All scrapers, crawlers, and other 'bots' SHOULD respect robots.txt. The original intent was to block what was termed at the time (1994) as "crawlers", but that has evolved as the Internet has evolved.
Justifying crawling behavior by saying it's "just scraping, and then loading additional pages..." is... Well. Fucky logic to say the least. Following your logic here: If I access a single page, then extract all the links in it and add it to an RSS feed, I'm free to then access all those subsequent pages because now they're in an RSS feed, and I get to scrape them. I just run this in a loop, iterating over all of them because hey, I just want the contents of all those pages, and...
Where do you draw the line? Your "amending the fulltext to your RSS feed" example is crawling. You iterated over a series of links, accessing all the linked pages to get their full text with an automated process. It's just a nonsense argument to try to say you weren't "crawling". Just because you added --max-depth=1 doesn't mean it wasn't an iterative, automated process retrieving the contents of a page.
An AI Agent, acting on behalf of a set of logic instructions given by a user, accessing multiple pages and traversing them based on findings in the preceeding pages (such as executing a search, then following links to scrape for values) isn't crawling then?
Automated processes aren't just dumb "crawling and scraping" any more.
I don't believe any automated process should allow itself to access content denied by robots.txt, no matter the logic leaps made to justify it. If I wanted automated processes to access those URLs, they wouldn't be covered by a Disallow rule in robots.txt. A robots.txt file is a statement saying "I specifically do or do not allow access based on these criteria". The criteria isn't dependent on your use case. It's dependent on mine.