The difference is how easily it can be avoided and spotted.
A remote desktop connection is easily to spot and avoid. Even despite the horrible mess the MS RDP is from a security point of view. For a RD connection, I first of all have to reach your computer from the outside world. Meaning, I have to initiate the connection. Something that already fails to work on most private setups, let alone corporate networks. I would usually die no later than the router (in a private setup) or a relevant firewall (in a corporate one).
Assuming that I somehow manage to actually get a connection going, this connection is interactive. I would have to stay connected for as long as I wish to view the attacked computer. Something that could probably be tricky when sensitive data is being manipulated, a time when probably additional care is taken to ensure only valid and known connections are allowed.
All in all a scenario that needs very sloppy security on the attacked end.
Compare that to an attack where I record what I wish to see (with a planted trojan, something that probably would have to exist for the former attack to work as well since RDP, despite its insecurities, is usually not configured to be free for all). It would start recording when a certain tool is being run or a certain webpage is being viewed in a browser. This recording is then stored in a "secret" location, most likely somewhere in the user's documents or his %appdata% folder where he has read/write access without elevated privileges, which is also something you cannot easily deactivate due to programs needing to write data in those areas constantly.
The transfer happens when the user next time connects to some server I either control or when I can estimate that opening a connection to my C&C server would go unnoticed (like when he is doing an update for his system or programs, any time large amounts of data are being transferred qualifies). Preferably of course when he is sending bulk data but in general as long as I can somehow assume that security is not as tight as during the critical use (i.e. what I wanted to record) would do.
That's what makes the mess more dangerous.