The idea is that the only thing you are uploading to the server is input, such as mouse/keyboard/voice information. The game logic and assets all reside on the server itself, and thus don't have to be upload by your machine. It's as if you were playing a game over a VNC connection.
One thing that is really cool about this technology is that it has the potential to eliminate cheating in games such as first person shooters. A lot of the cheating in the past is because the game client running on a user's machine actually knows a lot more information than what is being shown to the user. If a user can get past those artificial barriers to the information with hacked graphics drivers to see through walls, or sound drivers to see the exact location of footsteps, then they have a huge advantage over another user which leaves those artificial "information limiters" in place. It turns out it's very difficult to limit the sending of information from the server to a game client to only exactly what a game client needs at any given time. Theoretically, if the only thing being received from a game server are pre-rendered images, the a user couldn't use that information to cheat with wallhacks or any other current cheats that I know of. The problem is that there would also be no way to do client-side prediction (which is why extra information generally has to be sent in the first place), and mitigate the lag that inevitably exists between nearly all servers and clients to this day.