I'm pretty sure that in the worst-case, ATSC MPEG-2 frames have at least one I-frame per 15 frames, so the total latency should still be well under a half-second per channel EVEN IF you had to wait 1/4 second for an I-frame, then spend another 1/60th of a second analyzing it and another 1/60th of a second outputting it to the display. If switching between a 720p60 and 1080i60 channel, maybe add another 1/15th of a second of delay (assuming the box can't transmit the resolution/framerate metadata with each frame, so the TV could get started with switching output modes even while the box was still waiting for the next I-frame).
Insofar as encryption is concerned, there's no reason why the box shouldn't already have a copy of every channel's current encryption key pre-negotiated and ready to go. It's not like RAM is actually expensive anymore, and 2GHz+ quadcore ARM processors are now almost free. Worst-case, maybe add another dollar or two for a second DSP to constantly walk through the channels, update its metadata, and renegotiate encryption keys as necessary in the background.
I really wish I knew why American HDTVs are so completely "dumb" in their operation. On paper, at least, there's NO REASON why a broadcaster shouldn't be able to seamlessly transition from a 720p60 newscast to a 1080i commercial, then transition to a 720p50 imported TV show and follow it up with a 1080p24 movie (all with more 720p60 and 1080i60 commercials seamlessly inserted along the way). I'd love to know where in the transmission chain the whole thing breaks down and makes mode-changing such a big deal. IMHO, changing from 1080i60 to 720p50 (for example) should AT WORST cause 1/60th to 2/24ths of a second of blackness before resuming video display in the new mode.
By the same token... it drives me nuts that 1080p60 wasn't one of the official ATSC modes. Yes, I know that realtime compression of 1080p60 back in the 90s would have been almost impossible (at least, at an acceptable quality and keeping the bitrate below ~19mbps). HOWEVER, I also have a pile of old VCDs I made from ripped DVDs using TMPGEnc that got near-DVD quality out of 2.7mbps burned to a CD-R, so I know what's possible when you can let the encoder take its time to chew on the file and re-analyze the video at its leisure... especially when variable bitrate and long GOPs are available options. With the exception of sports, news, and award shows, almost NOTHING gets literally encoded in realtime anymore. And even news & awards shows now get delayed by 15-30 seconds so they can prevent the transmission of anything obscene or shocking (like someone blowing his head off on a live news feed, or flashing a boob at the superbowl). For any other content, there's plenty of time to aggressively cross-reference frames & use motion-estimation to shave the 1080p60 bitrate down to something you could send at high quality with just 18mbps.