I'd imagine they decompress the video into it's constituent frames. That's easy to do with various Linux command line tools. Now you have to determine whether each adjacent pair of images are moving forwards or backwards in time. You can split this task up into small tiles to make use of parallel processing. Now you've got various sorts of movement; no change (eg. blue sky), upwards movement (smoke, clouds, rockets), sideways movement (cars, people), downwards movement (stuff falling, parachutists). Each of those will have it's own pattern of pixel movement and colors.
If you can understand what an object is, you can impose some sort of expectations on how it will move. You just need to look at some of those early comedy movies where the directors discovered how to play a film reel backwards. A tractor/trailer going backwards wasn't unusual, but someone lying on the ground, rolling backwards then jumping back into a standing position on the trailer was. Another one would be paratroopers receive an order to retreat, standing in a field, inflating their parachutes and jumping upwards into the back of an aircraft. So some rules are: human figures don't jump higher than 2 or 3 feet without help of a trampoline or unless they are a super hero. Smoke doesn't concentrate itself back into a small tube. Liquids don't fall upwards into the ceiling. If the system can understand those rules, it can tell when a video is being played backwards or forwards.