HDR means you use more bits when recording the image. More than the usual 8 bits per color component. One can already do a bit of HDR when you take the raw image from most photo cameras that have 10 to 14 bits of depth. However these 10 to 14 bits are linear light (as opposed to gamma corrected for the display, so their dynamic range is not much better).
The real improvement comes from taking multiple exposures of different lengths of the same subject. Then combine these exposures into a single image; basically you would try to use the pixels from the long exposure (more accurate measurement) unless the pixel is over exposed, then you would use the same pixel from the short exposure; in reality you would use a weighted average to smooth it out a bit more. The more exposures you have the more range of accurate measurements you have.
In this case they took two cameras, set to a different exposure speed, then later they combined the two videos into a HDR image.
Now comes the interesting part, displaying the HDR image/video. You can now simply choose a virtual exposure time to show the image in a normal way, but more convenient than having to select the exposure during filming.
Or you use a special algorithm that changes the exposure of an image on a per pixel basis based on the surrounding pixels, in sort of the same way as a human eye would interpret the real world. This would show a picture with both dark a light patches very clearly, and more lively. However such algorithms always make it look fake, but it may just be conditioning that we have had looking at normal photographs (like a transistor amp compared to the valve amp).
Also from the video it looks like the algorithm used here causes flickering in the image (unless the flickering was caused by the cameras themselves), I guess the algorithm needs to be modified to take into account moving images.