From Claim 1 of the patent filing:
A method of controlling a portable electronic device including an image capturing device, the method comprising: detecting, via the image capturing device, motions of an object over the image capturing device; determining a type of the detected motions using timing information related to the detected motions, the timing information comprising duration of at least one of the detected motions; and controlling the portable electronic device based on the determined motion type.
Claim 2 then says:
The method of claim 1, wherein the type of the detected motions comprises single tapping, double tapping, hovering, holding and swiping.
Then there is a lot of refinement, talking about edge detection, direction of movement, the usual definition of a computing device with memory, and finally kicking off predetermined actions based on recognized motions.
But look at Claim 2: "... comprises single tapping, double tapping, hovering, holding and swiping." To me, this patent seems to be a simple extrapolation of the gestures Apple made popular with their mobile UI, with the addition of "hovering" (assuming I understand the definition of that word, here). Same gestures, different input control.
Is there a significant difference between, say, swiping across a phone's screen and making the same gesture a few inches away? (I'm thinking that if the device interpreted motions from a larger distance then the only thing that will reliably happen is a serious of hilarious DoS attacks via interpretive dance.)