Let's see if we can clear up a few things. Imagine looking at your monitor.
The pixel in the upper left corner is emitting a hemisphere of light. Or rather, it's emitting a bunch of rays of light that spread out in a hemisphere. Under ideal circumstances, it's the same color and intensity for any of those rays, though we know from experience that it tapers off and sometimes changes color as you see it from greater angles. But for most of the "straight on" angles, they're about the same.
A subset of that hemisphere of rays is entering one of your pupils. If you consider the shape of that subset, it forms a cone, with the base at that pixel on the monitor, and the extent formed by the circle of the pupil. All those rays of light will (assuming your eye is focused on the monitor) focus to a point on the associated retina.
The individual rays in that cone are close to, but not quite, parallel to each other. The farther away your monitor is, the more parallel they are, and the closer the monitor is to you, the more the rays are spreading out. Each eye's lens takes care of focusing the parallel or spreading out rays back to a point on its retina. Note that if the rays are spreading out too much (ie, the monitor is too close to your face), you cannot refocus the rays back to a point. You'd need additional optics to help achieve this. (This is why Oculus needs a big fat lens in front of each screen.)
For the purposes of this explanation, we'll simplify a bit and consider a bundle of rays that are parallel. Given this simplification, the only distinction between the pixels on the monitor (aside from their color and intensity) is that they arrive at your pupil from different directions.
In fact, you can replace the monitor with physical objects that are reflecting light, and the same principles apply. Going a step further, you can see that it doesn't really matter how those bundles of rays are generated; the only thing that matters is how they enter the pupil. The direction (ie, angle) that they enter from determines the location, and the color and intensity determine what you see there.
So let's take away the monitor, and instead imagine other ways that you can generate different parallel ray bundles directed at your pupil. The original "virtual retinal display" from the University of Washington was based on the following principle:
1) Generate a single collimated beam of light rays. Collimated means that all the rays within the beam are parallel (or close to it). Beam, in this case, does not mean a tiny dot, but rather a beam with some girth to it (on the order of a centimeter).
2) Use one or more tiltable mirrors to shine this beam at different angles at your pupil. By redirecting the beam in a raster-scan fashion, you can trace out a complete image.
3) For each different direction scanned (ie, each pixel), you also need to change the color and intensity of the beam appropriately (to correspond to the pixel you see from that direction).
Note that the beam has to be spread out significantly from a single point, such that when redirecting it from one extreme to other that it will still hit your pupil. Light that doesn't enter your pupil is wasted.
This is just one method. The subject of today's article appears to use a DMD array instead of one or two scanning mirrors. Assuming that the DMD mirrors can scan in a 2D fashion, then it's really the exact sample principle.
Note that there are many other ways to achieve the same ends. If you have a point light source, you could use a parabolic mirror to generate a large collimated beam. Provide some way to scan that beam, and voila. You might also note that spherical mirrors approximate a parabola, except for arbitrary directions. Provide a way to scan the light source, and voila.
As you can see, the trick is mainly in the scanning, since all the rest is "easy".