Physical modelling in sound generation is decades old, there was lots of interest in it in the 90's with commercial hardware, but it has kind of died down. It's computationally intensive for one, which a GPU can help with, but it's also a bitch to actually use well for most real-world instruments. Bell-like sounds are common and can be quite interesting, wind instruments can be done fairly ok, bowed instruments are a bit meh compared to the real thing or samples.
The novelty is doing it on a GPU which means greater processing capacity, and also doing it in real-time which can be tricky with audio, partially because of latency when transferring data, but also because anything over 15ms is simply to be comfortably usable as an actual playable instrument. If you're just playing something back it's different of course, but then there's no real-time requirement in the first place.
I've made a simple and rather crude analog virtual synth running on a GPU using OpenCL, without any shared memory, that can in a pinch go down to a 1024-point buffer (in stereo) which is about 22ms, though not quite reliably. It's obviously much simple calculations, but it can easily do some thousand oscillators in stereo. The article says a 512-point buffer (11ms) is the smallest that they could make usable, which is pretty good.
I can see it being interesting in a game, different objects have different sounds and so on. Music playback is something else though. I always return to the problem of string instruments: how do you actually create the model, and more importantly the inputs? An actual violin has quite a few parameters that govern the sound. Consider the bow: pressure, angle (leaning), speed, position on the string, tension of the hairs (more tension creates slightly smaller contact area), static pressure on the strings (causing the bow and string to stick and then suddenly unsnap), exact time of contact. You'll have to either record all those from the live performance, I would guess a resolution of maybe 10 to 50ms might be good for some of the parameters, more for others. But then you can also pluck the string, and the strings and body resonate with each other, and even with nearby instruments, and you still have the whole left hand yet to be done.
It also doesn't remove the problem of imperfect speakers. Even if you separate out the instruments on their own speakers, each speaker still needs to reproduce the entire spectrum. And also, a speaker is physically a much smaller sound source than a large wooden resonator like a cello, which makes an acoustical difference.
In short, it's very cool but not a revolution. But also don't quote me on that.