The risk with figure out something really clever is that you want to apply it to more things, and the
Description of how it is working under the hood can be found here
The trick applied here is to put all the automatic variables on the stack just as expected, but a method marked as "async" might "lift" the local variables into a Task instance (i.e. allocate a piece of memory an copy the needed context from the stack into that memory area). This won't scale well in scenarios where you have hundred-of-thousands of concurrent clients to track, and they are doing a lot calls that need to block for some time.
Erlang and Go got this right, they don't allocate large number of memory areas and copy stuff around, they just ensure that every new process/goroutine (equivalent of Task in Erlang/Go) can start out with a _very_ small stack (like 200-300 bytes) and switching to a new process/gorouting will just involve switching the CPU-registers (including the stackpointer).
C# got another thing wrong in the process: marking a method returning T as async changes the signature of that function. It now returns Task instead. That is bad because you have to decide upfront which methods you want to run asynchronous and which to run synchronous, unless you are prepared to break your API later. Yet again something Erlang and Go got right, the signature of a function is always the same as you always implement it as if it would run synchronous. The _caller_ of your function determins if he/she want to make a synchronous ("normal" call) or asynchronous call (just wrap it into a process/goroutine).
They rewritten the async API 3 times in