I'm sorry, but its ridiculous to count Samsung Microwave model 333-X and 334-X, that run 95% of the same code and differ by one feature, and each factory rev of each one of those, which may differ by 0 or a few bytes of code as a separate program, that's preposterous. By that measure every single installed copy of the Linux OS is a separate program since they SURELY link to a unique subset of all the possible libraries out there, use different libc's, etc etc etc. This kind of counting is utterly bogus.
You have to understand that when you say "small applications" you are talking about a VERY VERY small class of applications, ones that compile to no more than at most a few 100 or perhaps 1000 bytes of code and contain a few hundred instructions. ANYTHING more extensive than that is not human-optimizable to the level that a compiler can achieve. Frankly its just infeasible and not cost-effective to attempt full hand optimization of larger code bases, and anything less will fall short of what the best compilers will do. Understand, I say this as someone who has written a great deal of assembly language code, assemblers, real-time applications and system code, etc. I also wrote FORTH compilers which performed both local and global optimizations on performance-critical code. I have a good idea how this stuff works, even if I'm not really involved in the area currently. I can certainly put out some VERY good MC68k code, but I know for a fact that the best commercial compilers, starting in the early 90's, began to exceed our ability to do it faster except in very small sections of code like interrupt handlers.
I worked extensively with PASC FORTH and OS-9/68k back in the day. We did lots of very cool stuff and a lot of that code (the entire OS and much of the compiler) were written in hand-coded assembler. Now, go look at the history of OS-9/68k, which is a high performance Posix-oid RTOS, and you will see that it was replaced in the mid-90's by OS-9000, which is entirely written in C, and was at the time equally fast, and is now MUCH faster (presumably, nobody even bothers with OS-9 anymore). I'd note that even in OS-9000 certain sections that handle clock ticks, basic interrupt controller handling logic, and some other bits ARE still assembly. Those are bits that are SMALL pieces of code where a human can consider every aspect of execution and, if you are a real deep subject matter expert on the specific architecture, eek out enough clock cycles to make it worthwhile. Other than that? You'd just make the bulk of the OS slower if you tried to code it in assembler.
I'd note that OS's like OS-9000 (things like VxWorks for instance) are embedded in all sorts of things. Many appliances, many commercial spacecraft, aircraft, industrial systems, etc. Truthfully, THESE are the super common things, because each one REALLY IS unique, or close to it.