Comment Re:A C app would be much faster (Score 2, Informative) 752
The proposed ratio of 1:10 is real, if not bigger. And here's why:
1.) For each request, PHP has to load entire application responsible for that particular response, including its configuration, etc. With memcache(d), you have to instantiate connection classes and reconfigure them, per request. Languages like C/C++, Python and Ruby have different architecture to begin with. They load ONCE and each request triggers a FUNCTION or METHOD of a class, with all the app-specific configuration, db and memcached connections done and configured on app init, NOT per request.
With caches like APC, overhead is very much mitigated. PHP can also use a pool of connections to memcache/database to minimize connection delays.
2.) TFA mentions microsecond relevance! Even a simple echo "Hello World" will take much more time than similar action in C. I have yet to see a PHP helloworld app that does it in under 1msec, let alone the microseconds required.
helloworld.php takes 0.363ms on average here on my laptop.
3.) Arrays in PHP are slow, being always hashmaps. Other data structures can speed up things. You don't always need hashmaps. SPLFixedArray() is a joke, btw, and available only as of 5.3. Can't compare it to a vector anyways, and lots of fixed structures can be represented by structs or classes in C which are anways faster than in PHP. Also the app can instantiate them once on init, and just (re)load when required.
PHP can also instantiate it once, with the use of APC cache. It caches opcodes (and thereby constant values/arrays), and you can also cache any data you want, and loading that data is fast since APC is written in C (some small overhead).
4.) Even if all the app does it parse input vars and call memcache(d) / database funcs/methods to retrieve/store data, those calls are faster in C. Params can be parsed quicker in C, not requiring hashmaps for instance.
Is waiting on I/O in C faster than in PHP? Nope. So if you're mainly doing database lookups you'll see extremely low speedup porting your code to C.
5.) FastCGI is crap. If this app were to be done in C, then it would require its own HTTP layer, epoll based (for Linux). It can take out all the crap in HTTP that is not requred to parse the AJAX calls, and does not need to be "generic" enough to deliver static content.
I know. I'd like to get rid of that crap too. But is it really worth it? Making a very efficient server with all the capabilities your application has now, with all the error checking, testing and security protections would take months. HTTP is more complicated than you'd think.. Is the small speedup you'd gain, versus the maintenance of a much larger application worth it?
6.) For such dedicated and distributed deployments, garbage collection is sometimes not required. For instace, fixed-length stuctures can be preallocated upon app init, and the app can really take as much RAM as possible on startup. Yes, that would limit the MAX number of users/connections per server, but so what? The app dominates the server, nothing else is required to run (except basic OS environment for the app), so fixed memory consumption is not a problem.
7.) Even though each request has to wait for I/O of some sorts, either from memcache(d), from disk or from DB, you can process much more of these per front-end server and just scale backend servers as required. For example, with PHP your front-end server can serve 100k/sec, having X DB backends and Y memcached backends. With a C application, the front end can serve, say, 1M/sec. You still get to keep one front-end, even though you had to put more backends.
In short, you can significantly reduce the number of servers required if the app was written in C.
You're pulling those numbers out of thin air. You still have to wait for data! You're not magically reducing latencies by switching to C. There are many layers of latencies in a complicated web application, from the front end facing the user to the backend storage. Making one layer go faster doesn't magically fix the other bottlenecks. And the bottlenecks are mostly I/O or network latency. (And developers...)