How Far Can Large Commercial Applications Scale? 56
clusteroid81 asks: "I've been working with customers who run large commercial applications on big iron (16-32 symmetric multi-processor systems - 64GB or more memory ). There are always numerous other front-end servers involved, but the application on the back end server is often difficult to spread across multiple systems or clusters due to the application architecture. Scaling is done by increasing memory and processor counts. As things progress, the bottleneck is usually contention within the application or operating system. Are there folks here on Slashdot who work with large single system commercial applications? What kind of processor counts and memory do the applications have and how well do they scale?"
It's the network! (Score:2)
Re:It's the network! (Score:3, Informative)
It all depends on the applications... (Score:4, Insightful)
I've run (now obsolete) ATG Dynamo on the same, with similar results.
I've run Apache (1.3.x) on the same, with similar results.
I've seen applications which stopped scaling well at much less than that.
"Large business applications" isn't specific enough.
Enterprise (Score:4, Funny)
I like to add a couple hundred enterprise myself.
Re:Enterprise (Score:3, Funny)
How well does "Enterprise" scale? (Score:1)
Enterprise vs. Voyager (Score:1)
Re:Enterprise vs. Voyager (Score:2)
scale by hashing (Score:3, Insightful)
Re:scale by hashing (Score:1, Funny)
Re:scale by hashing (Score:2)
Re:scale by hashing (Score:4, Informative)
One way divide the data is per-user or per-group. Divide data according to its owner so that each user account is hosted on a given machine and has first-class access to his/her own data and his/her group's data, but second-class (network-based) access to everyone else's data.
Another way, as you mention, is to do hashing based on some well-defined key, but for this to be useful requires that the front end be thoroughly abstracted from the back end so that multiple front ends share multiple back end stores. Otherwise, you are probably just moving the bottleneck around. It also requires that this key be known in advance, which means that it doesn't generally work well if, for example, you need to do a join on two tables and one of those tables is scattered across multiple machines. The only way that it would work for such use would be if either the key being used for the join is the hashed key or if each machine has a table index that spans multiple machines' content, at which point, you are going to have cache coherency problems.
Which brings us to a fairly nice compromise solution: a replicated database with each of the outer-ring database servers being read-only caches with some sort of built-in cache consistency protocol, and the central database accepting write queries from clients, but with all the read queries directed to the outer ring. Makes for seriously scalable database access.
This, of course, assumes that the app in question is a front-end for a database. If you're doing some other sort of application, then all bets are off. Give us more information.
Re:scale by hashing (Score:2)
Say you have a bunch of products, each sold by a different department of your business. So split your data based on a hash on the department, to keep all of a department's information together. Later, a business decision consolidates two departments, or splits a department, or moves a whole slew of produc
It may seem offtopic.... (Score:3, Interesting)
Their game is little more than a MASSIVE database application supporting tens of thousands of simultaneous users... They have lag issues but, on the whole, seem to be scaling bloody well.
Re:It may seem offtopic.... (Score:2, Funny)
Re:It may seem offtopic.... (Score:1)
Re:It may seem offtopic.... (Score:2)
Cheers,
-Jar.
Re:It may seem offtopic.... (Score:2)
Re:It may seem offtopic.... (Score:1)
yes (Score:2, Interesting)
Anyhow, they started out on a 4-way machine and had scaled up to the 64-way without many code changes. If it had been cost effective, they would have kept on scaling upwards.
Vague question... Vague answers (Score:5, Insightful)
I work for a company that has a large commercial application. We knew we needed to scale our data set and processing power to be huge, so we made sure from the start that the heavy lifting could be divided into little chunks, and thrown to the cluster. For our purposes, back end scalability is basically linear. When we need more, we just bring another rack of little 1U critters online. There are a few theoretical bottlenecks, but we'll never see them before we have our own nuclear power plant to run the data centers.
For other applications we use, there is *no* scalability. The algorithm has to be single threaded. It doesn't matter if I run it on a cluster, or a machine bristling with CPUs. So we basically buy the data center equivalent of a gaming PC: The fastest processor and memory that fits our budget.
So there are the ends of the spectrum. Your scalability will be somewhere between zero and infinity, depending on the problem at hand.
Re:Vague question... Vague answers (Score:2, Informative)
Some problems are like the "baby" problem. It takes nine months to make a baby, no matter how many couples are assigned to the problem. BUT, if the task is to make 1000 babies, you can still do that in 9 months—if you can find 1000 couples. But, if you only need one, you're stuck. It's a parallelism granularity problem.
Other times you get stymied by serial bottlenecks in an application. Sometimes you can gain fractional benefit from additional compute resource by allowing various CPUs in the
Re:Vague question... Vague answers (Score:1)
Poorly worded. I meant: "In fact, the entire goal of this exercise was to convert the algorithmic description of the edge connections into an explicit form." (The end goal was to generate a databased that expanded the compact-but-slow form into a fast-but-eats-my-hard-disk form.)
--Joebaby problem (Score:2)
techincally you need 1001 people to produce 1000 babies in 9 months-not subtracting for multiple births.
it's all in how you look at it.
Re:baby problem (Score:2)
-Jar.
Re:baby problem (Score:1)
180 MILLION (Score:2)
Re:180 MILLION (Score:1)
Re:baby problem (Score:1)
FWIW, I said "couples." Couples, in the specific context of making babies, tend to be pairings of males and females. The same male could be paired with two different females, giving two couples, but 3 people. As someone else pointed out, though, you do need enough sperm to go around, and the time to make all these couplings. :-)
Those pedantics aside, I wanted to avoid the sometimes-called-misogynistic formulation that's somewhat more common. And, well, multiple births in this analogy are like data de
Posting Anon to save my ass (Score:1, Interesting)
That is just the database server which handles approx 40,000+ user sessions at one time.
Of course in front of that you have your liberal sprinkling of app server and database proxy servers and whatnot, amounting to about 100 other seperate systems.
As others have noted, you need lots of enterprise, which costs money.
The fli
Must be made to scale (Score:1)
How big a machine can you buy? (Score:2)
We have a few well designed apps and there the answer is pretty much "How big a machine can you buy"
Very little to go by ... (Score:4, Interesting)
You have to tell us many many specific things before we can suggest specific solutions. All we know is that the application runs on a 32 cPU system, and has 64 GB. This is all about the hardware. The application is a "large commercial application", and there is "contention within the application or the operating system". We do not even know what the hardware is, nor what operating system it is.
Anyways, here are some generic suggestions form past experience, most of it on UNIX systems, many with Oracle, and most with commerical non-web systems.
- Is the application CPU bound, memory bound, or I/O bound? If you do not know then you have to find out first, then attack the area of
- Is the application transactional in nature or batch? Is it an operational system, or a decision support type of application?
- Does the application use a database (probably does)? Is the database on the same box that runs the application? If so moving the database to a separate box with a fast connection (FDDI or Gigabit Ethernet) may help things.
- Does the application uses queues or message passing? Do these queues fill up at certain peak hours causing slow downs?
- Can you benchmark/load test the application on a similar box? If you have transaction generation/injection tools, then you can simulate the real load and then run tools for profiling, performance and the like in real time (e.g. sar, vmstat, top,
Performance tuning is an iterative process that is more of an art than a science. Start with the 80/20 rule, and get the low hanging fruit (attack the easiest and most obvious area that would gain you some performance, then move to the next area,
My experience with Solaris/Oracle (Score:3, Interesting)
Not far enough. (Score:3, Interesting)
Even if you can magically get a single system that's big enough for your needs forever, you'll still pay orders of magnitude too much money for it, and get no added reliability through redundancy.
Any application that requires a solitary, unique, big server is just definitionally broken. It needs to be redesigned to allow it to be spread over an arbitrary number of small systems in geographically diverse locations. For reliability, your serving infrastructure needs to be at least n+1 at every layer to allow for planned maintenance, unexpected failures, and site-destroying disasters. And for scale, it needs to allow you to continue to plug in more batches of cheap little machines and get more throughput.
Re:Not far enough. (Score:2)
Why? Centralization is often the best solution for many reasons (performance, security, legal issues, recoverability, reliability can all be factors depending on the nature of the system).
Only an extremist advocates one type of computing solution for all problems. :-)
Disclaimer: my background is medium-scale airline online-transaction applications where monolithic systems (read: mainframes) still tend to work very we
Really... (Score:2)
I do SUN PS gigs, so if its SUN hardware, I can help out (just contact SUN). Ask for "PACP" (Performance Analysis and Capacity Planning). I helped design the service. Also, google "adrian cockcroft". Or http://www.cs.washington.edu/homes/lazowska/qsp/ [washington.edu]
Or IBM or HP: they have equivalent services.
You can also get any number of other people to help: try datacenterworks.com, or treklogic.com (off the top of my head).
Yes, the problem falls directly into my domain,
Experience to share (Score:2)
For our application these machines are over-spec'd. While our app has many components in many languages, (COBOL, C, Java, P
Re:Experience to share (Score:1)
Problems scale too (Score:1, Interesting)
The moral of the story is:
You're not just scaling up your effeciency / work load. You are also scaling up the other va
Re:Problems scale too (Score:1)
We (the developers) did not notice because the websrv process only went to 100% capacity (30% of the CPU) when the SAS websrv crashed - which is not often. Also, this process was 'low priority' and was constantly trumped by most other jobs. A situation where during the day programmer compiles take a lot of the CPU (plus online systems), and at night.. no one noticed 30% of the CPU not being available.
On the old machine it did not *cost* too much
Reservation industry? (Score:2)
Large Project on Server Cluster (Score:2, Insightful)
Short-sighted project management (Score:3, Insightful)
This is caused by short sighted project management, which translates into short sighted programming. The necessary questions about throughput aren't asked, because it all works fine on the developers' PC with a test load. In our case, we eventually got the application running OK, but changes that have been made since have not taken into account anything to do with I/O, so the fact that our CPU usage is not maxing out seems to indicate to the development team that we are not bound by the server performance, and hence have not reached any scalability thresholds.
Obviously this is madness. If one was to investigate the scalability of this application properly, one should be looking at where I/O happens, where interprocess communication happens, where object creation and destruction happens, and so on... There is no other way to scale an application -- you have to define what the "load" is, find what happens when you increase it, work out where any bottleneck is, and how parallelisable this bottleneck is. Anything less is no more than buzzwords.
My experience (Score:2)
Sometimes, it's just time to look for another job because your way out of your league when people ask vague questions!