Slashdot is powered by your submissions, so send in your scoop


Forgot your password?
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 internet speed test! ×
User Journal

Journal Journal: Why branch prediction doesn't help

In the discussion about IBM putatively buying Sun, we were having a side-discussion about prefetches and branch prediction.

I had forgotten why my branch prediction performance experiments had failed ("confirmed the null hypothesis") and had to go back to my notes.

It turns out that mature production software tends to be full of small blocks of error-handling and debug/logging code, which is not often used. A Smarter Colleague[TM] and I set out to test the newly-available branch prediction logic, expecting to see a significant improvement. I manually set the branch prediction bits in a large production application, only to find no detectable improvement.

The test application was Samba, so we changed the driver script to only read a few files from a ram disk, to eliminate disk I/O overheads. Still no detectable advantage from predicting the branches correctly!

Then we tried just a single few functions, under a test framework that did no I/O at all. Still nothing.

Eventually we tracjked it down to the debug/log/else logic: the branches areound it were always taken, but the branch-arounds were long enough that the next instructions were in a different icache line, and the cache-line had to be fetched.

It turned out that we had reproduced in code what our HPC colleagues see in data: the cache doesn't help if you're constantly leaping to a different cache line!


User Journal

Journal Journal: Capacity planning in six paragraphs

An acquaintance asked about what to measure, and what tools to use, expecting to hear about vmstar, sar or the like.

However, the really interesting measurements are of the application's performance: response time and transactions per second.

Imagine you have a web site which responds in 1/10 second on average, is known to be running on a single cpu (queuing center, to be precise) and is averaging 6 transactions per second (TPS)

From that you know that the maximum performance will be 10 TPS, because ten 1/10ths fit into one second. You also know you're at 60% of the maximum, a nice safe number.

Now correlate this with your average CPU usage, network bandwidth and IO bandwidth, and you have a little estimator for what resources are needed to maintain good performance.

You also know that things will start getting bad at >8 TPS, so if you expect more business in future, you need to add more queuing centers (CPUs) with the appropriate amounts of network and disk I/O bandwidth.

You can also now use both the resource usage figures and tools that all the other folks have suggested, and watch out for growth in each of them. If the trend in their use looks like it will soon get above the number that corresponds to 8 TPS, above, then and only then do you need to start buying resources.

--dave c-b

User Journal

Journal Journal: Information for good log messages

This is a commonly reinvented wheel, and the version Stefan (metze) Metzmacher suggested in samba-technical is the round one (;-))

A maximally useful log message contains a number of fixed items, usually in a fixed-format header of some sort, and text for the human reader to use to understand the implications of the problem.

From memory, the fixed information includes enough to allow for mechanical sorting by nastiness and occasionally mechanical processing:

- date/time
- origin, meaning machine- or domain-name
- source, in some detail,, including the executable name and process id as a minimum, if applicable, and optionally the file, function and line, it is good to make this one token, for ease of parsing and resilience when one line has "sendmail:parse.c:parse_it:332:1948" and another has only "mconnect:1293"
- pre-classification, meaning the application type, error type and severity. DFAs can switch on this, and should.

The old ARPA format was error type source and severity as three decimal digits, which you still see when smtp says "250 ok". The 2 was permanent success, the 5 meant "the app", in this case smtp, and 0 was the severity. I prefer ascii, not numbers (;-))
- then the text for the human, saying the meaning of the error, the same way you're supposed to write the **meaning** of code in comments, not just say what the code does.

Syslog does about half of this, metze's did most of it.

User Journal

Journal Journal: ARPA result codes 1

Alas, many folks don't know the old ARPAnet tricks and have to reinvent them. Often inelegantly.

One very handy pair was the ARPA command and return-code standard.

A command was four letters or less at the beginning of a line (record, packet), often monocase, so it could be treated as a 4-byte integer and switched on.

For example, smtp starts ups with
helo localhost
250 froggy Hello localhost [], pleased to meet you

The "HELO" is the command, and the next line the response.

the first character is an ascii digit, where
1 means "informational message", and is rare
2 means permanent success
3 means partial success, as in a series of steps.
4 means temporary failure, such as "no space", and
5 means permanent failure

The second digit is 5 for "this app" and 9 for "the OS"

The third digit is the severity, so
599 I must close down, my CPU is on fire
is a very sever and permanent error (:-))

The fourth character is an ascii blank if the reply is complete on this line, a "-" if it continues to additional lines. For example, smtp has a help command:
214-2.0.0 This is sendmail version 8.12.8+Sun
214-2.0.0 Topics:
214-2.0.0 For more info use "HELP ".
214-2.0.0 To report bugs in the implementation contact Sun Microsystems
214-2.0.0 Technical Support.
214-2.0.0 For local information send email to Postmaster at your site.
214 2.0.0 End of HELP info

The three digits and the "-" for continuation allows one to write as simple or as complex a DFA as you like, by doing trivial masking on fixed-length strings.

Slashdot Top Deals

Usage: fortune -P [-f] -a [xsz] Q: file [rKe9] -v6[+] file1 ...