Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Origin of Quake3's Fast InvSqrt()

Posted by Zonk on Fri Dec 01, 2006 03:20 PM
from the i-know-you-were-dying-inside-without-this dept.
geo writes "Beyond3D.com's Ryszard Sommefeldt dons his seersucker hunting jacket and meerschaum pipe to take on his secret identity as graphics code sleuth extraordinaire. In today's thrilling installment, the origins of one of the more famous snippets of graphics code in recent years is under the microscope — Quake3's Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees while contemplating its simple beauty and power."
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by ArchieBunker (132337) on Friday December 01 2006, @03:23PM (#17070412) Homepage
    "English motherfucker, do you speak it?" Anyone care to explain what that function does?
    • Re:A famous quote (Score:5, Informative)

      by GigsVT (208848) on Friday December 01 2006, @03:26PM (#17070456) Journal
      1/sqrt(x)
    • Re:A famous quote (Score:5, Informative)

      by 91degrees (207121) on Friday December 01 2006, @04:13PM (#17071420) Journal
      Okay - At times you want a normalised vector. A lot of the time you will have vectors of arbitrary length For example, the light is at the origin, the point is at (12, 4, 3). So the vector from the point to the light is (-13, -4, -3). The length of this vector can be calculated easily using pythagoras's theorem (sqrt (12^2+4^2+3^2). It's 13 units in length. We want a unit vector (i.e. a vector 1 unit in length) So we divide by the length to get (12/13, 4/13, 3/13).

      This is great for a 3D rendering application, but in a game speed is critical. This pair of calculations involves a square root and a divide. Both of thse are at least an order of magnitude slower than multiplications and additions.

      So what this function does is provide a value you can multiply each component by to get a unit vector.

      Well, there's the what and why parts. As for the , I have no idea. I think it uses magic.
      • Re:A famous quote (Score:5, Informative)

        by Red Flayer (890720) on Friday December 01 2006, @03:35PM (#17070624) Journal
        It's just John Carmack's way of telling everyone he knows mathematics and please worship the ground he walks on.
        RTFA.

        Carmack quite graciously denied the code was his and helped direct the author closer to the true source.
  • I have a truly marvelous proof of who wrote this code which this comment box is too narrow to contain.
  • by gukin (14148) on Friday December 01 2006, @03:33PM (#17070584)
    Anyone who has ever played a computer game should pay up.
  • by nels_tomlinson (106413) on Friday December 01 2006, @03:35PM (#17070626) Homepage
    The linked site seems to be down (gee, you think it might be slashdotted?), but this paper [purdue.edu] seems to be covering the same topic.
  • by zoftie (195518) on Friday December 01 2006, @03:50PM (#17070964) Homepage
    Introduction
    Note!

    This article is a republishing of something I had up on my personal website a year or so ago before I joined Beyond3D, which is itself the culmination of an investigation started in April 2004. So if timeframes appear a little wonky, it's entirely on purpose! One for the geeks, enjoy.
    Origin of Quake3's Fast InvSqrt()

    To most folks the following bit of C code, found in a few places in the recently released Quake3 source code, won't mean much. To the Beyond3D crowd it might ring a bell or two. It might even make some sense.

    InvSqrt()

    Finding the inverse square root of a number has many applications in 3D graphics, not least of all the normalisation of 3D vectors. Without something like the nrm instruction in a modern fragment processor where you can get normalisation of an fp16 3-channel vector for free on certain NVIDIA hardware if you're (or the compiler is!) careful, or if you need to do it outside of a shader program for whatever reason, inverse square root is your friend. Most of you will know that you can calculate a square root using Newton-Raphson iteration and essentially that's what the code above does, but with a twist.
    How the code works

    The magic of the code, even if you can't follow it, stands out as the i = 0x5f3759df - (i>>1); line. Simplified, Newton-Raphson is an approximation that starts off with a guess and refines it with iteration. Taking advantage of the nature of 32-bit x86 processors, i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast. i is then set to 0x5f3759df, minus itself shifted one bit to the right. The right shift drops the least significant bit of i, essentially halving it.

    Using the integer cast of the seeded value, i is reused and the initial guess for Newton is calculated using the magic seed value minus a free divide by 2 courtesy of the CPU.

    But why that constant to start the guessing game? Chris Lomont wrote a paper analysing it while at Purdue in 2003. He'd seen the code on the gamedev.net forums and that's probably also where DemoCoder saw it before commenting in the first NV40 Doom3 thread on B3D. Chris's analysis for his paper explains it for those interested in the base math behind the implementation. Suffice to say the constant used to start the Newton iteration is a very clever one. The paper's summary wonders who wrote it and whether they got there by guessing or derivation.
    So who did write it? John Carmack?

    While discussing NV40's render path in the Doom3 engine as mentioned previously, the code was brought up and attributed to John Carmack; and he's the obvious choice since it appears in the source for one of his engines. Michael Abrash was mooted as a possible author too. Michael stands up here as x86 assembly optimiser extraordinaire, author of the legendary Zen of Assembly Language and Zen of Graphics Programming tomes, and employee of id during Quake's development where he worked alongside Carmack on optimising Quake's software renderer for the CPUs around at the time.

    Asking John whether it was him or Michael returned a "not quite".

    -----Original Message-----
    From: John Carmack
    Sent: 26 April 2004 19:51
    Subject: Re: Origin of fast approximated inverse square root

    At 06:38 PM 4/26/2004 +0100, you wrote:

    >Hi John,
    >
    >There's a discussion on Beyond3D.com's forums about who the author of
    >the following is:
    >
    >float InvSqrt (float x){
    > float xhalf = 0.5f*x;
    > int i = *(int*)
    > i = 0x5f3759df - (i>>1);
    > x = *(float*)
    > x = x*(1.5f - xhalf*x*x);
    > return x;
    >}
    >
    >Is that something we can attribute to you? Analysis shows it to be
    >extremely clever in its method and supposedly from the Q3 source.
    >Most people say it's your work, a few say it's Michael Abrash's. Do
    >you know who's responsible, possibly with a history of sorts?

    Not me,
  • by kan0r (805166) on Friday December 01 2006, @04:01PM (#17071186)
    But the first thing I thought when I saw this was: "Damn, that code is a mess!"
    Seriously, try looking away from the genius who obviously wrote it.
    • There is no single comment which would make reading and understanding what happens here much easier!
    • Introduction of a magic number with no explanation whatsoever
    • Magic pointer arithmetics without demystification
    • Portability? Abuse of a single processor architecture, without warning that this would not work on non-x86
    I know it is good code. But it is simply bad code!
    • by ewhac (5844) on Friday December 01 2006, @03:44PM (#17070820) Homepage Journal
      C "understands" ints and floats. If you do the simple cast:

      int i = (int)x;

      Then C will simply convert the float value into an integer value (throwing away fractional part). But this isn't what we want. We want to operate on the bits of an IEEE floating point value directly, and integers are the best way to do that.

      So first, we lie to the compiler by telling it we have a pointer to an int:

      (int *) &f

      And then we deference the pointer to get it into an operable int:

      i = *(int *) &f

      Note what's important here is to keep the compiler from modifying any part of the original 32-bit value.

      Schwab

      • by eis271828 (842849) on Friday December 01 2006, @03:46PM (#17070866)
        (int) x would convert the floating point value to an integer (truncation, basically).
        *(int*) &x treats the bits as an integer, with no behind the scenes conversion to an actual int value.
      • by radarsat1 (786772) on Friday December 01 2006, @03:47PM (#17070878) Homepage
        Because in C's own weird way, that's the only way of refering to a float as an int without changing the bits.

        If you do this:
        int i = (int)3.0f;

        You get i=3, like what you'd get from the floor() function.

        If you do this:
        float f = 3.0f;
        int i = *(int*)


        Then i contains a bit-for-bit copy of the IEEE floating-point representation of 3.0.

        It's because C knows how to cast a float to an int by applying the floor function. However, if you do it the second way, you aren't casting a float to an int, you are casting a pointer-to-float to a pointer-to-int and then dereferencing it.

        By the way, I just wanted to say... this is one of the most interesting things I've read on Slashdot in a while. Wow. That function is just amazing. I only wish I understood how it worked. I know nothing about what a "Newton-Raphson iteration" is.
        • by dsci (658278) on Friday December 01 2006, @04:12PM (#17071400) Homepage
          Newton-Raphson is a general algorithm for finding root of an equation f(x)=0.

          You start with some INITIAL GUESS (the real beauty of this algorithm) X(0), then apply:

          X(n+1) = X(n) - f(X(n)) / f'(X(n))

          where
          X(n+1) is the NEXT guess after the value you 'know',
          X(n) is that most recent value you know,
          f(X(n)) is the function evaluated at X(n) and
          f'(X(n)) is the first derivative of f(x) evaluated at X(n).

          It's not foolproof and a BOTH whether it converges at al AND how FAST it converges depends on the initial guess, X(0)

          The "Secant Method" is an improvement that makes it a little 'smarter,' at the expense of more computation (this is often a positive trade-off on numerical modeling codes, since the 'smarter' algorithm does tend to converge faster). There are other improvements as well, such as the Los Alamos Linear Feedback Solver (a slightly modified secant method that converges about 10-17% faster, at least for some types of problems) that I use in my own codes.

          Obligatory wikipediea followup: Newton's Method [wikipedia.org]
      • by ToxikFetus (925966) on Friday December 01 2006, @03:56PM (#17071070)
        Why does the coder use

        (1) int i = *(int*)

        instead of just

        (2) int i = (int)x;
        (some of my points added for emphasis)

        I'll take a swing at this one. It's because the author doesn't want the value of x, but the integer representation of the value at x's memory address.

        If x is 3.14159, (2) will result in i==3, whereas (1) will result in whatever the 4-byte IEEE-754 representation of 3.14159 is (0x40490FD0, if Google is correct). By using (1), the author is able to use integer bitwise opeartions (>>) to perform "free" floating point operations. When i is sent back into floating point form via:

        x = *(float*)

        x now contains the value of the integer operation:

        i = 0x5f3759df - (i >> 1);

        which was presumably faster than an identical floating point operation. It's a nifty little solution, especially with regard to the selection of the magic number.
    • Re:It was fast (Score:5, Insightful)

      by systemeng (998953) on Friday December 01 2006, @04:19PM (#17071500)
      First off, this function calculates 1.0/sqrt(x), not sqrt(x). InvSqrt is a particularily nasty function because both the divide and the square root stall the floating point pipeline on IA32 processors. As a result, instead of shooting out one result per cycle that the pipelining normally allows, the processor will stall for 32 cycles for the divide after it has stalled for the 43 cycles for the square root(P4). This is a big hit to realtime performance and it also prevents 76 multiplies from getting done while the pipeline is stalled. Secondly, IA32 processors are super scalar and have multiple integer units which can do portions of this calculation in parallel. This algorithm is brilliant because it uses the integer units for a portion of the most difficult part of the calculation and the remaining floating point multiplies only take about 6 clock cycles on the FPU. The difference in clock cycles you are counting is likely because the routine as written will be implemented as a function call and the stack push overhead will eat you alive. If this is implemented inline, it's about 6 times as good as simply calling the processor's assembly instructions for root and divide in sequence with the penalty that it isn't as accurate. It is virtually impossible to beat sqrt on IA-32 but 1.0/sqrt can be computed faster with newton raphson iteration in one fell swoop than by coposition of the operations. I've worked several years implementing similar optimizations in the reference implementation of ISO/IEC 18026, a standard for digital map conversion. Most of the routines that had optimizations like this added to them saw at least 30% speed improvements. This is a bit of a soft number because many things were reordered to make the pipeline fill better but in general, a complicated function especially of trig fucntions that can be computed in one iteration of well designed newton-raphson will be much faster than the coposition of the CPU's implementation of the component functions. In short, don't write off careful numerics they can provide great sped improvements, just don't use them in code that people will want to understand later if you don't document exactly what you did and why.