Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Origin of Quake3's Fast InvSqrt()

Posted by Zonk on Fri Dec 01, 2006 02:20 PM
from the i-know-you-were-dying-inside-without-this dept.
geo writes "Beyond3D.com's Ryszard Sommefeldt dons his seersucker hunting jacket and meerschaum pipe to take on his secret identity as graphics code sleuth extraordinaire. In today's thrilling installment, the origins of one of the more famous snippets of graphics code in recent years is under the microscope — Quake3's Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees while contemplating its simple beauty and power."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • A famous quote (Score:5, Funny)

    by ArchieBunker (132337) on Friday December 01 2006, @02:23PM (#17070412)
    (http://www.naawp.org/)
    "English motherfucker, do you speak it?" Anyone care to explain what that function does?
    • Re:A famous quote (Score:5, Informative)

      by GigsVT (208848) on Friday December 01 2006, @02:26PM (#17070456)
      (Last Journal: Saturday June 30, @01:22AM)
      1/sqrt(x)
      [ Parent ]
    • Re:A famous quote by theelectron (Score:1) Friday December 01 2006, @02:28PM
    • Re:A famous quote by Anonymous Coward (Score:1) Friday December 01 2006, @02:29PM
      • Re:A famous quote (Score:5, Informative)

        by mike260 (224212) on Friday December 01 2006, @04:41PM (#17073056)
        Nope, the reason you want a fast 1/sqrt(x) in graphics is so you can normalise 3D vectors quickly (something that most 3D apps have to do a LOT).

        Slow:
            const float length = sqrt( v.x*v.x + v.y*v.y + v.z*v.z );
            v.x /= length;
            v.y /= length;
            v.z /= length;

        Fast:
            const float recip_length = InvSqrt( v.x*v.x + v.y*v.y + v.z*v.z );
            v.x *= recip_length;
            v.y *= recip_length;
            v.z *= recip_length;

        The 2nd version has no divides, and no call to sqrt, which makes it *loads* faster.
        [ Parent ]
    • Re:A famous quote (Score:4, Informative)

      by Raul654 (453029) on Friday December 01 2006, @02:34PM (#17070604)
      (http://en.wikipedia.org/wiki/User:Raul654)
      Given a number, that function calculates the inverse of the square root - which, according to TFA, is very common in graphics applications.
      [ Parent ]
    • Re:A famous quote by nicklott (Score:2) Friday December 01 2006, @02:48PM
    • Re: A better question: by Anonymous Coward (Score:2) Friday December 01 2006, @02:49PM
      • Re: A better question: (Score:5, Informative)

        by Jerry Coffin (824726) on Friday December 01 2006, @04:45PM (#17073126)
        How does this function compute 1/x^(1/2)?


        It starts by taking a guess at the right answer, and then improving the guess until it's accurate enough to use.

        The first step depends heavily on the fact that a floating point number on a computer is represented as a significand (aka mantissa) and an exponent (a power of two). For the moment, consider taking just the square root of X instead of its inverse. You could separate out the exponent part of the floating point number, divide it by two, and then put the result back together with the original significand, and have a reasonable starting point.

        From there, you could improve your guesses to get a better approximation. The simplest version of that would be like a high-low game -- you split the difference between the current guess and the previous guess, and then add or subtract that depending on whether your previous guess was high or low. Eventually, you'll get arbitrarily close to the correct answer.

        This can take quite a few iterations to get to the right answer though. To improve that, Newton-Raphson looks at the curve of the function you're working with, and projects a line tangent to the curve at the point of the current guess. Where that line crosses the origin gives you the next guess. That's probably a lot easier to understand from picture [sosmath.com].

        In this case, we're looking for the inverse square root, which changes the curve, but not the basic idea. As a general rule, the closer your first guess, the fewer iterations you need to get some particular level of accuracy. That's the point of the:

        i = 0x5f3759df - (i>>1);

        While the originator of this constant is unknown, and some of it is rather obscure, the basic idea of most of it is fairly simple: we start by shifting the original number right a bit. This divides both the mantissa and the exponent part by two, with the possibility that IF the exponent was odd, it shifts a bit from the exponent into the mantissa. The subtraction from the magic number then does a couple of things. For one thing, if a bit from the exponent was shifted into the mantissa, it removes it. The rest of the subtraction is trickier. If memory serves, it's based on the harmonic mean of the difference between sqrt(x) and (x/2) for every possible floating point number of the size you're using.

        This is where the fact that it's 1/sqrt(x) instead of sqrt(x) means a lot: 1/sqrt(x) is a curve, but it's a fairly flat curve -- much flatter than sqrt(x). The result is that we can approximate a point on the curve fairly accurately with a line. In this case, it's really two lines, which gets it a bit closer still.

        From there, the number has had a bit of extra tweaking done -- it doesn't actually give the most accurate first guess, but its errors are often enough in the opposite direction from those you get in the Newton-Raphson iteration steps that it gives slightly more accurate final results.
        [ Parent ]
    • Evil math--it approximates the inverse square root by Anonymous Coward (Score:2) Friday December 01 2006, @03:08PM
    • Re:A famous quote (Score:5, Informative)

      by 91degrees (207121) on Friday December 01 2006, @03:13PM (#17071420)
      (Last Journal: Friday June 11 2004, @11:15AM)
      Okay - At times you want a normalised vector. A lot of the time you will have vectors of arbitrary length For example, the light is at the origin, the point is at (12, 4, 3). So the vector from the point to the light is (-13, -4, -3). The length of this vector can be calculated easily using pythagoras's theorem (sqrt (12^2+4^2+3^2). It's 13 units in length. We want a unit vector (i.e. a vector 1 unit in length) So we divide by the length to get (12/13, 4/13, 3/13).

      This is great for a 3D rendering application, but in a game speed is critical. This pair of calculations involves a square root and a divide. Both of thse are at least an order of magnitude slower than multiplications and additions.

      So what this function does is provide a value you can multiply each component by to get a unit vector.

      Well, there's the what and why parts. As for the , I have no idea. I think it uses magic.
      [ Parent ]
    • Re:A famous quote by Lord Ender (Score:1) Friday December 01 2006, @03:29PM
    • Re:A famous quote by Gilmoure (Score:2) Friday December 01 2006, @04:37PM
    • Re:A famous quote (Score:4, Funny)

      by ebyrob (165903) on Friday December 01 2006, @06:15PM (#17074620)
      (http://slashdot.org/)
      If this doesn't make your head swim:

      int i = *(int*)&x;
      i = 0x5f3759df - (i >> 1);


      Then I'm afraid the whole article is going to be lost on you...

      We've got a floating point being operated on as an integer.
      We've got a mysterious constant.
      We've got a two's complement sign-flip combined with a bit-shift.

      The only thing missing from this party is hookers and beer.
      [ Parent ]
      • 1 reply beneath your current threshold.
    • Re:A famous quote by Codename46 (Score:1) Friday December 01 2006, @02:35PM
    • Re:A famous quote (Score:5, Informative)

      by Red Flayer (890720) on Friday December 01 2006, @02:35PM (#17070624)
      (Last Journal: Friday November 10 2006, @02:16PM)
      It's just John Carmack's way of telling everyone he knows mathematics and please worship the ground he walks on.
      RTFA.

      Carmack quite graciously denied the code was his and helped direct the author closer to the true source.
      [ Parent ]
    • nope by rhombic (Score:2) Friday December 01 2006, @02:36PM
    • Re:A famous quote by Broken scope (Score:2) Sunday December 03 2006, @08:34PM
    • 3 replies beneath your current threshold.
  • And so why do we care? (Score:2, Insightful)

    by Codename46 (889058) on Friday December 01 2006, @02:27PM (#17070484)
    From the words of John Carmack himself, if you discover and implement an algorithm by yourself, even though it may have already been discovered already, you deserve credit for finding it on your own.

    [Insert rant about software patents]
  • I have a truly marvelous proof of who wrote this code which this comment box is too narrow to contain.
  • by gukin (14148) on Friday December 01 2006, @02:33PM (#17070584)
    Anyone who has ever played a computer game should pay up.
  • The linked site seems to be down (gee, you think it might be slashdotted?), but this paper [purdue.edu] seems to be covering the same topic.
  • What's with use of Pointers? (Score:3, Interesting)

    by MankyD (567984) on Friday December 01 2006, @02:35PM (#17070630)
    (http://millionnumbers.com/)
    Why does the coder use
    int i = *(int*)
    instead of just
    int i = (int)x;


    Can someone enlighten me?
    • Re:What's with use of Pointers? by MankyD (Score:2) Friday December 01 2006, @02:38PM
      • Re:What's with use of Pointers? by chrisbtoo (Score:2) Friday December 01 2006, @02:46PM
      • Re:What's with use of Pointers? (Score:5, Informative)

        by eis271828 (842849) on Friday December 01 2006, @02:46PM (#17070866)
        (int) x would convert the floating point value to an integer (truncation, basically).
        *(int*) &x treats the bits as an integer, with no behind the scenes conversion to an actual int value.
        [ Parent ]
      • Re:What's with use of Pointers? (Score:5, Informative)

        by radarsat1 (786772) on Friday December 01 2006, @02:47PM (#17070878)
        (http://www.music.mcgill.ca/~sinclair)
        Because in C's own weird way, that's the only way of refering to a float as an int without changing the bits.

        If you do this:
        int i = (int)3.0f;

        You get i=3, like what you'd get from the floor() function.

        If you do this:
        float f = 3.0f;
        int i = *(int*)


        Then i contains a bit-for-bit copy of the IEEE floating-point representation of 3.0.

        It's because C knows how to cast a float to an int by applying the floor function. However, if you do it the second way, you aren't casting a float to an int, you are casting a pointer-to-float to a pointer-to-int and then dereferencing it.

        By the way, I just wanted to say... this is one of the most interesting things I've read on Slashdot in a while. Wow. That function is just amazing. I only wish I understood how it worked. I know nothing about what a "Newton-Raphson iteration" is.
        [ Parent ]
        • Re:What's with use of Pointers? (Score:5, Informative)

          by dsci (658278) on Friday December 01 2006, @03:12PM (#17071400)
          (http://www.dsbscience.com/)
          Newton-Raphson is a general algorithm for finding root of an equation f(x)=0.

          You start with some INITIAL GUESS (the real beauty of this algorithm) X(0), then apply:

          X(n+1) = X(n) - f(X(n)) / f'(X(n))

          where
          X(n+1) is the NEXT guess after the value you 'know',
          X(n) is that most recent value you know,
          f(X(n)) is the function evaluated at X(n) and
          f'(X(n)) is the first derivative of f(x) evaluated at X(n).

          It's not foolproof and a BOTH whether it converges at al AND how FAST it converges depends on the initial guess, X(0)

          The "Secant Method" is an improvement that makes it a little 'smarter,' at the expense of more computation (this is often a positive trade-off on numerical modeling codes, since the 'smarter' algorithm does tend to converge faster). There are other improvements as well, such as the Los Alamos Linear Feedback Solver (a slightly modified secant method that converges about 10-17% faster, at least for some types of problems) that I use in my own codes.

          Obligatory wikipediea followup: Newton's Method [wikipedia.org]
          [ Parent ]
        • Re:What's with use of Pointers? by The boojum (Score:3) Friday December 01 2006, @03:21PM
        • Re:What's with use of Pointers? by mrogers (Score:1) Friday December 01 2006, @03:30PM
        • Re:What's with use of Pointers? by fractalus (Score:2) Friday December 01 2006, @03:31PM
        • Re:What's with use of Pointers? by Abcd1234 (Score:2) Friday December 01 2006, @03:35PM
        • Re:What's with use of Pointers? (Score:5, Informative)

          by arniebuteft (1032530) <buteft&gmail,com> on Friday December 01 2006, @04:58PM (#17073326)
          The article at http://www.math.purdue.edu/~clomont/Math/Papers/20 03/InvSqrt.pdf [purdue.edu] really explains it better, but the point is to trick the code into performing integer operations on a float. You start the function with a float, which is 32 bits, arranged in a very specific sequence: first bit is sign (0 for positive, 1 for neg), next 8 bits are exponent (offset by 127 to allow positive and negative exponents), and the last 23 bits are the mantissa (normalized to assume the decimal point has a binary 1 to its left). So, the simple good old number 21 in float is the sequence 0 10000011 01010000000000000000000, for the sign, exponent, and mantissa (omit the spaces of course). That same number 21, stored as a 32-bit integer is 00000000 00000000 00000000 00010101 (again, omit the spaces).

          The trick of this function is to take the 32 bits of data that are really a float, but process it as if it's an integer. So you take that cumbersome number 21 as a float, then BAM! presto, turn it directly to an integer not through type conversion, but by simply treating those same 32 bits as if they were representing an integer all along.

          Let's use the number 21 as an example in the function call.

          The binary representation of 21 as a float is 01000001 10101000 00000000 00000000 (broken into 8-bit words for clarity). The function then goes to create a new integer i, whose value is also 01000001 10101000 00000000 00000000 (which happens to be 1101529088 in decimal). The magical line of the code, i = 0x5f3759df - (i>>1), takes that integer i, shifts its bits one to the right (turning our 01000001 10101000 00000000 00000000 into 00100000 11010100 00000000 00000000, or 550764544 in decimal), then subtracts it (still doing integer math here) from 0x5f3759df (which is 01011111 00110111 01011001 11011111 or 1597463007 in decimal), and winds up with 00111110 01100011 01011001 11011111 (or 1046698463 in decimal).

          Now, for its next trick, it takes that wonky integer 1046698463, and turns it back into a floating point number, by the same trick used above, i.e. simply by looking at those same 32 bits, and pretending they're a float, not an int. The binary representation of 1046698463, 00111110 01100011 01011001 11011111, is the same as 0.22202251851558685 in float.

          From here on out, it's all floating math. Apply the Newton-Rhapson method (thats the next line), we get x = 0.22202251851558685 * (1.5 - ( (21*.5) * 0.22202251851558685^2 )) = 0.218117811. We return this value at the closing of the function. As it turns out, the inverse square root of 21 is 0.21821789... (thanks Google calc). So, I have no idea WHY the Float to Int to Float trick works, but it works very well.

          Whew!

          [ Parent ]
        • Re:What's with use of Pointers? by fireboy1919 (Score:2) Friday December 01 2006, @05:05PM
        • Re:What's with use of Pointers? by HTH NE1 (Score:3) Friday December 01 2006, @05:12PM
          • 1 reply beneath your current threshold.
      • Re:What's with use of Pointers? by The boojum (Score:2) Friday December 01 2006, @02:48PM
      • Re:What's with use of Pointers? (Score:5, Informative)

        by ToxikFetus (925966) on Friday December 01 2006, @02:56PM (#17071070)
        Why does the coder use

        (1) int i = *(int*)

        instead of just

        (2) int i = (int)x;
        (some of my points added for emphasis)

        I'll take a swing at this one. It's because the author doesn't want the value of x, but the integer representation of the value at x's memory address.

        If x is 3.14159, (2) will result in i==3, whereas (1) will result in whatever the 4-byte IEEE-754 representation of 3.14159 is (0x40490FD0, if Google is correct). By using (1), the author is able to use integer bitwise opeartions (>>) to perform "free" floating point operations. When i is sent back into floating point form via:

        x = *(float*)

        x now contains the value of the integer operation:

        i = 0x5f3759df - (i >> 1);

        which was presumably faster than an identical floating point operation. It's a nifty little solution, especially with regard to the selection of the magic number.
        [ Parent ]
      • Re:What's with use of Pointers? by mooingyak (Score:2) Friday December 01 2006, @03:00PM
      • Re:What's with use of Pointers? by eneville (Score:1) Friday December 01 2006, @02:51PM
      • Re:What's with use of Pointers? by Cyberax (Score:2) Friday December 01 2006, @03:37PM
      • 1 reply beneath your current threshold.
    • Re:What's with use of Pointers? by Anonymous Coward (Score:1) Friday December 01 2006, @02:42PM
    • Re:What's with use of Pointers? by MeanMF (Score:3) Friday December 01 2006, @02:43PM
    • Re:What's with use of Pointers? by Second_Derivative (Score:1) Friday December 01 2006, @02:43PM
    • Re:What's with use of Pointers? (Score:5, Informative)

      by ewhac (5844) on Friday December 01 2006, @02:44PM (#17070820)
      (http://ewhac.best.vwh.net/ | Last Journal: Saturday August 18 2001, @10:28PM)
      C "understands" ints and floats. If you do the simple cast:

      int i = (int)x;

      Then C will simply convert the float value into an integer value (throwing away fractional part). But this isn't what we want. We want to operate on the bits of an IEEE floating point value directly, and integers are the best way to do that.

      So first, we lie to the compiler by telling it we have a pointer to an int:

      (int *) &f

      And then we deference the pointer to get it into an operable int:

      i = *(int *) &f

      Note what's important here is to keep the compiler from modifying any part of the original 32-bit value.

      Schwab

      [ Parent ]
    • Re:What's with use of Pointers? by uhoreg (Score:2) Friday December 01 2006, @02:47PM
    • Re:What's with use of Pointers? by ars (Score:2) Friday December 01 2006, @02:48PM
    • Because that's what he wanted to do. by mcg1969 (Score:2) Friday December 01 2006, @02:48PM
    • Re:What's with use of Pointers? by doshell (Score:2) Friday December 01 2006, @02:49PM
    • Re:What's with use of Pointers? by systemeng (Score:1) Friday December 01 2006, @02:54PM
    • Re:What's with use of Pointers? by 91degrees (Score:1) Friday December 01 2006, @03:19PM
    • Re:What's with use of Pointers? by gnasher719 (Score:2) Friday December 01 2006, @04:30PM
    • Re:What's with use of Pointers? by prockcore (Score:2) Friday December 01 2006, @06:52PM
    • 2 replies beneath your current threshold.
  • by Stavr0 (35032) on Friday December 01 2006, @02:42PM (#17070778)
    (http://slashdot.org/~Stavr0/journal/ | Last Journal: Thursday January 19 2006, @01:18PM)
    New hotness: Fast InvSqrt()

    Naah, just kidding. They both deserve a spot in the Clever Hacks Hall of Fame

  • Hmm... (Score:2, Funny)

    by porkmusket (954006) on Friday December 01 2006, @02:43PM (#17070798)
    (http://spiderweblabs.com/)
    Suddenly I'm very scared that Ballmer will want to InvSqrt me some pictures of his kids.
    • Re:Hmm... by gstoddart (Score:3) Friday December 01 2006, @02:54PM
  • Poor function name (Score:4, Insightful)

    by bloobloo (957543) on Friday December 01 2006, @02:43PM (#17070800)
    (http://paul-mclaughlin.com/)
    The inverse of the square root is the square. The reciprocal of the square root is what is being calculated. In the case of multiplication I guess it's a reasonable name but it's pretty poor form in my view.
  • It was fast (Score:4, Informative)

    by KalvinB (205500) on Friday December 01 2006, @02:43PM (#17070808)
    (http://www.icarusindie.com/)
    http://www.icarusindie.com/DoItYourSelf/rtsr/index .php?section=ffi&page=sqrt [icarusindie.com]

    That page compares the time it takes to calculate the sqrt various ways including Carmacks. Short version is that modern processors are significantly faster since it can be done in hardware. It may still be useful in cases where the processor doesn't have the sqrt function available.

    His version took 428 cycles compared to 107 cycles doing it in hardware on the same system.
    • Re:It was fast (Score:5, Insightful)

      by systemeng (998953) on Friday December 01 2006, @03:19PM (#17071500)
      First off, this function calculates 1.0/sqrt(x), not sqrt(x). InvSqrt is a particularily nasty function because both the divide and the square root stall the floating point pipeline on IA32 processors. As a result, instead of shooting out one result per cycle that the pipelining normally allows, the processor will stall for 32 cycles for the divide after it has stalled for the 43 cycles for the square root(P4). This is a big hit to realtime performance and it also prevents 76 multiplies from getting done while the pipeline is stalled. Secondly, IA32 processors are super scalar and have multiple integer units which can do portions of this calculation in parallel. This algorithm is brilliant because it uses the integer units for a portion of the most difficult part of the calculation and the remaining floating point multiplies only take about 6 clock cycles on the FPU. The difference in clock cycles you are counting is likely because the routine as written will be implemented as a function call and the stack push overhead will eat you alive. If this is implemented inline, it's about 6 times as good as simply calling the processor's assembly instructions for root and divide in sequence with the penalty that it isn't as accurate. It is virtually impossible to beat sqrt on IA-32 but 1.0/sqrt can be computed faster with newton raphson iteration in one fell swoop than by coposition of the operations. I've worked several years implementing similar optimizations in the reference implementation of ISO/IEC 18026, a standard for digital map conversion. Most of the routines that had optimizations like this added to them saw at least 30% speed improvements. This is a bit of a soft number because many things were reordered to make the pipeline fill better but in general, a complicated function especially of trig fucntions that can be computed in one iteration of well designed newton-raphson will be much faster than the coposition of the CPU's implementation of the component functions. In short, don't write off careful numerics they can provide great sped improvements, just don't use them in code that people will want to understand later if you don't document exactly what you did and why.
      [ Parent ]
      • Re:It was fast by imsabbel (Score:2) Friday December 01 2006, @03:41PM
        • Re:It was fast by systemeng (Score:1) Friday December 01 2006, @04:40PM
      • Re:It was fast (Score:5, Informative)

        by adam31 (817930) <[moc.liamg] [ta] [13mada]> on Friday December 01 2006, @04:07PM (#17072390)
        Okay, let's try x86...

        rsqrtss xmm1, xmm0
        about 5 cycles. And it can pipeline.

        Not a fan of x86? Maybe altivec...
        vrsqrtefp V2, V1
        depends, but 12 cycles probably and pipelined.

        On PS3's SPU it's rsqrte (6 cycles), on 3dNow it's pfrsqrt (8 cycles) both pipelined. Even PS2 had rsqrt (13 cycles). There's just no reason for software reciprocal square root. It's a cool trick, but it's not even useful anymore.

        [ Parent ]