Understanding Quake's Fast Inverse Square Root

russell · on Jan 4, 2009

IIRC (it's been a long time) the Cray 1 could do inverse, multiply, and square root in single cycle instructions. Isqrt would take two cycles as wood divide. (Back to the rocking chair.)

marcus · on Jan 5, 2009

Actually it didn't http://klausler.com/cray2.txt it had a command for a square root approximation and a command to iteratively get closer to the real square root.

But even if a certain computer did compute a square root (or even an approximation) in a single cycle, that would mean CPU speed was reduced by slowing down the clock to enable a more complex action to complete within the timeframe of a single cycle, and other simpler actions like adding/subtracting/moving memory we're sacrificed in its favor.

wwalker3 · on Jan 5, 2009

Not necessarily. The square root hardware could be heavily pipelined to get each stage short enough for one cycle. This increases latency, but lets you keep a one-per-cycle throughput without slowing down the clock.

Jebdm · on Jan 5, 2009

I didn't know that Crays could split timber!

nx · on Jan 4, 2009

It's really easy to understand if you know what the Newton Method for computing roots of a polynomial is.

statictype · on Jan 5, 2009

Its easy to understand the algorithm. But that initial guess there is where the magic is. Thats what makes the algorithm work so efficiently.

jballanc · on Jan 4, 2009

Cool, but if you really want to blow your mind, look at some of the tricks for diagonallizing Hessians!

ComputerGuru · on Jan 4, 2009

John Carmack FTW.

snprbob86 · on Jan 4, 2009

Although Carmack developed some incredible tech in the past and has a successful (I think?) rocket building team now, he seems to have missed the mark the last few times in graphics. He also deserves some praise for the open sourcing of his old engines, but there are definitely people who have made greater contributions to graphics in recent times.

The Quake 3 engine added procedural curved surfaces which were computed on the CPU. Meanwhile, the Unreal engine simply optimized the rendering of arbitrary meshes which allowed artists to build any shaped surface they want and then let the GPU brute force it.

The Doom 3 engine added a unified lighting model based on stencil shadows. Again, this was a heavily CPU based technique while the industry was trending towards being CPU-bound. Additionally, the chosen lighting equations created a very plastic-like look. Competitors used shadow maps which are generated by the GPU and these are still the standard in speed and quality for most games today.

The latest engine he is working on is pushing "mega textures" which effectively emulate a hardware paging technique in software. "Mega textures" is the next logical step in mip-mapping, but mip-mapping has been performed by the graphics card for over a decade now. The SeaDragon team over at Microsoft Research and Live Labs built a superior version of this idea before Carmack started on this and are working actively with hardware vendors and the Direct3D team to speed it up for use in games.

Once again, Carmack has opted to trade precious CPU cycles -- better budgeted for game logic, physics, and AI -- for generality. He seems to be fighting a crusade against special cases in his engines at the expense of the games made with that engine.

There is a great reason why the Unreal Engine has been the most popular middleware platform since Quake 3...

whatusername · on Jan 4, 2009

I'm talking without much experience here

"Again, this was a heavily CPU based technique while the industry was trending towards being CPU-bound." -- isn't the industry now trending in the opposite way? ie - everything is becoming a CPU (and it has more and more cores)

snprbob86 · on Jan 4, 2009

Yes and no.

GPUs are becoming more CPU-like. They are gaining more general-purpose instructions and supporting additional complexity in computations.

CPUs are becoming more GPU-like. They are increasingly pipelined and parallelized.

However, there are still fundamental differences between CPUs and GPUs that make them suited for varied tasks. CPUs tend to support branch prediction (although not always, see game consoles) and various volatile threading operations. GPUs gain a lot of their speed by making assumptions about side effects and causality in their calculations. Branching is rarely, if ever, needed at runtime and almost all of the operations performed by most applications can be encoded as vector operations.

The future of powerful computers is probably a hybrid of various processing units, memory architectures, and special purpose hardware. We're going to need software abstractions to deal with this complexity.

timnic · on Jan 5, 2009

Concerning software abstraction layers: What do you think of OpenCL? It's advertised as a way to utilise the combined computing power of multi-cores and GPUs. I am not at all an expert on this but it seems just too good to be true.

snprbob86 · on Jan 5, 2009

I think all of the C-style languages are doomed in the increasingly parallelized world. I don't know much of OpenCL specifically, but all of the HLSL/Cg/GLSL derivatives seem woefully unprepared for tasks which are not "embarrassingly parallel". Some of the newer derivatives might be capable of solving tougher problems, but that doesn't mean they are good at it.

nihilocrat · on Jan 5, 2009

For a guy that got his name by cutting corners in clever ways (2.5d engines we see in Wolfenstein and DooM) I find it really odd that he's sort of going the other way.

FlorinAndrei · on Jan 4, 2009

Maybe his bet is that CPUs will become massively multicore "soon enough".

snprbob86 · on Jan 4, 2009

Quake 3 Arena was released in 1999.

When designing rendering engines, the general goal is to design for 80% to 90% of the market of the market conditions for when your game releases. Considering most games have a 1 to 3 year development cycle, this seems like a weak bet.

jmagar · on Jan 4, 2009

Carmack or Michael Abrash? Read Graphics Programming Black Book ( http://www.amazon.com/Michael-Abrashs-Graphics-Programming-S... )

Abrash did amazing things with assembly to get Quake to run on the 486.

I don't know if Carmack, Abrash, or someone at SGI (it's in OpenGL source code too) came up with it; but it seems like witch craft to me.

thisisnotmyname · on Jan 4, 2009

Or none of the above? http://www.beyond3d.com/content/articles/8/

wanderingmarker · on Jan 4, 2009

According to this article, neither of them: http://www.beyond3d.com/content/articles/8/

That article also links to a paper (http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf) analyzing the initial approximation constant.

andrewf · on Jan 4, 2009

The Black Book is available as PDFs from http://www.gamedev.net/reference/articles/article1698.asp

Byte.com hosted a copy for a long time, but that seems to have issues atm.

weavejester · on Jan 5, 2009

I had a 486 when Quake came out. The framerate was far too low to make it playable. I remember trying to play the demo, and I had to cut the screen down to the size of a postcard (something like 160x100) to get a framerate of 25fps.

The first Pentiums, on the other hand, were more than up to the job.

dangrover · on Jan 4, 2009

The 486!? Wow

MikeCapone · on Jan 5, 2009

Yeah, I actually played it (choppily) for a while on my 486.