Lazy web sites run faster

Xichekolas · on April 7, 2008

Not to nitpick (because I found the rest of the article fairly interesting), but it seems like he has a poorly chosen metaphor right off the bat.

"The key to solve this problem lies in the classical definition of speed. In Physics, speed is defined as the distance divided by time required to cross that distance. So we can make the requests run faster either by decreasing the time, but we can also do it by shortening the distance."

If S = D/T, then to increase speed while holding time constant, you'd need to increase distance, not decrease it. I think it would be better if we thought about the performance he is talking about as what it really is... throughput.

If you have a big pipe spewing water, you can increase the throughput by either increasing the pressure of the water (optimized code) or by removing the rocks in the pipe (long-running/synchronous I/O threads). (Removing said rocks increases cross-sectional area.)

abstractbill · on April 7, 2008

I was a little surprised to get to the end of this and see no discussion of actual asynchronous web servers, using non-blocking i/o rather than threads. I write such things often, using twisted. It's easy to handle a few thousand concurrent requests that way, but it doesn't seem to be an approach that's widely used for whatever reason.

jmtulloss · on April 7, 2008

I was also surprised he doesn't mention this.

One thing that holds true, however, is that a browser won't make another request to a single server if two requests are still outstanding. So if, say, you have a COMET connection open and a long running server side AJAX request, none of your other AJAX requests will run. This might make the interface appear to hang, regardless of how many connections your web server can handle. I would say the general idea of returning quickly is probably a good one.

I'm sure a lot of you have seen this already, but I love it: http://www.sics.se/~joe/apachevsyaws.html

axod · on April 7, 2008

Agreed. Even Apache supports a variety of other models with its mpm modules. You don't have to stick to the defaults even if you do use Apache.

wmf · on April 7, 2008

Very few people write Web servers or other networking code; they just write code that plugs into some existing Web server (usually Apache) or framework (e.g. Rails).

mrtron · on April 7, 2008

I move everything that requires doing more work than just processing and displaying data to a simple little batch server. So, for example if you submit a contact us email, it doesn't email immediately. It just stores the info from the form and sends a signal to the batch server.

It seems trivial, but it is quite important.

lux · on April 7, 2008

The other great thing about this is that your batch script can essentially keep plugging away behind-the-scenes when the data set to process is larger (e.g. a newsletter list to send). So even if something goes wrong and the batch script fails, by maintaining a queue and monitoring the script via cron, now you've got a level of simple fault-tolerance built in.

mrtron · on April 7, 2008

Correct. Those sort of cases were the main motivation for building the batch server.

In regard to the other point: the batch server should never fail since tasks are spawned off as another thread, and even if it does, the tasks are in fact stored in a database.

Fun times!

jmtulloss · on April 7, 2008

Assuming your queue doesn't die with your script :)

lux · on April 7, 2008

I believe that's what DBs and log files are for ;)

jmtulloss · on April 7, 2008

Ah man, that sounds like it takes forethought and effort. Just stick it in RAM and pray ;).