How to Survive a Slashdotting on a Small Apache Server

…so your plain ordinary webserver just got listed on a high-traffic news site.  Slashdot?  Reddit?  Hacker News?  Well done, turns out you’re hosting something thousands of people want to read.  Now thousands of people want to come to your webserver at once.

But you’ve got a problem.  Your server’s hosed and none of those potential readers can see it.  Worse, your story will only be top of the Slashdot/Reddit/whatever for a few hours and this means you’ve got to fix the problem right now before the world loses interest.  You haven’t time to buy new hardware, setup a complex load balancing solution or rewrite half your app’s codebase.  What are you going to do?

Follow uncle Alex’s checklist and we’ll soon have you back on the air.

This is written on the assumption that you’re running a standard, out-of-the-box setup on RedHat, Ubuntu, CentOS or some other common Linux distro with SYSV init scripts.  It also presumes you have Prefork MPM (the default), MySQL as your database and no other major apps running on there.  If you’re already running some funky config then this tutorial isn’t for you.

NB: Uncle Alex makes no warranties about his checklist and accepts no liability if destroys your server, fails to solve the problem, runs over your wife or sleeps with your dog.  Follow these instructions at your own risk.  But you’re desperate, right?

1) Gain Access

If your server is really hosed you will find it hard to SSH in.  If you have console access (e.g. on a virtual hosting package, HP’s excellent iLO or you’re stood in front of the machine) login through that – you’ll save a minute waiting to negotiate a SSH session into your overloaded machine.

Try to run ssh inside a terminal with a scrollback buffer (putty, mac console, gterminal, whatever) because your server may be so laggy it’s quicker to hunt for the output of previous commands than running them again.  If you’re running SSH from the command line use the ‘-v’ flag so you can see how the connection progresses.

It might take a while…

mock@hostname:~$ ssh -v mock@my.slashdotted.server
OpenSSH_5.2p1, OpenSSL 0.9.8l 5 Nov 2009
debug1: Reading configuration data /etc/ssh_config
debug1: Connecting to my.slashdotted.server [x.x.x.x] port 22.
debug1: Connection established.
....
....
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
....
....
You have mail.
Last login: Sat Nov  6 02:13:31 2010 from your.desktop.computer

mock@my.slashdotted.server:~$

Be patient.  Resist the temptation to open multiple sessions when the first doesn’t connect immediately: you’ll only load the machine even more.  For each connection the server needs to negotiate a secure pipeline, do authentication for your user and start a process for your shell.  On a healthy Linux box this is usually instantaneous – but right now yours ain’t healthy.

The waiting is the hardest part.

…………………………………………………… connected?

Good.

Best practice be damned – become root straight away.  Yeeee-haw cowboy.

mock@my.slashdotted.server:~$ sudo -s    # or 'su -' if you're in 1990

2) Now to Work.  Diagnosis…

You’re on a heavily loaded machine.  Commands will take forever to complete so you’ve got to make each one count.

First we want to diagnose the problem.  Since interaction with a hosed server is painfully slow we need to do this in as few steps as possible.  But we don’t want to kill Apache and free up resources before we’ve examined the system’s state to figure out what’s wrong…

Bandwidth or server load?

A slashdotting can kill your server in one of two ways: saturating its internet connection with the data you’re serving (rare these days, ISP’s have a lot of bandwidth) or inundating your host with many more connections than it can handle.  We want to figure out which as quickly as possible.

root@my.slashdotted.server:~$ time cat /proc/loadavg
45.25 31.26 22.73 1/135 27844

real    0m3.002s
user    0m2.002s
sys     0m1.000s

Count in your head how long this really takes to return output.  One elephant, two elephant, three elephant…

The ‘cat /proc/loadavg` part is obvious, you want an idea of the system’s load.  But why `time`?  Because it tells us how long it took your host to load the `cat` program into memory, run it and print out the contents of /proc/loadavg.  Perhaps it took a long time because the system’s loaded or perhaps it ran quickly and the result took forever to reach you through the congested network.  And the best part of it is if your shell’s bash `time` is a builtin – so we can know it imposed almost no overhead.

So does the machine’s timing for ‘real’ match what you counted in your head?

  1. If `cat /proc/loadavg` ran quickly but the load is high your network connection is likely okay.  But the load is huge – what do you do?  Read on…
  2. If the load is low (say <10) but it took forever for that output to reach you probably your server’s network link is saturated.  Your server is healthy, it just has a bandwidth issue.  Skip the next section.

3) Getting the Load Down (or “It’s almost always the RAM”)

Look at the first three figures in /proc/loadavg.  On a healthy Unix server the load averages ought to be no higher than your number of CPU’s.  A little higher (maybe 2x) is bearable but really high numbers mean you’ve got a system load problem.   The extreme figures in the example – “85.25 73.26 41.73” in our example would triangulate your position as “up shit creek”.

In case you you were thinking “hey, how many CPU’s have I got?” here’s a simple way to find out:

root@my.slashdotted.server:~$ grep ^processor /proc/cpuinfo
processor       : 0
processor       : 1

You’ll see a ‘processor:’ line for each CPU.  The server in this example has two cores.

For a better understanding of load averages read the excellent page at LinuxInsight.  In short – if the first number is higher than the last your problem is getting worse.

How many apache processes?

One dead giveaway for system load issues is the number of Apache processes.  That’s not to say Apache itself is the problem – but if you have hundreds of them backing up on a small server it means something’s wrong…

# on debian/ubuntu
root@my.slashdotted.server:~$ ps ax | grep -c /usr/sbin/apache2
151

# on redhat/centos/suse etc.
root@my.slashdotted.server:~$ ps ax | grep -c /usr/sbin/httpd
151

151?  That’s a lot of processes.  Why, if you’d maxed out Apache’s default ‘MaxClients’ setting then added one more for the `grep` it’d be 151.  I wonder if that’s a coincidence…

If you’re running a modern webserver – one where Apache has modules enabled for PHP, Perl, Python or some other language interpreter – each of those processes is going to be big.  So big that together they eat way more RAM than your computer has.   For example on my little 32-bit Ubuntu 10.04 VM each Apache process eats 27Mb of physical RAM.  And that’s with mod_php alone; add Python or Perl as well and it gets bigger.  If you live in 64-bit land even more so.

On a machine with limited memory a 32-bit OS is almost always better.

Do you have enough RAM to run 150 processes who each need 27Mb of RAM?  Apache will eat 4Gb before you even begin to think about OS services or MySQL.  Unless you have tons of RAM you can’t afford that many.

[No, I don’t know why the distros all ship with a default so high.  Naughty Apache, why can’t you autodetect it and set a sensible value at start time?]

So let’s figure out how much memory each of your Apache processes is using:

root@my.slashdotted.machine:~$ top -u www-data -b -n 1
top - 21:21:32 up 3 days, 26 min,  2 users,  load average: 66.6 66.6 66.6
Tasks: 301 total,   1 running, 300 sleeping,   0 stopped,   0 zombie
Cpu(s):  ...
Mem:    408944k total,   310848k used,    98096k free,    12172k buffers
Swap:  1048568k total,   574372k used,   474196k free,    93884k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16228 www-data  20   0 64072  26m 4996 S  0.0  6.7   0:02.49 apache2
16275 www-data  20   0 62564  25m 3672 S  0.0  6.4   0:02.66 apache2
...
...

On a RedHat-derived machine you’ll want to use “-u apache” – they do things differently over there.

Image from Wikimedia Commons
Right now half your RAM is on the disk and only this wee critter can get to it. He ain't in a hurry...

Woah buddy, look at that “Swap:  1048568k total,   574372k used” line.  That means half your memory has been swapped out to disk.  That ain’t good; your computer’s been reduced to the speed of that little mechanical head skittering around inside your hard drive.  The Linux kernel does its best to swap out less-used pages of RAM first but with the kind of overload you’re seeing half your code is executing off of here.  Mr Babbage is on the phone and he wants his mechanical computer back…

Now you know enough about the problem to stop Apache.  Kill it dead with:

# in RedHat/Suse/Centos:
root@my.slashdotted.machine:~$ /etc/init.d/httpd stop

# in Debian/Ubuntu
root@my.slashdotted.machine:~$ /etc/init.d/apache2 stop

It might take a while for all those processes to terminate.  Apache will try to finish any requests it was servicing but if you have 150+ processes all fighting for memory and CPU they’ll take a minute or two to come back from swap and terminate.

All stopped? Good. Already the load is starting to fall and your machine is much more responsive to terminal commands.  You’re off the air so let’s keep it short and sweet….

Open up Apache’s main config file in your favourite editor. In RedHat land it’s /etc/httpd/httpd.conf; in Debian/Ubuntu you’re looking for /etc/apache2/apache2.conf.

If you’re running with the standard config that came with your machine Apache is likely using the Prefork MPM. This means it runs a bunch of processes waiting for incoming connections and will always try to keep a couple of spare for the next client. If you have enough memory that’s great; otherwise it results in what’s called “swapping yourself to death”.

Look for the MPM settings. It’ll be a section of config something like:

<IfModule mpm_prefork_module>
    StartServers          5
    MinSpareServers       5
    MaxSpareServers       10
    MaxClients            150
    MaxRequestsPerChild   0
</IfModule>

Most of these settings are sane.  But look at MaxClients, it’s the upper limit on the number of processes Apache will run when the whole world is trying to connect.  Reduce it to a level commensurate with your machine’s amount of RAM.

As a rule of thumb:

  • For 512Mb of RAM MaxClients should be no higher than 20
  • For 1Gb, 20
  • For 4Gb, 75

That default of 150 is way too generous for a small server.  It comes from a more innocent time before everyone had mod_php (or python or perl or whatever) loaded into Apache and processes were a lot smaller.

KeepAlives considered harmful

So now you have Apache configured to run a number of processes roughly in line with what your hardware supports.  While you’re tinkering with it there are a couple more useful changes you can make:

  1. Search for the “Timeout” setting.  Usually it’s way too high – something like 300 seconds.  This means that if a client browser’s connection to you hangs (like if their DSL cut out or their PC crashed) the Apache process you had serving them will hang around for 300 seconds in case they come back again.  Screw ’em, drop that value right down.  15 seconds is enough.
  2. Search for the “KeepAliveTimeout” setting.  I could write a whole essay on the way modern browsers are greedy with HTTP KeepAlives.  If you can only afford a handful of Apache processes do you really want each tied up by an idle client waiting for their user to click the next link?  Until your load spike subsides set it to something like how long it takes to deliver your pages to the average user.  3 seconds, not 30.  HTTP KeepAlives deserve an essay all of their own – maybe one day I’ll sit down and write it.  [I did]
  3. Depending on the kind of content you’re serving you might even switch KeepAlive to “Off”.  If you’re only serving a one or two objects in the average request (i.e. the page getting hammered does not contain much CSS or images) KeepAlive costs you more under high load.  Think of this again if you shift content off to a CDN.

For the time being resist the temptation to mess with your php.ini or any other interpreter-level settings.  They are complex and quick to anger.  You want the low-hanging fruit.

By now you should have a sane Apache config, one at least vaguely suited to your hardware.  Start the service up again…

# Ubuntu/Debian
root@my.slashdotted.machine:~$ /etc/init.d/apache2 start

# RedHat/CentOS/Suse
root@my.slashdotted.machine:~$ /etc/init.d/httpd start

If you made a mistake in the config Apache should tell you here. But probably you’re fine and Apache will go on its merry way.

So now you have a webserver where:

  • Fewer clients can be connected at once, which means fewer of those heavyweight Apache processes will be competing for memory at any one time.  Those processes which do exist now run a lot faster.
  • Clients are not permitted to stay on the line for ages after you’ve delivered their web page.  If KeepAlive is still on they get cut off after KeepAliveTimeout seconds; if you disabled it entirely they get one object per connection and the process is immediately free to serve another user.

Run `top` and keep an eye on those load figures.  Are they going up or down?  Is your website responsive again?

What about the database?

Of course, there might be some underlying reason why your Apache processes were backing up like the holding pattern above Heathrow.   Maybe it was browsers holding open persistent connections forever (which you just stopped) or maybe it’s something else.  Maybe your code running inside all those Apache processes is waiting to use some shared resource on the system, something that’s dog-slow and causes each process to pause until it’s available.

People have described optimising MySQL far better than I can in this article but here’s a few tips:

  • Allocate an amount of RAM (mainly key_buffer_size and innodb_buffer_pool_size in my.cnf) appropriate for your server.  The defaults are measly; allowing MySQL to hold more data in memory can be a big win.
  • Workloads vary so there are no hard-and-fast rules
  • Use `show processlist` to look for running queries
  • The query cache rarely helps – writes to a table invalidate it
  • Use the slow query log to identify queries taking a long time to run
  • Slow queries can often be sped up by adding an index to the appropriate columns
  • Use the smallest data type available for each column.  Unsigned integers where appropriate.  The smaller you make your table the more of it will fit in RAM.

I could go on.

The single biggest thing you can do to solve database issues is hit the database less.  See section 5 for suggestions on caching with common web applications.

4) Reducing your Bandwidth

So it turns our your server is connected to the Internet via a piece of wet string and you’re trying to push way more data than it can handle.  Did your ISP tell you it was on an “unlimited” 100Mbit connection?  Did they tell you how many users that 100Mbit is shared between?  Oops.

Or maybe you’re just terrified of the bandwidth bill.  Either way, while enduring a traffic spike it’s a good idea to reduce your network traffic as far as possible.  The benefits are not solely financial:

  • Delivering less data to your client means they need be connected to your host for a shorter time.  Think how long it takes to deliver a 300k jpeg to the average DSL user – that’s going to take at least a couple of seconds, time your valuable Apache process could be using to run the code which generates your dynamic content – the only bit that really changes from user to user.
  • You want to deliver as few objects as possible per page impression.  The more you can shift to someone else’s server (or even eliminate entirely!) the better.  Get it down to the magic number of 1 and you can turn off KeepAlives and free up a lot of Apache processes.

How to do this?  Simple answer: move static assets out to a CDN.

“But that’ll take ages!” you wail.  “I haven’t time to do a deal with Akamai or open an Amazon CloudFront account!”

Don’t worry, there are some very simple CDN’s out there.  Coral Cache is here to save your day.  For free these guys will cache static objects from your server and relay them to your visitors.  At times of low load the Coral Cache slows down your site (objects fall out of the cache and Coral must re-retrieve them for every hit) but when you’re sustaining a dozen hits per second that isn’t a problem.  Just remember to switch if off (or get an account with a proper CDN) once the panic is over.

Big stuff first

You don’t have hours to spend out figuring out what to move.  Small objects (little user icons, spacer gifs, menu options) aren’t your problem – look at the pages seeing the highest traffic and figure out what the largest static objects are.  Probably they will be article images (e.g. that 150k jpeg of your mom) and maybe, if there’s a lot of it, your CSS and Javascript includes.  In almost all cases the body text of the page is the least of your worries.

Firefox will help  you figure out what the largest objects within a page are…

  1. Navigate to your page
  2. Right-click for the context menu
  3. Click “view page info”
  4. Move to the “Media” tab.  You’ll see a table of objects loaded as parts of the page.
  5. At the top-right hand corner you’ll see a button for selecting what columns the table should show.  Ensure “size” is checked.
  6. Look at the sizes of objects included in your page.  Starting with the largest items ask yourself “does this really need to be generated by my server for each hit?”

Of course, if your server’s really hosed loading the page will be hard.  You’ll just have to guess what the largest assets were.  Unless you’ve some big embedded media (videos, audio) it’s almost certainly the images.

firefox-page-media-info.gif

Hope the nerds don’t mind me using their page about steam engine design as an example.  Maybe steam engine design will come back into the vogue and they’ll get slashdotted.

Immediately we can see the best candidates for moving off the server: those jpeg images add 131k per page view. If you have fifty hits a second that’s going to chew 6.4 megabytes of bandwidth every second.  Yowch.  I don’t wanna be the guy who pays your hosting bills!

Related: this is why big sites (eBay, Facebook etc) use a separate webserver for big static assets like images.  Delivering them takes much longer than the page’s dynamic content and can be done much more efficiently using simpler types of webserver.

5) Advanced: Caching

Probably the page getting hammered is being served using some third-party application you installed from the Internet.  Examples here are WordPress, Gallery, Drupal and Joomla.

Caching at the app level

Bad news: in their default states these applications are rarely efficient.

Good news: for most of the common web applications someone else has already noticed the problem and written a plugin to fix it.  See if you can find a caching plugin that’ll store generated content and regurgitate it for future hits.  I did this for my WordPress installation and won at least 10x extra performance for a few seconds work.

I won’t attempt to list the caching plugins for every big web app but here are a few popular ones:

Caching at the PHP level

Is the app you’re serving written in PHP?  There’s an easy win to be had by installing one of the code caches.  I favour xcache but it’s a matter of personal choice.

# Ubuntu/Debian
root@my.slashdotted.machine:~$ apt-get install php5-xcache
root@my.slashdotted.machine:~$ service apache2 restart

# RedHat/CentOS/Suse
root@my.slashdotted.machine:~$ yum install php-xcache
root@my.slashdotted.machine:~$ /etc/init.d/httpd restart

Tweak the settings if you like but in both cases the cache modules should give a big improvement straight out of the box.

What does it do?  Two things:

  1. By default for every hit to a dynamic page PHP will be loading the code from disk, interpreting it and then executing it.  A code cache skips the first two steps – once the code has been loaded and interpreted it’ll be stored in memory for later use.  Unless you’ve changed your code (by default the module calls stat() on the file to check) subsequent hits should mean a lot less work for PHP.
  2. Provides a special area of memory shared between all processes serving your data.  If written to do so (look for a plugin) this can be used to cache generated page content and other commonly loaded data.

6) Conclusions

If you’re running a stock install of any OS on minimal hardware as your webserver it’s common to run into problems under load.  There are plenty of simple, tried-and-tested techniques for reducing it – with a few minutes and a little knowledge of your server’s internals you can squeeze an awful lot more performance out of a simple machine before needing to buy more hardware.

41 thoughts on “How to Survive a Slashdotting on a Small Apache Server

  1. Nice summary, Alex, thanks. If you can take stopping the webserver at all, then stop it straight away, whether you’re suffering CPU-bound load, IO-bound load (as you say, the most common) or, network-load – these will all be stopped by the webserver. Since on Linux, the sampling rate for the loadavg is only every 5s, then you’ll actually lose very little useful information of what’s going on by doing this.

    In respect of using MySQL for the database, apart from the standard advice of “don’t”, it’s important to understand what your load generators are. I recently wrote some stuff in my work wiki space (unfortunately not accessible to non-work people), about IO-load and databases. If you’re dealing with high numbers of reads, then more memory is the key. If you’re dealing with high numbers of transaction commits, then (if you have replication, you need to double this) you’ll have at least one IO operation for every transaction commit to the transaction log, as it’s forced to fsync(). This is, however, in a linear file, and so can be dealt with fairly well, if you, for example, stick that transaction log on its own RAID controller, with its own disks (and battery/capacitor-backed RAM or flash). Swap is harder, here, of course.

    The worst bit about all of the above, however, is the number of people who haven’t tuned their apache settings before the crisis hits, and the moral of the story is really – do the tuning you’ve very nicely highlit (how much memory do you want to use for Apache, for MySQL etc, cache like hell – APC, xcache or any of the others) before it actually becomes a problem, and if you can be clever with your image and other asset domains in advance (images.mocko.org.uk, for example), then it’s easier to turn them into CDN-hosted bases. Of course, if you expect regular high traffic, then going to Akamai, despite the cost, will probably be worth it…

    1. The points you make are all spot on. Databases are the one most in my attention at the moment – I’ve wrestled with them for weeks on end and while I always knew DB machines need massive IO it was a surprise that they should need it so carefully allocating between different applications and databases. I assumed MySQL would be smarter; a RDBMS should be no more than a clever proxy between data on disk and one’s network cable. It isn’t yet and there’s still a lot of engineering in between.

      1. I’m not sure I agree with you about a “proxy for the data on disk to the network”. You’re asking for structured data, and you’re asking the computer to do the work of structuring it for you.

        I guess I wonder whether we’ll get to the point where we stop worrying, even for high-performance things as to how the internals work, and just treat those things as a black box. I don’t see that as a possibility. You have to tune a car engine for high performance, and make compromises based on an understanding of the mechanics and electr(on)ics of that particular engine, why should it not be the same for computers?

        A solid understanding of how your RDBMS works and what it’s doing under the hood is entirely necessary to tune it to the hardware and the hardware to it, IMO. Of course, they all work OK under low-load with default settings, but that’s the “disk-to-network proxy” setup. Do anything at higher volume, and you need to understand what your load profile looks like in order to get the most out of your budget and hardware.

  2. Hello, great article. Read it, decided it might be a good idea to tune up my apache config.

    I’m hosted on a gogrid.com centos XEN VM with 512 meg of ram.

    Looking at my config below I suspect I suspect the maxclients and perhaps everything else is massively overspecced for the machine it’s on.

    StartServers 8
    MinSpareServers 5
    MaxSpareServers 20
    ServerLimit 256
    MaxClients 256
    MaxRequestsPerChild 4000

    However I wasn’t sure how serverlimit and maxrequestsperchild fit into the equation. Any suggestions on correct values? Apps being run are a couple of wordpress blogs with wp-cache enabled and a drupal install.

    1. Hallo there. You haven’t said whether your VM is 32 or 64-bit (I’ll assume 32) or the Apache process size but given the limited memory I’d change them to something like:

      StartServers 5
      MinSpareServers 5
      MaxSpareServers 10
      MaxClients 15 # Or higher if your processes are small
      MaxRequestsPerChild 4000

      If you’re using the prefork MPM ServerLimit won’t do much so can be omitted.

      That “MaxClients 256” value was way too high – if you ever did have 256 Apache processes running they’d need a lot more memory than your server has and the machine would grind to a halt as 240 of them tried to execute from swap.

      1. It’s a 64bit box and apache top output was as follows.

        [root@dooby etc]# top -u apache -b -n 1
        top – 15:25:38 up 171 days, 13:09, 1 user, load average: 0.02, 0.02, 0.00
        Tasks: 76 total, 1 running, 75 sleeping, 0 stopped, 0 zombie
        Cpu(s): 0.3%us, 0.2%sy, 0.0%ni, 99.2%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
        Mem: 510824k total, 501936k used, 8888k free, 161952k buffers
        Swap: 1097756k total, 65844k used, 1031912k free, 74132k cached

        PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
        12347 apache 15 0 402m 62m 33m S 0.0 12.5 0:00.85 httpd
        12348 apache 15 0 386m 41m 24m S 0.0 8.2 0:00.96 httpd
        12349 apache 15 0 385m 41m 25m S 0.0 8.3 0:00.62 httpd
        12350 apache 15 0 387m 40m 25m S 0.0 8.2 0:00.84 httpd
        12351 apache 15 0 401m 55m 27m S 0.0 11.1 0:00.83 httpd
        12352 apache 15 0 404m 60m 32m S 0.0 12.2 0:00.98 httpd
        12353 apache 15 0 401m 56m 28m S 0.0 11.4 0:00.68 httpd
        12354 apache 15 0 385m 41m 26m S 0.0 8.4 0:00.76 httpd
        16460 apache 15 0 385m 30m 14m S 0.0 6.1 0:00.09 httpd
        23734 apache 15 0 420m 77m 34m S 0.0 15.5 0:01.29 httpd
        28269 apache 15 0 385m 30m 14m S 0.0 6.1 0:00.20 httpd
        28270 apache 15 0 387m 32m 16m S 0.0 6.6 0:00.35 httpd
        28271 apache 15 0 385m 31m 15m S 0.0 6.2 0:00.24 httpd

        Thank you very much for your advice.

  3. You may want to add that, as few apache modules are loaded.
    But having said that, two modules that may be useful is deflate and expires.

    Although, with deflate im not sure adding compression is then adviseable if you are been slashdotted.

    1. Well noted – I left that out for simplicity’s sake since apart from the language interpreters (mod_php, mod_perl, mod_python etc.) most Apache modules are quite small and fiddling with them can have disastrous consequences.

      Mod_deflate is a Good Thing but it only has an effect on compressible content (i.e. HTML, JavaScript and CSS) and does nothing for images which in most cases form the bulk of the transfer. In most cases there are larger savings to be made by shifting assets away from the server.

      1. Also, separate things – back to older schools of thought. Things like fastcgi where you have few processes doing the work and thin clients on top means your apache children can have as little as 5-6MB resident set size. Also means that a few slow clients getting the images don’t completely tie you up, which is, of course the downside of MaxClients. Really, though, you start to want webservers where the connection cost is minimal, but the expense is in the page serve. In apache, the two are linked. In a single-threaded (or multi-event-loop system) such as nginx/fastcgi, you’ll probably do a lot better under that kind of load.

  4. May I ask what’s the significance of “Yeeee-haw cowboy” and other mockery in the article? It’s really distracting for a technical reading.

    1. It’s because I can’t write anything without injecting a bit of humanity into it. Terrible flaw I know.

      Besides nobody is going to wade through a 4,000 word post if you don’t make them smile a couple of times :)

      1. I fully agree…I am a human trying to tame a machine…a little humour/light speech helps the man-machine-interface inside of my brain….that’s what I like about “you brits”…you gotta listen carefully because every sentence in itself might be packed with humour sometimes…

    2. If you’re reading this for technical advice on how to do scaling, then you’re doing scaling wrongly. This is about dealing with the slashdotting of your home colo, so there’s an element of reasonable light-heartedness in Alex’s post. I say, good for him, it makes it a lot more readable.

    3. I thought that was one of the best snippets in the article (besides the actual useful data). He perfectly stated the feeling I get every time I su.

      The safety is off cowboy, don’t shoot your foot off :)

  5. This was a very helpful article, thanks. For any readers who are not familiar between bits and bytes, I wanted to point out that bandwidth is measured in bits. In the previous example of 6.4 Mbytes/sec, the bandwidth usage would be 51.2 Mbits/sec.

    1. Memcache can make things worse if you’re swapping – the particularly nasty failure mode is that the object times out (so is assumed to be not in cache), you do the calculation, but then the connection is really quick, so you assume the cache is there and re-insert.

      You also have to be aware of that in your application design. You can’t just go “add memcache and suddenly everything will be OK”. If you’re using something off-the-shelf – WordPress, Drupal, whatever – then you’ll find that the “Just Add Memcache” may not help as much.

  6. Nice article! Although I’d like to make a few comments.

    First of all I think the MaxClients setting and the keep-alive settings are arguable. I’ll try to explain why the keep-alive setting is very useful first. It’s not “just” there. Yes, the default setting (300 seconds) is out of this world, but if you set it right, it’s saves you loads! Literally ;).. Let’s say you’re apache process is indeed a (massive) 27 megabytes, for every new request it’ll have to go in memory! So that means, for every image, it’ll load 27 megabytes for that particular apache process. That’s costly (cowboy)! So please just hang on there for let’s say 2 seconds (because you’d need a hell of a slow browser to let there be 2 seconds between the css and the images and these won’t be you’re problem anyway) and save your suffering box a lot of headache..

    Then the MaxClients setting. You’re right about the fact that the default MaxClients setting is absolutely ridiculous, but like you said, there is no such thing as default. So you can’t say things like “For 512Mb of RAM MaxClients should be no higher than 20, For 1Gb, 20 For 4Gb, 75”. (make that b’s capital btw, we’re talking about bytes here). If you’re talking about small sites mod_php is likely to be the only major apache plugin there. Python and Perl are not that common as far as I know. So people should look at how much memory on of their apache process uses and use that as a guide for the MaxClients setting. And they’ll find that an average apache process won’t consume more then 6-10 MB’s of memory.. I think [your RAM] – [the amount of memory your OS needs (something like 32 or 128, might vary a lot)] – [MySQL cache] would be a good calculation. At least to be safe.

    I hope this is useful.

  7. If your site is business critical, might I suggest that you just stick a Zeus Traffic Manager in front of it. :)

    (Disclaimer: I used to work for Zeus years ago. But, once you’ve worked on a revolutionary product that deserves more attention that it gets, you’re always willing to shout about your baby)

    1. Having come across Zeus gear in the past I can only agree with you – it really is outstanding. The only problem is the cost! A couple of times I’ve wanted to implement it for clients only for the quotes to come in way too high. Anyone who’s hosting their site on a single LAMP server will find Zeus hardware way out of their reach.

  8. First off, very awesomely thorough post. I loved both the amount of technical knowledge and your writing style. Having that kind of fun with the voice usually (key word) indicates to me that a person is very passionate about and is well-versed in what they’re writing about. I think the comment from @kaw was actually an unfeeling robot trolling you.

    Second, it seems as if many of these settings tweaks are just good practice for less powerful servers. I know you shouldn’t spend a ton of time pre-optimizing, but the premise of reacting to a slashdotting seems like waiting too long. Any items that you recommend not just doing to have it done?

    Finally, write more! I don’t do a ton of server-related work, but at the very least, want to stay up-to-date on such tactics. You definitely know your stuff and have the ability to write about it, so don’t keep all that to yourself. I know these kinds of posts don’t get cranked out in a short 5 minutes by any means, but hopefully you’re feeling rewarded in taking the time to share.

    Thanks!

  9. Which of these adjustments should one apply immediately just to be prepared? Are there any that you should only use as a last resort?

    Thanks for such a great article!

    1. I’d say all the Apache config changes are sensible things to do right from the start. Leave KeepAlive on unless things get really bad though, under most circumstances it’s a good thing so long as your timeout isn’t too high. Likewise database tuning – while premature optimization is bad there’s nothing wrong with making your site’s DB run faster while you have the luxury of time to examine it properly.

      As for shifting content off to a CDN – that’s only a win at times of high load and if you’re using the Coral Cache for your images it can actually make pages load a lot slower when the site isn’t in high demand.

  10. Great article Alex, nice to see a guide of this nature summarised so well, I especially liked the reference to a HDD arm’s inability to provide RAM access speeds AKA “Oh shit, I’m hitting swap”.

    My line of work involves advising some pretty large sites on topics of this nature and I’d agree with everything you’ve said only add a small note about how inappropriately MySQL is often used especially by framework built sites. As such having a standby version of your site with a non-dynamic feature set built in flat HTML can be an absolute godsend in slashdot situations.

    An approach I’ve often favoured is to have lightweight ‘holding’ version of the site in question installed and ready to go in an alternative light weight HTTP daemon such as nginx. If you want to ice the cake you can even cron a small script to check the load average (be weary of using the 5m average!) and kill apache/start up nginx above certain thresholds to automatically alleviate high load until it returns to normal levels.

  11. Hi Alex,

    Good article. I don’t know whether it was said in the comments, but one thing to mention about databases is that you must make sure that if you can start 50 Apache2 processes, then you may want to have at least 50 connections available on your database. Otherwise, you are likely to get errors “could not connect to database.”

    This means if your Apache uses 30Mb and your database uses 20Mb, you need 50Mb for each process… and you better have that much memory available.

    This being said, I rarely get “cannot connect to server”. One thing you did not mention… when Apache runs out of child processes, it still accepts connections, it just doesn’t do anything with them until a child becomes available.

    One other thing I found strange in your settings… You set MaxRequestsPerChild to 0. I would use a larger number, but not too large unless you only serve static pages. I use 100 and it also helps making things go faster (but it uses more memory… there is a trade off to everything!)

    Thank you for the info. I reduced my KeepAlive to 15 instead of 300. We’ll see whether my many overload still occur as frequently (although mine only go a little over # of CPUs… I still don’t like them!)

    Alexis

    1. Hi Alexis,

      Agreed; having one MySQL connection available per process is a sensible baseline although since workloads vary there’s no hard-and-fast rule. It’s also worth remembering that PHP will pool MySQL connections so if your site’s seeing a lot of traffic a connection from one of the last page impressions should get reused instead of the Apache/PHP process going to the trouble of throwing up a whole new one.

      I didn’t mention MaxRequestsPerChild but one of the comments did. It’s unlikely that a PHP-driven web app will have serious memory leaks (data should all get garbage collected after each impression!) so a much higher figure is correct. In most Apache installations the default is a sensible 1,000 which means a thousand impressions will be served before an Apache processes terminates and is replaced. Even larger values would be reasonable (10k? 100k?) as well.

      All the best,
      Alex.

  12. Pingback: Tom Graft

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.