Debugging 500 Errors

For the past two years, the AlliedModders site has been randomly dishing out “HTTP 500 – Internal Server Error” responses. It’s been pretty frustrating.

I actually tried debugging the errors two years ago. I got far enough to attach GDB to PHP, and saw it blocked on mysql_query(), but didn’t know how to proceed. We knew that during these incidents, the server load would spike to obscene levels – like 100-150. Lots of PHP processes were getting spawned. “iowait” time was through the roof, and CPU usage was normal. Something couldn’t read the disk, or the disk was getting hammered. I put MySQL on a separate hard drive, but the problems persisted and I gave up. Keep in mind: I didn’t know anything about running servers, and I still don’t – I just have a vague notion of how to debug things.

Fast forward to a few weeks ago, we had a serious hard drive crash and suffered some painful downtime. We lost a lot of data. I clung to a hope that our years of errors had been caused by a failing disk. Predictably, we weren’t so lucky, and the website problems continued. Someone joked, “Hey, I’m glad to see you got the server back up, even the 500 errors!” Yeah… it was time to fix this. Somehow.

Step One: Apache Logs

There were only two notable Apache errors. They’d appear over and over, but only in large clumps:

[Tue Jan 25 04:15:19 2011] [warn] [client xx.xxx.xxx.xx] (110)Connection timed out: mod_fcgid: ap_pass_brigade failed in handle_request function
[Tue Jan 25 04:15:25 2011] [warn] [client xxx.xxx.xx.xx] mod_fcgid: read data timeout in 60 seconds

It was pretty clear that somehow, communication was breaking down between mod_fcgid (Apache) and PHP processes. I read, I think, every single webpage on the Internet involving these two errors. Twice. They were all dead ends, both for us and both for the person reporting their own problems. A little sad. I wanted to blame mod_fcgid, since historically it has been really buggy for us, but I wanted to pin absolute blame.

So the next step was to break out a debugger.

Step Two: Debugging PHP, Apache

AM developer Fyren edited the mod_fcgid source code, such that instead of killing PHP, it would attach gdb in a separate tty. It usually took around three hours to get a debugging session going. The call stacks looked like:

(gdb) bt
...
#4 0x00007f1f7cad804f in net_write_command () from /usr/lib/libmysqlclient.so.15
#5 0x00007f1f7cad4c51 in cli_advanced_command () from /usr/lib/libmysqlclient.so.15
#6 0x00007f1f7cad1751 in mysql_send_query () from /usr/lib/libmysqlclient.so.15
#7 0x00007f1f7cad17b9 in mysql_real_query () from /usr/lib/libmysqlclient.so.15
...
(gdb) fin
Run till exit from #0 0x00007f1f7cad17b9 in mysql_real_query () from /usr/lib/libmysqlclient.so.15

At this point, the session would hang, indicating that MySQL was not responding. I really had no idea what to do, but we got lucky. We noticed during these incidents, Apache’s server-status was stuck at “200 requests being processed,” all in the “W” (“Sending reply”) state. Our MaxClients setting was exactly 200. Now we had a hypothesis: MySQL was intermittently not responding for short bursts of time. This blocked PHP, which in turn made mod_fcgid block and timeout, causing Apache to spawn a bunch of workers that would lock up as well.

Unfortunately, the intermittent failures only happened a few times a day, so debugging went slow. We managed to catch one, and ran iostat:

Device:          r/s       w/s   rsec/s    wsec/s    avgrq-sz  avgqu-sz   await  svctm  %util
sdc              320.80    4.40  3540.80    54.40    11.06     4.32   13.01   3.08 100.00
sdc1             320.80    4.40  3540.80    54.40    11.06     4.32   13.01   3.08 100.00

Someone was reading tons of data off the disk, and iotop confirmed MySQL was the culprit.

Diagnosing MySQL

I managed to find this enlightening presentation by Percona. Case study #3 looked identical to ours, except in their case, MySQL was writing. But their conclusion about bad queries seemed plausible. I made a script that would peek at server-status every second, log a ton of data, and e-mail me if it went above 100 concurrent requests. The data included oprofiling, iotop spew, SHOW FULL PROCESSLIST, SHOW OPEN TABLES, SHOW GLOBAL STATUS, etc.

Oprofile’s report wasn’t too interesting, but the MySQL status showed around 150 queries waiting for a table lock. The active queries, in all cases, were SELECTs. This explained the reads, but not the table locks. This didn’t make sense, so I generated a few core files of MySQL from within GDB, then Fyren and I did some post-mortem analysis.

The analysis revealed that MySQL had about 200 threads, most of which were waiting on locks (pthread_cond_timedwait) with a timeout of one year. The locks were almost all read locks for SELECTs, and the actually succeeding queries were exclusively SELECTs. Fyren tried to track down the thread owning the locks, but it ended up being apparently circular (thread X’s lock was held by thread Y, and thread Y was waiting on a lock held by thread Y).

That was all pretty mysterious (and still is), but rather than try to understand exactly what locks are in MyISAM tables, I took a look at the non-deadlocked queries. They were all one of: the AMX Mod X plugin search, the SourceMod plugin search, or the vBulletin member list search.

It turns out all three of these queries were extremely expensive – containing ORs, joins, full table scans of 800MB tables, etc. On average they each took about 500ms to complete. That’s without MySQL freaking out. I turned off vBulletin member searching (most large boards do), and replaced the plugin searches with faster queries off a much a smaller table. This brought those page load times down to 20ms. Twelve hours later, no 500 errors had occurred, and there were no mod_fcgid failures in Apache’s logs.

Too Early to Declare Victory

While waiting to see if any more 500 errors would show up, suddenly the server load spiked, and lo and behold – mod_fcgid became overloaded. Our monitoring script kicked in, and this time, MySQL was nowhere in the profile! It turned out some forum users were effectively DoS’ing the site by holding down F5 on their profile pages to bump up the “view” count. OProfile suggested that 20% of CPU time, during this incident, went to PHP parsing and lexing source code. I installed eAccelerator and now that time appears to be gone. (And for good measure, I removed profile view counts.)

Growth and Traffic

It’s unclear what our load and traffic really is. I think we’re pretty good though. CPU utilization is usually under 20%, disk I/O is under 10%, and we rarely see more than 30 concurrent requests. We don’t need to invest in a separate MySQL server yet, and we definitely aren’t growing fast enough to expect to need one anytime soon.

Community Involvement

I would like to thank Fyren for his help in looking into these problems, and asherkin for helping beef up our non-existent monitoring. Also, thanks to AzuiSleet, Drunken_Fool, MatthiasVance, and devicenull for sticking around in IRC and answering random questions.

Diagnose, then Solve

Many people, well-intentioned, said “switch to lighttpd, nginx, IIS, use my favorite host, use a VPS, buy more servers,” etc. I got caught up in this too, and fueled it by initially blaming mod_fcgid. It’s really important to make sure a solution has targeted a specific problem, and I had to keep telling myself to keep a cool head. Sometimes I had to slap myself or wait for Fyren to slap me. They all might be good ideas in their own right, but without diagnosing the actual cause of the errors, there was no guarantee of a fix. By careful debugging (albeit in painful, hour-per-day sprees), we got pretty close to the root cause, whereas throwing our infrastructure into a salad bowl would have delayed the inevitable.

Part of the reason our original drive is still at the recovery lab is because someone didn’t diagnose before “solving.” When it failed, a technician (not one of us) ran fsck on the disk instead of doing a hardware diagnostic first. I have no doubt this caused untold extra damage.

Conclusion

tl;dr, the forums should be much faster now, and it shouldn’t error anymore. I’ve got the system set up to e-mail me and dump diagnostics when our load goes way above normal, so I’ll continue monitoring for potential problems.

I would like to sincerely thank the whole AlliedModders community for the successful donation drive. I won’t lie, the past month has been pretty sucky. Despite our failings, the fact that everyone cares enough to see the whole thing keep going is really awesome. Our seven year anniversary was just a few weeks ago. Here’s to an AlliedModders for 2011!

58 thoughts on “Debugging 500 Errors

  1. Aryeh Gregor

    I’ve been tech admin of twcenter.net since late 2005. We also run on only one server (until we just got a second for failover), but our vBulletin is significantly bigger than yours, so maybe I can offer some useful advice from experience.

    If you’re having locking problems with vBulletin, switching from MyISAM to InnoDB will solve them, since InnoDB uses only row-level locking for table modifications (instead of table-level), and no locking at all for selects (via the magic of MVCC). This isn’t a substitute for fixing slow queries, but MyISAM simply doesn’t scale except on very specific workloads. It takes out table-level locks for every query, and that just doesn’t work for concurrency.

    InnoDB doesn’t support MySQL fulltext search, so you’ll have to use the semi-official vB Sphinx plugin: http://www.vbulletin.com/forum/showthread.php/350427 But that’s a big performance improvement in itself; MyISAM’s fulltext search is hideously slow. Since you apparently haven’t upgraded to 4.0, you can use the totally unofficial one instead: http://www.vbulletin.org/forum/showthread.php?t=127868 My site has been using that for a few years with no problems. Even without that, you can switch all tables without fulltext indexes to InnoDB, but that leaves the post table unconverted, and of course that’s your biggest one.

    If you switch to InnoDB, make sure you reconfigure MySQL appropriately. This post will be helpful: http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/ Basically, key_buffer_size is used only by MyISAM, the InnoDB equivalent is innodb_buffer_pool_size. But the latter should be set around twice as large as the former if you switch from all-MyISAM to all-InnoDB, because MyISAM uses the OS disk cache for data (key_buffer_size is only for indexes, as the name suggests) while InnoDB doesn’t use the OS disk cache at all. The only really important InnoDB tuning setting is the buffer pool size; raise it until “SHOW INNODB STATUS\G” in MySQL shows a buffer pool hit rate of, ideally, 1000/1000 (should be attainable if you have a decent amount of memory).

    InnoDB has other useful features too. If all your tables are InnoDB, you can use mysqldump –single-transaction to get consistent backups without taking out any locks, so no site downtime regardless of how large your database is. mysqlhotcopy will no longer work, if you were using that, and mysqldump is much slower to restore than mysqlhotcopy, but having SQL backups rather than just file copies is good anyway in the event of database corruption.

    Generally, InnoDB is a far superior storage engine to MyISAM. MyISAM only clung on to life for so many years because InnoDB was owned by a different company and MySQL AB didn’t want to become dependent on it, so it kept developing MyISAM and left it as the default. Oracle now owns both MySQL and InnoDB, so they did the sensible thing and made InnoDB the default in 5.5, now that it had no conflict of interest. At least one fork of MySQL did the same thing.

    For detecting slow queries in MySQL, you want to enable the slow query log: http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html mysqldumpslow will then tell you what all the problematic queries are, and you can try optimizing them. If your problem is locking, though, the results won’t be very useful — it will tell you which queries were blocked rather than which ones took out the locks. Switching to InnoDB will fix this.

    I’ve publicly recorded most of the optimization stuff I did with my forum in the last three and a half years here: http://www.twcenter.net/forums/forumdisplay.php?f=321 Maybe some of it will be useful to you. If you have any questions, I might be able to give you further advice. I’m happy to help out other open-source software developers when I can.

    Reply
    1. dvander Post author

      Aryeh – thank you for the in-depth reply! In fact, we aren’t using fulltext searches right now (we’re using the weird postindex one), so we might just be able to straight-up convert to InnoDB.

      Yeah, we had the slow query log, but as you said, the locking made it noise.

      It’s great you’ve documented your optimization pains. I’ll read over this today. Your forum was started around the same time as ours and has about 8X as many posts – and clearly handling the load much better. From the tech posts on your forums, it seems like you might use more than one server? Do you rent, or colo?

      I noticed you also haven’t upgraded to vB 4.0. I’m on the fence about this upgrade – the underlying code doesn’t look too different and we’d have to retheme again. So it doesn’t seem like there’s a performance motivation, but maybe there is a strong feature one.

      Reply
      1. Aryeh Gregor

        We colocate two servers, with hosting costs paid by advertising and hardware costs paid by donation drives. We always used to be on one server, but a few months ago we had a one-week outage (the longest since I became admin in 2005) caused by repeated hardware failures combined with me and the on-site tech admin being away/busy at various points. At that point we got a second server for failover, but we aren’t doing any load-balancing, so for performance purposes we have only one server.

        (It’s a pretty beefy server, though: sixteen cores, 16 GB of RAM, and now four 10k RPM SATA drives on the new machine. The disks are relatively weak, but that’s okay, since our data fits in memory well. If our dataset got too big, it would probably be cheaper to upgrade RAM instead of disks.)

        We haven’t upgraded to vB 4.x because I haven’t gotten around to doing the necessary work. I plan to get to it sometime after we switch to our new server (which has somewhat better hardware than the current one), which should happen soon. I don’t think it has any very interesting new features, but the fact is, as soon as 4.2 is released 3.8 is no longer getting security fixes. We’ve been attacked by bots exploiting vB security flaws before, so we can’t really take the risk. We don’t want to be forced to upgrade in a hurry to thwart a sudden attack. We will have to reskin, but we have a whole bunch of artsy sorts who’d be happy to do the work for us if it means their skin gets to be the new default. AFAIK, there’s no performance motivation to upgrade.

        Overall, vB’s scalability is decent but not great. I’m pretty sure most vB forums my forum’s size are on multiple servers. But I was able to keep things working efficiently on a single server by suitable hacks. MySQL is no problem for us at all at this point; our bottleneck is almost purely CPU. In addition to switching to InnoDB and Sphinx instead of MySQL, I wound up adding some indexes to certain tables and tweaking some queries.

        As far as MySQL performance goes, I’d recommend the book “High Performance MySQL, Second Edition”: . Trying to debug MySQL performance problems using gdb is an interesting approach, but probably not the most efficient way to go about things. :)

        Reply
  2. Lennie

    I’m really surprised you got that far with that storage engine and website traffic without knowing what to do.

    I know that sounds like an insult, it isn’t what I mean. I mean without experience in this field/subject or without finding the right advice.

    As Aryeh Gregor mentioned gdb and so on is probably not the best way to solve problems in such a complicated piece of software.

    Still an interresting read to see how people (try to) solve things

    Reply
  3. Refugio Neer

    Wer seinen neuen Schlafraum anlegen möchte, der wird meist seinen Traum auf ein großes Bett annullieren, da sich schlicht die Fläche für jenes Wunschbett nicht ausfindig machen lässt. Vielmals wird das Platzieren eines Schranks oder einer Kommode als sinnvoller angesehen, dadurch zur Genüge Platz vorhanden ist. Aber welche Person sich in der großen Bettenwelt umsieht, dieser wird einsehen, dass er sich ein Bett in Wunschgröße kaufen kann und so die gewünschte Liegefläche und hinreichend Raum und sogar Lampen gebrauchen kann. Im Besonderen in modernen Kinderzimmern ist der Platzmangel ausgebrochen und so ist das Funktionsbett ein beliebtes Kinderbett, dass durch seine naturbelassenen Materialien wie Gehölz voll dem Trend zu einem gesunden Raum folgt.

    Reply
  4. Luanna Boland

    The next time I read a blog, Hopefully it won’t fail me as much as this particular one. After all, I know it was my choice to read, nonetheless I genuinely believed you’d have something useful to say. All I hear is a bunch of moaning about something you could fix if you were not too busy seeking attention.

    Reply
  5. eevaedump

    На странице https://grishechkin.ru/ расположен официальный сайт бизнес-тренера Евгения Гришечкина. Если вам уже исполнилось 25 лет, то предприниматель готов оказать помощь — научит, как выйти на новый уровень заработка. Специалист уже не первый год проводит тренинги и может подтвердить свои результаты цифрами и прочими фактами. На веб-странице благодарные клиенты оставили свои отзывы, с которыми рекомендуется ознакомиться перед обращением к Евгению. Любой человек может получить практические навыки, которые кардинально изменят его жизнь.

    Reply
    1. Fernando Robinson

      Thanks for detailed explanation of the process, you helped me very much. You need to make a video instruction too :) You can upload it on youtube, buy youtube views for video’s promotion and add the video here.

      Reply
  6. Trevor Hovsepian

    The Secret To Maintaing A Healthy Diet Plan In order to make the correct dietary alternatives about nutrition, you need to be knowledgeable. When you don’t know what you are performing, you can’t make intelligent selections. The details incorporated into this item may help direct you in the direction of making healthful choices. You must consume adequate protein everyday. Protein improve muscle tissue and encourage healthful epidermis. In addition they assist your mobile phone processes and you will definitely metabolize electricity metabolism. Proteins help your system to assist combat ailments. Some food products that have proteins are chicken, grain, sea food, tofu, beans and species of fish and whole milk goods. Get proper numbers of supplement B12 to construct red blood flow cellular material. Non-meat eaters and senior citizens are often lacking in this from the diet programs. Folks who suffer from anemia also at an increased risk. You can go on a dietary supplement or eat breakfast cereal, or search for this nutrient in your morning meal cereals. The stealthy means of far better consuming is always to sneak well balanced meals in your regular food. This can certainly help when your children are fussy eaters mainly because it gets them to consume a lot more nourishing meals. This can help anyone try to eat much more nutritiously with no knowledge of. Try out loading your own personal meal for function or college dishes. It takes only ten minutes or a lot less to make yourself some treats and food. End eating meals as soon as your body gives you are nearly total. This can help to prevent you from eating too much food items. Most junk foods along with other unhealthy foods have lots of salt in them. Once you begin decreasing sodium, you’ll see that a bit of salt will certainly be a good deal more robust than it once was. Food items that are not healthy for you may instantly appear too salty. You will notice that you don’t desire to consume them as frequently. You are able to realize several health and fitness benefits, though many people aren’t as well fond of soil turkey’s dry skin, when you replacement floor poultry for floor meat. You could add extra virgin olive oil and cut red onion to improve the taste making your turkey a lot more fruit juices. Your meats will likely be less fat and still have wonderful taste. Small modifications to your diet program could bring numerous advantages for your well being. To lower your anxiety, find more energy and increase your focus, use the suggestions out of this report. There’s almost no time like the give to begin to make more healthy eating selections you’ll be blown away by the amount of an improvement it will make!

    Reply
  7. здоровье и здоровый образ жизни

    You might have altered your diet plan and they are eating rather well. Added to that, maybe you have started off training much more and also hardwearing . entire body in a fit condition. What’s missing out on? Vitamins and minerals, of course, and this post will help you via how to add more those to your regime once and for all overall health! For men 50 plus, a different list of nutritional vitamins are essential for maximum wellness. Vitamins like zinc, pantothenic acid and niacin all lower as gentlemen age group. Search for supplements which can be particularly focused toward men that are over 50 years of age to get the best merge for optimum health. The most effective way to obtain nutritional supplements are fresh fruit and vegetables. Making sure that you consume at the very least several helpings of veggies and fruits each day will help be sure you get every one of the nutritional supplements you have to be healthy. For the very best effects, take in them raw or extremely softly made by steaming or poaching. When you have been through menopause, don’t get prenatal vitamin supplements. A lot of ladies that aren’t expectant get prenatal nutritional vitamins to assist expand out their nails and head of hair. This is often dangerous should you not carry it in the appropriate medication dosage. In case you are looking to increase your your hair or fingernails, attempt to add a Biotin dietary supplement. Even though it won’t make the your hair or fingernails grow faster, it is going to make them more powerful and much more immune to breakage. The truth is, Biotin is often advised to chemotherapy people to help them grow back the hair which they dropped throughout remedy. The most effective way to obtain nutritional supplements are new fruit and vegetables. Ensuring you eat at least five portions of vegetables and fresh fruit every single day will help make sure you get each of the vitamins and minerals you ought to be healthy. To get the best results, eat them natural or really gently prepared by steaming or poaching. As stated in the above post, using minerals and vitamins is an important part of anyone’s life who wants to keep a healthful lifestyle. Since you now read through some terrific suggestions about this subject matter there is absolutely no basis for you not to begin on a far healthier course. Keep at it and remember how essential it is actually for your personal lifestyle to become wholesome.

    Reply
  8. Tomaser

    Thanks to the author’s comprehensive description, I learnt everything I needed to know about my topic after reading this post. I published a review that you may read on the https://bit.ly/3BItU1x. Thank you so much for your time and attention.

    Reply
  9. elopkjep

    Новички в сфере трейдерства часто сталкиваются с тем, что не знают, как выводить деньги, а статья на сайте https://forexvision.ru/help/instruktsiya-kak-vyvesti-dengi-s-brokera может всему научить. В статье опубликована инструкция о том, как правильно выводить деньги. Но в то же время в статье есть информация о том, как выбрать брокера – убедиться в его законности, а также уточнить доступные платежные системы. Эта статья будет полезна и для новичков, и для опытных трейдеров.

    Reply
  10. IDgod

    I’ve read several good stuff here. Certainly worth bookmarking for revisiting. I wonder how much effort you put to create this sort of excellent informative site.|

    Reply
  11. usimsnag

    Компания «Natural Dagkamen» работает в Крыму, предлагает широкий ассортимент дагестанского камня изделий на основе этого природного материала. В Дагестане одни из самых крупнейших месторождений на Кавказе доломита, известняка, ракушечника. Данные виды породы здесь получили название – дагестанский камень. На сайте https://natural-dagkamen.ru/ Вы найдете огромное количество разновидностей данных камней для фасадов и внутренней отделки. Компания располагает инновационным оборудованием, на котором выполняют все виды обработки камня.

    Reply
  12. mugraDus

    На сайте ttps://top100kursi.ru/ Вы сможете ознакомиться с актуальными и лучшими на сегодняшний день курсами по заработку. Недорогое пособие в формате видеокурсов – это возможность начать действительно зарабатывать хорошие деньги. Все, что от Вас потребуется, это полностью вникнуть в материал и обучаться. На сайте отобраны только проверенные схемы для обучения заработку.

    Reply
  13. SV388

    Metode Main Mesin Live Slot Online Game gambling mesin slot bet gratis memansdfglah ialah salah satu tipe opsi game klasik yang hingga dikala ini

    Reply
  14. SV388

    Metode Main Mesin Live Slot Online Game gambling mesin slot bet gratis memanglah ialah salah satu tipe opsi game klasik yang hingga dikala ini

    Reply

Leave a Reply

Your email address will not be published.