Optimizing Build Times for Large C Projects
Posted by JZig on February 11, 2009
If you’re not a programmer, you probably want to go ahead and skip this entry. Anyway, because of a temporary lack of fires at work and my inherent need to Optimize Things, I ended up spending last week looking into ways to speed up the programmer build times. As a bit of background, we use Visual Studio 2005 (Team Suite, so we can use MS’s static code analysis), and our code is nearly all pure C. We use Xoreax IncrediBuild to distribute our build process across idle machines, otherwise it would take significantly longer. I would describe our code base as Quite Large, but not having worked at other game companies I have nothing to compare it to. At the start of last week it took about 30 minutes to compile our entire code base from scratch (which is distributed), and about 10 minutes to relink our entire code base after changing one of our base libraries (which is entirely local). I knew I could do better, and here was my process:
Manual Include Cleanup
My first step was to take a look a look at a source file that took an average amount of time to compile. I changed it to output preprocessed text (in the properties for a file -> C/C++ -> Preprocessor -> Generate Preprocessed File) and took a look at what text was being included. It gives you the line numbers and source file of what included what, so I fairly quickly tracked down that my random OS-independent C file was including all of windows.h! If you’re not aware, windows.h is a QUITE large include file, and ends up grabbing things like the Microsoft XML library headers. After discovering this, I found some OS-independent header files that were accidentally including windows.h, and fixed that up manually. I also made a special header file that only included only the bits of windows.h (windef.h and winbase.h) actually needed to get symbols like HINSTANCE to work properly.
This step ended up reducing the size of the preprocessed text by about half. This saved about 5 minutes off the from-scratch build time, and 2 minutes off the full link. It also reduced executable size by a few hundred k. This was definitely worth it, and I recommend everyone look for any low-hanging fruit in their PreProcessor output.
Incremental Linking/IncrediLink
IncrediBuild includes something optional called IncrediLink. Turning it on caused a bunch of errors so we hadn’t been using it. I went back to figure out why, and it appears that it has been more-or-less integrated directly into Visual Studio 2005. Basically, if you use static libraries (which we have many of), Visual Studio’s incremental linker doesn’t work very well. To get around this, you can enable “Use library Dependency Inputs” in the linking properties for a project. This will make it not link in the .lib file, but instead directly link in the .obj files that create the .lib file. This allows the incremental linker to work better, but does make the compiler more paranoid about duplicate symbols. That ended up being okay, because I fixed some stupid code (includes of foo.c instead of foo.h) that was very likely slowing down linking.
After fixing all the duplicate symbol errors, I noticed about a minute decrease in build times. My suspicion is that this is mostly due to there no longer being tons of duplicate symbols. I’m skeptical of this actually being helpful, but I left it enabled to force us to avoid duplicate symbols. It’s slightly faster, I guess.
Automated Include Cleanup
So link and from-scratch build times were getting better, but there was still one major problem. Because our code tends to be highly cross-linked and full of automatically generated code, changing a given .c file in a trivial way can often end up causing a large chunk of the code base to recompile. We’ve also been pretty lazy about #include directives, and many of these linkages have no real reason to exist. So, I came up with the sophisticated solution of Brute Force Perl Script. Basically, I would scan through all of the .c files in our code and attempt to comment out each #include directive. It would then recompile the code base and see if there were errors. If there were it would revert the comment out and then go on to the next one. It would then continue to do this for a very long time. I ran it over the weekend and it only got through about half our code. I’ll finish it off this weekend.
This step was definitely worth it. The run this weekend removed a total of 5772 duplicate or unnecessary #include directives. It’s a bit harder to quantify the success of this (full build time went down by a small bit, but anecdotally it seems like trivial .c file changes cause fewer recompiles), but I think it was worth it. I put up a copy of the script I used on this site, so take a look if you want someone else to do the hard work for you. I looked around for a script like this before I wrote it, but couldn’t find any.
New Hard Drive
Having gotten about as far as I could on software alone (most of the rest of the compile time is being spent in MS’s static code analysis, which I don’t have access to the internals of), I thought it was time to look into hardware. Okay, so I mainly wanted an excuse to get one of those sweet Solid State Hard Drives. So, I stopped by Fry’s and picked up a VelociRaptor 300 GB drive, and an Imation/mTron 32 GB Solid State drive. I installed the code base on both, and then used Junction to make windows think they were on my C:\ drive (Some of our tools care about that. We’re lazy). My primary hard drive is a fairly slow SATA 500 GB drive from a few years back. Here’s how the numbers came out:
| Hard Drive | Build From Scratch | Link From Scratch | Common Link | Partial Local Build |
|---|---|---|---|---|
| Old HD | 26:20 | 6:35 | 1:50 | 24:15 |
| VelociRaptor | 26:30 | 3:22 | 1:12 | 24:05 |
| SSD | 25:10 | 2:55 | 1:05 | 23:45 |
As you can see the HD didn’t make much of a difference on the full build (which depends on other computers) or the partial local build (which is all local but is stalling on the static analyzer), but it significantly sped up the link-only builds. The SSD was not much faster then the VelociRaptor, and is way more expensive/GB. With either new drive, my computer is much more responsive and usable during builds then it was before. Because my code is now on a different drive then my data or my swap file, I can do things like read email or text editing during compiles without my hard drive thrashing and stalling my computer. Opening Visual Studio and getting latest code from version control are also significantly faster. I highly recommend getting a second HD to store your source code, but it doesn’t have to be an SSD.
Conclusion
After spending a few days off and on looking into build times, I managed to reduce the full build time by 5 minutes and the link-only build time by 6 minutes. I also reduced the linkages between our C files, so fewer recompiles will result from trivial .c changes. IncrediLink was possibly a waste of time, but the other 3 things I tried were worth it. I recommend everyone to bug IT until they give you a second hard drive, and spend a bit of time cleaning up your #includes. Saving 6 minutes on your link times doesn’t sound like much, but it quickly adds up if you’re in a fast development cycle.
Ben Wilhelm said
I’ve actually got a similar include culler I rigged up, but mine also tries replacing #include’s with all the include’s that the included file contains. I found this frequently cut things out as well, and I recommend it as a further step.
If you want my codebase I’ll send it to you, but I’m going to suggest right now that you don’t, as it’s quite horrifying
Joe Ludwig said
Are you using precompiled headers? Putting those remaining windows.h includes into one would probably help quite a bit.
Just for comparison’s sake, the Pirates codebase was about 1 million lines of code at launch. That includes comments, whitespace, and all the generated code. (I suspect it was about half that without generated code, but I never measured it.)
Whaledawg said
What was Brute Force Perl Script doing at your place this weekend? He was supposed to be at mine mirroring file systems across multiple hard drives in a pathetic attempt to fake a raid array.
JZig said
Joe, what were your compile times like, with that much code? I was going to poke you to find out the size of your code base
I need to actually measure ours, what tool did you use?
JZig said
Whaledawg, Brute Force Perl Script is good at multi tasking.
Joe Ludwig said
The compile times on Pirates were about half an hour for a full rebuild. About ten minutes of that time was spent linking. It turns out that linking DLLs with a huge number of exports is painfully slow, and we had about 15 of those. That’s something I’m going to try to avoid in future projects.
I don’t remember what I used to measure exactly. I think I just downloaded a LOC counting add-on for Visual Studio.
Amol Deshpande said
Interestingly, the windows.h in the new Windows SDK is now a small file that has much more control over what it includes (you can #define various things to turn off including other files).
Also, #define WIN32_LEAN_AND_MEAN (for pre-vista SDKs), precompiled headers, and “#pragma once” instead of old-school include guards can also help a lot.
Andy Brice said
IIRC PC-Lint from Gimpel can spot unnecessary #include’d files.
I have a RAID 1 harddisk for robustness. What would be the best way to add another drive (or 2) to speed up compile times – RAID X (what would X be?).
Anna-Jayne Metcalfe said
I can second Andy’s suggestion about PC-lint and unused include folders. It’s pretty effective and will identify all such issues in a single run without changing source files etc. if you set it up correctly (clue: -w0 -e766).
A a bonus, PC-lint is also (as our experience bears out) rather amenable to parallelisation to speed up the analysis runs (we were able to cut the analysis time for our 180k LOC codebase from over 4 hours to 15 minutes or so using IncrediBuild with 6 agents).
We’ve also found that refactoring a codebase for unit testing will naturally reduce include dependencies and hence compilation time. Michael Feathers’ “Working Effectively with Legacy Code” is an excellent guide to the techniques you need to do this.
A final plug: IncludeManager by Profactor Software (http://www.profactor.co.uk/includemanager.php) is a great way to visualise (in graphical form) the includes (and corresponding build time impact) for a given file or project so you can see what’s going on. We use it extensively, and it’s proved absolutely invaluable.
Paulo Pinto said
Include Manager is indeed a great application although I find it difficult to use it effectively on very large projects.
Any hints from anyone that has used it successfully for large projects?
Matt Hargett said
The “Include Browser” built into Eclipse CDT is great for this.
As far as distributed PC-Lint runs, I’m curious if you’re doing a file-by-file analysis or generating LOB files for an inter-module analysis. I have personally found that an inter-module analysis is really necessary for taking advantage of PC-Lint’s strengths (inter-function value tracking, etc) to find deep bugs.
As far as disk-related bottlenecks, I have a question: would it be better to get a single 15K RPM disk or two 10K RPM disks in a SATA2 RAID configuration? I’m trying to figure out if the elevator seeking would outperform the single spindle, even if it’s slower.
Anna-Jayne Metcalfe said
The IncrediBuild integration currently supports either single file or whole project analysis (the latter of which will run one project per core of course, so it’s only really worth doing if you have a lot of projects). FWIW the value tracking in PC-lint 9.0 is much improved over 8.0, and even a single file analysis can offer far more information on potential defects than most teams seem able to deal with (nobody likes bad news, right?). That’s a shame, as it’s capable of so much more.
LOB files do indeed offer a way to do detailed inter-module tracking which is not available in other modes (as does whole project analysis, of course). We’ve not yet implemented support for LOBs in Visual Lint as none of our customers have asked for it (something which surprised me, quite frankly. By contrast whole project analysis was much requested from the outset). If any of our customers wants full LOB file support badly enough we’ll happily implement it.
I’m not sure about the answer to your disk bottleneck question – the only way to be sure for real is to try it. I suspect either will bring a great improvement over a typical configuration, though. If you’re looking into performance PC-lint 9.0 precompiled header file support is worth looking into once it’s mature as it can dramatically reduce analysis times. However, as our tests in 9.00a through 9.00c show additional messages are being produced when PCH is enabled, we’ve held off full support for it until that’s resolved.