Optimizing Build Times for Large C Projects
Posted by Ben Zeigler on February 11, 2009
If you’re not a programmer, you probably want to go ahead and skip this entry. Anyway, because of a temporary lack of fires at work and my inherent need to Optimize Things, I ended up spending last week looking into ways to speed up the programmer build times. As a bit of background, we use Visual Studio 2005 (Team Suite, so we can use MS’s static code analysis), and our code is nearly all pure C. We use Xoreax IncrediBuild to distribute our build process across idle machines, otherwise it would take significantly longer. I would describe our code base as Quite Large, but not having worked at other game companies I have nothing to compare it to. At the start of last week it took about 30 minutes to compile our entire code base from scratch (which is distributed), and about 10 minutes to relink our entire code base after changing one of our base libraries (which is entirely local). I knew I could do better, and here was my process:
Manual Include Cleanup
My first step was to take a look a look at a source file that took an average amount of time to compile. I changed it to output preprocessed text (in the properties for a file -> C/C++ -> Preprocessor -> Generate Preprocessed File) and took a look at what text was being included. It gives you the line numbers and source file of what included what, so I fairly quickly tracked down that my random OS-independent C file was including all of windows.h! If you’re not aware, windows.h is a QUITE large include file, and ends up grabbing things like the Microsoft XML library headers. After discovering this, I found some OS-independent header files that were accidentally including windows.h, and fixed that up manually. I also made a special header file that only included only the bits of windows.h (windef.h and winbase.h) actually needed to get symbols like HINSTANCE to work properly.
This step ended up reducing the size of the preprocessed text by about half. This saved about 5 minutes off the from-scratch build time, and 2 minutes off the full link. It also reduced executable size by a few hundred k. This was definitely worth it, and I recommend everyone look for any low-hanging fruit in their PreProcessor output.
IncrediBuild includes something optional called IncrediLink. Turning it on caused a bunch of errors so we hadn’t been using it. I went back to figure out why, and it appears that it has been more-or-less integrated directly into Visual Studio 2005. Basically, if you use static libraries (which we have many of), Visual Studio’s incremental linker doesn’t work very well. To get around this, you can enable “Use library Dependency Inputs” in the linking properties for a project. This will make it not link in the .lib file, but instead directly link in the .obj files that create the .lib file. This allows the incremental linker to work better, but does make the compiler more paranoid about duplicate symbols. That ended up being okay, because I fixed some stupid code (includes of foo.c instead of foo.h) that was very likely slowing down linking.
After fixing all the duplicate symbol errors, I noticed about a minute decrease in build times. My suspicion is that this is mostly due to there no longer being tons of duplicate symbols. I’m skeptical of this actually being helpful, but I left it enabled to force us to avoid duplicate symbols. It’s slightly faster, I guess.
Automated Include Cleanup
So link and from-scratch build times were getting better, but there was still one major problem. Because our code tends to be highly cross-linked and full of automatically generated code, changing a given .c file in a trivial way can often end up causing a large chunk of the code base to recompile. We’ve also been pretty lazy about #include directives, and many of these linkages have no real reason to exist. So, I came up with the sophisticated solution of Brute Force Perl Script. Basically, I would scan through all of the .c files in our code and attempt to comment out each #include directive. It would then recompile the code base and see if there were errors. If there were it would revert the comment out and then go on to the next one. It would then continue to do this for a very long time. I ran it over the weekend and it only got through about half our code. I’ll finish it off this weekend.
This step was definitely worth it. The run this weekend removed a total of 5772 duplicate or unnecessary #include directives. It’s a bit harder to quantify the success of this (full build time went down by a small bit, but anecdotally it seems like trivial .c file changes cause fewer recompiles), but I think it was worth it. I put up a copy of the script I used on this site, so take a look if you want someone else to do the hard work for you. I looked around for a script like this before I wrote it, but couldn’t find any.
New Hard Drive
Having gotten about as far as I could on software alone (most of the rest of the compile time is being spent in MS’s static code analysis, which I don’t have access to the internals of), I thought it was time to look into hardware. Okay, so I mainly wanted an excuse to get one of those sweet Solid State Hard Drives. So, I stopped by Fry’s and picked up a VelociRaptor 300 GB drive, and an Imation/mTron 32 GB Solid State drive. I installed the code base on both, and then used Junction to make windows think they were on my C:\ drive (Some of our tools care about that. We’re lazy). My primary hard drive is a fairly slow SATA 500 GB drive from a few years back. Here’s how the numbers came out:
|Hard Drive||Build From Scratch||Link From Scratch||Common Link||Partial Local Build|
As you can see the HD didn’t make much of a difference on the full build (which depends on other computers) or the partial local build (which is all local but is stalling on the static analyzer), but it significantly sped up the link-only builds. The SSD was not much faster then the VelociRaptor, and is way more expensive/GB. With either new drive, my computer is much more responsive and usable during builds then it was before. Because my code is now on a different drive then my data or my swap file, I can do things like read email or text editing during compiles without my hard drive thrashing and stalling my computer. Opening Visual Studio and getting latest code from version control are also significantly faster. I highly recommend getting a second HD to store your source code, but it doesn’t have to be an SSD.
After spending a few days off and on looking into build times, I managed to reduce the full build time by 5 minutes and the link-only build time by 6 minutes. I also reduced the linkages between our C files, so fewer recompiles will result from trivial .c changes. IncrediLink was possibly a waste of time, but the other 3 things I tried were worth it. I recommend everyone to bug IT until they give you a second hard drive, and spend a bit of time cleaning up your #includes. Saving 6 minutes on your link times doesn’t sound like much, but it quickly adds up if you’re in a fast development cycle.