How scalable is SCons?

The marquee feature in ElectricAccelerator 5.0 is Electrify, a new front-end to the Accelerator cluster that allows us to distribute work from a wide variety of processes in addition to the make-based processes that we have always managed. One example is SCons, an alternative build system implemented in Python that has a small (compared to make) but apparently growing (slowly) market share. It’s sometimes touted as an ideal replacement for make, with a long list of reasons why it is considered superior. But not everybody likes it. Some have reported significant performance problems. Even the SCons maintainers agree, SCons “Can get slow on big projects”.

Of course that caught my eye, since making big projects build fast is what I do. What exactly does it mean that SCons “can get slow” on “big” projects? How slow is slow? How big is big? So to satisfy my own curiosity, and so that I might better advise customers seeking to use SCons with Electrify, I set out to answer those questions. All I needed was some free hardware and some time. Lots and lots and lots of time.

Read the rest of this entry »

How to quickly navigate an unfamiliar makefile

The other day, I was working with an unfamiliar build and I needed to get familiar with it in a hurry. In this case, I was dealing with a makefile generated by the Perl utility h2xs, but the trick I’ll show you here works any time you need to find your way around a new build system, whether it’s something you just downloaded or an internal project you just transferred to.

What I wanted to do was add a few object files to the link command. Here’s the build log, with the link command highlighted:

gcc -c  -I. -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g   -DVERSION=\"0.01\" -DXS_VERSION=\"0.01\" -fPIC "-I/usr/lib/perl/5.10/CORE"   mylib.c
rm -f blib/arch/auto/mylib/
gcc  -shared -O2 -g -L/usr/local/lib mylib.o   -o blib/arch/auto/mylib/   \
chmod 755 blib/arch/auto/mylib/

Should be easy, right? I just needed to find that command in the makefile and make my changes. Wrong. Read on to see how annotation helped solve this problem.

Read the rest of this entry »

Subbuilds: build avoidance done right

I’ve heard it said that the best programmer is a lazy programmer. I’ve always taken that to mean that the best programmers avoid unnecessary work, by working smarter and not harder; and that they focus on building only those features that are really required now, not allowing speculative work to distract them.

I wouldn’t presume to call myself a great programmer, but I definitely hate doing unnecessary work. That’s why the concept of build avoidance is so intriguing. If you’ve spent any time on the build speed problem, you’ve probably come across this term. Unfortunately it’s been conflated with the single technique implemented by tools like ccache and ClearCase winkins. I say “unfortunate” for two reasons: first, those tools don’t really work all that well, at least not for individual developers; and second, the technique they employ is not really build avoidance at all, but rather object reuse. But by co-opting the term build avoidance and associating it with such lackluster results, many people have become dismissive of build avoidance.

Subbuilds are a more literal, and more effective, approach to build avoidance: reduce build time by building only the stuff required for your active component. Don’t waste time building the stuff that’s not related to what you’re working on now. It seems so obvious I’m almost embarrassed to be explaining it. But the payoff is anything but embarrassing. On my project, after making changes to one of the prerequisites libraries for the application I’m working on, a regular incremental takes 10 minutes; a subbuild incremental takes just 77 seconds:

Standard incremental:
Subbuild incremental:

Not bad! Read on for more about how subbuilds work and how you can get SparkBuild, a free gmake- and NMAKE-compatible build tool, so you can try subbuilds yourself.
Read the rest of this entry »

Using Markov Chains to Generate Test Input

One challenge that we’ve faced at Electric Cloud is how to verify that our makefile parser correctly emulates GNU Make. We started by generating test cases based on a close reading of the gmake manual. Then we turned to real-world examples: makefiles from dozens of open source projects and from our customers. After several years of this we’ve accumulated nearly two thousand individual tests of our gmake emulation, and yet we still sometimes find incompatibilities. We’re always looking for new ways to test our parser.

One idea is to generate random text and use that as a “makefile”. Unfortunately, truly random text is almost useless in this regard, because it doesn’t look anything like a real makefile. Instead, we can use Markov chains to generate random text that is very much like a real makefile. When we first introduced this technique, we uncovered 13 previously unknown incompatibilities — at the time that represented 10% of the total defects reported against the parser! Read on to learn more about Markov chains and how we applied them in practice.
Read the rest of this entry »

Are Clusters a Dying Technology?

I happened across a blog today that made the claim that accelerating builds by distributing to a cluster of computers is “a dying technology.” Instead, they said, you should take advantage of increasing CPU density to enable increasing levels of parallelism in your builds — a single system with eight or more cores is pretty affordable these days. So is it true? Have we reached a critical point in the evolution of computer hardware? Are clusters soon to become obsolete? In a word: no. This notion is rubbish. If the massive push to cloud computing (which is all about clusters) isn’t convincing, there are (at least) three concrete reasons why clusters will always beat individual multicore boxes.
Read the rest of this entry »

Rules with Multiple Outputs in GNU Make

I recently wrote an article for CM Crossroads exploring various strategies for handling rules that generate multiple output files in GNU make. If you’ve ever struggled with this problem, you should check out the article. I don’t want to spoil the exciting conclusion, but it turns out that the only way to really correctly capture this relationship in GNU make syntax is with pattern rules. That’s great if your input and output files share a common stem (eg, “parser” in parser.i, parser.c and parser.h), but if your files don’t adhere to that convention, you’re stuck with one of the alternatives, each of which have some strange caveats and limitations.

Here’s a question for you: if ElectricAccelerator had an extension that allowed you to explicitly mark a non-pattern rule as having multiple outputs, would you use it? For example:

#pragma multi
something otherthing: input
	@echo Generating something and otherthing from input...

What do you think? Comments encouraged.

On the Proper Setup of a Virtual Build Infrastructure Part 2

In this post I will get into some additional findings from our deployment and also tie all our findings into the deployment of ElectricAccelerator in a virtual environment.

You can find part one of this article here.

Read the rest of this entry »