Timing the Linux -j(cpu+1) Myth

Quite a few times I've been told to use the make -j(cpu+1) flag when building on Linux. Once I got my smp box up and running I ran a series of tests to see if this was correct. I guess the theory is you should make sure the processor has something to do by telling it to start one more job than the number of processors available.

The original test was on a dual Tualatin 1000 MHz machine, and the results were negative: it turned out make -j2 gave the best times. For historic purposes, I'm leaving the original page intact below.

However, recent tests on my new dual-core 2.8 GHz Intel machine indicate the twinned cores in the new architecture do slightly improve their timings with additional "j"s. However, it's still not j+1; the best numbers were with j+2 (or jx2; I have no way of knowing until I get my hands on a quad. Anyone care to test?)

As soon as I find the file with the timings on it, I'll include them. But suffice it to say the differences between -j2 and -j3 were on the order of a couple of percent, and the difference between -j3 and -j4 much smaller; a bit less than a percent, and in some cases nothing at all. Additional js eventually slow the timing down again.

Okay, tentative results from some Core i3s indicate it is jx2 as long as the processor has hyperthreading turned on (equaling the number of "virtual" processors) but the same old j=#processors without hyperthreading. This is hardly a shock.

Here's the text of the original dual-processor test (as opposed to dual core):

Timing the Linux -j(cpu+1) Myth

Quite a few times I've been told to use the make -j(cpu+1) flag when building on Linux. Once I got my smp box up and running I ran a series of tests to see if this was correct. I guess the theory is you should make sure the processor has something to do by telling it to start one more job than the number of processors available.

The title should be a clue to the results of my tests. The short version is, it's wrong. The best flag is that which matches the number of processors (which means, don't even bother with the -j flag if you only have one).

I did do a uniprocessor test, but didn't bother to write down the times. Here are the results on a dual 1Gh p3 running 2.4.19 (and building it. The test was make -j(n) bzImage after untarring a fresh kernel source each time, doing make oldconfig, then the make bzImage). The config was of course identical in each case. Timings were done using Big Clock on a Sony Clie. These results are averaged over two runs (which were usually identical).

make -j1 dep=24sec bzImage=4min38sec
make -j2 dep=13sec bzImage=2min31sec
make -j3 dep=13sec bzImage=2min33sec
make -j4 dep=19sec bzImage=2min34sec

Technically the first entry is with no flag. I did also write down a time for a single 1Ghz Celeron without flags (but did not write down the j2 and j3 tests I remember running. Oh well).

celeron: dep=23sec bzImage=6min34sec

All these tests were run from the console without much else running. You'll probably note that it's not real important which of the -j flags is used on an smp system, but I think the accuracy is enough to say that matching the job number to the processor count is probably best.

Email:
D A V I D . M A R K . N O R T H