I predicted that an 8-core Mac Pro would be roughly equivalent to a 6-core Mac Pro as compared with the 4-core model. First test results at barefeats.com confirm this prediction, showing that the 8-core Mac Pro doesn’t scale in performance as might be assumed by a layman, but can be about 50% faster with certain tasks.
Many of the tests show no improvement in performance at all—even though all 8 cores are used. Why? As noted in April 5 blog entry, unless the task is compute-intensive, memory bandwidth will limit the performance to about that of a 4-core Mac Pro, because all 8 cores will be trying to access memory at the same time.
It’s not just memory bandwidth either—access to the hard disk and system services in effect require tasks to queue up single-file to wait for the needed resource (memory access, disk, exclusive lock, etc). This is called contention. Like a rush-hour freeway, doubling the number of cars can cut speed by more than half as cars (tasks) must jostle with one another for the same lane space (system resource).
Contention and overhead — Operating systems (eg Mac OS X ) have to juggle the outstanding tasks/applications/threads among the available processor cores as well as the operating system itself. As the number of processor cores increases, this overhead also increases.
In the real world, multi-core systems do not scale linearly and the performance gains of 8 cores over 4 cores are often modest, or even degraded.
Poor scheduling of tasks across processor cores is also responsible for performance degradation. In particular, Mac OS X is fond of switching a task (thread) between cores for apparently no reason at all—I’ve observed this myself on many occasions; with several idle cores, Mac OS X switches a task from one to another core at apparently arbitrary intervals. Each time this happens, the large on-chip processor cache of the “old” core must be flushed to main memory, which is very slow (relatively speaking). As the task resumes running, the on-chip cache must be populated from main memory, again a slow operation in relative terms.
The core-swapping behavior can be seen below; two runs of diglloydTools run-stress-test were done. The first run shows that the single-threaded portion of the test ran on the 4th core for its full duration. The second run shows that the single thread ran on the 1st core for about half its run, then was swapped to the 2nd core—in spite of the 3 other cores being idle! This had little effect under the circumstances, but can have a large effect if there is contention for the CPU by multiple threads.
Mac OS X core swapping
It seems clear that Mac OS X 10.4 has not been adequately optimized for even 4 cores; perhaps Mac OS X 10.5 will improve matters.