Multi-core Computers
All images Copyright 2007 Lloyd L Chambers


Apple’s Activity Monitor “CPU History” for a 4-core machine

Contents

Introduction

Updated: December 09, 2007

Don’t miss MacPerformanceGuide.com — read about the latest on how to select and configure a Mac, optimizing Photoshop, hard drive performance, RAID, and more!

This article offers a beginner’s introduction to why today’s multi-core computers frequently don’t perform as well as expected. Why doesn’t an 8-core Mac Pro run programs twice as fast as a 4-core Mac Pro?

The primary issues are memory bandwidth and software design.

Background

Even low-end computers have dual CPU cores these days, and higher end units like Apple’s Mac Pro have four or even eight CPU cores. A CPU “core” is what we used to call a CPU (Central Processing Unit). It used to be that one such “brain” was all that could be fabricated, and as a result the focus was on increasing the clock speed of that single core. It increased over the years from around 16Mhz (Motorola 68000 in the original Macintosh) to 4000MHz (4GHz) with some of today’s chips. However, clock rate is no longer an efficient way to get more performance; heat, memory bandwidth (speed) and cost make higher clock rates problematic.

Instead, today’s CPU manufacturers have turned to multi-core designs eg putting more than one “brain” on a single chip. Intel offers two, four or eight core chips with more to come, Sun Microsystems has chips with up to 64 cores, AMD has its quad-core chip (and others), etc. The future belongs to multi-core systems, especially in the server marketplace

But there are some dirty little secrets that you won’t find advertised by Apple, Intel or Dell—quite the contrary—you will find rosy claims with no basis in reality when 99% of the software programs used by the public are considered.

Memory bandwidth

Memory bandwidth is a major limitation with multi-core systems (see Apple Mac Pro and All About Mac Pro Memory). Think of a 3/4" garden hose as compared with a fire hose—the fire hose allows a much greater volume of water to pass through it per unit time. Or compare a dial-up modem to a DSL or cable internet connection; in all cases it is bandwidth or the “pipe” that makes the system as a whole be speedy or sluggish.

All cores must share access to the same memory and therefore have to take turns to read from or write it. Like the Ladies bathroom at a movie theatre, a long queue can form waiting for access—the cores actually become idle for much or even most of the time that they are allegedly computing; they are forced to wait most of the time for their memory access operations to complete.

Assuming a well-designed program, memory bandwidth is the key reason why an eight core machine might not run any faster than a quad-core machine, or might even run a little bit slower. Until more sophisticated memory systems are implemented, eight core machines will offer only modest improvements over quad-core systems, typically -10% to +50% faster, not twice as fast (that’s right—up to 10% slower). There are exceptions of course—programs that are primarily computation-intensive with modest bandwidth needs can show nearly double the performance with eight cores instead of four.

Disk speed

Like memory bandwidth, disk speed can throttle performance. If a disk can read only 40MB/sec (a common figure for even a fast laptop hard drive), then a program that needs to read a 400MB file will take a minimum of 10 seconds to open it, even if it has no computation to do. Using a striped RAID array capable of 200MB/sec would drop that overhead down to two seconds.

Assuming 5 seconds of computation and a program not smart enough to compute while simultaneously reading from the disk, that’s a total run-time of 15 seconds (10+5) instead of 7 seconds (2+5), an approximately 2X speedup.

A well-designed program will be able to compute while simultaneously reading from disk; for the previous example that would result in run times of 10 seconds versus 5 seconds, still twice as fast.

The lesson here is to be aware of what’s actually going on with the software you actually run—if it’s limited by disk speed, invest in a faster disk or RAID before considering a faster computer.

Software and scalability

Another multi-core problem is that software today is frequently not written to take advantage of multiple cores efficiently; it is technically challenging to write efficient threaded code that is free of threading bugs (threading bugs are those that result from incorrectly-written threaded code).

A highly scalable program will run approximately N times as fast on an N-core system as on a single-core system, at least for reasonable values of N (eg up to 16 cores). Creating a scalable program requires a very high degree of expertise; it is relatively easy to write code that runs efficiently on dual-core systems, runs a little faster on quad-core systems, then shows no further improvement beyond four cores. Adobe Photoshop CS3 is an example of a popular program that scales fairly well to dual-core systems, but shows little improvement on quad-core systems.

Some computing problems cannot be solved in a threaded (parallel) fashion; there are too many serial (ordered) dependencies to allow multiple worker threads to work simultaneously. But the fact remains that most programs are just poorly designed for today’s multi-core world, and that includes popular programs such as Adobe Photoshop CS3, which rarely utilizes more than two of four cores on my quad-core Mac Pro (125% - 200% is typical).

Sometimes it’s just a matter of failure to do anything halfway intelligent. For example, Photoshop CS3’s Save command is not only single-threaded (uses only one CPU core), but it’s modal—it forces the user to wait until the operation completes. There is no legitimate reason that a Save could not be in progress while the user works on another image.

Activity Viewer

Apple’s Activity Viewer is a handy-dandy tool for seeing what’s going on in your system. It can be found in the Utilities folder (within the Applications folder).

Activity Monitor window

On Mac OS X you can use Apple’s Activity Viewer to see what’s going on (/Applications/Utilities/Activity Viewer). The Activity Monitor window shows you what percent of CPU time is being used, memory, etc. Click on the % CPU column to sort by CPU usage.


Apple Mac OS X 10.4.x Activity Monitor window

The figure 100% means 100% of one CPU core, so a four-core system has up to 400% available. For example, below is the CPU usage for Photoshop CS3 while saving a large TIF file; the 99.2% figure indicates that only one CPU core is being used (“single threaded”), a poor use of machine resources.


Photoshop CS3 CPU usage while saving a large TIF file
(The “Real Memory” number is an Apple bug)

CPU History window

The CPU History window is very useful; it maintains a graphical history of CPU usage with one graph per CPU core; resize it to a comfortable size for your monitor, and if you have a dual-monitor system, place it on your secondary monitor so you can see it whenever you’d like.


CPU History window

Examples

One program that makes excellent use of the quad-core 3GHz Mac Pro is onOneSoftware’s Genuine Fractals 5.0. Scaling a Nikon D3 image up by 200% shows nearly ideal behavior in terms of using the available processing power. This is what Activity Viewer shows:


Genuine Fractals 5.0 CPU usage on 3GHz MacPro

DXO Optics Pro 5.2 also does a superb job, using all four cores of a 3GHz Mac Pro. This is the behavior you want to see!

cpu usage DXO Optics Pro on quad-core Mac Pro
Genuine Fractals 5.0 CPU usage on 3GHz MacPro

Photoshop CS3 is comparatively poor with most operations in terms of threading. Shown below is the CPU utilization for a Smart Sharpen of a large file (left). Utilization is only about 220% out of a possible 400% (100% is one CPU core). This might be due to memory bandwidth, but is more likely due to an inefficient implementation.


Adobe Photoshop CS3 Smart Sharpen CPU usage

At right is the usage for saving a 600MB file as a compressed TIF (zip) onto a hyper-fast RAID volume (capable of ~ 350MB/sec). The Save operation is single-threaded; Photoshop never uses more than 99.7% of a core while saving, and uses only a fraction of the disk speed available. This would be forgivable if another file could be worked on while saving, but even that is not possible.


Adobe Photoshop CS3 Smart Sharpen (left), save 600MB file (right)

Conclusions

That fancy new computer won’t deliver on its speed promise unless it offers adequate memory bandwidth for the program you actually run. Those programs need to be especially well designed to exploit more than one or two CPU cores.

Please consider buying (anything) through a link below to help support this site at no cost to yourself. Links below are recommended vendors that have proven themselves. Amazon.com has a huge product line, and B&H ships internationally.
  

See also:

Apple Mac ProandAll About Mac Pro Memory

Canon Digital Photo Professional Batch Processing Tip

Hard Drives

PC or Mac—Making a Sensible Choice

PowerMac G5 Internal Drive Kits

Card Readers

Free Articles. And Paid Reviews.

Contact: email a comment on this article.


  You’ve just enjoyed a free digloyd.com article. Diglloyd.com also offers paid articles/reviews, whose depth and breadth you’ll rarely find anywhere else—and which you can rely on for purchasing decisions.

Save yourself the aggravation and expense of buying the wrong equipment, and make the right choice for your photographic needs—get the reviews that analyze, inform, describe, and teach based on real-world experience, while providing detailed examples for you to make your own informed decisions.

na na na na na
Nikon D200 vs D2X Nikon D2X vs Canon EOS Raw-file Converters The Sharpest Image 28mm Shift Lenses