| |
|
|
 |
|

The technology behind dual CPU Opterons
Written originally for the Poweroid
9334, some parts also apply to other dual
Xeon Opteron machines we sell.
Some
people may ask, what makes a good workstation?
Since "time is money", we'd say a computer
that gets your work completed in the least amount of
time, and provides the highest level of stability, is a good workstation.
The idea is to
create a tool that eliminates, where possible, or
reduces performance bottlenecks to the maximum extent
that current technology allows. The Poweroid 9334
is exactly that kind of machine.
The
most obvious contributor to that performance in this
machine is
the pair of Opteron processors working in concert.
Two heads is always better than one right? Unlike
Intel's "HyperThreaded" processors, or
some future implementations of "dual core"
CPU's, this is a full fledged dual processor
implementation. That translates to two full cores, on separate
dies, each having their own full speed cache. Thanks to the design of the Opteron, they also have
their own separate pipes to the memory.
Direct
Memory Access
Each
processor has an integrated dual channel memory
controller, going to it's own set of slots. This
is the opposite of a Xeon setup, where dual processors
there must choke down through the Front Side Bus (FSB)
to a shared dual channel memory controller. AMD is
able to use a more independent setup thanks to the
inclusion of two "HyperTransport" links
(distinct from hyperthreading)
in
each processor. One acts as a "crossbar"
for sharing cache information between CPUs, while the
other goes to the AMD 8151 Graphics Tunnel (also called
the "north bridge"). This allows for the
two CPUs to have high bandwidth conversations as long as the
software being used is optimised for dual processor
operation.
With Intel's HyperThreading feature becoming more
mainstream now more programs are being recompiled for
multithreading, especially those in the workstation and content creation
area.
How
does the Opteron differ from the original
Athlon MP?
Unlike the previous Athlon MP processors,
which Poweroid were the first to introduce to the UK (read
report), the Opterons also support x86 instruction set extensions
such as SSE2. This is an important set of SIMD
(Single Instruction Multiple Data) instructions used for
graphics work to speed up the completion of large
datasets. While these types of instructions are
more clock speed dependant than others (i.e. Xeons gain
more benefit from this because the higher GHz numbers
they come in), they do make the work go faster than it
otherwise would.
Also
squirrelled away in these Opteron processors are the
x86-64 extensions i.e. support for 64 bit processing. This
is The Big Selling Point in all AMD's marketing. At the
moment, there is a lack of true Windows support, but
that doesn't mean support doesn't exist elsewhere.
Linux has been much quicker to capitalise on the uptake of this
feature. What it allows for is the use of more
than 4 GB in total memory, and for individual tasks
(such as large databases or simulations) to each be able
to use more than 2-3 GB of memory (without having
to rely on the current "hacked solutions"). That,
however, is not the only benefit for customers. Programs compiled
in x86-64 also gain performance from the doubling of
available general purpose and SSE registers.
Registers are the smallest available storage area in a
processor, and is where data is stored before it enters
the execution core.

As you
can see from the diagram, not only are the values of the
GPR's extended to 64 bits, but there is also a whole
lot more of them. The SSE registers aren't any
bigger but there are more of them to play with. This allows for is less calls to memory, as the
compiler can keep more values tucked away in these
registers to immediately operate on as and when
necessary. While the
Opteron does possess many more "unnamed"
registers which can be used internally for storing
values, it's not the most efficient way of accomplishing
the goal of executing code.
Another
mention of each processors integrated memory controller
is necessary. While SUN and others have gone this
route before, it's not been seen in the x86 world until
now. By moving this typically discreet IC on to
the processor die, there's one less bus stop for the
data to go through from memory to CPU core. Since
latency is the hardest part of memory speed to increase,
this is an invaluable advancement. Especially for
applications that require random reads to various
different parts of memory, as opposed to sequential
ones. Database apps are a good example of
"random" reads and writes.
The
motherboard used here
Turning
to the motherboard that all this processing horsepower is
plugged into: Poweroid decided to not skimp on this
vital component. Tyan is known
for building some of the best server and workstation
class motherboards and the "Thunder K8W"
doesn't disappoint. With four memory slots for
each CPU, supporting 16GB of RAM in total, expandability
in this area is not an issue. On the Extended ATX PCB
(printed circuit board), there's still room for two
independant PCI-X 64 bit bus runs, with two slots for
each, as well as a standard 32 bit PCI slot. That's
not
to mention the 8x AGP Pro slot for graphics.
What's that "pro" tag for? That
indicates extra power handling capabilities, required
for various high end, energy hungry workstation class
video cards. The gigabit LAN, a standard feature
on most boards these days, is hooked in through an
unconventional method. Instead of helping to
saturate the older PCI bus as is done in most cases,
it's instead attached internally to a PCI-X bus. This keeps it from interfering with
other high bandwidth devices on board, like video
editing cards. The last
feature is one that is not often given enough credit. That's the separate three phase VRM's (Voltage
Regulation Module) used for each processor.
Cheaper boards might try to get away with a single VRM
shared between the two CPUs. Instead, each
processor here gets it's own clean power from it's own VRM
located close to the socket (to reduce transmission
variances over distance).
There
is more to a workstation though than just the sexy bits.
Coming standard with the Poweroid 9334 is a 300GB SATA
HD, possessing 16MB of cache and another new feature
called NCQ.
The NCQ technology, though, is something you won't
benefit from in an AMD based solution as the AMD boards
don't currently support NCQ. However, the fast spindle
speed and the extra high quantity of hard disk cache
(which runs at RAM speeds of nanoseconds rather than
hard disk speeds of milliseconds) does make a big
difference to read/write operations.
Working with digital
content creation requires space of course but making your work
mobile is also important. 700MB available on a CD
isn't always sufficient, but the 9.4GB available on a
dual layer DVD should cover many more cases. And
with 16x speed for writing, you aren't going to have to
wait the hours it would take on a slower drive to get
that info onto the medium. The 9334 also comes
with a 16x DVD reader, so you can copy from DVD to
another at one time, or dump massive amounts of data
onto the host computer through both drives.
Last,
but not least is the choice of an nVidia Quadro workstation class video card. What, you've never
heard of a "Quadro" before? Think of a
GeForce FX card meant for gaming, then add support for
OpenGL calls in hardware. In workstation 3D
graphics, things like two sided lighting, clipping
planes, logic operations, culling etc are required,
which are not called for in the gaming arena. To speed these
operations up cards like the Quadro execute them in the
GPU (Graphics Processor), as opposed to passing them through the driver path
to the CPU like a standard game/desktop card does.
The hardware path of a Quadro is very optimized to
complete these in "real time", so that you can
rotate and translate 3D models without having to wait
for them to render. Again, it all boils down to
getting your work done faster. A workstation
graphics card is a key part
in accomplishing that goal when you work with 3D graphics.
|
|
Article
on dual core
|