128-bit floating point numbers on new AMD quad-core Barcelona?

tarheel_hax0r · ‎02-10-2007

There's quite a lot of buzz over at Slashdot about the new AMD quad core chips, announced yesterday:

http://hardware.slashdot.org/article.pl?sid=07/02/10/0554208

Much of the excitement is over the "new vector math unit referred to as SSE128", which is integrated into each [?!?] core; Tom Yager, of Infoworld, talks about it here:

Quad-core Opteron? Nope. Barcelona is the completely redesigned x86, and it’s brilliant

Now here's my question - does anyone know what the inputs and the outputs of this coprocessor look like? Can it perform arithmetic [or, God forbid, trigonometric] operations [in hardware] on 128-bit quad precision floats? And, if so, will LabVIEW be adding support for it? [Compare here versus here.]

I found a little bit of marketing-speak blather at AMD about "SSE 128" in this old PDF Powerpoint-ish presentation, from June of 2006:

http://www.amd.com/us-en/assets/content_type/DownloadableAssets/PhilHesterAMDAnalystDayV2.pdf

WARNING: PDF DOCUMENT

Page 13: "Dual 128-bit SSE dataflow, Dual 128-bit loads per cycle"

Page 14: "128-bit SSE and 128-bit Loads, 128b FADD, 128 bit FMUL, 128b SSE, 128b SSE"

etc etc etc

While it's largely just gibberish to me, "FADD" looks like what might be a "floating point adder", and "FMUL" could be a "floating point multiplier", and God forbid that the two "SSE" units might be capable of computing some 128-bit cosines. But I don't know whether that old paper is even applicable to the chip that was released yesterday, and I'm just guessing as to what these things might mean anyway.

Other than that, though, AMD's main website is strangely quiet about the Barcelona announcement. [Memo to AMD marketing - if you've just released the greatest thing since sliced bread, then you need to publicize the fact that you've just released the greatest thing since sliced bread...]

tarheel_hax0r · ‎02-11-2007

Oops - that should say:

Page 17: "128-bit SSE and 128-bit Loads, 128b FADD, 128b FMUL, 128b SSE, 128b SSE"

tarheel_hax0r · ‎02-12-2007

Bump for a Monday morning - anybody know anything about the capabilities of this "SSE 128" circuitry?"

rolfk · ‎02-13-2007

@tarheel_hax0r wrote:

Bump for a Monday morning - anybody know anything about the capabilities of this "SSE 128" circuitry?"

Hmmm, this is highly special and I don't see LabVIEW adding this just now. Yes The LabVIEW extended floating point potentially would have the ability to be adapted to that but currently makes use of the extended floating point format implemented in all x86 FPUs since about 386. It does use 10 or 12 bytes instead of 16. The flattened format for extended precision floats would however provide space for 16 bytes as that was necessary for some other CPU architectures.

However this being an AMD specific extension and the extended format being seldom really used in LabVIEW, I'm not sure NI would want to go to the effort of detecting this extension AND dynamically change the floating point execution core in the next LabVIEW version already.

So don't hold your breath for it.

Rolf Kalbermatter

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

tarheel_hax0r · ‎02-13-2007

However this being an AMD specific extension and the extended format being seldom really used in LabVIEW, I'm not sure NI would want to go to the effort of detecting this extension AND dynamically change the floating point execution core in the next LabVIEW version already. So don't hold your breath for it.

Traditionally, LabVIEW has maxed out at 80-bit doubles for Intel/AMD hardware, but they've had the 128-bit "quad precision" floats for Sparc/Solaris for quite some time, and the new data type spec sheet clearly stipulates a capacity for both 128-bit real numbers & 256-bit complex numbers:

Numeric Data Types Table

So the LabVIEW development team should have all the templates prepared for type-casting/type-overloading/runtime interpretation of these data types - in theory, all they'd need to do would be to re-compile everything against the AMD math library for "SSE 128".

As far as I know, until now, the only "general purpose" CPUs capable of performing 128-bit floating point calculations in hardware were the 370/390/z-series mainframes [the Sparc/Solaris quad precision floats are actually just a software hack, and manipulating them is orders of magnitude slower than an equivalent hardware calculation would be].

If it's true that AMD Barcelona "SSE 128" really does offer 128-bit floating point calculations in hardware*, and if LabVIEW were to add support for them, then that would be like having the proverbial "super-computer" on your desktop.

For "scientists" [really mathematicians] on a budget, this is could be a serious breakthrough in "scientific" computing - the likes of which we haven't seen in 15 or 20 years.

[*Although I have to admit that I'm dubious - this just seems too much like manna from heaven.]

tarheel_hax0r · ‎02-13-2007

This is really maddening; to date, the only thing I can find at AMD's website is this single sentence, from a press release yesterday:

< Power Advanced and Performance Breakthrough for Features Design Quad-core Native Details>

AMD Details Native Quad-core Design Features for Breakthrough Performance and Advanced Power Efficiencies

February 12, 2007

...High-performance computing (HPC) applications can benefit tremendously from a doubling of Barcelona’s floating-point execution pipeline to 128-bit width, which includes an AMD-only doubling of instruction and data delivery capabilities...

Quad-Core AMD Opteron Processors are expected to be available in mid-2007...

http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~115794,00.html

But man, that sure does sound like 128-bit floating point calculations in hardware...

rolfk · ‎02-14-2007

@tarheel_hax0r wrote:

This is really maddening; to date, the only thing I can find at AMD's website is this single sentence, from a press release yesterday:

But man, that sure does sound like 128-bit floating point calculations in hardware...

Sounds quite like marketing buzz to me and being on /. doesn't help that classification one bit. And while LabVIEWs flatten format reserves 16 bytes for an extended flaoting point that is just the Flatten to and Unflatten from function. It says absolutely nothing about how to build it into the actual compiler engine and that is were things get nasty. It's not just about adding some AMD provided floating point lib into the LabVIEW code but instead extending the LabVIEW internal compiler to create the correct code to be written to the FPU registers. That's quite a bit of work and the fact that this would have to be adapted dynamically based on the available CPU doesn't make it easier.
The actual variables used in the plattform specific code are hardcoded to the type for the CPU architecture and suddenly they need to adapt that at runtime to the correct size. No easy task at all and definitely not one I would recommend to do in a hurry. It's better not to have this feature for quite some time (and who knows maybe there are even bugs in the core itself) than having a bug ridden integration that calculates sometimes wrong.

Rolf Kalbermatter

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

GerdW · ‎02-14-2007

Hi tarheel,

for me this only sounds like 'now we can transport 128 bits of data in one go' instead of 'we can calculate 128 bits in one quad-precision value'!

The pipeline is maybe 128 bits wide, so it can handle 2 extended or 4 single precision values ot once. That's all...

Best regards,
GerdW

using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019

shoneill · ‎02-14-2007

I'm with GerdW on this. It sounds like a simply architecture change.

AMD would surely have to announce yet another set of X86 instructions to allow working with true 128-bit FP numbers, something I'm sure they haven't announced. I mean, the current command set doesn't allow manipulation of 128-bit FP numbers.....

Shane.

Using LV 6.1 and 8.2.1 on W2k (SP4) and WXP (SP2)

tarheel_hax0r · ‎02-15-2007

I posted a query over at the AMD forums, and here's what I was told.

I had hoped that e.g. "128b FADD" would be able to do something like the following:

/* "quad" is a hypothetical 128-bit quad precision */

/* floating point number, similar to "long double" */

/* in recent versions of C++: */

quad x, y, z;

x = 1.000000000000000000000000000001;

y = 1.000000000000000000000000000001;

/* the hope was that "128b FADD" could perform the */

/* following 128-bit addition in hardware: */

z = x + y;

However, the answer I'm getting is that "128b FADD" is just a set of two 64-bit adders running in parallel, which are capable of adding two vectors of 64-bit doubles more or less simultaneously:

double x[2], y[2], z[2];

x[0] = 1.000000000000000000000000000001;

y[0] = 1.000000000000000000000000000001;

x[1] = 2.000000000000000000000000000222;

y[1] = 2.000000000000000000000000000222;

/* Apparently the coordinates of the two "vectors" x & y */

/* can be sent to "128b FADD" in parallel, and the following */

/* two summations can be computed more or less simultaneously: */

z[0] = x[0] + y[0];

z[1] = x[1] + y[1];

Thus e.g. "128b FADD", working in concert with "128b FMUL", will be able to [more or less] halve the amount of time it takes to compute a dot product of vectors whose coordinates are 64-bit doubles.

So this "128-bit" circuitry is great if you're doing lots of linear algebra with 64-bit doubles, but it doesn't appear to offer anything in the way of greater precision for people who are interested in precision-sensitive calculations.

By the way, if you're at all interested in questions of precision sensitivity & round-off error, I'd highly recommend Prof Kahan's page at Cal-Berzerkeley:

http://www.cs.berkeley.edu/~wkahan/

PDF DOCUMENT: How JAVA's Floating-Point Hurts Everyone Everywhere

http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf

PDF DOCUMENT: Matlab's Loss is Nobody's Gain

http://www.cs.berkeley.edu/~wkahan/MxMulEps.pdf

LabVIEW

128-bit floating point numbers on new AMD quad-core Barcelona?

128-bit floating point numbers on new AMD quad-core Barcelona?

Re: 128-bit floating point numbers on new AMD quad-core Barcelona?

Re: 128-bit floating point numbers on new AMD quad-core Barcelona?

Re: 128-bit floating point numbers on new AMD quad-core Barcelona?

Re: 128-bit floating point numbers on new AMD quad-core Barcelona?

Re: 128-bit floating point numbers on new AMD quad-core Barcelona?

Re: 128-bit floating point numbers on new AMD quad-core Barcelona?

Re: 128-bit floating point numbers on new AMD quad-core Barcelona?

Re: 128-bit floating point numbers on new AMD quad-core Barcelona?

Re: 128-bit floating point numbers on new AMD quad-core Barcelona?