Macintoshes coming next week ? month ? year ?

blabla · August 6, 2002 7:02PM

[quote]Originally posted by Ed M.:



"To begin with Intel does not have a separate SIMD unit like PowerPC does. If you want to use MMX/SSE/SSE2 on a Pentium, you have to shut down the FPU. That is very expensive to do. As a work around, Intel has added Double precision to its SIMD so that people can do double precision math without having to restart the FPU. You can tell this is what they had in mind because they have a bunch of instructions in SSE2 that only operate on one of the two doubles in the vector. They are in effect using their vector engine as a scalar processing unit to avoid having to switch between the two. Their compilers will even recompile your scalar code to use the vector engine in this way because they avoid the switch penalty."

<hr></blockquote>

Now, thats interesting.

ed m. · August 6, 2002 7:17PM

blabla...

Keep reading... ;-)

--

Ed

programmer · August 6, 2002 7:58PM

The x86 need to "turn off" the FPU applies only to the MMX unit, I believe, and it is because the MMX unit actually uses the FPU's register "stack" as the 8 MMX registers. The SSE/SSE2 added their own set of 8 128-bit registers in addition, so they can be used at the same time as the FPU. The problem with the FPU (and the reason why you might want to use SSE2 for doubles instead) is that it is a hideous stack-register based design that was outdated in 1990! The PowerPC FPU programming model is hugely superiour to the x86 model, and that is why both AMD and Intel are trying to replace it with something better.

gamblor · August 7, 2002 12:22AM

[quote] I have a funny feeling that if they went SP their renderer would suddenly get legs, and the impact on accuracy would be negligible when all was said and done. I believe NewTek made the same discovery a while back. <hr></blockquote>

No. Newtek discovered that SP (and hence Altivec) could be useful for modeling and previewing, but not for final rendering.

[quote] And with respect to 3D apps *requiring* double precision...

Most 3D rendering apps do not NEED double precision everywhere. They just need it in a few places, and often (if they really decide to look) they may find that there are more robust single-precision algorithms out there that would be just as good. In the end they should be using those algorithms anyway, because the speed benefits for SIMD are twice as good for single precision than they are for double precision. <hr></blockquote>

No. While it's true that there are places where SP can be used (transformations, hidden surface removal, light intensity calculations, etc.), the lion's share of the calculations in a high quality render MUST be DP. If you tried to substitute SP for DP, you'd find that the precision error would quickly effect the render, which is VERY BAD. It could even get so bad as to eclipse the actual signal; it depends on the complexity of the render. In a scene with millions of polygons, dozens of lights, dozens (at least) of textures, and using radiosity with the ray bouncing a couple dozen times, calculating a single pixel gets extremely complex, requiring hundreds or thousands of operations (perhaps tens of thousands?). The precision error adds up QUICKLY, so it's best to keep it as small as possible from the get-go.

[quote] The SSE/SSE2 added their own set of 8 128-bit registers in addition, so they can be used at the same time as the FPU. <hr></blockquote>

Hmmmm... I don't think that's quite right. IIRC, SSE reduced the overhead in switching from SSE to x87 compared to MMX (from 50 clock ticks to 1, or something like that), but it still uses the same register set, as does SSE2. With SSE2, it shouldn't make any difference; Intel designed SSE2 to completely replace x87, so there shouldn't be a reason to mix x87 and SSE2 code.

It's been a while since I've looked at this stuff, so I could just be remembering things wrong. It's happened before who knows

.

programmer · August 7, 2002 12:43AM

[quote]Originally posted by Gamblor:

Hmmmm... I don't think that's quite right. IIRC, SSE reduced the overhead in switching from SSE to x87 compared to MMX (from 50 clock ticks to 1, or something like that), but it still uses the same register set, as does SSE2. With SSE2, it shouldn't make any difference; Intel designed SSE2 to completely replace x87, so there shouldn't be a reason to mix x87 and SSE2 code.

It's been a while since I've looked at this stuff, so I could just be remembering things wrong. It's happened before who knows

.<hr></blockquote>

<a href="http://x86.ddj.com/articles/sse_pt1/simd1.htm"; target="_blank">http://x86.ddj.com/articles/sse_pt1/simd1.htm</a>;

to quote:

"A major difference between MMX and SSE is that no new registers were defined for MMX, while eight new registers have been defined for SSE. Each of the registers for SSE is 128 bits long and can hold four single-precision floating-point numbers (each being 32 bits long). The arrangement of the floating-point numbers in the new data type handled by SSE is illustrated in Figure 1. "

The original Pentium MMX has a fairly hefty switch cost to/from MMX mode -- about 50-60 cycles, IIRC. Starting with the PentiumII, however that cost was reduced to almost nothing. Nonetheless you can't really intermingle MMX and FPU instructions in the same way that the PowerPC can with FPU and VMX instructions. Even more important, the AltiVec unit handles both integer and float data types whereas MMX does integer and SSE does floating point. I think SSE2 addresses this a little, but still doesn't really compare to the AltiVec unit.

gamblor · August 7, 2002 12:19PM

I was kinda right

From <a href="http://www.intel.com/support/processors/pentium4/sb/1059772772898408-prd483.htm"; target="_blank">http://www.intel.com/support/processors/pentium4/sb/1059772772898408-prd483.htm</a>:

"Streaming SIMD Extensions 2 (SSE2) extends the MMX(TM) technology and SSE technology with the addition of 144 new instructions that deliver performance increases across a broad range of applications. The SIMD integer instructions introduced with MMX technology have been extended from 64 to 128 bits, doubling the effective execution rate of SIMD integer type operations.

New double-precision floating point SIMD instructions allow for two floating-point operations to be simultaneously executed in the SIMD format, providing support for double-precision operations that help accelerate content creation, financial, engineering, and scientific applications. "

So, basically, SSE2 extends both MMX & SSE. Since it's supposed to replace x87 code, Intel has resolved the conflict of sharing MMX with x87 registers by enhancing SSE to the point where it can effectively replace the x87 unit.

Or something like that.

programmer · August 7, 2002 12:27PM

[quote]Originally posted by Gamblor:

I was kinda right

From <a href="http://www.intel.com/support/processors/pentium4/sb/1059772772898408-prd483.htm"; target="_blank">http://www.intel.com/support/processors/pentium4/sb/1059772772898408-prd483.htm</a>:

"Streaming SIMD Extensions 2 (SSE2) extends the MMX(TM) technology and SSE technology with the addition of 144 new instructions that deliver performance increases across a broad range of applications. The SIMD integer instructions introduced with MMX technology have been extended from 64 to 128 bits, doubling the effective execution rate of SIMD integer type operations.

New double-precision floating point SIMD instructions allow for two floating-point operations to be simultaneously executed in the SIMD format, providing support for double-precision operations that help accelerate content creation, financial, engineering, and scientific applications. "

So, basically, SSE2 extends both MMX & SSE. Since it's supposed to replace x87 code, Intel has resolved the conflict of sharing MMX with x87 registers by enhancing SSE to the point where it can effectively replace the x87 unit.

Or something like that.

<hr></blockquote>

Ah yes -- I didn't realize that SSE2 had expanded the MMX registers. So now they have 8 integer registers and 8 fpu registers? I suppose that's an improvement.

airsluf · August 7, 2002 1:22PM

rickag · August 7, 2002 2:52PM

[quote]Originally posted by AirSluf:

Hmmm, now why do we ABSOLUTELY NEED double precision in renderers?[<hr></blockquote>

Not being a mathematician nor a computer programmer, I can only think of one good reason.

"Just because"

[ 08-07-2002: Message edited by: rickag ]

gamblor · August 7, 2002 3:47PM

[quote] But ALAS! Most programmers who have heard of quaternions are scared sh1tless of them because you have to understand the math first, and they don't want to make that time or effort. <hr></blockquote>

How do you know quaternions aren't used extensively in professional 3D renderers already? I don't imagine it would be something they'd advertise on the box.

(Certainly an interesting subject, though. It's been about a decade since I wrote any rendering code, and I don't recall learning about quaternions. In a quick Google search for them I stumbled across this web page: <a href="http://www.javaworld.com/javaworld/jw-08-1998/jw-08-step-p4.html"; target="_blank">http://www.javaworld.com/javaworld/jw-08-1998/jw-08-step-p4.html</a>.

Notice what data type is used for all of the elements.

I think I'm going to do some more digging on this. Thanks for the inadvertant tip!

)

blabla · August 7, 2002 9:03PM

Amazing thread

Anyway, I grabbed this link from the yahoo AAPL board.

<a href="http://developer.apple.com/hardware/ve/pdf/oct3a.pdf"; target="_blank">http://developer.apple.com/hardware/ve/pdf/oct3a.pdf</a>;

It discuss how to implement 256 bit FP (!!) library using Altivec. <img src="graemlins/smokin.gif" border="0" alt="[Chilling]" />

airsluf · August 8, 2002 1:46AM

gamblor · August 8, 2002 2:07AM

Interestingly enough, my cursory search at Google on quaternions turned up the fact that 3DStudio MAX's file format supports them in some capacity. There's got to be something to that... If the code monkeys at Autodesk have caught on to them, then they must be in wide use all over the industry by now.

Thanks for the link! I'm going to be getting back into a bit of 3D programming in the next week or so (yay!), but I probably won't have a need for quaternions, which is a bummer (it'll just be a straight forward format converter, in this case VRML->Lightwave objects, along with some triangle decimation code, or whatever the latest equivalent is.)

rick1138 · August 8, 2002 2:13AM

I'm really surprised that quaternions don't have wider use,the math is actually very easy,which makes them so powerful-the alternative-Euler angles-are really a pain to deal with,and they also have singularities,tears, in the state space.Differential forms and Clifford algebras are other gadgets that are are very powerful,simple to use,much better than what is already in use,but not used much because people just aren't familiar with them.

rick1138 · August 8, 2002 2:23AM

i^2 = j^2 = k^2 = ijk = -1

Macintoshes coming next week ? month ? year ?

Comments