Vmx 2

wizard69 · August 7, 2003 11:25AM

Quote:

Originally posted by Amorph

Actually, IEEE specifies an 80 bit floating point type, which was implemented in the 68040 and lost in the transition to PowerPC.

Yes; I always thought that this was step backwards. Apple did have a reference at one time to a data type they called doubledouble, not sure if this is still defined, but obviously somebody thought that there was a need for such a data type and the code to support it. I'm not currently and Apple developer, but if doubledouble is still supported it is still only one of many extended precision floats out in the wild.

[qoute]

For the national debt, you'd want one of those big IBMs that can do fixed-point math in hardware.

[/quote]

Real soon now we will have 64 bit real integers. This is a tremendous expansion in capability application wise. I sometimes believe that people are underestimating just how important 64 bits will be to the future of computing on the desktop and Apple specifically.

Quote:

None of this is really relevant to a revision of VMX - G4s have always been able to do 64 bit floating point just fine (if a bit slowly). The issue is whether it's worth the extra bandwidth and silicon to handle 64 bit values in vectors, and currently the answer appears to be no. SSE2 can do 2x64 bit FP, but that's only to make up for the fact that the x86's built-in floating point unit is hilariously bad. That's not the case on any PowerPC (that has an FP unit in the first place).

I have to disagree just a bit here. As long as code and algorithms exist that make use of 64 bit vectors, it MAY be worthwhile to support such vectors with hardware. Now it may not be a good idea to support them with the current VMX design which was architeched for specific issues on a desktop machine. We certainly would not want support for 64 bit operations disturbing or impacting the performance of current VMX code.

Just as SSE2 improved on the standard Intel FPU an enhanced VMX unit could improve on the PPC FPU. It could be argued though that the thing to do would be to rev the PPC FPU to handle vector type operations on 64 bit data types. Niether of these solutions or capability expansions would deal with the fact that past 64 bit vector machines worked on long vectors as opposed to the short vector design seen in VMX/AltVec.

To rev VMX to truely support 64 bit vectors well I would suspect that one of two things would be required. One would be very wide registers of 256 bits to 512 bits wide. The other would be very long 64 bit wide FIFO's or buffers. I could imagine the instantaneous power disapated in a 512 bit wide VMX unit would be very high, so 64 bit vector operations against a FIFO or buffer make sense. Assuming of course that a FIFO or buffer could deliever data in a single cylce.

Thanks

Dave

wizard69 · August 7, 2003 11:57AM

Quote:

Originally posted by Nevyn

???

If you're running out of bits in double precision floating point math doing anything related to money, fire your programmer.

Yes, the US national debt exceeds $4B, it is still NOWHERE near something that needs floating point sizes beyond double precision. Yes, I grok compounded interest and other methods of getting "partial cents" etc - but _in_the_end_ it all has to be rounded. (Because we don't cut coins into pieces anymore). The other aspect is that something like the debt isn't really a 'single item'. It is a collection of a very large number of smaller items... each of which has its own individual interest rate, maturity etc. -> Each of those is _individually_ calculable with 100% accuracy on hand calculators (once you acknowledge the inherent discreetness + rounding rules).

Hmm I thought that debt was $4T. Well yes the budget (not just the debt) is a collection of smaller items, but that does not mean that one goes about ignoring each and every dollar allocated to them. They can not be rounded out of the picture until the budget is totaled up. The resulting error would be huge. So yeah when it comes time to talk about the budget we may talk about trillions here or there but in the end it is the summation of a bunch of little things that are all significant.

[qoute]

On another note, the text in front of me tabulates only e & pi beyond 16 digits. Well, and pi/2, and a slew of other sillyness like that.

You don't really mean "If you've got the more precise info, you're insane to chuck it, all calculations must proceed with all available information" do you? You really mean "I've assessed 1) the number of significant bits I need in the end, and 2) I've assessed how my algorithm will spread my error bars/reduce the number of significant bits, and 3) I've measured as accurately as I need to, with a fair chunk of extra accuracy"

[/quote]

First off; it is never a good idea to throw away measured data if you have a way to represent it. I really hope that you aren't suggesting this.

The algorithm de jour that you are using at the moment may not process all of that information, but that is nothing new in the history of man kind. We often improve our algorithms as we develop better understandings of our subject interests.

Often it is not a question of measuring as accurately as you need to but of measuring as accurately as you are capable of. By a fair chunk of extra accuracy I thining you are talking about resolution. A measurements resolution as absolutely nothing to do with its accuracy.

Quote:

Because if you really mean "never use approximations when better data is available", please call me when you've accurately and precisely calculated the circumference of a circle exactly 1 meter in diameter. In units of _meters_ please, not multiples of pi. Oh, and here's the first million digits of pi or so.

I mean exactly what I said. To clarify, never use, for a value of PI, 3.14 when it is available in the full resolution of the data type available. Since you have kinly pointed to one of the PI sights that inhabit the net, you should be aware that this constant is covered for just about any data type we can dream up and process at the moment.

I suppose that when a circle of exactly 1 meter in diameter is ever measured we would then be able to know its circumference, I suspect that this is wee bit in the future. Given that you have a circle of approximate diameter of 1 meter, you certainly would use all of the resolution of PI that you have available to calculate circumference. I know that some people will respond in disgust at that last statement, but a little bit of thought should clear things up.

Thanks

Dave

amorph · August 7, 2003 1:05PM

[QUOTE]Originally posted by wizard69

Yes; I always thought that this was step backwards. Apple did have a reference at one time to a data type they called doubledouble, not sure if this is still defined, but obviously somebody thought that there was a need for such a data type and the code to support it.

In this case it fell to the RISC philosophy, which made no provision for large oddball sizes like 80 bits. The great battle cry was that all instructions and data were in identically sized chunks, to remove the complexity of instruction fetching and decoding inherent in CISC architectures.

Quote:

Real soon now we will have 64 bit real integers. This is a tremendous expansion in capability application wise. I sometimes believe that people are underestimating just how important 64 bits will be to the future of computing on the desktop and Apple specifically.

Given that Apple is currently gunning for the UNIX workstation and enterprise server spaces, I don't think anyone is pooh-poohing the relevance of 64 bit there. The question is, when will it become a crucial feature of, say, the iBook? I'm not going to bet against the ingenuity of developers, but right now the obvious uses of 64 bit CPUs in consumer applications are thin on the ground.

Quote:

I have to disagree just a bit here. As long as code and algorithms exist that make use of 64 bit vectors, it MAY be worthwhile to support such vectors with hardware. Now it may not be a good idea to support them with the current VMX design which was architeched for specific issues on a desktop machine.

You aren't disagreeing with me, except that I think VMX has legs. Currently, there are not enough uses for 64 bit values in vector math to justify an implementation in hardware. Maybe the demands of high-end 3D apps will change that down the road. But it won't be a simple change: 2x64 bit "vectors" are hardly worth it, because the 970's dual FPUs can do that just as well, and without the need to pack and unpack the vectors. 4x64 bit vectors mean 256-bit registers, a whole slew of transistors, new instructions, and even more phenomenal bandwidth demands (currently, as fast as it is, the 970's bus can't even come close to keeping VMX fed).

Quote:

Just as SSE2 improved on the standard Intel FPU an enhanced VMX unit could improve on the PPC FPU.

No, there's no analogy there. The x86 FPU is a miserably designed piece of crap that they can't improve without breaking legacy code because of the nature of its design. So the SIMD engine gets to function as a replacement. The PowerPC FPUs have always been better, and more importantly, they've always been designed in a way that allows the implementation to be improved without breaking everything. So if you want better FPU performance in, say, the 971, you just beef up the FPUs or add more units. You don't touch the SIMD engine unless you want to improve that.

Quote:

It could be argued though that the thing to do would be to rev the PPC FPU to handle vector type operations on 64 bit data types. Niether of these solutions or capability expansions would deal with the fact that past 64 bit vector machines worked on long vectors as opposed to the short vector design seen in VMX/AltVec.

I wouldn't argue that. FPUs should do FP, and SIMD engines should do SIMD.

Quote:

To rev VMX to truely support 64 bit vectors well I would suspect that one of two things would be required. One would be very wide registers of 256 bits to 512 bits wide. The other would be very long 64 bit wide FIFO's or buffers.

The stack based (FIFO) design is what crippled the x86 FPU permanently.

nevyn · August 7, 2003 1:38PM

Quote:

Originally posted by wizard69

Given that you have a circle of approximate diameter of 1 meter, you certainly would use all of the resolution of PI that you have available to calculate circumference.

[/B]

Of course it is daft to use just 3.14.

But it is equally daft to use "all of the resolution that you have available" unless you _know_ that the end result will use it.

A guy with a hand saw and a string trying to make a 1 meter circle in a board doesn't need to go beyond a couple of digits -> he isn't going to come anywhere near the size he calculated he needed anyway.

A guy with better normal tools might need another couple of digits.

A guy using several sets of laser inferometry measurements with statistical precision calculations to ensure tool position could use a really solid set of calculations.

But if the millionth digit of pi appears anywhere in any of these three calculations, one hell of a lot of wasted work occurred.

I am _NOT_ saying anyone should wantonly discard useful starting information. Just that "useful" is dependent on context, and the contexts where the millionth digit of pi are useful are... rare. A reliance on "I'll just use higher precision in my calculation" is often a sloppy avoidance of analyzing the error propagation through your algorithm. It can also lead to an overconfidence, particularly if there's an overlooked discontinuity.

ed m. · August 7, 2003 4:06PM

You wouldn't want double-precision in the vector unit. Period.

21st post down from the top:

http://forums.appleinsider.com/showt...X%2FSSE%2FSSE2

--

Ed

tomb of the unknown · August 7, 2003 5:04PM

Quote:

Originally posted by Ed M.

21st post down from the top:

21? Sorry, can't count that high. Not enough bits.

Oh, wait... missed one.

wizard69 · August 7, 2003 8:22PM

Quote:

Originally posted by Nevyn

Of course it is daft to use just 3.14.

But it is equally daft to use "all of the resolution that you have available" unless you _know_ that the end result will use it.

Maybe I did not communicate this well. My point is if your development system has PI defined as a full double precision value in makes no sense to round it off or use 3.14, before use. Same thing goes for a single precision value or doubledouble.

Quote:

A guy with a hand saw and a string trying to make a 1 meter circle in a board doesn't need to go beyond a couple of digits -> he isn't going to come anywhere near the size he calculated he needed anyway.

A guy with better normal tools might need another couple of digits.

A guy using several sets of laser inferometry measurements with statistical precision calculations to ensure tool position could use a really solid set of calculations.

But if the millionth digit of pi appears anywhere in any of these three calculations, one hell of a lot of wasted work occurred.

Yes this was not communicated well resolving PI beyond the resolution of your data type is wasted energy. Like wise it is not to smart to use a single precision value of PI when the rest of you calculations are double precision.

It is funny some of the examples you gave as I was thinking along similar lines. That is a sheet metal craftsman trying to make a tube 1 meter in diameter. He would have a very hard time getting good results using some of the rounding suggestions mentioned in this thread.

Quote:

I am _NOT_ saying anyone should wantonly discard useful starting information. Just that "useful" is dependent on context, and the contexts where the millionth digit of pi are useful are... rare. A reliance on "I'll just use higher precision in my calculation" is often a sloppy avoidance of analyzing the error propagation through your algorithm. It can also lead to an overconfidence, particularly if there's an overlooked discontinuity.

I could not agree more with the above statement. Yet at the same time I've seen many an instance where people have thrown away resolution and then wondered why they are having so much trouble. It is amazing that people accept that when you divide 1/2 by 2 you get 1/4 yet reject that 0.5 divided by 2 = 0.25 is a valid result. It certianly is in the real world. A simplification of course but I've had educated people try to convince me that this is the only point of view on the subject.

I'm willing to state that in a similar manner, that these off the cuff old wives tales, about rounding and resolution, often end up producing the same overconfidence. It takes a bit of thought to determine where the best place is to drop resolution or introduce rounding.

wizard69 · August 7, 2003 8:42PM

Quote:

Originally posted by Amorph

Given that Apple is currently gunning for the UNIX workstation and enterprise server spaces, I don't think anyone is pooh-poohing the relevance of 64 bit there. The question is, when will it become a crucial feature of, say, the iBook? I'm not going to bet against the ingenuity of developers, but right now the obvious uses of 64 bit CPUs in consumer applications are thin on the ground.

I'm also left with the impression that the workstation is the direction that Apple is headed in. While it may be a while before the iBook moves to 64 bit, I think Apple will find that it will have no choice in the manner. It will come down to an issue with addressable memory, not strictly a 64 bit issue but probally the easiest way to solve that issue.

Quote:

You aren't disagreeing with me, except that I think VMX has legs. Currently, there are not enough uses for 64 bit values in vector math to justify an implementation in hardware. Maybe the demands of high-end 3D apps will change that down the road. But it won't be a simple change: 2x64 bit "vectors" are hardly worth it, because the 970's dual FPUs can do that just as well, and without the need to pack and unpack the vectors. 4x64 bit vectors mean 256-bit registers, a whole slew of transistors, new instructions, and even more phenomenal bandwidth demands (currently, as fast as it is, the 970's bus can't even come close to keeping VMX fed).

Yes we are real close here. The point I'm trying to make is that there are applicaitons for floating point (single and double) vector processing that VMX is not optimised for. I'm thinking about the type of applications that old Crays and other supercomputers where optimised for. The current PPC register based FPU could be improved on a great deal for certain types of applications. But it may make more sense to leave the complexity out of the FPU and add it to a specialized unit such as the rumored VMX2.

Quote:

No, there's no analogy there. The x86 FPU is a miserably designed piece of crap that they can't improve without breaking legacy code because of the nature of its design. So the SIMD engine gets to function as a replacement. The PowerPC FPUs have always been better, and more importantly, they've always been designed in a way that allows the implementation to be improved without breaking everything. So if you want better FPU performance in, say, the 971, you just beef up the FPUs or add more units. You don't touch the SIMD engine unless you want to improve that.

Again I can't totally disagree here other than to say that a capability for vector math with 64 bit floats is a positive addition to the CPU. If that happens in the FPU, the VMX or a new unit doesn't make much difference. Logically though the VMX unit would take some of these capabilities cleanly. That is we are talking new data types for existing instructions.

Quote:

I wouldn't argue that. FPUs should do FP, and SIMD engines should do SIMD.

Yep all I'm really talking about is extending the SIMD unit to add doubles to its singles FP capability. Part of this involves a much wider register set but that is and advantage all around.

Thanks

Dave

Quote:

The stack based (FIFO) design is what crippled the x86 FPU permanently.

amorph · August 7, 2003 9:24PM

Quote:

Originally posted by wizard69

Yet at the same time I've seen many an instance where people have thrown away resolution and then wondered why they are having so much trouble. It is amazing that people accept that when you divide 1/2 by 2 you get 1/4 yet reject that 0.5 divided by 2 = 0.25 is a valid result. It certianly is in the real world. A simplification of course but I've had educated people try to convince me that this is the only point of view on the subject.

It is, if you think about what the problem they're identifying is.

Fractional representations are perfectly precise because they are abstract: 1/2 is exactly 1 divided by 2, with no possibility of noise or inaccuracy.

0.5, on the other hand, by scientific convention means 0.5<and any additional precision was lost to noise, coarse measuring tools etc.>. In other words, 0.5 does not mean 1/2. It could be 0.503, or even 0.55, or 0.49. In that case, the best you can do is say that 0.5/2 = 0.2 - the approximation is simply an admission that the data is noisy, and the equal sign is analogous at best to its mathematical counterpart. Along these lines, 0.50 / 2.00 = 0.25 - but that still isn't the same as 1/2 divided by 2 = 1/4. You've just pushed the noise back one significant digit.

One of the things that floating point does, actually, is present the illusion of precision by ignoring the idea of significant digits, and this along with the lack of a built in "equality within a given delta" operator actually introduces inaccuracy to measurements, and instills false confidence. (FP also introduces inaccuracies via approximation, but at 64 bits that is only a problem at unusual extremes, and problem children like 1/3 and 1/10.) If all you know is that you have a measurement of 0.25-and-change, then it's not accurate to say that you have a measurement of 0.25395264823. You might... As a result, responsible FP code tracks the delta that represents the real, guaranteed accuracy, and uses it when appropriate to clip the machine's overly optimistic "precision" and compensate for its exacting comparison operators.

Quote:

Yep all I'm really talking about is extending the SIMD unit to add doubles to its singles FP capability. Part of this involves a much wider register set but that is and advantage all around.

Yes, but you still haven't made a case for it. I'm sure that, somewhere, there's a problem involving 1,024 bit vectors. Maybe someone's trying to model the impact of the solar wind on the Milky Way down to the cubic femtometer? Before you build it into hardware, you have to ask what the benefit is vs. the cost. The cost is not inconsiderable: Much wider registers are a benefit all around until your massive vector unit spends most of its time twiddliing its thumbs while the bus and main RAM struggle under the load, and the caches thrash constantly.

In short, you can't really argue for the adoption of this technology until you sit down and figure out how hard it is to implement, and what the implications will be for the rest of the CPU and the rest of the board. Right now, today, VMX will cheerfully eat four times the total bandwidth of the 970's bus, starving out the rest of the CPU. Double the register width, and the bandwidth requirements double. If you want something to replace a supercomputer, you need to give it all the bandwidth it could ever want, or it'll sit there twiddling its thumbs at incredibly high speed. And, if you're Apple, you have to figure out when your ersatz Cray will appear in a PowerBook or an iMac - an eventuality which every extra transistor delays.

programmer · August 7, 2003 10:25PM

Quote:

Originally posted by Amorph

It is, if you think about what the problem they're identifying is.

Fractional representations are perfectly precise because they are abstract: 1/2 is exactly 1 divided by 2, with no possibility of noise or inaccuracy.

0.5, on the other hand, by scientific convention means 0.5<and any additional precision was lost to noise, coarse measuring tools etc.>.

I don't think I agree with this... the decimal notation in-and-of itself doesn't imply the loss of any precision. If I write 0.5, I mean 0.5. If I write 0.5 +/- 0.05 then I mean there was a loss of precision. The problem is that people don't pay attention to their levels of precision.

And additional problem is that 0.5 is a decimal representation, and IEEE floating point is a binary representation. Some floating point numbers represent values which cannot be written precisely in decimal, and visa versa. This leads to inaccuracies that people don't usually pay attention to.

The real problem in both cases is that people don't understand computational math, or choose to ignore its finer points.

powerdoc · August 8, 2003 12:48AM

Quote:

Originally posted by Programmer

I don't think I agree with this... the decimal notation in-and-of itself doesn't imply the loss of any precision. If I write 0.5, I mean 0.5. If I write 0.5 +/- 0.05 then I mean there was a loss of precision. The problem is that people don't pay attention to their levels of precision.

Yes, but by 0,5 he means 0,5 displayed by a calculator, and calculators are limited in digits. Otherwise you are right 0,5 has the same absolute precision than 1/2.

However there is no way to dysplay without loss of precisions 1/3 without employing the fractions.

Your 0,5 +/- 0,05 is an interesting notion. Unfortunately, such an info is not offered often in computation calculation. It would be fine it the soft give this sort of info. It's easy to guess for a simple division, but it's very difficult to guess after a whole complex calculation. Do you know a software, that he is able to give the loss of precision of a complex calculations (after millions of calculations for example) ?

programmer · August 8, 2003 1:14AM

Quote:

Originally posted by Powerdoc

Yes, but by 0,5 he means 0,5 displayed by a calculator, and calculators are limited in digits. Otherwise you are right 0,5 has the same absolute precision than 1/2.

But if you punch 1 / 2 = into a calculator you will get 0.5 and it will be an exact answer.

powerdoc · August 8, 2003 3:41AM

Quote:

Originally posted by Programmer

But if you punch 1 / 2 = into a calculator you will get 0.5 and it will be an exact answer.

Yes but if yo do 0,5(log (inv log))n times, you won't have an exact answer. it will give you something like 0,500000001. And for the computer there is no difference between exact answers and approximatives ones.

Some mathematicians have made tricky programs that multiply the imprecision in a such exponential way, that it leads to great errors. In this way, the show the limit of math simulation.

The important is to be able to detect when such a thing appears. For basic calculation like 1/2, it's simple, for complex mathematic calculations or simulations it's a much more difficult task. You don't have to follow blindly what the computer said.

programmer · August 8, 2003 9:28AM

Quote:

Originally posted by Powerdoc

Yes but if yo do 0,5(log (inv log))n times, you won't have an exact answer. it will give you something like 0,500000001. And for the computer there is no difference between exact answers and approximatives ones.

Some mathematicians have made tricky programs that multiply the imprecision in a such exponential way, that it leads to great errors. In this way, the show the limit of math simulation.

The important is to be able to detect when such a thing appears. For basic calculation like 1/2, it's simple, for complex mathematic calculations or simulations it's a much more difficult task. You don't have to follow blindly what the computer said.

Yes, but the statement I'm objecting to was that 0.5 is somehow less accurate than 1/2. This is not correct. A given floating point number represents a number precisely, that just may not be the number you wanted and it may not be possible to convert that number of a decimal representation in an exact way. The imprecision comes from the calculations (including conversion), and occurs because of using a fixed format representation (e.g. 32-bit or 64-bit floating point).

The IEEE standard contains the "inexact" flag and exception. Any calculation that involves rounding, overflow, or underflow will set this flag or throw this exception. Unfortunately it doesn't track the amount of error for you, but it could be used to detect when it happens. Unfortunately most calculations are going to have something inexact in them so that flag isn't going to help you a whole lot.

amorph · August 8, 2003 9:40AM

Quote:

Originally posted by Programmer

Yes, but the statement I'm objecting to was that 0.5 is somehow less accurate than 1/2. This is not correct.

That depends on how you got it. If you measured 0.5 experimentally, there's an implicit +/-. The shorthand has always been that you explicitly give the number of significant digits you're sure of. So in this realm, 0.5, 0.50, 0.500, and 0.5000 are all slightly different, and converge on the mathematical real number 0.5, which is obviously equivalent to 1/2.

Put it this way: If you have to write down thousands of measurements, would you consistently write down 0.5 +/- some delta, or would you adopt the "significant digits" shorthand?

Quote:

A given floating point number represents a number precisely, that just may not be the number you wanted and it may not be possible to convert that number of a decimal representation in an exact way.

If the only precision you care about is that of the number you want - and under what circumstance would any other definition apply? - then this isn't precision at all. In the best case, it's the value you want to the precision you're guaranteed (by the quality of your measurement or calculation) plus or minus some essentially random noise introduced by the FP hardware. This infects both attempts at pure mathematics (because of flaws is FP representation of real numbers) and in calculation from experimental or observational results (because of the former reason, and because of illusory precision).

Quote:

The IEEE standard contains the "inexact" flag and exception. Any calculation that involves rounding, overflow, or underflow will set this flag or throw this exception. Unfortunately it doesn't track the amount of error for you, but it could be used to detect when it happens. Unfortunately most calculations are going to have something inexact in them so that flag isn't going to help you a whole lot.

So you end up with the code I've seen that tracks the delta manually and ignores the built-in comparison operators in favor of hand-rolled functions that take the delta into account, and track the number of significant digits.

wizard69 · August 8, 2003 9:56AM

Quote:

Originally posted by Amorph

It is, if you think about what the problem they're identifying is.

Fractional representations are perfectly precise because they are abstract: 1/2 is exactly 1 divided by 2, with no possibility of noise or inaccuracy.

0.5, on the other hand, by scientific convention means 0.5<and any additional precision was lost to noise, coarse measuring tools etc.>. In other words, 0.5 does not mean 1/2. It could be 0.503, or even 0.55, or 0.49. In that case, the best you can do is say that 0.5/2 = 0.2 - the approximation is simply an admission that the data is noisy, and the equal sign is analogous at best to its mathematical counterpart. Along these lines, 0.50 / 2.00 = 0.25 - but that still isn't the same as 1/2 divided by 2 = 1/4. You've just pushed the noise back one significant digit.

This is exactly what I'm getting at. There is no reason to infer that there is precision lost or course measuring tools or anything of that nature. In the end you are specifing the same value - 0.25 represents the same thing as 1/4.

Quote:

One of the things that floating point does, actually, is present the illusion of precision by ignoring the idea of significant digits, and this along with the lack of a built in "equality within a given delta" operator actually introduces inaccuracy to measurements, and instills false confidence. (FP also introduces inaccuracies via approximation, but at 64 bits that is only a problem at unusual extremes, and problem children like 1/3 and 1/10.) If all you know is that you have a measurement of 0.25-and-change, then it's not accurate to say that you have a measurement of 0.25395264823. You might... As a result, responsible FP code tracks the delta that represents the real, guaranteed accuracy, and uses it when appropriate to clip the machine's overly optimistic "precision" and compensate for its exacting comparison operators.

If you have a measurement of 0.25 and change as you say it is silly to dispose of that "change" until you know that it is relavant. There may be little precision in that measurement but you do want to keep and track all of the resolution that you had when you made the memsurement. You seem to be making the mistake or confusing resolution with precision, they are not the same thing.

Quote:

Yes, but you still haven't made a case for it. I'm sure that, somewhere, there's a problem involving 1,024 bit vectors. Maybe someone's trying to model the impact of the solar wind on the Milky Way down to the cubic femtometer? Before you build it into hardware, you have to ask what the benefit is vs. the cost. The cost is not inconsiderable: Much wider registers are a benefit all around until your massive vector unit spends most of its time twiddliing its thumbs while the bus and main RAM struggle under the load, and the caches thrash constantly.

Frankly you have not made any case at all for keeping doubles out of a SIMD unit of any type. All you have to accept is that singles, that is 32 bit floats, do not have the dynamic range for many applications. It really doesn't matter if the application needs one or two bits extra or a lot more, the next logical data type is double. This implies that it would be reasonable to expand the VMX to handle 64 bit data types.

I don't deny that implementing such capabilities will cost in transistors. It is probally for this reason that the rumor is directed at 65nm devices. The same argument can be made that the reason AltVec has its current limitations is one of economics with respect for the process it was first targeted at.

The issues with caches and data transfers are real, which is one of the reasons I believe that Apple & IBM are looking seriously at improving VMX. Maybe not VMX2 as the rumors describe but improvements none the less. Since such improvements would address data movement and buffering as much as anything else, new data types and instructions would experience less of a hit. Remember this rumor revolves around a new or improved VMX unit, hopefully it will not be saddled with the current units limitations.

Quote:

In short, you can't really argue for the adoption of this technology until you sit down and figure out how hard it is to implement, and what the implications will be for the rest of the CPU and the rest of the board. Right now, today, VMX will cheerfully eat four times the total bandwidth of the 970's bus, starving out the rest of the CPU. Double the register width, and the bandwidth requirements double. If you want something to replace a supercomputer, you need to give it all the bandwidth it could ever want, or it'll sit there twiddling its thumbs at incredibly high speed. And, if you're Apple, you have to figure out when your ersatz Cray will appear in a PowerBook or an iMac - an eventuality which every extra transistor delays.

Since this is a chip that apparently hasn't even left the stage of realization, the above concerns are really not valid. Bandwidth is and always will be an issue. Any new design would have to address those bandwidth issues, but those issues would be in place even if the SIMD unit was not touched. You only need to give the improved VMX unit enough bandwidth to make it a significant solution over other performance improvements. Trade offs in processor design are not going away, it is a matter of getting the best bang for the buck for the processors targeted market. As you have accurately indicated there is a great deal of room for improvement in the current design.

Thanks

Dave

programmer · August 8, 2003 10:15AM

Not needing 64-bit floats is not the reason to avoid putting them into VMX2.

Better arguments are to be found in examing the costs of such an addition in terms of the amount of machine state to be preserved, the opportunity cost, and the effects of fragmenting the PowerPC user programming model further. Intel and AMD have been changing their programming model on a whim for years and look at the mess it has gotten them into, and developers either don't bother using any of it or choose to support some very small subset (but the hardware has to support it all).

For a given number of transistors is it better to add double support VMX2, or improve the normal FPU implementation? If you add transistors to the VMX2 units then nobody gets the benefit until they recode for VMX2 specifically. If you add to the FPUs then everybody already doing double precision benefits immediately. In an SMT design there is probably a thread running on the processor somewhere that can use all those FPUs all the time. If you extend VMX to have doubles you pretty much have to double the register width to be useful, adding another 512 bytes to a full context switch.

The AltiVec unit is great. It is a terrific design. I just don't see that increasing the register size and adding the huge complexity of a quad double precision unit is worth it, however. There are other instructions they could add first that improve what the unit can do already without having to introduce such an expensive new type. With this kind of an addition everybody pays the price, but few reap the benefits.

amorph · August 8, 2003 11:25AM

Quote:

Originally posted by wizard69

This is exactly what I'm getting at. There is no reason to infer that there is precision lost or course measuring tools or anything of that nature. In the end you are specifing the same value - 0.25 represents the same thing as 1/4.

No, it's not what you're getting at. You're missing the point. The mathematical real number 0.25 is the mathematical real 1 divided by the mathematical real 2. Anything measured is a lot messier than that, and any responsible scientist has to account for that. A measured value of "0.25" could be any of: 0.24999, 0.253, 0.25000000001, or even precisely 1/2 (although what are the odds of that?). You don't know what the exact value is, so any assumptions beyond the initial signficant digits are almost guaranteed to be false.

Quote:

If you have a measurement of 0.25 and change as you say it is silly to dispose of that "change" until you know that it is relavant. There may be little precision in that measurement but you do want to keep and track all of the resolution that you had when you made the memsurement. You seem to be making the mistake or confusing resolution with precision, they are not the same thing.

At this point I have no idea what you're talking about. Precision represents the accuracy with which something can be represented. It applies both to measurements and to representations in floating point, which is why people refer to "64 bit precision" and "precision tools".

How is resolution different from precision, anyway? Both specify a quantum value beneath which the representation is no longer accurate.

You've misinterpreted what I've said. If you have a measurement of "0.25 and change" you don't know what that change is. It could be zero, or it could not be. Disposing of it is not an issue, because you don't know what it is in the first place. If you could measure it in any meaningful way, there would be significant digits to represent it! The fact that FP might, over the course of calculations, introduce a whole bunch of extra digits (but not significant digits, because you can't get signal from noise), is an unwelcome artifact. It's not anything you can use, and it's not the "change" I was referring to.

Quote:

Frankly you have not made any case at all for keeping doubles out of a SIMD unit of any type. All you have to accept is that singles, that is 32 bit floats, do not have the dynamic range for many applications.

All you have to do is answer the question: How many applications, and are they worth the cost of implementing a vector unit vs. using parallelism and conventional FP units? It's nice that there are supercomputers that can do this, but Apple doesn't make supercomputers (silly marketing hype aside). I don't pretend to know the answer to that question, but it's not a simple question, and it can't be blown off.

We don't know that IBM and Apple (and probably Mot) are revisiting VMX to provide this functionality, either. There are all kinds of capabilities they could add that would greatly improve its appeal for streaming and signal processing work, without changing the sizes of the registers or supporting 64 bit anything.

Quote:

Since this is a chip that apparently hasn't even left the stage of realization, the above concerns are really not valid.

No, they're valid, they just aren't anything more than "concerns." I'm not saying can't or won't or shouldn't. I'm merely pointing out that the concerns are hairy enough, and the payoff uncertain enough, and other features desirable enough, that the support you want might or might not happen even at 65nm. Whether it happens depends on a large number of variables whose values are currently unknown.

Personally, looking at the problem, I am leaning toward "won't happen." The current top of the line PowerMac can crunch through 4 FP calculations per clock (2 CPUs with 2 FPUs each). There you go - no additional hardware necessary, and twice the memory bandwidth and twice the cache that would be available to a 64-bit-savvy VMX unit on one CPU.

Quote:

Trade offs in processor design are not going away, it is a matter of getting the best bang for the buck for the processors targeted market.

So what are the tradeoffs in going to 256 bit or 512 bit registers, and how easy are they to surmount? Is it worth it to the target market? There is zero use for it in the embedded space (any time soon), zero use on the desktop (any time soon), so that leaves the workstation and server markets. Servers might be able to use it for IPv6 networking, but IBM seems to have other ideas for that sort of thing (FastPath, which will intercept a lot of the system interrupts and allow the main CPU to keep crunching away).

bigc · August 8, 2003 12:01PM

Quote:

Originally posted by Amorph

No, it's not what you're getting at. You're missing the point. The mathematical real number 0.25 is the mathematical real 1 divided by the mathematical real 2. Anything measured is a lot messier than that, and any responsible scientist has to account for that...

....

now that's the way to start a mathematical argument...

powerdoc · August 8, 2003 12:17PM

Quote:

Originally posted by Bigc

now that's the way to start a mathematical argument...

It was just a syntax error. Don't be hard with Amorph : he has the power

Vmx 2

Comments