Powerlogix 2x1.6 GHz G4's???

yevgeny · August 7, 2003 12:30PM

Quote:

Originally posted by User Tron

(Isn't it amazing how much effort is put in out of order execution, branch prediction etc. and how little most programmers know about putting their statements in right order to help the compiler and cpu?) I know there's lot of legacy code out there that prevents radical changes but maybe the embedded space will show more variations in VLSI designs.

Quote:

Originally posted by THT

Programmers are lazy. Or, the economics of software prevents software from being optimized while it is the reverse for hardware. It's simply cheaper to make software faster by using faster hardware, hence, OOOE, BP, and in the future, predication, SMT and speculative SMT, are put into processors while programmers concentrate on more features using higher order languages.

You are both really wrong about programmers and programming. Let me count the ways.

User Tron, I write code that is compiled for Linux, HPUX, Solaris, AIX, and Windows. These five platforms run on various CPU's. There is x86 (Intel), x86 (AMD), Power 4, SPARC, Alpha, etc. Which CPU should I arrange my C++ code for? Should I prefer x86 Intel over all others? Should everyone (somehow magically) write their high level code for Intel and everyone else have to do branch prediction because the code was written for some specific CPU?

Further, one of the problems is that CPU vendors CHANGE THE PREFERRED ORDER OF EXECUTION BETWEEN CPU GENERATIONS. That's right kids, hand written assembler code for the original Pentium chip could run like a dog on the Pentium Pro chip. The transition from PII to PIII to PIV were all the same. Each chip chip changes how it would like instructions to be given, and so the best way is for the chip to do all sorts of creative reordering of instructions and branch prediction, etc.

Finally, If I had to write my code for specific CPU's then I would have to be constantly changing it to keep it up to date with new CPUs. Basically, I would be rewriting my code for each new CPU. This would obviously keep me from writing much new code.

Now for THT. Put the crack pipe down THT

. I spend the occasional week in front of a code profiler trying to find better ways to make my code run faster (both in terms of algorithm design and implementation). I am sure that some programmers are lazy, but most are not. Software is made faster by a programmer sitting down and looking at where it is slow. Faster hardware also makes software faster, but it isn't cheaper since it involves the end user buying a new machine and thus limits the potential market for released software. I am sure that there are some programmers who look to new CPU's as their speed increase, but not all do.

Abstraction has been the name of the game ever since IBM made the first compiler for the FORTRAN language. While less abstraction means that you can write more efficient code, more abstraction means that the programmer is insulated from the hardware changes that inevitably happen. Abstraction is a GOOD thing. Abstraction lets programmers get more done. Currently, higher order languages mean languages that abstract you from any of the hardware by running on a virtual machine (Java, .NET). This is a valuable idea for certain programming environments where the platform is distributed.

tht · August 7, 2003 1:08PM

Quote:

Originally posted by Yevgeny

Now for THT. Put the crack pipe down THT . I spend the occasional week in front of a code profiler trying to find better ways to make my code run faster (both in terms of algorithm design and implementation).

Occasional? Hehe, how does this jive with these three paragraphs you wrote in answer to User Tron which indicate that programmers don't optimize:

User Tron, I write code that is compiled for Linux, HPUX, Solaris, AIX, and Windows. These five platforms run on various CPU's. There is x86 (Intel), x86 (AMD), Power 4, SPARC, Alpha, etc. Which CPU should I arrange my C++ code for? Should I prefer x86 Intel over all others? Should everyone (somehow magically) write their high level code for Intel and everyone else have to do branch prediction because the code was written for some specific CPU?

Further, one of the problems is that CPU vendors CHANGE THE PREFERRED ORDER OF EXECUTION BETWEEN CPU GENERATIONS. That's right kids, hand written assembler code for the original Pentium chip could run like a dog on the Pentium Pro chip. The transition from PII to PIII to PIV were all the same. Each chip chip changes how it would like instructions to be given, and so the best way is for the chip to do all sorts of creative reordering of instructions and branch prediction, etc.

Finally, If I had to write my code for specific CPU's then I would have to be constantly changing it to keep it up to date with new CPUs. Basically, I would be rewriting my code for each new CPU. This would obviously keep me from writing much new code.

Actually the primary reason I said what I said is that code, moreover, applications, have complexity such that it is very hard to design and manage. I don't really see the economics of the business promoting speed over features and bug fixes, especially when the upgrade cycles are so short.

Quote:

Abstraction has been the name of the game ever since IBM made the first compiler for the FORTRAN language. While less abstraction means that you can write more efficient code, more abstraction means that the programmer is insulated from the hardware changes that inevitably happen. Abstraction is a GOOD thing. Abstraction lets programmers get more done. Currently, higher order languages mean languages that abstract you from any of the hardware by running on a virtual machine (Java, .NET). This is a valuable idea for certain programming environments where the platform is distributed.

Yes, very true, but it reduces the importance of machine level optimizations, no? Why would the programmer go through with the expense of machine level optimizations if they didn't have to?

user tron · August 7, 2003 1:40PM

Quote:

Originally posted by Yevgeny

You are both really wrong about programmers and programming.

Sorry to dissapoint you but you're wrong about me

! Besides having a degree in CS, I'm programming for 20 years now and I'm teaching programming and OS stuff. So no offence but I know what I'm talking about when I say the average programmer knows only little about what troubles he cause at compile time. I'm pretty sure you do your stuff right but don't think they majority of programmers are like you. As for abstraction we agree, so need for arguing. I was thinking more of things like people using string instead of stringbuilder when concatenating lot's of text or making loops which jump across the mem where linear access is possible etc.

End of Line

yevgeny · August 7, 2003 2:51PM

Quote:

Originally posted by THT

Occasional? Hehe, how does this jive with these three paragraphs you wrote in answer to User Tron which indicate that programmers don't optimize:

Programmers do not do much in the way of machine specific optimization. Programmers do much in the way of proper algorithmic design and making sure that a good algorithm is implemented in a reasonable fashion, making the problem get solved in an "optimal" fashion. This is a world of difference from thinking about how MS DEV or Metrowerks handles an if statement and thinking about what a good memory and CPU cycle efficient solution is in the first place. If statements are cheap when compared to things that get run over and over...

Quote:

Actually the primary reason I said what I said is that code, moreover, applications, have complexity such that it is very hard to design and manage. I don't really see the economics of the business promoting speed over features and bug fixes, especially when the upgrade cycles are so short.

Yes, very true, but it reduces the importance of machine level optimizations, no? Why would the programmer go through with the expense of machine level optimizations if they didn't have to?

Yes, application complexity is difficult to manage. This is why you must design software in advance and spend quite a bit of time in the design phase. My company is notorious for doing lots of design and publishing amazingly large object models.

Programmers do machine level optimizations (e.g. altivec) when they can have a substantial performance impact. Secondly, the only reason why programmers do altivec programming is that Apple isn't going to ditch Altivec. If Apple said that the G6 would not support Altivec, then programmers would drop it tomorrow.

When programmers do machine level optimizations, they do it sensibly- they put the optimizations in a library that is accessed through a very straightforward interface. They (should) then write a version of the library that doesn't use the machine dependent code. You optimize litte chunks that are called frequently, not big parts that are everywhere. The idea is to speed up the parts that are slowest and are called most frequently. What you don't do is try to place optimizations everywhere because then you go through hell trying to maintain optimizations. Hinting to a compiler which way an it statement is likely to go to get a speed boost out of branch prediction is nice, but just plain worthless at alot of if statements.

Speed is a required feature for some people.

yevgeny · August 7, 2003 3:00PM

Quote:

Originally posted by User Tron

Sorry to dissapoint you but you're wrong about me ! Besides having a degree in CS, I'm programming for 20 years now and I'm teaching programming and OS stuff. So no offence but I know what I'm talking about when I say the average programmer knows only little about what troubles he cause at compile time. I'm pretty sure you do your stuff right but don't think they majority of programmers are like you. As for abstraction we agree, so need for arguing. I was thinking more of things like people using string instead of stringbuilder when concatenating lot's of text or making loops which jump across the mem where linear access is possible etc.

End of Line

I guess I am not an average programmer. I have a good idea of the complexities that go on at compile time because I wrote a compiler once.

Sorry If I cam off a bit harsh- I do take my code seriously (I have to- I live in the onotation land where n is large) and I get frustrated by people who think that software development is easy and blame problems on programmers.

user tron · August 7, 2003 5:16PM

Quote:

Originally posted by Yevgeny

I guess I am not an average programmer. I have a good idea of the complexities that go on at compile time because I wrote a compiler once.

Sorry If I cam off a bit harsh- I do take my code seriously (I have to- I live in the onotation land where n is large) and I get frustrated by people who think that software development is easy and blame problems on programmers.

I know your problem too well

Your are far beyond the average programmer with the knowlegde how to build a compiler!

To make it more clear what I meant a little example: Take a loop where the body only contains code that does not rely on the iteration itself. Therefore each iteration is fully independent and all iteration can run parallel. If you write this code in a normal language and then compile it the result will be more or less optimized for target cpu. Now add a second cpu. The compiler itself won't help you much. Ok we can try to multithread our loop but creating and destroying threads is probably far too costly for a simple loop and how many are we creating? 2? 1 for each cpu? 4 because the are SMT enabled? 8 because it may run on machines with 4 cpus? Why not 1 for each iteration? My answer is: I don't want think about it, I just want to say this code can run parallel. As you know yourself it's getting harder and harder to write code that takes advantage of the underlying hardware. The programmer hopefully knows best how his code operates but still have little chance to give the compiler hints what to do. Hope that makes it clearer. If your are interested in alternative computing solutions, you should look at Occam and Transputers. Both very dead but still good food for the brain

As hardware will start to hit physical bounderies we may see more different approaches. Making the pipeline longer isn't the only road to go.

End of Line

user tron · August 8, 2003 5:04AM

Quote:

Originally posted by THT

This is pretty much the Itanium design. 8 stage pipeline with massage execution resources and heavily dependent on compiler technology. We have yet to find out if Itanium will be successful. After all these years, only some 10,000 Itanium systems have been sold. Itanium does hold the FPU lead, but just imagine what the SpecFP of the 970 would be if it ran at 3 GHz or if the P4 had 2 full FPUs. It'll be close, but guess which chip would be ten times as cheap to produce.

Well EPIC alone is not flexible enough IMO ans legacy code runs pretty bad on it. I'm thinking more of independent pipes and more freedom for the compiler. Bigest problem for the itanium is the fact normal code is too sequential. The compiler has to transform this into parallel instructions, not an easy task. I'm very curious how IBM's cell technology will work.

Quote:

Programmers are lazy. Or, the economics of software prevents software from being optimized while it is the reverse for hardware. It's simply cheaper to make software faster by using faster hardware, hence, OOOE, BP, and in the future, predication, SMT and speculative SMT, are put into processors while programmers concentrate on more features using higher order languages.

Yup! legacy code and legacy knowledge prevents radical changes.

End of Line

programmer · August 8, 2003 9:44AM

You're also ignoring the benefit that OoOE gains from doing dynamic analysis of the code and system state. In most modern systems the time to fetch data from memory varies significantly from fetch to fetch, never mind between machines (remember that this applies to both code and data fetching). Interrupts come along with disturbing frequency. Branches often have high repeatability and in some cases that repeatability isn't known until runtime. Some instructions run for varying numbers of cycles depending on data (early exit divides, multiplies, move multiples, and conditional fetches). The situation will become more pronounced in an SMT machine where the interactions between multiple threads cannot be predicted.

I can't believe the direction that Intel chose to go with EPIC, and I think (and hope) that it is doomed to failure. Even the language designs in widespread use (i.e. C/C++) prevent effective compiler optimization in many ways (e.g. pointer aliasing).

user tron · August 8, 2003 10:46AM

Quote:

Originally posted by Programmer

You're also ignoring the benefit that OoOE gains from doing dynamic analysis of the code and system state. In most modern systems the time to fetch data from memory varies significantly from fetch to fetch, never mind between machines (remember that this applies to both code and data fetching). Interrupts come along with disturbing frequency. Branches often have high repeatability and in some cases that repeatability isn't known until runtime. Some instructions run for varying numbers of cycles depending on data (early exit divides, multiplies, move multiples, and conditional fetches). The situation will become more pronounced in an SMT machine where the interactions between multiple threads cannot be predicted.

First of all I'm not an EPIC fan myself

And as I said before EPIC is too static for my taste. Basicly there's nothing wrong with OoOE but it seems that it may not be worth the effort in all cases http://www.inria.fr/rrrt/rr-3391.html. We may something more like this http://www.zytek.com/~melvin/flowstorm.html in the future. We are already seeing things like dynamic multithreading (DMT) which try to split loops into threads. All I was trying to say that we may see a paradigm change from the longer pipeline is the better pipeline. I know very well that the problem is very complex and that there are no simple solutions.

End of Line

Powerlogix 2x1.6 GHz G4's???

Comments