Motorola: Dual core, RapidIO, DDR, Altivec chip on roadmap.

sc_markt · June 3, 2003 12:58PM

See page 34

leonis · June 3, 2003 1:09PM

I will believe it when I see it. That PDF is pretty much telling what G4 can do for embeded purpose.....

What can you expect (on the desktop CPU side) from a company that's been lagging behind miserably in the last 5 years?

nitzer · June 3, 2003 1:13PM

That's nice. The 970 will still spank this thing.

"Classic PPC with Altivec" == Same old cr*p

It'll be sweet for Cisco and other embedded purposes though, whenever it comes out.

zapchud · June 3, 2003 1:13PM

Seems good, too bad 2004/2005 is very late for DDR.. RIO, on the other hand.. iBook.

powerdoc · June 3, 2003 1:59PM

Quote:

Originally posted by Nitzer

That's nice. The 970 will still spank this thing.

"Classic PPC with Altivec" == Same old cr*p

It'll be sweet for Cisco and other embedded purposes though, whenever it comes out.

The PPC 970 will beat it for FP but will be beaten for integer, because this chip will be a dual core.

nitzer · June 3, 2003 2:28PM

Quote:

Originally posted by Powerdoc

The PPC 970 will beat it for FP but will be beaten for integer, because this chip will be a dual core.

The 970 will beat it because it's shipping this year (hopefully this month). Where as this... thing... from MOT won't exist for another year or two. By then we'll be comparing it to the dual cored POWER 5 derivative.

8)

outsider · June 3, 2003 2:35PM

I actually think IBM has plans to introduce a dual core 970 as soon as they transition the 970 to the 90nm process. There would be a 970SX and a 970DX or something like that. The 970 has less transistors than the present G4, so this would be even more likely.

I have a question for someone who has a deep understanding of chip design: What is the drawback of a more parallel processor approach (double the execution units, double the cache, etc.) in contrast to a dual core design? What type of software would take more advantage of a certain approach versus another?

bigc · June 3, 2003 2:47PM

I like how in the PDF Summary it says that it was produced with Acrobat for Windows. Guess they can't afford to buy a Mac.

rickag · June 3, 2003 2:49PM

I found one thing interesting in the "See page 34" link sc_markt"mentioned and the "Roadmap Presentation" link M.Isobe provided @ Arse.

On page 11, the presentation clearly states that 7 million G4's have been shipped to date.

Any guesses as to how many were shipped to Apple, half, more than half??

Seems that if Apple would switch their PowerMacs completely over to IBM's 970, Motorola stands to lose a very high % of their G4 sales. This seems to contradict many posts on several boards stating that G4 sales to Apple, or Apples sales in general, are not significant to Motorola's overall sales.

bigc · June 3, 2003 2:49PM

Quote:

Originally posted by Powerdoc

The PPC 970 will beat it for FP but will be beaten for integer, because this chip will be a dual core.

Hard to compare two chips that aren't even out yet.

lemon bon bon · June 3, 2003 3:13PM

Quote:

The 970 will beat it because

Lemon Bon Bon

lemon bon bon · June 3, 2003 3:14PM

Quote:

I will believe it when I see it. That PDF is pretty much telling what G4 can do for embeded purpose.....

What can you expect (on the desktop CPU side) from a company that's been lagging behind miserably in the last 5 years?

Lemon Bon Bon

chu_bakka · June 3, 2003 4:15PM

page 32 says they are aiming for 3+ GHz with process and architecture improvements... and remain at 10 watts and under.

mmmpie · June 3, 2003 5:05PM

When you increase the number of execution units etc they have to find work to do from the same thread of execution. Until software is designed to be executed in parallel, there is a limited amount of work that can be done at any one time in a given thread. The 970 supports 200 instructions in flight to make sure there is enough work to keep its units busy. Thats a big window to keep open. In comparison, the G4's window is only about 16 instructions. To keep twice as many units busy becomes much harder, its a non linear relationship, and may even have a hard limit for any given piece of code ( probably undecidable ). That implies that at some point you can double the number of units, and get no increase in speed, because there is no more work to do.

In an old OS, this was the only way to approach the problem.

But modern, multi threaded OS's, have other stuff do. Hence, you can have a dual processor system and actually find it to be very good at improving the systems responsiveness.

It is comparitevely easy to put two die's on a chip ( designwise ) rather than make a single die faster/bigger, especially if the die's are already designed to be multi proc capable.

An alternative to putting two dies on chip is symmetrical multi threading, see hyper threading on the P4, which allows all those unused units on the chip to work on another thread. As far as the OS is concerned you have two cpu's. Although benchmarks on intensive multi threaded code, indicate that it isnt nessecarily a big win, I havent seen any discussion on the responsiveness of the system. IMHO, a dual system is usefull, not for a two times speed increase ( although in some cases that might happen ), but because you can do compute intensive work, and not have the machine grind to a halt. I still wish to have my dual pentium beOS box back, even though it struggles compared to my 1ghz athlon. It just 'felt' better.

yevgeny · June 3, 2003 5:15PM

Quote:

Originally posted by Outsider

I have a question for someone who has a deep understanding of chip design: What is the drawback of a more parallel processor approach (double the execution units, double the cache, etc.) in contrast to a dual core design? What type of software would take more advantage of a certain approach versus another?

It has been a while since my hardware courses, but here's my explanation.

Adding tons more execution units does not make for a faster computer because the compiler (the program that takes computer code and makes a version of the code that runs on a particular CPU) must figure out a way to keep all the execution units busy (and the CPU must interpret what the compiler gives it so that it keeps itself busy). It is actually very hard to keep lots and lots of execution units busy, and if they aren't busy, then they are just sitting there taking up space on the CPU for no benefit. In the Itanium CPU, you will find a good deal of parallelism and when the IA-64 can keep its execution units fed, it perfroms well (although Itanium basically makes the compiler do all the work, and it has been difficult to make a good compiler for the Itanium). More execution units is like having more arms. Six arms are useless if you only need two for the tasks you have to get done.

Doubling the cache is usually a good idea, but the catch is that cache space also takes up space on the CPU. Larger CPU's are more costly (fewer CPU's per wafer means higher costs, and a greater chance of having a defect means more rejects). MHz and power consumption are related to the number of transistors where fewer transistors (smaller die size) are generally considered to be better (I was surprised at how few transistors there are in the PPC 970. Also, the cache isn't very useful for bandwidth intensive operations (e.g. Altivec) where the system's bus speed are more important.

Extra cores on the die are great for software that has been parallelized (designed to run in such a way that the software tasks can be split up into chunks that preferably have very little to do with each other). Generally a server can most benefit from multiple CPu's ore multiple dies per core because servers usually are doing many unrelated things. I hope that one positive upshot of Moto's junk chips is that software for OS X probably is better parallelized than software for x86 because dual CPU's has been the only way for Apple to have better speeds and developers (hopefully) have been writing with parallelization in mind. If I were a developing software for OS X, then I would be thinking about how I could parallelize CPU intensive operations (this kind of stuff is fun for me).

yevgeny · June 3, 2003 5:28PM

Well, the upsideof this article is that Moto seems to understand that MPX won't work for much longer. Hopefully, they will manage to get RapidIO onto a G4 before this chip comes out (such a chip would be great for iBooks and low end alBooks).

It is important to note that the dual core stuff comes at the end of the presentation and has no further details. Speaking as one in the software business, I would say that this is typical management vision casting. It may be real, it may be a bone that they are throwing customers to make them think that they have a clue. Just because they put some words in print doesn't make them true.

outsider · June 3, 2003 6:48PM

First off, excellent post.

Quote:

Originally posted by Yevgeny

It has been a while since my hardware courses, but here's my explanation.

Adding tons more execution units does not make for a faster computer because the compiler (the program that takes computer code and makes a version of the code that runs on a particular CPU) must figure out a way to keep all the execution units busy (and the CPU must interpret what the compiler gives it so that it keeps itself busy). It is actually very hard to keep lots and lots of execution units busy, and if they aren't busy, then they are just sitting there taking up space on the CPU for no benefit. In the Itanium CPU, you will find a good deal of parallelism and when the IA-64 can keep its execution units fed, it perfroms well (although Itanium basically makes the compiler do all the work, and it has been difficult to make a good compiler for the Itanium). More execution units is like having more arms. Six arms are useless if you only need two for the tasks you have to get done.

I had a feeling the issue would be heavy compiler optimizations similar to the problem the IA-64 platform has presently. I would imagine that IBM thinks the ideal balance of execution units in a desktop processor is 2 fixed point, 2 floating point, and a vector unit. Now let me ask this: instead of a dual core processor with each core having the same execution units as I described above, how about a quad core processor with a single fixed and floating point unit and a vector unit with half the cache for each core? With a properly threaded application (and operating system) would this be advantageous or just an over engineered flop?

Quote:

Doubling the cache is usually a good idea, but the catch is that cache space also takes up space on the CPU. Larger CPU's are more costly (fewer CPU's per wafer means higher costs, and a greater chance of having a defect means more rejects). MHz and power consumption are related to the number of transistors where fewer transistors (smaller die size) are generally considered to be better (I was surprised at how few transistors there are in the PPC 970. Also, the cache isn't very useful for bandwidth intensive operations (e.g. Altivec) where the system's bus speed are more important.

So would it make more sence to design a processor with minimal L1(like the P4) and L2 cache (or none at all) and concentrate on a super fast system bus, say a fairly wide (64bit) fast (1GHz DDR) bus thatconnects to a similarly fast memory interface? I always thought of cache as a temporary solution to a problem computer technology always had in the 80's and 90's: slow memory interfaces. I envision a futuristic processor with the memory right on the die. Several cores tapping into a central store of main memory, maybe 4GB or so, with an attachment to external solid state magnetic memory, negating the use of any mechanical disk storage. But this is already off topic.

tomb of the unknown · June 3, 2003 8:06PM

Quote:

Originally posted by Zapchud

Seems good, too bad 2004/2005 is very late for DDR.. RIO, on the other hand.. iBook.

Bush's two year roadmap for peace has a better chance of becoming a reality than Moto's two year roadmap.

hasapi · June 3, 2003 8:16PM

I still believe Apple will NEED a 32bit low power G4 class CPU for the "consumer" range.

Now if Moto can provide it,.. cough then fine, but the entire product line cannot move because the G4 is stuck where it is. The 970 will free the "Pro" line allowing faster and more palatable upgrades to Apple's "Consumer" range.

A 1.42G (7455) G4 iMac would sell well right now, with 7457 speed bumps and maybe even for the iBook, unless IBM has its low power G4 class CPU available.

The reality is that Apple NEEDS to get its products well beyond the 1GHz if it intends on staying business, and FAST!

junkyard dawg · June 4, 2003 12:26AM

Quote:

Originally posted by rickag

I found one thing interesting in the "See page 34" link sc_markt"mentioned and the "Roadmap Presentation" link M.Isobe provided @ Arse.

On page 11, the presentation clearly states that 7 million G4's have been shipped to date.

Any guesses as to how many were shipped to Apple, half, more than half??

Seems that if Apple would switch their PowerMacs completely over to IBM's 970, Motorola stands to lose a very high % of their G4 sales. This seems to contradict many posts on several boards stating that G4 sales to Apple, or Apples sales in general, are not significant to Motorola's overall sales.

It's also interesting to consider that Apple is buying Moto's highest-end CPUs available. These high-margin sales contribute more to profits than their proportional numbers would suggest.

I suspect it will/would be a great loss to Moto if they lost Apple as a PPC customer. Would it put them under? No, but it would hurt profits until significant changes were made in their SPS division.

the swan · June 4, 2003 9:42AM

Outsider, you're always going to want to have cache, its more a matter of economics than anything else. It is much easier to speed upa processor than a system bus. The physical distance of the system bus is so much larger, it will always be "too slow" for the processor to some extent. The more cache the btter really. The future vision you have is making all memory what we think of as cache today. Economically, I doubt this will ever be practical. Properly managed cache can keep up with a processor. No cache and things creep to a halt almost, if you remember the original Intel Celeron with no cache. What a flop.

Justin

Motorola: Dual core, RapidIO, DDR, Altivec chip on roadmap.

Comments