Faster G4 - MOTO 7470

airsluf · May 27, 2002 12:37PM

razzfazz · May 27, 2002 1:00PM

Just looked it up, and c't magazine claims that, given randomly distributed 64 byte memory accesses (i.e. page misses only), PC133 gets 569MB/s, whereas DDR266 gets 776MB/s. If we increase the clock speed on PC133 while leaving everything else unchanged (measured in clock cycles), this would give around 710MB/s. While this is still slower than DDR266, it's not a huge difference. (Remember that this is pretty much worst-case for DDR, though.)

Bye,

RazzFazz

rickag · May 27, 2002 1:21PM

RazzFazz

Thank you for the response. I think I understand what you're saying, but I guess what I was trying to get to was-

If the PCI cards are communicating at 133MHz(?66MHz maybe?) with the controller, even if the controller is communicating with the ram and CPU @ 266(133MHz DDR) is it still not possible the cpu would be waiting on information from the PCI card or whatever??

I understand that the cpu is juggling more than just the information from the PCI bus, but well could it be possible.

Also, I understand the cache bus and MPX bus are separate, but it exists, therefore, hasn't Motorola already designed a DDR controller of sorts?? And if Motorola were to decide to implement a special bus between the cpu - controller - ram independent from the MPX bus wouldn't this be kind of a blueprint??

CPU -- DDR controller -- Ram

|

MPX

Contoller

|

PCI bus etc.

OR

CPU

| DDR to CPU

Mystery controller -- DDR to -- RAM

|

MPX controller

|

PCI , etc.

In each case information is being fed into the controller not at DDR rates, but the controller communicates between the cpu and ram using DDR??

It just seems to me that if Mortorola already has a controller, abeit a specialized controller for cache, that operates using a DDR bus, they should be able to implement this between the cpu and ram. They should have plenty of experience by now wouldn't they??

junkyard dawg · May 27, 2002 1:24PM

[quote] (moto) should have plenty of experience by now wouldn't they??

<hr></blockquote>

Yeah, experience at fu[king up!

razzfazz · May 27, 2002 3:17PM

[quote]Originally posted by rickag:

RazzFazz

Thank you for the response. I think I understand what you're saying, but I guess what I was trying to get to was-

If the PCI cards are communicating at 133MHz(?66MHz maybe?) with the controller, even if the controller is communicating with the ram and CPU @ 266(133MHz DDR) is it still not possible the cpu would be waiting on information from the PCI card or whatever??

<hr></blockquote>

Certainly. In fact, the CPU always has to sit around waiting once it accesses anything else than a register. The farther away from the core you get, the longer those periods of waiting become.

[quote]Also, I understand the cache bus and MPX bus are separate, but it exists, therefore, hasn't Motorola already designed a DDR controller of sorts?? And if Motorola were to decide to implement a special bus between the cpu - controller - ram independent from the MPX bus wouldn't this be kind of a blueprint??

CPU -- DDR controller -- Ram

|

MPX

Contoller

|

PCI bus etc.

<hr></blockquote>

Well, what they are planning to do in the long run is moving the RAM controller into the CPU core, and replacing MPX with RapidIO. This would look very similar to what you constructed above.

[quote]It just seems to me that if Mortorola already has a controller, abeit a specialized controller for cache, that operates using a DDR bus, they should be able to implement this between the cpu and ram. They should have plenty of experience by now wouldn't they??<hr></blockquote>

Well, for one, the interface to L3 is a lot simpler than the interface to the northbridge (aka the MPX bus), because it is much more limited in scope. Also, I don't think the problem is whether Motorola can design a DDR-capable controller, but rather whether it can easily be integrated into the G4 and existing companion products - if I remember correctly, most northbridges in the embedded space still haven't even adopted the MPX bus (still use the old 60x bus instead), Motorola's own ones included.

Bye,

RazzFazz

programmer · May 27, 2002 4:18PM

[quote]Originally posted by RazzFazz:

Just looked it up, and c't magazine claims that, given randomly distributed 64 byte memory accesses (i.e. page misses only), PC133 gets 569MB/s, whereas DDR266 gets 776MB/s. If we increase the clock speed on PC133 while leaving everything else unchanged (measured in clock cycles), this would give around 710MB/s. While this is still slower than DDR266, it's not a huge difference. (Remember that this is pretty much worst-case for DDR, though.)

<hr></blockquote>

This isn't really the case we care about the most, however... signal processing style algorithms usually try to grind through memory in sequential order, and for this the AltiVec unit has streaming cache control instructions. It is also where RamBus does well. In this case Apple's SDRAM controller can get better numbers than you list. With DDR266 it should be able to get >1 GB/sec (the problem being that the basic cache unit of 32 bytes doesn't get larger, so there is essential twice as much overhead per data transfer clock cycle). If the current overhead is ~20% (~800 GB/sec, for the sake of argument), then doubling the overhead to ~40% on 2100 GB/sec results in about 1250 GB/sec. Going to 166 MHz will increase the 800 to ~1000 GB/sec, so DDR wins by >25% over 166 MHz bus. Of course DDR333 should get it to ~1575 GB/sec, gaining the benefit of both higher clock rate and DDR.

razzfazz · May 27, 2002 5:20PM

[quote]Originally posted by Programmer:



This isn't really the case we care about the most, however... signal processing style algorithms usually try to grind through memory in sequential order, and for this the AltiVec unit has streaming cache control instructions.

...

In this case Apple's SDRAM controller can get better numbers than you list.

<hr></blockquote>

Yeah, as I said, the example I gave was pretty much worst-case, especially for DDR.

But as much as the worst-case won't usually happen in real life, your ideal case won't either, since any ideal DSP-style process will get preempted regularly in OS X, and the other process(es) will access different regions of physical memory. Also, DSTs have to be restarted after each preemption IIRC.

Bye,

RazzFazz

[ 05-27-2002: Message edited by: RazzFazz ]

telomar · May 28, 2002 9:22AM

[quote]Originally posted by AirSluf:



You have it exactly backwards. Andy Moore made his original observations of doubling transistor counts on 24-month intervals. His engineers at Intel and competitors at DEC broke that rate and went at 18 months or better for nearly 2 decades, resulting in most folks re-basing the interval tighter to 18 months, not loosening it.<hr></blockquote>

Umm no. For a start the man's name was Gordon Moore. As it happens even Intel hasn't followed it and for 2 decades (~1978 - 1998) fell behind expectations. It is only since the PII was released they have started to accelerate again beyond what Moore's law would predict anyway over that time period, which ties into something similar that I won't deal with.

Let me give you a brief run down on the theory though. The basic idea behind the law is if you plotted the function of cost per component vs components per integrated circuit there was an optimum minimum value (ie. a certain density produced a minimum cost per component compared to other densities).

The presumption was that every year the minimum cost per component occurred at a 2 fold increased number of components per circuit. As a result it could be inferred each year the transistor density would double due to the new minimum cost positioning.

And if you read Moore's paper on the subject you will find he then proceeds to present a lovely diagram of log to the base 2 of transistors vs the year. This clearly anticipates a growth from 2^6 - 2^16 transistors over a 10 year period, 1965 - 1975, or a doubling every year. It also clearly shows an increase from 2^3 - 2^6 transistors from 1962 - 1965.

It's worth mentioning for the sake of completeness that he does comment on his doubts that will hold outside of short term (10 years), which paves the way for it's revision. However Moore's law hasn't been followed quite as stringently as people imagine and certainly not well enough to truely be considered a law.

Had Moore's law actually been followed we would see around a 10 times greater density than we are (4.5 - 5 years development roughly).

Edit: Corrected PIII to PII and made a few other addendums and corrections.

P.S. If you're wondering the paper was written in 1965 for a journal called "Electronics".

[ 05-28-2002: Message edited by: Telomar ]

programmer · May 28, 2002 9:34AM

[quote]Originally posted by RazzFazz:

Yeah, as I said, the example I gave was pretty much worst-case, especially for DDR.

But as much as the worst-case won't usually happen in real life, your ideal case won't either, since any ideal DSP-style process will get preempted regularly in OS X, and the other process(es) will access different regions of physical memory. Also, DSTs have to be restarted after each preemption IIRC.

<hr></blockquote>

Its pretty normal to have the DSTs restarted every few loop iterations anyhow -- it costs very little and it has the effect of restarting the streams after an interrupt or pre-emption. Also I didn't quote an ideal case, I quoted what is measured in practice... I've seen silly cases that seem to indicate that Apple is sometimes getting >95% efficiency, but I don't believe them

Anyhow, my new wild-assed theory is that the 7470 is Apple's custom G4 with a HyperTransport link to the memory controller. :cool:

xype · May 28, 2002 9:41AM

[quote]Originally posted by Programmer:

Anyhow, my new wild-assed theory is that the 7470 is Apple's custom G4 with a HyperTransport link to the memory controller. :cool: <hr></blockquote>

Would that actually be hard to do? Also I guess a HyperTransport "link" would enable the G4 to stay the same even if a new memory type is added to the architecture, or? If that's the case it would be better than having a 2700 DDR controller on-chip I guess.

As for wild-*** theories - I think Apple is about to release a Generation 5 chip and it would be amazing if it had 32-64 abilities like the Hammer does, but even a high bandwidth 32bit @ 1.4ghz chip would be extremely nice. Or at least something that renders all that globalilluminationcaustics funkyness fast enough (ie, maximum 20% slower than the x86 architecture).

powerdoc · May 28, 2002 9:44AM

[quote]Originally posted by Programmer:



Anyhow, my new wild-assed theory is that the 7470 is Apple's custom G4 with a HyperTransport link to the memory controller. :cool: <hr></blockquote>

Does it mean that : 7470 ---> via hypertransport ---> Memory controller ---> DDR ram. ?

Do you think that the new chipset of the rackmount will be only usefull for rack ( a very small market indeed) or do you expect them to be present in the next revision of powermac.

Does this chipset is able to talk to the G4 via hypertransport or is it limited to the mpx protocol ?

razzfazz · May 28, 2002 9:55AM

[quote]Originally posted by Programmer:



Also I didn't quote an ideal case, I quoted what is measured in practice... I've seen silly cases that seem to indicate that Apple is sometimes getting >95% efficiency, but I don't believe them

<hr></blockquote>

With "ideal" I was referring to "walks through memory in a straight line" (i.e. an ideal, uninterrupted process doing branchless DSP-style calculations ad infinitum).

[quote]

Anyhow, my new wild-assed theory is that the 7470 is Apple's custom G4 with a HyperTransport link to the memory controller. :cool: <hr></blockquote>

Sounds a little wild indeed, since noone seems to be using (or planning to use) RapidIO or HyperTransport without also using on-chip memory controllers.

Sure, might be easier to use the same CPU with different memory architectures this way, but I'd imagine that chances are pretty high that, in the near future, CPU revisions (i.e. chances to modify on-chip-controllers) will be more frequent than moves to a different memory architecture ("different" being more than just "clocked higher") anyway.

Bye,

RazzFazz

programmer · May 28, 2002 10:22AM

[quote]Originally posted by RazzFazz:

With "ideal" I was referring to "walks through memory in a straight line" (i.e. an ideal, uninterrupted process doing branchless DSP-style calculations ad infinitum).

<hr></blockquote>

Ah yes, my mistake. It actually doesn't take much to get pretty close to the ideal case. Blasting over one multi-megabyte image in Photoshop, for example. Or compressing 10 seconds of MPEG video.

[quote]

Sounds a little wild indeed, since noone seems to be using (or planning to use) RapidIO or HyperTransport without also using on-chip memory controllers.

Sure, might be easier to use the same CPU with different memory architectures this way, but I'd imagine that chances are pretty high that, in the near future, CPU revisions (i.e. chances to modify on-chip-controllers) will be more frequent than moves to a different memory architecture ("different" being more than just "clocked higher") anyway.

<hr></blockquote>

I freely admit that this is unlikely to come to pass... but it occured to me while replying to another message that Apple really doesn't want to be stuck with Moto's decisions and timelines, and they also don't want the expense of designing entire processors for themselves. They already do the chipsets, so it seems like a logical step to create the biggest, baddest pipe from processor to chipset and then make the chipset better.

And as for nobody else doing this... well that is because nobody else is in Apple's situation. Intel designs chips & chipsets so they have a point-to-point FSB. AMD is mainly about processors, so they are moving the chipset onto the chip. Motorola is in the embedded space and they have a system bus (MPX) that everybody talks to, and they are moving toward a new system bus (RapidIO) that everybody talks to plus moving the most bandwidth intense part of that onto the chip (the memory controller) -- taking over more of the system design. IBM builds their own chips, chipsets and systems. There isn't really anybody else who is doing high bandwidth stuff. So that leaves poor Apple as the only one doing just the chipset and system based around a processor that somebody else is aiming at a different market.

HyperTransport, by the way, is being used in a few places without on-chip memory controllers. It is just a fast chip-to-chip connection, and nVidia is using it as such. In this way it is quite different than RapidIO's intended use -- as a board level packet network.

I'm not sure why you're thinking that memory changes will be happening at a slower pace in the future. If you look at the graphics guys, they are on the leading edge of high speed memories and are already pushing well upwards of 600 MHz DDR. DDR-II is on the way, and it will no doubt go through a few revisions. Apple likely wants to reduce its dependence on any outside supplier, so becoming more reliant on Motorola just doesn't seem like something that they would be keen to do. Perhaps you are refering to the modularity of the 8540?

[ 05-28-2002: Message edited by: Programmer ]

razzfazz · May 28, 2002 12:07PM

[quote]Originally posted by Programmer:



HyperTransport, by the way, is being used in a few places without on-chip memory controllers. It is just a fast chip-to-chip connection, and nVidia is using it as such.

<hr></blockquote>

The difference is that the nForce (I guess that's what you're referring to here?) uses HT only to connect its north- and southbridge to each other, and especially does not use it to connect the CPU to the northbridge (which is what you were proposing above, if I didn't get you wrong).

[quote]I'm not sure why you're thinking that memory changes will be happening at a slower pace in the future.<hr></blockquote>

I didn't say that (well, didn't mean to, at least). Rather, I said that substantial changes in memory architecture (DDR-II being more or less the only one that seems to be relatively close) will happen less frequently than processor revisions, and that an on-chip memory controller is likely to be designed into the chip in a way that would provide flexibility for modifications with processor revisions.

[quote]If you look at the graphics guys, they are on the leading edge of high speed memories and are already pushing well upwards of 600 MHz DDR. DDR-II is on the way, and it will no doubt go through a few revisions.<hr></blockquote>

Well, memory on graphics cards is sort of a different beast because it's soldered-on. This allows for high speeds, but also kills upgradeability, making it rather un-interesting as system RAM.

So if you look at the development of main RAM in the past decade, you'll see only two major architectural changes (asynchronous DRAM -> SDRAM and SDR SDRAM -> DDR SDRAM), everything else was just about ramping up clock speeds (and consequently didn't require complete re-designs of memory controllers). I don't see that changing very much in the future. As you said yourself, there will be DDR-2 (which, from the controller point of view, is probably less of a difference to DDR-1 than the latter was to SDR), and that's it for the near future (unless Rambus' RDRAM or Kentron's QBM suddenly become more popular than expected).

As such, I don't think an on-chip memory controller is an unreasonable option for Apple, especially given

a) an on-chip controller has some inherent performance advantages

b) Motorola actually seem to aim at providing an up-to-date memory interface, at least for the 8540 and

c) it would free Apple from the need to continuously invest significant resources into developing state-of-the-art memory controllers themselves (and I'm not sure a hypothetical on-chip memory controller would really get updated less frequently than the one in UniNorth).

Bye,

RazzFazz

[ 05-28-2002: Message edited by: RazzFazz ]

gumby5647 · May 28, 2002 12:18PM

ok...let me see if i got this right.

lets say moto could get it up to 166mhz FSB by july.

could we possibly see Towers with a 166Mhz FSB and a 266Mhz DDR BSB?

basically Xserve but with a 166mhz FSB?

razzfazz · May 28, 2002 12:32PM

[quote]Originally posted by gumby5647:

ok...let me see if i got this right.

lets say moto could get it up to 166mhz FSB by july.

could we possibly see Towers with a 166Mhz FSB and a 266Mhz DDR BSB?

basically Xserve but with a 166mhz FSB?<hr></blockquote>

First of all, "BSB" (back side bus?) is not the correct term here. But to answer your question, yes, having asynchronous front side and memory buses should be possible, at least it's possible (and pretty common in fact) in the x86 world.

Bye,

RazzFazz

brendon · May 28, 2002 12:59PM

[quote]Originally posted by RazzFazz:



As such, I don't think an on-chip memory controller is an unreasonable option for Apple, especially given

a) an on-chip controller has some inherent performance advantages

b) Motorola actually seem to aim at providing an up-to-date memory interface, at least for the 8540 and

c) it would free Apple from the need to continuously invest significant resources into developing state-of-the-art memory controllers themselves (and I'm not sure a hypothetical on-chip memory controller would really get updated less frequently than the one in UniNorth).

Bye,

RazzFazz

[ 05-28-2002: Message edited by: RazzFazz ]<hr></blockquote>

I guess I'm with Programmer on this one. I would rather see Apple spend its transistor budget on better FPU performance, including another FPU unit. It seems that Apple would like to spend this budget wisely to produce very fast and COOL chips. The cooler the chips, the more design options Apple has. As far as spending the resources to develop their own memory controller Xserve is the example. Apple already has done this.

Ty

razzfazz · May 28, 2002 1:24PM

[quote]Originally posted by Brendon:



I guess I'm with Programmer on this one. I would rather see Apple spend its transistor budget on better FPU performance, including another FPU unit.

<hr></blockquote>

Hm, lost me somehow - what do Apple have to do with the FPU and its transistor budget at all? That's Motorola's affair.

(Unless you were thinking of Apple taking the whole PPC design into their own hands - but that wasn't quite what Programmer suggested - "and they also don't want the expense of designing entire processors for themselves")

[quote]As far as spending the resources to develop their own memory controller Xserve is the example. Apple already has done this.

<hr></blockquote>

Yes, quite simply because they had no alternative.

If the memory controller was on-chip instead, they could just use that and spend their resources on other areas (PCI-X, FireWire2, Serial ATA, whatever) instead of designing yet another memory controller (or adapting the current one to new specifications).

Bye,

RazzFazz

brendon · May 28, 2002 1:43PM

[quote]Originally posted by RazzFazz:



Hm, lost me somehow - what do Apple have to do with the FPU and its transistor budget at all? That's Motorola's affair.

(Unless you were thinking of Apple taking the whole PPC design into their own hands - but that wasn't quite what Programmer suggested - "and they also don't want the expense of designing entire processors for themselves")

*Snip*

Bye,

RazzFazz<hr></blockquote>

Um... Yes I was not clear, sorry about that. In the 8500 series chips or Book-e design is modular. Apple can choose what execution units they want on the chip without Moto having to total redesign. With this in mind as the transistor count goes up so does the heat. I think that Apple would be better off keeping the memory controller off chip and trading functionality for another FPU. If MaxBus is not going DDR than another option would be RapidIO, but if Apple could have it be HT it would be better. In that the resources of implementing HT on CPU for IO would also allow the CPU access to the system chip, which would give access to the whole system. In this design the system chip would act as a traffic cop allowing all system resources access to each other.

programmer · May 28, 2002 2:10PM

You're probably right RazzFazz, but there are some sticky issues with the on-chip memory controller. The biggest one is how to manage a memory pool per-processor, both from an OS point of view and from a user-upgrade point of view. Also a factor is that RapidIO is brand new and there are no devices (or very few at least) which speak that "language". Apple builds all of its devices into a single chip anyhow, so they are going to gain very little from a RapidIO-based system... and they could gain more from the faster and simpler HyperTransport. Connecting the north/south bridge, different communications chips, etc shouldn't be significantly different than connecting a processor to the motherboard chipset. What my suggestion boils down to is replacing MPX with HyperTransport. In a single processor system this is very straightforward. In an MP system the chipset would need multiple HyperTransport ports, and would have to make each CPU's transactions somehow visible to the other CPUs... a little ugly but doable.

Again, you are probably right and Apple will just use what Motorola hands them. Given Moto's track record (and emphasis on embedded designs) I wouldn't mind seeing Apple having more independence in terms of how they implement their systems. I suspect Apple wouldn't mind having that independence either -- most companies don't like having their bread-and-butter depend on another companies R&D so heavily, especially when they are so obviously behind.

Faster G4 - MOTO 7470

Comments