or Connect
AppleInsider › Forums › Mac Hardware › Future Apple Hardware › PowerBook G5
New Posts  All Forums:Forum Nav:

PowerBook G5 - Page 6  

post #201 of 376
Quote:
Originally posted by fred_lj
It seems to be all theory for the moment....Jobs' mentioning the close of 2004 as a date for the PB is really becoming quite probable, whether or not we see a scaled-down 970-derivative or this cell architecture.

It's not theory. It's implemented in the 90nm version of the PPC970 which is sampling now. Check the advance brochure for IBM's presentation on the 90nm PPC970 which will be their official announcement of the chip.
"Spec" is short for "specification" not "speculation".
"Spec" is short for "specification" not "speculation".
post #202 of 376
I normally stay out of these conversations, but this one is just too much. I am with Tomb on this one - this thread makes for an interesting intellectual exercise in the possibility of parallel architecture in future machines, but anyone taking Nr9 as anything but a guy taking you for a ride shouldn't talk to strangers is all I have to say.
Anyone for pie?
Anyone for pie?
post #203 of 376
Quote:
Originally posted by Nr9
Cell is basically cluster technology

there is nothing new, except very high bandwidth, very low power core, lots of cores.

Really? You don't think they might have a proprietary messaging protocol? Or proprietary interchip busses? Or custom cores?

And you can prove this how?

You've repeatedly insisted that the STI Cell which, as described is an implementation of a cellular computing model, is the same thing as a standard cluster. (You've also conflated the implementation with the computing model, but that's another story.)

If Cell is just another cluster and there's nothing new, why aren't they done yet? How hard could it be? Why take several years and billions of dollars in development? How hard is it to just add a high bandwidth bus to some 440 cores and call it a day? Hell, they could just rename the Power5 and call it a day, since it meets your criteria already.
"Spec" is short for "specification" not "speculation".
"Spec" is short for "specification" not "speculation".
post #204 of 376
Quote:
Originally posted by LowB-ing
I dont know to much about this stuff, but this thread is a great read, wheather the rumor's true or not.
One thoght struck me though. The orighinal source might very well be right about the tech stuff and simultaneously wrong about the PowerBook "detail". what if there's a G5 PB in september but some kind of "future generation reference platform" released to devs at WWDC?


I was thinking along those exact same lines. I get the feeling that (if he's not completely pulling our legs) Nr9 does have access to some insiders, but it's not his own field and so is getting some details mixed up (or even is being fed disinformation by these "insiders" as a joke).

What Nr9 is describing would almost make sense for a gaming console. The PS3, XBox2, and GameCube successor are all apparently going with IBM, with the PS3 explicitly using this Cell technology (according to reports, anyway). An OSX (or Linux)-derived OS for these PPC-based systems might be feasible. Since games have a relatively short shelf-life as it is, it wouldn't be that big of a deal to require future versions to follow a new coding paradigm. Low power consumption is important for these units as they are mainly made out of plastic and have little room (or budget) for fancy cooling systems.

So I'm leaning toward this being more closely related to the next PlayStation than the next PowerBook. That's my current take on the issue, anyway.
"Mathematics is the language with which God has written the Universe" - Galileo Galilei
"Mathematics is the language with which God has written the Universe" - Galileo Galilei
post #205 of 376
Thread Starter 
Quote:
Originally posted by Tomb of the Unknown
Really? You don't think they might have a proprietary messaging protocol? Or proprietary interchip busses? Or custom cores?

And you can prove this how?

You've repeatedly insisted that the STI Cell which, as described is an implementation of a cellular computing model, is the same thing as a standard cluster. (You've also conflated the implementation with the computing model, but that's another story.)

If Cell is just another cluster and there's nothing new, why aren't they done yet? How hard could it be? Why take several years and billions of dollars in development? How hard is it to just add a high bandwidth bus to some 440 cores and call it a day? Hell, they could just rename the Power5 and call it a day, since it meets your criteria already.

does a new messaging protocol and new busses make it not a cluster?

I never conflated implmenetation with computing model, I think you have. cell is simply just another implementation of the computing model with new messaging protocols and busses. do you kno why cell is called cell? its short for cellular computing.

The point of Cell is a specific implementation of the cellular computing paradigm that uses large numbers of extremely low power cores with extremely high bandwidth capabilities. Do you have evidence it is anything else?
post #206 of 376
Thread Starter 
Quote:
Originally posted by TJM
I was thinking along those exact same lines. I get the feeling that (if he's not completely pulling our legs) Nr9 does have access to some insiders, but it's not his own field and so is getting some details mixed up (or even is being fed disinformation by these "insiders" as a joke).

What Nr9 is describing would almost make sense for a gaming console. The PS3, XBox2, and GameCube successor are all apparently going with IBM, with the PS3 explicitly using this Cell technology (according to reports, anyway). An OSX (or Linux)-derived OS for these PPC-based systems might be feasible. Since games have a relatively short shelf-life as it is, it wouldn't be that big of a deal to require future versions to follow a new coding paradigm. Low power consumption is important for these units as they are mainly made out of plastic and have little room (or budget) for fancy cooling systems.

So I'm leaning toward this being more closely related to the next PlayStation than the next PowerBook. That's my current take on the issue, anyway.

even though im not an expert in this field, at least I know more than tomb. At least I don't believe the STI Cell to be something other than on-chip clustering technology for a cellular implementation. At least I dont make comments like "440 is not cache coherent so its impossible" when we were clearly talking about MPI in the first place. or "if its 4 processor it must be SMP". or "this requires a new instruction set" when new libraries are sufficient. tomb is just talking out of his ass, dont let him get you carried away.
post #207 of 376
Bear in mind I have no technical knowledge.

Just reading this thread, it would seem to me that using 4 - 440 cores, then developing all the associated communication, cache and L2/L3 memory to provide any real level of performance would end up with a design that has more transistors total than a G5 and generates more heat than a G5. Then on top of this would run most existing software a abysmally slow speeds. I mean, come on 700Mhz. for single threads and apps that don't use multi-processor designs.

Maybe sometime in the distant future, when 2 - 7 stage cores can instantly morph into one long 14 stage core and bump the clock speed up from 700Mhz to 1.2Ghz, or whatever, then morph back into 2 cores again for multi-threaded apps.
just waiting to be included in one of Apple's target markets.
Don't get me wrong, I like the flat panel iMac, actually own an iMac, and I like the Mac mini, but...........
just waiting to be included in one of Apple's target markets.
Don't get me wrong, I like the flat panel iMac, actually own an iMac, and I like the Mac mini, but...........
post #208 of 376
Thread Starter 
Quote:
Originally posted by rickag
Bear in mind I have no technical knowledge.

Just reading this thread, it would seem to me that using 4 - 440 cores, then developing all the associated communication, cache and L2/L3 memory to provide any real level of performance would end up with a design that has more transistors total than a G5 and generates more heat than a G5. Then on top of this would run most existing software a abysmally slow speeds. I mean, come on 700Mhz. for single threads and apps that don't use multi-processor designs.

Maybe sometime in the distant future, when 2 - 7 stage cores can instantly morph into one long 14 stage core and bump the clock speed up from 700Mhz to 1.2Ghz, or whatever, then morph back into 2 cores again for multi-threaded apps.

Incorrect. Power dissipation depends on how many transistors are active at the same time.
post #209 of 376
this is a ploy I think to find a loose lipped guy
_ _____________________ _
1ghz Powerbook SuperDrive yippeeee!!!!
_ _____________________ _
1ghz Powerbook SuperDrive yippeeee!!!!
post #210 of 376
Quote:
Originally posted by Nr9
even though im not an expert in this field, at least I know more than tomb. At least I don't believe the STI Cell to be something other than on-chip clustering technology. At least I dont make comments like "440 is not cache coherent so its impossible" when we were clearly talking about MPI in the first place. or "if its 4 processor it must be SMP". or "this requires a new instruction set" when new libraries are sufficient. tomb is just talking out of his ass, dont let him get you carried away.

Nr9: I don't mean to sound like I'm putting you down or blowing you off. I'm no expert on any of this, so I'm simply standing on the sidelines trying to figure out what's up. We've had others in the past come by and assure us they had "inside" sources who proceeded to post absolute garbage that they just made up. So, I'm once bitten and twice shy in these matters. You and Tomb both appear to know what you're talking about, so I'm left in a quandry.

Reputation counts for a lot on these boards. Since you're new, I don't have that to draw on. If what you're posting pans out, your status will rise considerably in my eyes. If it turns out to be bogus, I'll never believe another thing you post. So, for your own sake, I hope that all you've been posting is absolutely legit. It's made for an interesting thread, in any event!
"Mathematics is the language with which God has written the Universe" - Galileo Galilei
"Mathematics is the language with which God has written the Universe" - Galileo Galilei
post #211 of 376
Quote:
Originally posted by fred_lj
Just to be explicit about what the brochure says:
....multi-gigahertz...........

I like the sound of that word. Hopefully, they mean much more than just 2 or 3.

Just a few comments on the discussion here. I'm not sure I buy into the idea that the PowerBook G5 will be this 440 based MCM. That said, I do find this discussion interesting and think this is a direction that Apple may eventually go. I don't think Apple is really looking at how to make Word or Powerpoint faster. In the case of Word, the industry reached the point that it was fast enough a couple of years ago. Apple should be looking at how to make things like Final Cut pro and Shake faster. An approach that bears some simularity to what Nr9 describes may be what is needed.
post #212 of 376
Quote:
Originally posted by Nr9
Incorrect. The quad processors should have a total of around 10 million transistors or less

I find that hard to believe, but am in no position to argue effectively, due to a lack of knowledge. But the designs posted in this thread include a lot of L2 or L3 cache, some for each core, and I would have thought these alone would exceed 10 million transistors.

Oh well, we'll see some time in the future, here's hoping you do have inside info and we'll all be celebrating our low watt, energy efficiate laptops running Blast genome projects while watching the latest Matrix DVD.
just waiting to be included in one of Apple's target markets.
Don't get me wrong, I like the flat panel iMac, actually own an iMac, and I like the Mac mini, but...........
just waiting to be included in one of Apple's target markets.
Don't get me wrong, I like the flat panel iMac, actually own an iMac, and I like the Mac mini, but...........
post #213 of 376
Quote:
Originally posted by rickag
........while watching the latest Matrix DVD.

Ah jeez, you're kidding! I thought that 'Reloaded' marked both the low point and the last of that crappy series......


IMHO
post #214 of 376
Quote:
Originally posted by rickag
I find that hard to believe, but am in no position to argue effectively, due to a lack of knowledge. But the designs posted in this thread include a lot of L2 or L3 cache, some for each core, and I would have thought these alone would exceed 10 million transistors.

If he's only counting the cores, that's possible: The 440 core is only 4 square millimeters in size. It's when you start bolting stuff on that the number becomes problematic.

Speaking of that, a little extra browsing around reveals that CoreConnect, the means available to attach extra units to the core (which Motorola also uses) provides 6.4GB/s of bandwidth (at up to 550MHz). VMX running at 700MHz could chew through about 11GB/s, so it would be starved for data on die if attached to that interface. Not even the prefetch-into-cache trick borrowed from the G4 would get around that, because the bottleneck is between the VMX unit and any caches.
"...within intervention's distance of the embassy." - CvB

Original music:
The Mayflies - Black earth Americana. Now on iTMS!
Becca Sutlive - Iowa Fried Rock 'n Roll - now on iTMS!
"...within intervention's distance of the embassy." - CvB

Original music:
The Mayflies - Black earth Americana. Now on iTMS!
Becca Sutlive - Iowa Fried Rock 'n Roll - now on iTMS!
post #215 of 376
Wow! It's a discussion on cells. I have been wondering where cell technology would lead. It's my impression that Sony is taking this approach for PS/3, while Microsoft is going with the Power5 derivative. So I have been wondering what is in Apple's future? It looks like Steve is getting his "options."

What I find interesting are the comments about using message passing, and the statement from Nr9 saying, ". . . each core runs a extremely small version of mac os x that only has basic MPI function." I don't know enough about computer science to know whether this might mean each core runs a separate Mach kernel? If so, it looks like there could be an advantage to Darwin, which uses a micro kernel. If this is true, it would indeed be a cluster of computers that are networking and under the control of a master computer, the "fifth" chip that was referred to. This master chip would run the full OS X operating system.

I think we will see both cell and PowerX technologies going forward in parallel for a while.
post #216 of 376
Quote:
Originally posted by snoopy
I don't know enough about computer science to know whether this might mean each core runs a separate Mach kernel? If so, it looks like there could be an advantage to Darwin, which uses a micro kernel.

But here's the problem: Darwin doesn't use a microkernel. Mach is in there, but it's "fused" to the rest of the kernel, meaning that it doesn't run in its own space and communicate with everything above it via message passing. This fusing trades modularity, reliability and security for brute performance: Generally, a true Mach microkernel imposes about a 10%-20% performance penalty over a monolithic kernel. So Darwin is a sort-of-message-passing monolithic kernel, which is unconventional but well suited to the task at hand.

We're a long long way from embedded kernels like L4 that can really do this: L4 is 32K in size (only eight times the size of the original UNIX! ) so it can stuff itself into a CPU cache and run from there. There are Mach implementations that run in this space, such as TMach, but Apple doesn't use them, and changing Darwin to accomodate one separate Mach kernel - let alone four, or eight - would be a serious enterprise with an accompanying performance penalty. (Ironically, Mac OS 8.5 - 9 ran on a nanokernel, which is a really small, minimalist microkernel.)

That's not to say that Apple isn't considering a major revision to OS X that would separate Mach back out into its own thing with its own address space. It's a better design generally (you ain't seen uptime yet), and the power and bandwidth of forthcoming hardware platforms might make the performance hit forgivable. However, this is not a project to take lightly, at all, and that's with only one microkernel running. The sort of system Nr9 is describing would keep Apple's engineers up nights for a good long time. (Avie might relish the challenge, though.)
"...within intervention's distance of the embassy." - CvB

Original music:
The Mayflies - Black earth Americana. Now on iTMS!
Becca Sutlive - Iowa Fried Rock 'n Roll - now on iTMS!
"...within intervention's distance of the embassy." - CvB

Original music:
The Mayflies - Black earth Americana. Now on iTMS!
Becca Sutlive - Iowa Fried Rock 'n Roll - now on iTMS!
post #217 of 376
Quote:
Originally posted by Amorph
... It's when you start bolting stuff on that the number becomes problematic....

My view is obviously over simplified, due to lack of knowledge, but going back to the G4 comparison. The G4 has 1 integer, 1 floating point and 4 altivec units(I think?), is using a 7 stage pipeline and is now at 52 million transistors.

The proposed 440 derivative uses a 7 stage pipeline, tacks on a floating point and altivec unit and the claim is that 2 dual core 440's each with altivec and added floating point units will be 10 million transistors. I just can't wrap my head around this at all. Plus the fact that each core will have to have additional systems for communicating between themselves to make any parallelism efficiently work.

While the Power 4 is not an ideal example, it uses a ring topology(???terminolgy), has up to four dual core cpu's runs at a relatively pokey speed compared to desktops, but uses up to 132Mb's of L3 cache per processor in order to perform it's intended job, which I presume to be massive parallelism, multi-tasking, multi-threading on a huge scale. Scaling this philosphy down to a desktop using 10 million transistors seems, well, uh, mm, quite difficult, let alone having to conjole developers into optimizing software for this design.

For the technologicaly impaired like me, please don't put to much stock in what I say, but I'm still skeptcal of Nr9's claims.
just waiting to be included in one of Apple's target markets.
Don't get me wrong, I like the flat panel iMac, actually own an iMac, and I like the Mac mini, but...........
just waiting to be included in one of Apple's target markets.
Don't get me wrong, I like the flat panel iMac, actually own an iMac, and I like the Mac mini, but...........
post #218 of 376
Quote:
Originally posted by rickag
My view is obviously over simplified, due to lack of knowledge, but going back to the G4 comparison. The G4 has 1 integer, 1 floating point and 4 altivec units(I think?), is using a 7 stage pipeline and is now at 52 million transistors.

The proposed 440 derivative uses a 7 stage pipeline, tacks on a floating point and altivec unit and the claim is that 2 dual core 440's each with altivec and added floating point units will be 10 million transistors. I just can't wrap my head around this at all.

(n.b.: the G4 has four units that, together, make up AltiVec. It only has one Altivec engine.)

I can't wrap my head around that number either. That's why I said that 10 million might make sense for the 440 cores (no FP, no AltiVec, no memory controller or L2 cache or message passing logic). The whole shebang, though? No. If memory serves, 4 AltiVec implementations alone, borrowed from the highly efficient design in the 7450 (also a 7 stage CPU) would total to 20 million transistors.

Quote:
While the Power 4 is not an ideal example, it uses a ring topology(???terminolgy), has up to four dual core cpu's runs at a relatively pokey speed compared to desktops, but uses up to 132Mb's of L3 cache per processor in order to perform it's intended job, which I presume to be massive parallelism, multi-tasking, multi-threading on a huge scale. Scaling this philosphy down to a desktop using 10 million transistors seems, well, uh, mm, quite difficult, let alone having to conjole developers into optimizing software for this design.

This looks to me like a sound analysis. The trick to the POWER4 implementation is that it's so massively powerful that you don't really have to program for it. It's basically designed to run lots of conventional server applications and push massive amounts of data around for those applications all at once. (And, of course, if you do program for it, you get rewarded handsomely.) Once you scale the design down to something built around 440s, you either have to be content to run applications that are happy executing on a 440, or you have to start targeting the architecture specifically. The ratio of the power of each core to the demands of each application is a lot lower.
"...within intervention's distance of the embassy." - CvB

Original music:
The Mayflies - Black earth Americana. Now on iTMS!
Becca Sutlive - Iowa Fried Rock 'n Roll - now on iTMS!
"...within intervention's distance of the embassy." - CvB

Original music:
The Mayflies - Black earth Americana. Now on iTMS!
Becca Sutlive - Iowa Fried Rock 'n Roll - now on iTMS!
post #219 of 376
Quote:
Originally posted by Amorph
...I can't wrap my head around that number either. That's why I said that 10 million might make sense for the 440 cores (no FP, no AltiVec, no memory controller or L2 cache or message passing logic)...

Thank you for the responses to my rambling posts. One thing I would like to say, I'm not against this cpu design philosphy, I just believe we're looking much further into the future than next year. OS X is only the beginning, and as more Altivec, multi-threading tech becomes available and programmed for it will feed the monster and ultimately we'll be there, running our Blast Genome project while watching the lastest Matrix DVD(not Revolution, but sequal #10) on our cool quiet laptops.
just waiting to be included in one of Apple's target markets.
Don't get me wrong, I like the flat panel iMac, actually own an iMac, and I like the Mac mini, but...........
just waiting to be included in one of Apple's target markets.
Don't get me wrong, I like the flat panel iMac, actually own an iMac, and I like the Mac mini, but...........
post #220 of 376
Quote:
Originally posted by Tomb of the Unknown
I'm not saying it can't be done. I'm saying it doesn't make sense to do it. The 440 was expressly designed to be extensible. A good SoC solution needs to be extensible because in the embedded market, one size rarely fits all plus an extensible design will have longer legs as far as product family lifecycle goes. (You don't have to retool for every new fad in communications technology.)


The very fact that the 440 is extensible is what makes the concept plausable. As to wether or not it makes sense that would depend on the goals of the developers. If your going the SOC route the 440 would be either a good base for prototyping or the real hardware. Again it is matter of what your goals are.
Quote:

So yes, you can add VMX and 440 FPU2 units to the 440 quite handily.

Well he said two MCMs with two 440s each. Which is nonsensical at the outset, since an MCM describes the Power4 and Power5 packaging and IBM does not use the terminology elsewhere, to my knowledge. But leaving that aside, there is the issue that what an MCM provides is interchip communications busses. It's purpose is to allow the cores to communicate, so separating them into pairs with only two per MCM is counterproductive if you know you'll need four cores. So I assumed that all four would be on one MCM.

The technology that produces a MCM has been around for a long time. It is not a POWER technology. In the past it hasn't been cheap either, often being used only for military and space based projects. I understood it to be pairs of SOC processors mounted on an MCM. It really doesn't matter though because the minute you move a signal off a chip you change things.
Quote:

Well, if you want to create imaginary architectures to make Nr9's scenario more plausible, go right ahead.

Not so my imagination but information picked off the web indicating that Apple and IBM are working together on a nwe laptop chip. That info could very well mean a mainstream chip, or it could mena they are pushing technology for laptops in a new direction.
Quote:

There would also have to be support in the core for thread locking, etc. (Means more silicon, more heat.)

What thread locking?
Quote:

This is actually a self contradictory two part objection. Please rest assured that (as Nr9 points out) a distributed architecture would require an entirely new programmatic model despite Mac OS X's unix foundations. It would mean rewritting the OS from the ground up. You would not be able to use the Mach kernel, for instance.

Nope I don't believe that Apple would produce a machine that would require a new operating system. They may add to the operating system and some functionality may be implemented differrently on the system side, but user apps should not be impacted.

Look at it this way PowerMacs have been used for cluster computing for some time. Just because there is support added to them to enable cluster computing does not mean that ordinary MAC applications do not run on them. In fact you can download at least a couple of different message passing libraries for UNIX/Linux off the internet.

Its not self contradictory at all it just reflects what is available now in the way of Message Passing technology.
Quote:

As to the second part of your objection, yes an SMP implementation is possible, just not with the 440 core.

I'm not willing to go so far as to say it is impossible. To an extent the cache system can be modified along with the bus interface.
Quote:


Whether he said it or not, it becomes a fundamental requirement as the only alternative is to run it on a slow core effectively undoing any expected perfromance benefit.

You are trying to say that something that is currently impossible is required for this to be successful. This is very convienent but is really not a very good arguement. Yes some applications may not take advantage of a threaded environment to the extent of others, but there is no reason to focus on them. Even then some single threaded applications will see very good performance due to the rest of the system being threaded.
Quote:

Well, in some applications, the advantages of distributed architectures are overwhelming. But that is not to say that this is true in all cases. Yes, it may be true that there are limits to SMP systems but can you tell me what those limits might be? IBM is heavily invested in the Power5 architecture which is carrying SMP down to the thread level (SMT).

Kinda depends on the application and hardware implementation doesn't it. What Nr9 has described sounds almost like a cluster of SMP machines, this may or may not be the case. What I do know is that at some point it does pay to look at other arraingements than SMP. It would be very difficult to nail down any one size of SMP machine as being the best choice, there are to many variables.

As to SMT and SMP, the Power 5 does not carry the abstraction down to the thread level. SMT allows the processeor to work on two threads, from the hardware point of view, at the same time. These hardware threads could be application threads or the could be completely different processes. Generally what is run in the two contextes is up to the scheduler. It is not quite the same thing as multithread programming.
Quote:

Sigh. You really can't just jumble up terms like MCM, SoC, SMP and "cluster". Each has a specific meaning in the context of this discussion so you can't posit the integration of seemingly anti-podal or contrasting technologies without a great deal more explanation as to how it would be achieved.

That is complete garbage. You can have a cluster of SMP machines just like VT built. If you have the money you can also build a cluster on a MCM, if things are small enought you can even build a cluster on a piece of silicon, that being one variant of a SoC. There is no contrasting technology here, it is a matter of integration and the goals you have set. Sure you won't get the entire VT cluster on one MCM but you could very well put 4 machines on one if you really wanted and two would not be a problem. Each step down in size also allows you do do things differrently, so instead of networking you have interchip connects. What becomes a problem is our old friend heat, thus nothing suggested at the start of this thread is even possible without low power parts.
Quote:

You miss my point. Right now, the only kinds of applications (software) that use the kind of cellular programming model described are high end, highly parallel, high performance applications such as climatological modeling software run by government agencies and academic institutions.

OK, so what your saying is that all of those massively clustered SMP systems that corporations have installed in the last few years have on real application. You should pass this informaiton onto the share holders. Even Formula 1 race teams have their own clusters.
Quote:

You'd think so, wouldn't you? But then, you'd be wrong. It would be extremely difficult for any number of reasons, not the least of which is that "ain't no one been there yet".

It kinda amazes me that you continue to believe that software that uses Message Passing doesn't exist. Further I find it funny that you believe that this can't exist on top of OS/X and maintain backward compatablity. History is not on your side.
Quote:

No, it's more like taking the APU of the PPC 970 and replacing it with a 440 core. Then repeat with the FPUs. And so on.

I stand by my original statement, modify the 440 core a bit does not imply that you now have a different family. Just as minor modifications to the 970 does not imply a completely different family. That doesn't mean that IBM & Apple can't come up with something that is totally different and thus send the boat down a different branch in the river. It just means that 440 being a core can be modified and still be considered a 440.
Quote:

You got me. I have no clue what you are talking about.

Please, go here for info you might need.

I'm trying to figure out if you have a clue or not. Apparently you don't as you have continued to confuse what multithreading is. The point being that multithreaded applications are not tied to the processor implementation on the motherboard (SMP) nor to the type of processor. A multithreaded application can be runned on a 8 bit processor or on a single processor 970. Multithreading has nothing to do with the processor as far as a programming concept. What modern operating systems along with SMP and SMT do is provide a way to enhance what can be offered by a multithread program. The gotcha is that multithreaded applications though are not dependant on features like SMP and SMT being there.
Quote:

As described, the architecture of the laptop in question is as foreign to Mac OS X applications as the P4 is. (Actually, the P4 is a kissing cousin compared to the implementation described.) So, if Apple were to ship this next week, there would be no software for it.

Again garbage! You would just run your old software but would not get full beenfit from the system. Much in the same way as someone running Excel on VT's cluster and getting no benefit from all of the other nodes available.
Quote:

That, of course, is the thrust of my argument.

Yep these are all rumors but you have not provided one bit of compelling evidence that they are not atleast possible.
Quote:

Nope, sorry. Not buying it.

I believe your not buying it because you are not aware of the possibilities. You really need to take a serious look at what is already available, and then add a bit of imagination.
Quote:

No, at least some of us have been discussing the cell architecture under development by the STI group. Whether it is implemented starting from PPC or x86 designs won't matter that much. You could begin with transmeta's architecture, but in the end you will have a bunch of instructions that make no sense to any other architecture. The problem space is too divergent. Hence, a new instruction set.

There is no new instruction set!!! This thread has been about a PPC implementation that could be new to laptops. That other have added the distractions of other systems does not mean that they are applicable to the base of this discussion. While I can see adding hardware supprot for certain sorts of functions in a machine of the type discussed you will still have the same base instruction set. Just as the 440 is a PPC with an extended instruction set with support for DSP for example.

Even if we did end up extending the PPC instruction set, there is very little likely hood that the user would ever see these new capabilities. These would be obscured by operating system and library abstractions. There is no doubt in my mind that the PPC instruction set will continue to evolve, as it has in the past, this evolution will not break user applications.
Quote:

Now, you might be able to convince me it could be done as an extension to an existing ISA, but you'd have to work pretty hard at it.

Just look around at all of the clusters operating in the world using standard off the shelf processors. I don't have to do the convincing prior art exists. Even then if they do extend the instruction set it doesn't mean anything to the user at all.

Did all of the user apps suddenly fail to work when Alt-Vec was introduced? That one additioned added more capability and instructions to the PPC than we are ever likely to see from hardware additions to support message passing. On top of that is the reality that all of the above could be done with out adding any new instructions at all to the PPC base. But agian this has been done again and again to the PPC programming model without breaking user apps, with DSP and vector instructions.

Thanks
Dave

Quote:

post #221 of 376
I've found this to be fasinating also. While I still have problems believing this will go into a PowerBook, the IBook or something smaller would be a very good fit.

As far as the OS goes there are anumber of possibilities. One is that the slave processors run a tiny kernel that handles communications with the primary CPU. This sort of arraingement generates its own problems but is possible. There is even past experience with such systems in the PC world where you had plug in computation servers.

To be honest I really hope that Apple is not going this route. I'd much rather see 4 fully capable computer units.

Dave


Quote:
Originally posted by snoopy
Wow! It's a discussion on cells. I have been wondering where cell technology would lead. It's my impression that Sony is taking this approach for PS/3, while Microsoft is going with the Power5 derivative. So I have been wondering what is in Apple's future? It looks like Steve is getting his "options."

What I find interesting are the comments about using message passing, and the statement from Nr9 saying, ". . . each core runs a extremely small version of mac os x that only has basic MPI function." I don't know enough about computer science to know whether this might mean each core runs a separate Mach kernel? If so, it looks like there could be an advantage to Darwin, which uses a micro kernel. If this is true, it would indeed be a cluster of computers that are networking and under the control of a master computer, the "fifth" chip that was referred to. This master chip would run the full OS X operating system.

I think we will see both cell and PowerX technologies going forward in parallel for a while.
post #222 of 376
Quote:
Originally posted by Amorph
Once you scale the design down to something built around 440s, you either have to be content to run applications that are happy executing on a 440, or you have to start targeting the architecture specifically. The ratio of the power of each core to the demands of each application is a lot lower.

Despite Nr9's insistance that Cell == clustering, keep in mind that future iterations of this technology will likely embody technologies that address this issue and may well change how we look at the problem.

OK, the following is entirely blue skying and may not be practical at all, but it's why I don't think Cell is "just another clustering implementation".

Don't think of each core or dual core chip as a stand alone node with it's own FPU, L2, and VMX unit. Instead think of each as a cell on the fabric, or better yet as each unit being built out of smaller, more generic execution units. Need some SIMD lovin' for your PS project? No problem, just dedicate a dozen or so cells (execution units) as a VMX unit and start decoding and processing instructions. Need some scalar FPU muscle? Just rededicate those VMX cells to DP math and rock and roll.

I think you might need some kind of magical load/prefetch units to keep this thing fed, but assuming that's possible, you can keep transistor counts down by making them do double and triple duty. (Obviously, some things, like cache would be more or less dedicated, but if your bus is fast and your memory controller does really smart prefetches then you can keep cache sizes down.)

Eh, its a pipedream.
"Spec" is short for "specification" not "speculation".
"Spec" is short for "specification" not "speculation".
post #223 of 376
It is always good to be skeptical when presented with new ideas. But do realize that we have had now several implementations of the PPC in our Macs all of them using a different number of transistors delievering different peroformance.

If you look at this whole consept as a way to deliever low power and high performance to the portable market then things look different. The nice thing is that you can get off IBM's web site documentation for the 440 and a couple of chips it has been implemented in. This is very much a low power device, some of the low power comes from giving up fucntionality found in the desktop procesors. How everything would work and perform when glued together is an open question.

There are a number of things to like about this machine even if it doesn't exist in reality. It is best to have a little fun with this discussion and wait for more signals from Apple. It has become apparent that Apple is workig on somethng for the portable market. As time passes I have less and less belief that the 970 will function well in a laptop no matter how much it is shrunk. So time will tell.

Dave



Quote:
Originally posted by rickag
My view is obviously over simplified, due to lack of knowledge, but going back to the G4 comparison. The G4 has 1 integer, 1 floating point and 4 altivec units(I think?), is using a 7 stage pipeline and is now at 52 million transistors.

The proposed 440 derivative uses a 7 stage pipeline, tacks on a floating point and altivec unit and the claim is that 2 dual core 440's each with altivec and added floating point units will be 10 million transistors. I just can't wrap my head around this at all. Plus the fact that each core will have to have additional systems for communicating between themselves to make any parallelism efficiently work.

While the Power 4 is not an ideal example, it uses a ring topology(???terminolgy), has up to four dual core cpu's runs at a relatively pokey speed compared to desktops, but uses up to 132Mb's of L3 cache per processor in order to perform it's intended job, which I presume to be massive parallelism, multi-tasking, multi-threading on a huge scale. Scaling this philosphy down to a desktop using 10 million transistors seems, well, uh, mm, quite difficult, let alone having to conjole developers into optimizing software for this design.

For the technologicaly impaired like me, please don't put to much stock in what I say, but I'm still skeptcal of Nr9's claims.
post #224 of 376
Ok, you've basically ignored my arguments or ignored their context and I've really no interest in correcting your assumptions beyond addressing this bit:

Quote:
Originally posted by wizard69
Just look around at all of the clusters operating in the world using standard off the shelf processors. I don't have to do the convincing prior art exists. Even then if they do extend the instruction set it doesn't mean anything to the user at all.

Of course there are PPC clusters out there. The #3 in the world is VT's "X". So what?

Quote:
Did all of the user apps suddenly fail to work when Alt-Vec was introduced? That one additioned added more capability and instructions to the PPC than we are ever likely to see from hardware additions to support message passing. On top of that is the reality that all of the above could be done with out adding any new instructions at all to the PPC base. But agian this has been done again and again to the PPC programming model without breaking user apps, with DSP and vector instructions.

Point to one PPC cluster that will run Word for Mac OS X. There aren't any. Even VT's "Big Mac" is not running an OS X image. Can you run Word for Mac OS X on one of VT's nodes? Yes, of course you can. But you can't dispatch the job to it from a head node and you'll have a bit of a problem with scrolling through your documents since there's no monitor, but sure you can.

As I see it, the problem you have is that you don't seem to understand what it is that a cluster does, and like Nr9, you have confused clustering with Cell.
"Spec" is short for "specification" not "speculation".
"Spec" is short for "specification" not "speculation".
post #225 of 376
Thread Starter 
Quote:
Originally posted by Tomb of the Unknown
Despite Nr9's insistance that Cell == clustering, keep in mind that future iterations of this technology will likely embody technologies that address this issue and may well change how we look at the problem.

OK, the following is entirely blue skying and may not be practical at all, but it's why I don't think Cell is "just another clustering implementation".

Don't think of each core or dual core chip as a stand alone node with it's own FPU, L2, and VMX unit. Instead think of each as a cell on the fabric, or better yet as each unit being built out of smaller, more generic execution units. Need some SIMD lovin' for your PS project? No problem, just dedicate a dozen or so cells (execution units) as a VMX unit and start decoding and processing instructions. Need some scalar FPU muscle? Just rededicate those VMX cells to DP math and rock and roll.

I think you might need some kind of magical load/prefetch units to keep this thing fed, but assuming that's possible, you can keep transistor counts down by making them do double and triple duty. (Obviously, some things, like cache would be more or less dedicated, but if your bus is fast and your memory controller does really smart prefetches then you can keep cache sizes down.)

Eh, its a pipedream.

eh, no, that would require something magical. Cell is more likely to be a really high bandwidth MPI cluster of low power cores. Programs written for cell are likely o have to be MPI threaded. There is no way you can take a single thread and run it across these "cells"
post #226 of 376
Thread Starter 
Quote:
Originally posted by Tomb of the Unknown
Ok, you've basically ignored my arguments or ignored their context and I've really no interest in correcting your assumptions beyond addressing this bit:


Of course there are PPC clusters out there. The #3 in the world is VT's "X". So what?


Point to one PPC cluster that will run Word for Mac OS X. There aren't any. Even VT's "Big Mac" is not running an OS X image. Can you run Word for Mac OS X on one of VT's nodes? Yes, of course you can. But you can't dispatch the job to it from a head node and you'll have a bit of a problem with scrolling through your documents since there's no monitor, but sure you can.

As I see it, the problem you have is that you don't seem to understand what it is that a cluster does, and like Nr9, you have confused clustering with Cell.

there is no confusion. i think you are confused. There is no MPI version of Word and Word doesnt need one.

The cell concept is not just a hardware thing. It will also define how software is to be written.
post #227 of 376
Would it be possible that despite being a message-passing architecture main memory is shared? This would eliminate some of the advantage that SMP has over MPI, but I'm not sure that it would be all that easy to implement, and it might still be visible at the application level.
If it's possible to make MPI look a lot like SMP, this approach would be a whole lot more feasible.
Perhaps it is possible to modify Darwin/Mach to control allocation of physical memory among various CPUs, and hand them off at the appropriate time- i.e. the kernel manages some large-scale locking of pages. Would this be any more efficient than NUMA? (I'm assuming that strict copy-all-the-data-over-the-connection style used in clusters isn't being considered, it just seems kinda slooow)

Just some random and not well thought out thoughts.

post #228 of 376
Thread Starter 
The L2 and L3 cache are coherent among pairs of processor, and yes, the main memory is shared.

directory sharing is perfectly feasible for small # of proc.

i do not believe sharing memory will be feasible on a large scale however ie thousands of processors though
post #229 of 376
Quote:
Originally posted by Amorph


. . . That's not to say that Apple isn't considering a major revision to OS X that would separate Mach back out into its own thing with its own address space. It's a better design generally (you ain't seen uptime yet), and the power and bandwidth of forthcoming hardware platforms might make the performance hit forgivable. However, this is not a project to take lightly, at all, and that's with only one microkernel running. The sort of system Nr9 is describing would keep Apple's engineers up nights for a good long time. (Avie might relish the challenge, though.)


Ah, thank you for an explanation of the Darwin kernel. All this time I believed OS X has a microkernel because it is based on Mach.

I'm guessing that cell computing must work the way Nr9 says, with each cell being a micro computer passing messages, since cells can be extended over a network. With all the work involved to turn such a chip into a real product, maybe the processor Nr9 refers to is an engineering experiment or prototype.

On the other hand, companies often start working on promising technologies far ahead of time. With IBM involved in the cell project, maybe Apple has been working on it for a year or two already. Apple may have most of the details worked out and a prototype OS X ready to test. (Well, maybe that is unrealistic considering the amount of engineering that went into Jaguar and Panther.)
post #230 of 376
My thoughts keep bouncing around on this subject. Oh well. Another wait for a build, another post.

A point of trivia: Software honcho Avie Tevanian is the father of the Mach kernel. He worked on it for years as part of his graduate and postgraduate work (NeXTStep ran on Mach, as I recall). I doubt that he gets his hands dirty with that kind of plumbing work anymore (a shame, really, since good systems programmers are hard to come by), but he's doubtless familiar with the code and with the issues around it. And, of course, they have BSD guru Jordan Hubbard on board. So if there's any company that has the knowledge and the staff to go playing with microkernels in general and Mach in particular, it's Apple. (Darwin, in addition to rolling in Mach, apparently rolled in some of the work on NuKernel, the basis of the doomed Copland/Gershwin project, so they've already been playing.)

I'll have to do some more reading tonight.
"...within intervention's distance of the embassy." - CvB

Original music:
The Mayflies - Black earth Americana. Now on iTMS!
Becca Sutlive - Iowa Fried Rock 'n Roll - now on iTMS!
"...within intervention's distance of the embassy." - CvB

Original music:
The Mayflies - Black earth Americana. Now on iTMS!
Becca Sutlive - Iowa Fried Rock 'n Roll - now on iTMS!
post #231 of 376
Quote:
Originally posted by Nr9
There is no way you can take a single thread and run it across these "cells"

No, not currently. But give STI a few years and lets see. Those are some smart folk with deep pockets, if anyone can come up with a next generation architecture they can. Of course, they're just looking for the next gaming chip, so we probably won't see anything quite as dramatic as I outlined. But I could see special purpose cores knitted together by a fast bus matrix of some kind. Almost like exploding out the execution units of a CPU.

By the way, iif MS Word doesn't need MPI why would it define how the software is written? Aren't these statements contradictory?

But as far as your idea of clustering being the next big thing: let's try this on for size. Let's say Apple rewrites their OS so it can run as a single image on the VT cluster. Then MS rewrites Office for this version of the OS.

Would you be able to scroll through a large Word document any faster than you can now? After all, you have 2200 G5 CPUs to throw at it. Shouldn't you be able to scroll to the end of the document even before your finger even clicks the mouse?

No.

Why? Because it's essentially still only running on one CPU. You can't split the job of scrolling though a document up into 2200 pieces, do it, then reassemble it for display. If anything, it may even happen slower because the head node has to find a free CPU, dispatch the task to it, and then return the results to you so you actually have more overhead than running the same task on your machine at home.

Messaging is not the holy grail, folks. And clustering is not the answer to every computing problem.
"Spec" is short for "specification" not "speculation".
"Spec" is short for "specification" not "speculation".
post #232 of 376
Thread Starter 
Quote:
Originally posted by Tomb of the Unknown
No, not currently. But give STI a few years and lets see. Those are some smart folk with deep pockets, if anyone can come up with a next generation architecture they can. Of course, they're just looking for the next gaming chip, so we probably won't see anything quite as dramatic as I outlined. But I could see special purpose cores knitted together by a fast bus matrix of some kind. Almost like exploding out the execution units of a CPU.

By the way, iif MS Word doesn't need MPI why would it define how the software is written? Aren't these statements contradictory?

But as far as your idea of clustering being the next big thing: let's try this on for size. Let's say Apple rewrites their OS so it can run as a single image on the VT cluster. Then MS rewrites Office for this version of the OS.

Would you be able to scroll through a large Word document any faster than you can now? After all, you have 2200 G5 CPUs to throw at it. Shouldn't you be able to scroll to the end of the document even before your finger even clicks the mouse?

No.

Why? Because it's essentially still only running on one CPU. You can't split the job of scrolling though a document up into 2200 pieces, do it, then reassemble it for display. If anything, it may even happen slower because the head node has to find a free CPU, dispatch the task to it, and then return the results to you so you actually have more overhead than running the same task on your machine at home.

Messaging is not the holy grail, folks. And clustering is not the answer to every computing problem.

My point is that MS Word is not an example of a future application. Why the hell do you wnat it to scroll fast anyways? For any applications that require computing power, there will be a way to apply clustering.

Single Threaded word processing applicatiosn aren't likely to run faster on cell either.
post #233 of 376
I hate talking about Word, just because it's such a bletcherous pile of code, but:

Word is - or at least, acts - multithreaded in a number of ways (I might just fire it up and watch it for threads, actually, but not now ). If one CPU is paginating a document while others are running the check-as-you-type formatting and spelling and grammar services, and one more controls access to the document itself, then the thread dedicated to the view of the document will be able to scroll smoothly and responsively.

The VT cluster example is a bit silly because of bandwidth and latency issues between machines that would not be a problem on an MCM or even on a common motherboard. But, in fact, a group of small processors just might be able to make Word run more responsively, if Word trusts all those worker tasks to threads.
"...within intervention's distance of the embassy." - CvB

Original music:
The Mayflies - Black earth Americana. Now on iTMS!
Becca Sutlive - Iowa Fried Rock 'n Roll - now on iTMS!
"...within intervention's distance of the embassy." - CvB

Original music:
The Mayflies - Black earth Americana. Now on iTMS!
Becca Sutlive - Iowa Fried Rock 'n Roll - now on iTMS!
post #234 of 376
Quote:
For any applications that require computing power, there will be a way to apply clustering.

Not true: not all applications are inherently (sensibly) parallelisable.
Stoo
Stoo
post #235 of 376
Here's a handy related development:

"Engineered Intelligence Corp. (EI) on Thursday announced the release of its Mac OS X-compatible version of CxC. Aimed at research labs and other markets that require high-performance computing, CxC is parallel programming software designed to simplify the process of writing code that can run on clusters of computers."

Here's the link (MacCentral): http://maccentral.macworld.com/news/...=1069342314000

Apparently, EI thinks that there is a market for multi-threading apps in the Mac world.
"Mathematics is the language with which God has written the Universe" - Galileo Galilei
"Mathematics is the language with which God has written the Universe" - Galileo Galilei
post #236 of 376
Thread Starter 
Quote:
Originally posted by Stoo
Not true: not all applications are inherently (sensibly) parallelisable.

most applications that require computing power will be to some degree parallelisable.

you would have to rethink your whole programming model; ie, what problems should be solved, and what problems shouldn't be. We can live in a computing world where most applications are parallel, its just that we are used to a different programming paradigm today.
post #237 of 376
Quote:
Originally posted by TJM
Apparently, EI thinks that there is a market for multi-threading apps in the Mac world.

There is:
Quote:
Application areas include: cellular automata, artificial neural networks, fluid dynamics, particle dynamics and other numerical applications.

Folks involved in supercomputing applications all over the country are interested in building systems like VT's Big Mac because for the first time ever really serious computing power is available at relatively low cost. (It's unheard of for a system that breaks 10 TFlops to cost as little as VT's.) EI apparently realized it would be trivial to port their products to OS X and it would open up a significant market for them.

Amoph:
Yes the example was rather extreme.

But sometimes you have to go to extremes to make a point.
"Spec" is short for "specification" not "speculation".
"Spec" is short for "specification" not "speculation".
post #238 of 376
Quote:
Originally posted by Nr9
We can live in a computing world where most applications are parallel, its just that we are used to a different programming paradigm today.

What do you mean we, white man?

In the world I live in, saying it doesn't make it so.
"Spec" is short for "specification" not "speculation".
"Spec" is short for "specification" not "speculation".
post #239 of 376
Quote:
Originally posted by Tomb of the Unknown
Ok, you've basically ignored my arguments or ignored their context and I've really no interest in correcting your assumptions beyond addressing this bit:


It is not a question of ignoring your arguments, the problem is that your arguments are invalid and apparently not based on contemporary knowledge in the field.
Quote:


Of course there are PPC clusters out there. The #3 in the world is VT's "X". So what?

The point is that you have been arguing that the described technology is impossible because a new instruction set is required or new hardware is required or a new operating system is required. Clusters in general and the PPC cluster in particular, should demonstrate clearly that it is possible to implement the discussed technology WITHOUT the need for instruction set modifications or special processors.

Quote:


Point to one PPC cluster that will run Word for Mac OS X. There aren't any. Even VT's "Big Mac" is not running an OS X image. Can you run Word for Mac OS X on one of VT's nodes? Yes, of course you can. But you can't dispatch the job to it from a head node and you'll have a bit of a problem with scrolling through your documents since there's no monitor, but sure you can.

First; off tcfslides.pdf states clearly that the cluster is running OS/X on page 13. This comes right from VT website, http://computing.vt.edu/research_computing/terascale/ if you will. So the only thing you would need to run word, besides a licensed copy, is a keyboard and video screen. Well that and getting by the system administrator! So At least we agree that the nodes are capable of running conventional code while being part of a cluster. There is no new instruction set to be dealt with to support legacy applications. At the same time you have support for message passing programs.
Quote:

As I see it, the problem you have is that you don't seem to understand what it is that a cluster does, and like Nr9, you have confused clustering with Cell.

At this point I apparently have a rather good understanding of what a cluster does, it is not a mystery. If you really want too, you could build one in your basement. How you would make use of it is another matter.

It has become apparent that Nr9 is not confused at all, wether he is just pulling our legs or actually has a line on good information is yet to be seen. Nothing he has described is impossible and could very well be a future path of development.

I do know a couple of things though. One is that Apple will have hard time stuffing a 970 into a laptop even with a die shrink. The second is that Apple is probally in the best position of any company, in the computing business, to be able to successfully launch such a machine. This is largely a result of previous efforts to support the G4 in dual processor configurations.
Quote:

post #240 of 376
I think one the problem with introducing a (hypothetical) "cluster-on-a-motherboard" laptop is that if the individual processors are slow, that will severely limit the performance of single-threaded tasks. Saying that people will have to change their apps is all well and good, but would anyone spend money on a product whose performance depends on app developers switching to a new paradigm?
Switching from SMP-style multithreading to an MPI-style multiple-process model is probably not trivial for most applications.
I'm not saying it's impossible, but I would find the appearance of the technology in a shipping apple laptop in the next year surprising.
I see few advantages to MPI as opposed to NUMA SMP, unless we're talking about completely separate address spaces (i.e. separate machines networked together) We will probably see a MPI API push by apple (for people who want their apps to be clusterable) before we see cluster-on-a-board computers.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Future Apple Hardware
This thread is locked  
AppleInsider › Forums › Mac Hardware › Future Apple Hardware › PowerBook G5