TS: The 970MP is coming

rolandg · July 28, 2004 4:44PM

Some of you really seem way too far out with your wishes and predictions:

Not that there aren't people out there beeing in need of the processing power a 4-way dual-core system offers. These kinds of systems are already offered - by IBM, for example, but they are very rarely used.

But, have a look at the number on the price tag - starting at a mere US-$ 11.000 and up to infinity for a Power based system. And, on a side not, such a system will run very hot, be very powerconsuming and fairly bigger than your dual G5 tower.

And then think about where Apple currently stands: They make copmuters for the higher-end of a mass market - both on the desk and in the server cabinet. And this is form all that I can see Apple's business modell.

They made inroads into the scientific number crunching market lately, mainly because it is easy and fairly cheap to build a cluster of a large amount of fairly "slow" and inexpensive systems.

And I guess this is where we are headed: Scaling computing performance by adding and networking inexpensive systems. Using more cores per chip can be more cost-efficient than using the same amount of singel-core chips.

I have to agree on the pro graphics cards, though.

wizard69 · July 28, 2004 5:05PM

Quote:

Originally posted by hmurchison

I don't even want to think about the $$$$$$ that would be

Opteron hardware is surprisingly competitive. When your average desk top goes SMP, servers are going to need four sockets just be able to keep a competitve advantage. The reality is that this thread should degenerate into a pricing discussion, better to look at Opteron as a technology benchmark.

Quote:

Yeah the Quad Tyans are for servers. They generally have old Ati graphics on the board suitable for monitoring but not much else.

Yeah I agree with Amorph ..there really is no reason not to do Quads since it can be accomplished in 2 sockets. I don't think the Quads would be cheap but there would be a market for them even at Pixar

Those sort of quads, that is dual, dual core chips shouldn't be any more expensive than todays hardware. Well yeah there may be a slight charge for the increased die size, but it won't be much bigger than the current 130nm hardware.

Quote:

I can wait for the Quads. I think it's time to get the Quadro on the Mac. Siggraph is just 2 weeks away.

It will be interesting to see what Apple has for Siggraph. An earlier than expected announcement of 970MP hardware would be neat also. A decent tower chassis or server chassis would be nice also. I still find the G5 tower to be an understatement as far as problem solving platform goes.

dave

onlooker · July 28, 2004 5:15PM

Quote:

Originally posted by RolandG

They made inroads into the scientific number crunching market lately, mainly because it is easy and fairly cheap to build a cluster of a large amount of fairly "slow" and inexpensive systems.

Fairly slow? They were #3 in the top 500 supercomputers list, and used less processors at less GHz vs intel, and AMD, but still smacked the crap out of both of them handily, They also cost less that a quarter of what the others had. I think your seriously miss informed as to why they have made inroads into super-clustering, and Science.

rolandg · July 28, 2004 5:17PM

Quote:

Originally posted by wizard69

Opteron hardware is surprisingly competitive. When your average desk top goes SMP, servers are going to need four sockets just be able to keep a competitve advantage. The reality is that this thread should degenerate into a pricing discussion, better to look at Opteron as a technology benchmark.

You have to be careful with the Opetron CPUs as they tend to become fairly expensive the more features the chip offers (FSB, amount of memory channels, SMP-capabilities etc.).

When desktops go SMP by implementing mutliple cores the server chips will go multi-core as well hence no need for more CPU-sockets.

I wonder how much more expensive chipsets become when going multi-core or multi-processor, especially with a UMA.

Quote:

Originally posted by wizard69

Those sort of quads, that is dual, dual core chips shouldn't be any more expensive than todays hardware. Well yeah there may be a slight charge for the increased die size, but it won't be much bigger than the current 130nm hardware.

I think this is too optimistic. What are the yields on todays multi-core chips?

hmurchison · July 28, 2004 6:16PM

RolandG

Normally you'd be right. Quads would be an idea that we'd all sit back and dream about. But here are the recent changes that make Quads in mid 2005 a reality.

Hardware -

Todays processors typically have 700 pins or more. So each socket has to have an equal amount of board traces to each pin. This is why dual socket motherboards with nice features are $500 and not the 150-200 for single socket boards. Tyan makes Quad socket boards that are over $1k because of the low demand and high complexity of designing such a board.

Software-

The OS has to understand how to manage threads down to the kernel level. If dual procs is by and large the highest the OS is going to see then why adjust it for more? However with the Tiger kernel we see that the SMP support in BSD 5.x for fine grain locking of resources has been added as well as support for virtual versus logical cpu. This has to be added implicitly by Apple into Darwin. BSD 5.x now more efficiently support 4 CPU and more systems. Apple has the same benefit in Tiger.

970MP -

The 970MP addresses the hardware issue. It is not pin compatible with the 970fx which is common sense. It will have many more pins but those pins will all be concentrated in 1 socket for a dual system or 2 sockets for a Quad. Much easier than making traces to 4 CPU sockets. There is no real penalty beyond heat dissipation requirements now for Quad sytems.

Therefore we know see a clear path in software and hardware for the support of more than 2 CPU. Remember previous IBM and Intel roadmaps had consumers using 6ghz processors in 2006. We know that isn't going to happen but what can happen is 2 3.5Ghz processors per core. We'll either get the speed vertically or horizontally.

Your comparison with the Quad systems of today has no bearing. Their design is far more sophisticated in almost every aspect. We're talking different beasts here.

Quote:

Opteron hardware is surprisingly competitive. When your average desk top goes SMP, servers are going to need four sockets just be able to keep a competitve advantage. The reality is that this thread should degenerate into a pricing discussion, better to look at Opteron as a technology benchmark.

Indeed the Opteron is an agressive platform albeit a very costly one considering the die size at 130nm. AMD will not get any relief until they get the Opteron down to 90nm. I think Apple/IBM will take the profitable road and see to compete with the Opteron's speed with a 65nm 980.

Quote:

When desktops go SMP by implementing mutliple cores the server chips will go multi-core as well hence no need for more CPU-sockets.

I wonder how much more expensive chipsets become when going multi-core or multi-processor, especially with a UMA.

Servers have been multicore for years. Sun plans to UltraSparc IV muliticore multithreading CPU in 2006. The pricing of chips is wholly dependent on demand and yield. Going multicore is a natural evolution of utilizing smaller fab processes. Initially they used the smaller processes to add more L2 cache but now multicore is where it's at.

Things really will get interesting roughly around 2007 once plans move to 45nm. Don't be suprised to see Quad systems possible in 1 CPU socket.

onlooker · July 28, 2004 9:02PM

Will you guys please stop calling them Quads when they are really duals, but act like quads.

2 sockets = duals

4 sockets = quads. new rule.

gavriel · July 28, 2004 10:07PM

Quote:

Originally posted by onlooker

What exactly is the iMac 3G?

iMac Third Generation.

onlooker · July 28, 2004 10:29PM

Quote:

Originally posted by Gavriel

iMac Third Generation.

duh.. I'm an imbecile when I'm tired. :P

charless · July 29, 2004 4:56AM

Quote:

Originally posted by onlooker

Ehhh.. ADMIN ALERT::::::! AI has been accidentally wiped clean of all post counts, and registration dates at least 3 times sense then.

Yeah, actually I'm pretty sure AppleInsider didn't exist in May 1990 since the first Web browser wasn't released until December 1990. Still, though, I kind of like my 1990 registration date, although I have no idea how it got that way.

zapchud · July 29, 2004 5:51AM

Quote:

Originally posted by onlooker

Will you guys please stop calling them Quads when they are really duals, but act like quads.

2 sockets = duals

4 sockets = quads. new rule.

2 x 2 = 4.

Two processors with two cores each are not different to the user compared to four processors with one core each. Sure, they're not "Quar processor machines", but "quad core machines". The difference is largely irrelevant.

onlooker · July 29, 2004 9:13AM

Quote:

Originally posted by Zapchud

2 x 2 = 4.

Two processors with two cores each are not different to the user compared to four processors with one core each. Sure, they're not "Quar processor machines", but "quad core machines". The difference is largely irrelevant.

Then at least call them quad cores. Not just quads. because people are talking about all different kinds of quad setups in there, and it's becoming hard to keep track of which way each person is referring to them.

[edit] even calling them quad cores will be confusing soon too because AMD says their Multiple core processors will have 4 cores each.

wizard69 · July 29, 2004 1:15PM

Quote:

Originally posted by RolandG

[B]You have to be careful with the Opetron CPUs as they tend to become fairly expensive the more features the chip offers (FSB, amount of memory channels, SMP-capabilities etc.).

Clearly you have not kept up with what AMD is doing to the i86 market. They have moved from position of follower to trendsetter.

In any event it is cheaper in the long run to integrate features onto the die instead of having them sit on seperate chips. That is in an economic sense but also in a performance sense. The more things that are on die the fewer level transistions and thus buffering a signal has to go through.

In the end what AMD has found with Opteron is the cheapest way to offer high performance with a given technology.

Quote:

When desktops go SMP by implementing mutliple cores the server chips will go multi-core as well hence no need for more CPU-sockets.

You are simply missing the boat on this one. Servers will expand to add as many cores and chips as they can within the power disapation limits of the selected housing. Sure there are server applications that will do fine with one dual core chip, but the vast majority of the servers out there would benefit from additional CPU's. If Apple can get a dual, Dual core machine on the market they will find a very good demand for the machine.

Quote:

I wonder how much more expensive chipsets become when going multi-core or multi-processor, especially with a UMA.

Again this is simply a question of chips size and yields. Intel currently produces Prescott at 90nm which is close to the size of the rumored multicore PPC chips. If intel can make Prescott affordable there is good reason to believe that a 970MP could hit the same price points and be profitable.

Quote:

I think this is too optimistic. What are the yields on todays multi-core chips?

What multicore chips? It is simply a question of chip size, it would be very possible to deliver a multicore, 970MO if you will, chip that is smaller than Prescott. Intel has not had the huge problems with 90nm Prescott production that IBM is having with 90nm PPC. As long as the chip size remains reasonable there should be little in the way of problems beyond what is already being dealt with in the industry.

zapchud · July 29, 2004 3:00PM

Quote:

Originally posted by onlooker

Then at least call them quad cores. Not just quads. because people are talking about all different kinds of quad setups in there, and it's becoming hard to keep track of which way each person is referring to them.

[edit] even calling them quad cores will be confusing soon too because AMD says their Multiple core processors will have 4 cores each.

I see your point about confusion and I think it's valid. But as processors will get multiple cores, the usefulness of sticking to the "quads are for four processors, quad cores are still quad cores" way of talking about it will be diminishing rapidly, and become obsolete. It makes sense to talk about quad cores and quads as in quad processors now, because it's not common for mainstream processors to have multiple cores. But when most mainstream processors are having them, it'll be meaningless to talk about quad processors, as it'll say nothing about the actual ability of the processor chip.

programmer · July 30, 2004 8:23PM

The number of sockets on the motherboard isn't particularly important. A quad socket board will typically be less efficient than a single socket quad core machine. It will also be more expensive. So why bother talking about the number of sockets/chips, when what we really care about is the number of processing cores?

I would be surprised if Apple ever went to more than 2 sockets on a single motherboard. The system controller(s) and FSBs just get too big and expensive. The chips coming in the next few years will move to dual core, and then beyond. The existing 970FX is ~60 million transistors. The 970MP reportedly doubles that cache per core which probably pushes each core up to something like 80-90 million, for a chip total of 160-180 million (about the same as the original POWER4's 170 million), and about the same area as the original 970 on 130nm. That means that if they can get the 90nm yields under control (and reports are that they are) then the price of this 2-core chip will be roughly the same as the price of the original 970. Going to 65nm will allow this to double to 4 970FX cores on one chip for the same cost, and IBM is claiming that they'll be able to do that by the end of 2005 / early 2006.

Quote:

Posted by Henroik:

How does SMT work? I've understand that it'll enhance the performance of a processor by around 30% or so by doing two threads simultaneously. These two threads.. Is one thread getting ~100% and the other ~30% or is it that each thread gets ~65%?

If was doing some heavy single threaded stuff, I wouldn't like it to be stuck with only 30% or 65%. Will there be a way to disable SMT per thread basis?

The first thing you have to understand in order to understand SMT is that modern processors are quite often idle, even when running flat out. This is because they are waiting for things and they have many long pipelines. Imagine the machine as a 2D grid with execution units along one axis, and pipeline stages along another. Each of the boxes in the grid holds an instruction that is part way through its execution. In some boxes it needs information to be obtained from somewhere else (a register, the cache, another instruction, etc). If that information is not available then it cannot advance to the next stage in the pipeline and as the clock ticks forward an empty box in the grid appears -- a "bubble". This bubble represents a little bit of inactivity. Imagine what the grid would look like if 50% of the time each instruction got held up... the whole grid would have bubble strewn about, with less than half the boxes holding actual instructions. Each of the empty boxes represents work that could have potentially been done, but failed to happen due to something the thread needed to wait for.

The idea behind SMT is to introduce an additional thread (or threads in more extreme designs) which share this grid with the first thread, but have their own work to do and their own registers to hold their information. If this thread had its own processor then it would have its own grid, but instead it shares the grid with the first thread. The second thread's filled little boxes then get intermingled into the first thread's empty boxes, and in a perfect world the whole grid is filled up. In reality the grid is more full if both threads are inefficient, or overfull if both threads are too efficient. But the goal of increasing the net amount of work done by the available execution units is achieved.

How the two threads interact is a function of the processor hardware. In the IBM POWER5 each thread is assigned a number from 0-31, and an instruction group (1-5 instructions) is dispatched from one of the two threads each cycle depending on the relative priority numbers of the threads. If they have equal numbers they trade back and forth. If one has double the number of the other, then it dispatches two groups for each of the other's. By setting a thread to the maximum value it will get all of the instruction groups, starving the other. If a thread's turn to dispatch instruction groups comes up, but it can't because it is stalled, then the other gets an extra kick at the can. [note: this is actually just a very rough approximation of the actual scheme implemented].

How does all this work out in practice? That depends on how the software and hardware interact. If your multi-threaded software spends most of its time waiting for memory to arrive in the cache and then be transfered to a register, then it is entirely possible that the POWER5 SMT will double the speed of your application. If you are completely computationally bound and you have no stalls in your code at all (very rare), then running a second thread will only slow down your computationally bound one. Fortunately, as I stated above, the IBM's SMT lets a software author tell the hardware how much priority his thread should get. There are enough diagnostics in these processors for the OS to monitor a thread's behaviour and adjust its priority automagically in cases where the author hasn't explicitly done so.

Tasks switching processors is usually a bad thing because, as you observed, it kills the caches. This might not matter in some cases because the cache would get blown by the other task being timesliced in, but some OSes implement "processor affinity" to reduce the problem. IIRC, Darwin implements some level of this. SMT somewhat reduces the problem because the caches are at least partially shared between the threads of the same core. Multi-core designs reduce the problem a little as well since communication between the cores is usually at the chip's clock rate and they might share cache as well.

hmurchison · July 30, 2004 9:43PM

Rockin' post Programmer. You get the "Geek Award" of the day!

I forgot about Hannibal's Ars article on the POWER5 Here

But reading it know makes perfect sense. What I got from the article was.

1. SMT will hit you up for as much as %24 more transistor space. With the 970MP already at 154mm squared that ain't happening.

2. The 970x CPUs have scenarios where one execution unit is full of data while the other sits starved and idle. The 970MP might be able to allay this issue with code cracking but I'm not sure.

3. The POWER5 ability to queue 10 instructions while dispatching 2 per cycle(double the POWER4) is really cool and I'd love to see the "980" support the same.

4. The thread prioritization of the POWER5 SMT sounds great from a programmatical POV. I like the idea of some threads being set to high priority why other can be set for very little or possibly none. I could see multi media benefitting from this.

All in all it likes like we'll ride the 970MP into Q1 2006 where soon after IBM will drop the 65nm G6 on us. I'd love to see ondie memory controllers and nice phat Hypertransport 2 links between the 2 980MP socketed CPUs.

onlooker · July 30, 2004 11:04PM

Quote:

Originally posted by Programmer

The number of sockets on the motherboard isn't particularly important. A quad socket board will typically be less efficient than a single socket quad core machine. It will also be more expensive. So why bother talking about the number of sockets/chips, when what we really care about is the number of processing cores?

As has been stated a thousand times in numerous threads. Repeating it isn't going to change the issues directly related to each reason we are still in need of dual socket motherboards after the chips become dual core.

hmurchison · July 30, 2004 11:08PM

Quote:

Originally posted by onlooker

As has been stated a thousand times in numerous threads. Repeating it isn't going to change the issues directly related to each reason we are still in need of dual socket motherboards after the chips become dual core.

Onlooker I think he means more than 2 sockets. I definitely don't think Apple is moving to strictly 1 socket. But 4 sockets is tough to make affordable for anyone.

onlooker · July 30, 2004 11:19PM

Quote:

Originally posted by hmurchison

Onlooker I think he means more than 2 sockets. I definitely don't think Apple is moving to strictly 1 socket. But 4 sockets is tough to make affordable for anyone.

That's true, and I agree totally because Apple has never had a Quad board before why would they start now.

I'm interested in seeing what happens when Tyan releases a Quad socket board for AMD's 4 core processors. If Apple has 2 sockets motherboards, and Tyan decides to market it as a killer 3D board this will be the equivalent of having 12 processor advantage.

programmer · July 30, 2004 11:43PM

Quote:

Originally posted by hmurchison

Rockin' post Programmer. You get the "Geek Award" of the day!

<bow>

Quote:

2. The 970x CPUs have scenarios where one execution unit is full of data while the other sits starved and idle. The 970MP might be able to allay this issue with code cracking but I'm not sure.

The POWER4, 970, 970FX, 970FX, and POWER5 all have the same code cracking mechanism. Indeed, they are all pretty much the same architecture with the POWER5 just being an enhanced version with SMT added. The reason the 970MP doesn't have SMT is mostly likely because they want a low risk upgrade to the 970FX.

Quote:

3. The POWER5 ability to queue 10 instructions while dispatching 2 per cycle(double the POWER4) is really cool and I'd love to see the "980" support the same.

If 980 == POWER5-lite, it probably will. Note that the queuing is improved, but the dispatch is still just 1 group of up to 5 instructions per clock.

Quote:

All in all it likes like we'll ride the 970MP into Q1 2006 where soon after IBM will drop the 65nm G6 on us. I'd love to see ondie memory controllers and nice phat Hypertransport 2 links between the 2 980MP socketed CPUs. [/B]

The simple nature of the 970MP design leaves me wondering if we'll see a single core 90nm POWER5-lite fairly soon after the 970MP appears, with a dual core version after it goes to 65nm (i.e. 980, 980FX, 980MP).

I'm not convinced we'll see the on-chip memory controller despite it being in the POWER5. Apple might want to keep control of that part of the system for a while longer. I suspect the existing FSB will continue to exist and scale, as opposed to IBM adopting HT directly into the processor.

hmurchison · July 30, 2004 11:44PM

Quote:

Originally posted by onlooker

That's true, and I agree totally because Apple has never had a Quad board before why would they start now.

I'm interested in seeing what happens when Tyan releases a Quad socket board for AMD's 4 core processors. If Apple has 2 sockets motherboards, and Tyan decides to market it as a killer 3D board this will be the equivalent of having 12 processor advantage.

Good question. My guess would be

$1500 Quad Tyan Mobo

would be matched by 2 Macs running Xgrid.

TS: The 970MP is coming

Comments