Quad processors ?

lowb-ing · May 18, 2002 3:55PM

[quote]Originally posted by Scott F.:



In a perfect world... "just adding soundproofing" would be fine... but in the real world... EVERYTHING is a tradeoff. To add ANY type of soundproofing is almost the equivalent to "insulating" the unit... the more "soundproofing" you do... the harder it will be for the enclosure to dissipate the internal heat OUT of the unit... unless you get into solutions that allow for multi-layer skins... and now we're talking added cost, added weight and added complexity to the unit for MINIMAL returns.

I'm not saying it "can't" be done... I'm just putting out the caveat that just "adding" a layer of sound absorption in the amounts to have any noticeable effect on the decible level WILL impact other aspects of the machine adversely... that's all.<hr></blockquote>

I'm not talking about just adding soundproofing (thats what some of the aftermarket vendors do, though), but rather designing the case from scratch with soundproofing and massproduction in mind. I'm aware of the difficulties (it'd ad weight and you'd need a more powerful fan, just for starters), but if someone could pull it off, it would be apple.

some modern soundprofing materials are quite amazing btw. I've seen 1/3" thick studio doors made out of some clear (and really heavy) plexiglass like material that made a marshall stack att full volume sound really quiet! not saying it would be suitable for a case nessecarilly (and this was a big commercial studio, so who knows how expensive those doors are), just an example.

thanx for the insightful reply, though

[ 05-18-2002: Message edited by: LowB-ing ]

gamblor · May 18, 2002 4:31PM

[quote] Symmetric means the processors are all the same kind. Multi-Processing means there are multiple processors. In what way is a RapidIO machine with more than one identical processor not an SMP machine?

<hr></blockquote>

Well, RapidIO doesn't specify that the processors have to be identical... In fact, they don't even have to be the same architecture-- Why not have different processors for different purposes-- say, an 8540 for networking, something for nVidia for graphics, another 8540 (or something else) for secondary storage, and a couple of 7470's (or whatever) for everything else.

[quote] NUMA architectures are unlikely to be the wave of the future (it's hard to architect an OS that actually runs well on such designs, I would say that the old NeXTdimension system was one of the exceptions, but that is an "easy" application). <hr></blockquote>

With Quartz Extreme, Apple is already offloading a whole bunch of stuff onto the GPU that otherwise has to be handled by the CPU. Is it all that outlandish to think that someday the entire graphics subsystem will be handled by the GPU?

It doesn't seem to be that far of a stretch to think that one day, the TCP/IP stack would be handled by a separate chip. Ditto filesystems, and I'm sure you guys can think of a few other things that may be offloaded to a dedicated processor.

I'm certainly not suggesting that this will happen overnight, but in a few years (perhaps more), I think it will be the norm.

programmer · May 18, 2002 9:01PM

The problem with having many different dedicated processors is that you don't have much flexibility in how you divide up your tasks. If you aren't doing any graphics, for example, then having a big expensive graphics chip doesn't do you any good. Graphics is a bad example since pretty much everybody uses graphics in the form of a GUI... but you get my point. General purpose processors are better because they are more flexible and can be applied to the task at hand.

Having some processors with some capabilities and some without is really just making life difficult for the OS, so I don't think having a machine with some AltiVec enabled processors and some non-enabled ones is a good idea. Clock rate doesn't really matter from this point of view, so I'd still consider a machine with multiple G4s at different speeds an SMP machine. Giving each machine its own memory controller and sharing a RapidIO-like bus can be handled in the OS... it may just require a little unconventional thinking. Virtual memory paging between processor memories across the RapidIO bus using a DMA engine, for example.

In the future I think we'll see one or more main processors, plus one or more parallel compute engines. The current graphics chips are specialized parallel compute engines, but the clear trend is for them to become much more general purpose and in the future I fully expect to see graphics chips turn into general purposed "parallelized" processors... optimized for running the same algorithm across huge sets of data in a highly parallel fashion. Currently they are too bound to the video system to be effectively used this way, but it is changing rapidly... with the DX9 level hardware we may start to see them used for things other than graphics.

[ 05-18-2002: Message edited by: Programmer ]

luca · May 18, 2002 9:31PM

Instead of using soundproofing materials, why not go by the same concept as the Bose noise canceling headphones and have little microphones next to the fan and HD, and then have the speaker automatically transmit equal opposite frequencies from those. This would cause the noise and the "anti-noise" to align each wave peak with the other's wave trough, canecling the noise without adding excess weight. Hopefully the cost on that sort of thing will go down also (it probably already costs less than modern soundproofing materials).

gamblor · May 19, 2002 1:16AM

[quote] The problem with having many different dedicated processors is that you don't have much flexibility in how you divide up your tasks.

<hr></blockquote>

I'm not sure what your point is here... For example, in the system outlined above, you'd have a processor dedicated to network tasks that should have enough power to keep a gig enet pipe stuffed (which as far as the network subsystem is concerned, is where the bottleneck is). Why would you want to further subdivide the tasks? Under what circumstance would you want to offload networking tasks from a processor dedicated to that purpose?

[quote]

Graphics is a bad example since pretty much everybody uses graphics in the form of a GUI... but you get my point.

<hr></blockquote>

Thank you for supporting my point.

[quote]

General purpose processors are better because they are more flexible and can be applied to the task at hand. <hr></blockquote>

You seem to be approaching the problem from a "zero-sum" standpoint-- that there can only be so much processing power in a box at a time. What I'm proposing is a system where the simple controllers used for networking, secondary storage, and other I/O be replaced with embedded microcontrollers that can execute non-trivial code (for example, current ethernet controllers that can only handle very basic enet packet processing replaced with embedded microcontrollers that can handle the entire TCP/IP stack). These controllers would be in addition to the general purpose processors already in the machine; not in place of them.

At some point (probably pretty quickly-- two or three years?) the cost of powerful embedded processors are going to reach the point where such systems are economically feasible in high end desktop systems and workstations. I'd simply like Apple to be ahead of the curve with this.

[quote] Having some processors with some capabilities and some without is really just making life difficult for the OS, so I don't think having a machine with some AltiVec enabled processors and some non-enabled ones is a good idea. <hr></blockquote>

I'm not proposing that these processors be available for general user-land computing; rather, they'd be used for specific system functions. There shouldn't be any need for analyzing userland code to determine if it needs execution units that are available on one set of processors in the system; such code would only be executed on the "main" CPUs, which should all be identical (in a traditional "SMP" configuration, if you wish).

[quote] In the future I think we'll see one or more main processors, plus one or more parallel compute engines. The current graphics chips are specialized parallel compute engines, but the clear trend is for them to become much more general purpose and in the future I fully expect to see graphics chips turn into general purposed "parallelized" processors... <hr></blockquote>

Well, to be quite honest, I don't really see this trend, I see PC hardware moving towards what has been available on SGI, Sun, HP, and probably IBM graphics workstation hardware for a while-- dedicated OpenGL hardware that is able to handle all of the OpenGL code, not just a subset of it. That's a far cry from a general purpose parallel processor.

amorph · May 19, 2002 1:45AM

Specialty ASICs/processors that applications have to be aware of (like the DSPs in the old AV series) tend to be doomed, but I can see some uses for chips that have a known use and a big OS layer hiding them. (Quartz Extreme actually has two: Quartz itself, and OpenGL.) So think about a specialized chip handling tasks that no application needs to know about, but which aggressively impact system performance. For example, imagine something like a chip that was little more than a PPC integer/addressing core whose job was bus arbitration; whether maintaining coherence between multiple processors on multiple busses or (more mundanely) fielding ATA bus requests (as well as those of any other bus that requires CPU intervention), so that no matter how heavily those busses were getting used, the G4s would never have to put down their work to arbitrate them. This would eliminate one of the last disadvantages of ATA relative to SCSI, for one thing (command queueing is probably too much to ask). This processor would not have to be expensive or powerful, it would have guaranteed work to do even on a single processor machine, and no userland process would have to know that it exists because it only handles system-level issues. Changing the OS to use it wouldn't be that hard, because the bus requests are still being handled by a PowerPC. It's just a matter of determining which PowerPC.

I don't know just how radical Apple feels like getting, but a few tactically chosen specialty processors could do wonders. Particularly, anything that can keep the main CPUs well fed and concentrated on crunching user applications will only help.

[ 05-19-2002: Message edited by: Amorph ]

xype · May 19, 2002 2:13AM

[quote]Originally posted by Gamblor:

Well, to be quite honest, I don't really see this trend, I see PC hardware moving towards what has been available on SGI, Sun, HP, and probably IBM graphics workstation hardware for a while-- dedicated OpenGL hardware that is able to handle all of the OpenGL code, not just a subset of it. That's a far cry from a general purpose parallel processor.<hr></blockquote>

Actually the DX9 and OpenGL 2.0 specification are both aiming at "programmable" graphic chips so the OpenGL-by-functions as we have known it form 1.0 on is going away and is going to be replaced by "tell me what you need computed and I'll do it for you" OpenGL. Many in the tech business are predicting NVidia/ATI/3DLabs to actually endanger the position of the CPU, since in theory it's possible to have the GPU do more than simple 3D stuff for you. If and how developers will take advantage of the GPUs power remains to be seen.

the bishop · May 19, 2002 5:31AM

[quote]Originally posted by Gamblor:



With Quartz Extreme, Apple is already offloading a whole bunch of stuff onto the GPU that otherwise has to be handled by the CPU. Is it all that outlandish to think that someday the entire graphics subsystem will be handled by the GPU?

<hr></blockquote>

Certainly not, that is what the NeXTdimension did 10 years ago. To a certain extent, the Quartz Extreme architecture is the NeXTdimension reborn 10 years later...

For those who aren't familiar with it, it was an early 90:s graphics subsystem for the NeXTcube which included a fast Risc processor and a bunch of RAM local to the card. The Windowserver could run on the graphics card and all drawing and compositing was done there making it able to drive 32bit RGBA at high resolution, which was quite good at that point in time.

In other words, the entire graphics subsystem was offloaded to the GPU which had its own memory (in a sense, a MP/NUMA machine where one CPU was dedicated for runnings graphics tasks!).

programmer · May 19, 2002 10:09AM

[quote]Originally posted by Gamblor:



I'm not sure what your point is here... For example, in the system outlined above, you'd have a processor dedicated to network tasks that should have enough power to keep a gig enet pipe stuffed (which as far as the network subsystem is concerned, is where the bottleneck is). Why would you want to further subdivide the tasks? Under what circumstance would you want to offload networking tasks from a processor dedicated to that purpose?

<hr></blockquote>

Okay, it depends on how large a piece you are talking about handing off. If you really just want to implement IP on the Ethernet chip, that is pretty trivial and rather than having another processor doing it I think we'll just see a smarter Ethernet chip that has a small processor built into it (which is hopefully flashable to fix bugs and support IPv6). Other devices are already that way to a large extent -- they handle the full protocol "in hardware" and the main processor just tells them where to get/put the data. This is called DMA. Even IDE (originally purely CPU dependent) has been moving that way for a long time.

 [quote]You seem to be approaching the problem from a "zero-sum" standpoint-- that there can only be so much processing power in a box at a time. What I'm proposing is a system where the simple controllers used for networking, secondary storage, and other I/O be replaced with embedded microcontrollers that can execute non-trivial code (for example, current ethernet controllers that can only handle very basic enet packet processing replaced with embedded microcontrollers that can handle the entire TCP/IP stack). These controllers would be in addition to the general purpose processors already in the machine; not in place of them.

At some point (probably pretty quickly-- two or three years?) the cost of powerful embedded processors are going to reach the point where such systems are economically feasible in high end desktop systems and workstations. I'd simply like Apple to be ahead of the curve with this.

<hr></blockquote>

Now that I understand what you were saying, I don't disagree with that... I just don't agree that it is really MP, it is just a smarter I/O system. To a large extent this already exists in current Macs... which have decent DMA subsystems and hardware to handle many of the protocols.

Unless the developers write code for it, however, I don't really consider it a multi-processor system. The Quadra 840av was an asymmetric multiprocessor, for example. The DSP in that machine couldn't run the same code as the '040 so developers had to code for it specifically... and didn't. It didn't offer enough advantage, it was hard to code for, and what it could do was severely limited. Putting multiple different application processors in a machine and telling developers to code for it just makes developers walk away from it... its typically not portable and breaks when you come out with a new machine.

Hardware design is a "zero-sum problem". The designers have a fixed price point they are aiming at, fixed space considerations, etc etc. With a given set of limitations they need to determine the most effective design that delivers the most powerful and flexible computer.

 [quote]

Well, to be quite honest, I don't really see this trend, I see PC hardware moving towards what has been available on SGI, Sun, HP, and probably IBM graphics workstation hardware for a while-- dedicated OpenGL hardware that is able to handle all of the OpenGL code, not just a subset of it. That's a far cry from a general purpose parallel processor. <hr></blockquote>

You obviously haven't been paying attention then.

The advantage that the new graphics hardware has is that it introduces (to the desktop anyhow) a whole new model of computing (pixel and vertex based) that can actually be applied on a much wider scope, and which is designed from conception to be extremely highly parallel. Conventional processors can only execute linear code so fast (as limited by current technology), and to do things in parallel you need to write threaded code and handle all of the management that that entails... and then they all get in eachother's way. These new graphics processors are designed from the ground up to do smaller tasks across a huge data set and handle the interaction, pipelining and memory access.

It may take a couple of years (or maybe not) but I think we'll start to see some very unconventional uses for these things. The main processor(s) is still going to be important to "run the show", but real computation may very well be done on the GPU. Apple seems to be getting closer to having leading edge GPUs in their machines, so they ought to be able to retain parity with the PC world in this department and perhaps the speed of the main processor will become less of an issue than it is currently. The nice thing from a developers point of view is that the industry is coming up with standard ways to program these things (OGL2 & DX9) so it usually doesn't matter exactly which board you have in your machine as long as it meets your code's minimum spec. At some point all these boards will meet your spec, they'll just run at different rates. Since every machine has a graphics engine, eventually developers will be able to count on it being there.

xype · May 19, 2002 12:24PM

[quote]Originally posted by Programmer:

At some point all these boards will meet your spec, they'll just run at different rates. Since every machine has a graphics engine, eventually developers will be able to count on it being there.<hr></blockquote>

This is where Apple will have quite an advantage over x86 PCs in my opinion. Apple can simply decide to put, say, a GeForce4+ class card in all new machines and say "from now on, the GPU is our b*tch!". The Wintella world however will be stuck with a whole lot of non ogl2 compilant GPUs before they become standard and even if dx9/ogl2 GPUs are there one will has as diverse platforms as 3DLabs', NVidias, ATIs and Matrox' GPUs (plus the many GPUs of integrated chipsets). It will be interesting to see how it develops and I wonder whether AltiVec could be replaced by GPUs...

gamblor · May 19, 2002 5:05PM

[quote]If you really just want to implement IP on the Ethernet chip, that is pretty trivial and rather than having another processor doing it I think we'll just see a smarter Ethernet chip that has a small processor built into it (which is hopefully flashable to fix bugs and support IPv6). <hr></blockquote>

I'm talking about implementing TCP, UDP, CIFS, WEBDAV, HTTP, SMTP, SNMP, FTP, DHCP, and a host of other acronyms that I can't think of right now

, along with protocols that haven't been developed yet.

[quote]Unless the developers write code for it, however, I don't really consider it a multi-processor system. The Quadra 840av was an asymmetric multiprocessor, for example. The DSP in that machine couldn't run the same code as the '040 so developers had to code for it specifically... and didn't. It didn't offer enough advantage, it was hard to code for, and what it could do was severely limited. Putting multiple different application processors in a machine and telling developers to code for it just makes developers walk away from it... its typically not portable and breaks when you come out with a new machine.<hr></blockquote>

Well, coders may write code for them, but in the case of an 8540, it's just a PPC... I wouldn't imagine it would be any more difficult to code for than a typical device driver. You might have to take into account syncing with the CPU, but that's a known quantity. That's the big benefit to this type of system-- you wouldn't have to learn assembly language for some wierd processor-- it would just be a matter of setting switches in a standard compiler.

[quote]Hardware design is a "zero-sum problem". The designers have a fixed price point they are aiming at, fixed space considerations, etc etc. With a given set of limitations they need to determine the most effective design that delivers the most powerful and flexible computer.<hr></blockquote>

Yeah, the basic idea is that these powerful embedded procs will get cheap enough that using one in place of a single purpose custom ASIC is economically viable. I'd imagine space & power considerations would still be a concern. That may be lessened if Moto were to produce an 8540 with, say, 64 or 128 MB of ram on chip.

[quote]The advantage that the new graphics hardware has is that it introduces (to the desktop anyhow) a whole new model of computing (pixel and vertex based) that can actually be applied on a much wider scope, and which is designed from conception to be extremely highly parallel. Conventional processors can only execute linear code so fast (as limited by current technology), and to do things in parallel you need to write threaded code and handle all of the management that that entails... and then they all get in eachother's way. These new graphics processors are designed from the ground up to do smaller tasks across a huge data set and handle the interaction, pipelining and memory access.<hr></blockquote>

Graphics is probably a bad example, because it's got higher performance requirements than just about anything else in the system, and is served quite well by processors designed specifically for it. That's the problem-- GPUs are designed for executing a 3D graphics pipeline as quickly as possible. If you're dealing with a problem that can be expressed in terms that can be evaluated by the tools provided by the processor, then it's gangbusters for ya. Otherwise, it's worthless. (The GPU has an execution unit for solving matrices! Yay! We need to solve arbitrarily-sized matrices, and it can only solve 4x4 matrices. Boo!)

Unless someone comes up with a major breakthrough in programming languages, I just don't see widespread adoption of parallel units happening any time soon. How long have we had these things around now? Since 97/98 (for MMX)? And programmers still avoid them like the plague. Unless someone writes a programming language that is easy to learn & use and can build SIMD code, I just don't see it happening. Hell, even with an easy-to-understand macro code, programmers avoid Altivec like it has horns.

[ 05-19-2002: Message edited by: Gamblor ]

programmer · May 20, 2002 12:50AM

[quote]Originally posted by Gamblor:

Graphics is probably a bad example, because it's got higher performance requirements than just about anything else in the system, and is served quite well by processors designed specifically for it. That's the problem-- GPUs are designed for executing a 3D graphics pipeline as quickly as possible. If you're dealing with a problem that can be expressed in terms that can be evaluated by the tools provided by the processor, then it's gangbusters for ya. Otherwise, it's worthless. (The GPU has an execution unit for solving matrices! Yay! We need to solve arbitrarily-sized matrices, and it can only solve 4x4 matrices. Boo!)

Unless someone comes up with a major breakthrough in programming languages, I just don't see widespread adoption of parallel units happening any time soon. How long have we had these things around now? Since 97/98 (for MMX)? And programmers still avoid them like the plague. Unless someone writes a programming language that is easy to learn & use and can build SIMD code, I just don't see it happening. Hell, even with an easy-to-understand macro code, programmers avoid Altivec like it has horns.

<hr></blockquote>

Well we use 'em, I can't speak to why the other weenies aren't.

Same goes for the programmable graphics engine and the future potential non-graphical uses for them. MMX, SSE, 3DNow! all pretty much suck. SSE2 doesn't suck as much, but its a pain in the butt to use and only recently has its installed base grown to the point where its worth coding for. AltiVec is definitely the most usable SIMD unit I've seen, but its market is quite small as well -- many developers are hard pressed to do a Mac port, coding specifically for a vector unit only on one platform just isn't worth it. This is one area where the new OGL2/DX9 specs are really interesting... they provide a hardware independant way of coding for these vector engnies, and compile to the native format at runtime.

The graphics engines and SIMD units operate on vectors of 4 floats (typically), so if you can figure out how to work in those units then you're in business. There are lots of problems that don't apply to this kind of a compute engine (for example: ones which don't require a lot of computation!)... but there are lots and lots of problems that do, including many surprising ones where you just wouldn't have thought it applicable.

And as for implementing the higher level Internet protocols on specialized processors... why bother? You may as well have another general purpose SMP processor that can share in all the system's tasks uniformly, even when no Internet protocols are in use. If you are doing a lot of Internet work, then all the SMP processors can share in the network duties. And some of those protocols are pretty expensive so to perform adequately the specialized processor would have to be almost a general purpose one anyhow.

razzfazz · May 20, 2002 5:11AM

[quote]Originally posted by Gamblor:



I'm talking about implementing TCP, UDP, CIFS, WEBDAV, HTTP, SMTP, SNMP, FTP, DHCP, and a host of other acronyms that I can't think of right now , along with protocols that haven't been developed yet.

<hr></blockquote>

Offloading IP, TCP and UDP would make sense, the client part of DHCP might, too, and probably IPsec and such. This could all happen inside the OS, invisible to the outside world.

The rest is purely userland stuff, though, and thus couldn't sensibly be offloaded to an ASIC, 'cos applications would have to be explicitely written to do that. In fact, I don't think it would be necessarily worth the effort either, 'cos the real CPU-intensive stuff is the TCP/IP stack. Once you offload that, the protocols that lie on top of it don't really eat that much processor power.

Bye,

RazzFazz

Quad processors ?

Comments