View Full Version : TOE NIC's In All Future PowerMacs?
OverToasty
01-25-2005, 09:38 PM
Try this on.
Given that Apple is doing way-snazzy things in the realm of Video, Audio and Networks (I'm personally still happily freaked over Logic Nodes) - with the X-SAN stuff, and no doubt, future "plug-and-play" (definitely for lack of a better term!) distributed processing integration with most future versions of their multi-media apps ...
... there seems to be one big thing missing from the mix, hardware wise, and that seems to be TCP/IP offload NIC's on the mobo of the actual PowerMac's themselves - rather than forcing folks who wish to go with a SAN to buy and install them separately after the fact. I realize TOE NIC's are generally thought of only in the realm of SAN access, but I suspect their benefit may be felt in distributed processing too.
I especially think Apple could be the first company to successfully bring this hi-end workstation technology down to the ease of use of virtually any moderately skilled user. Which, my guess says, would be a big bonus in the direction of Mac's in the hi end and mid multimedia world - particulairly if Apple can offer a simple "plug-and-play" SAN option (software AND hardware), especially to the medium range video market. (pick up a few X-RAIDs, an X-Serve, X-SAN and a few new PowerMacs with FCP - boom, you're in business, without ever opening a single case, just like folks who had to pay 10x as much before, but without anywhere near the hassle - and, it's really easy to administrate).
I also suspect (with my limited knowlege on this), that sticking TOE NIC's on the mobo in future PowerMac's may beef up much of the distributed processing Apple almost certainly has coming down the pipe in their apps - much as they currently do with Logic Nodes. The advantage here is, you never need even think of a SAN, as long as you have other Mac's on the network operating as CPU nodes, you can borrow their un-used cycles to greatly enhace your productivity over a standard NIC, SAN X-RAID/SERVE hardware or not.
Also Apple, to the best of my knowledge, was either the first, or one of the first, to put Gig Ethernet on the Mobo on their machines - so they have a bit of a tradition of being ahead of the curve in that manner.
Anyway, it's a thought, with a huge upside - I guess the only consideration is cost and heat - would it cost an arm and a leg to put TOE NIC's on the Mobo of all future PowerMacs? If not, and if heat isn't much of an issue (these things are basically small parallel processors), then I would not be at all surprised if Uncle Steve boasts Apple as being the only computer company having a complete distributed processing SAN solution, where the end user never needs to crack a case, which doesn't require a trained technician on site to maintain, and can leverage their TOE NIC's for all kinds of distributed processing beyond just SAN use alone.
If nothing else, it'll help intelligent, self-aware creative types (as Mac users generally are) get more content on the air, so that things
such as this (http://www.wiredvideo.com/clips/av9/mark-mathis-1.wmv)
can be avoided in the future.
Thoughts?
Rhumgod
01-25-2005, 10:09 PM
Originally posted by OverToasty
Thoughts?
TOE NICs only benefit are for high-bandwidth, sustained connections. A SAN storage server perhaps, but a PowerMac? I don't see the benefit for the end user.
For those not too informed on the matter, follow this link (http://limnos.csrd.uiuc.edu/notes/linux-networking/toe.html).
OverToasty
01-25-2005, 11:25 PM
Originally posted by Rhumgod
TOE NICs only benefit are for high-bandwidth, sustained connections. A SAN storage server perhaps, but a PowerMac? I don't see the benefit for the end user.
For those not too informed on the matter, follow this link (http://limnos.csrd.uiuc.edu/notes/linux-networking/toe.html).
I guess the issue I'm getting at above is, even if you have Gigabit ethernet on your Mac, you can't edit video over it - since your mac is spending too much time dealing with the IP stack to properly handle cycle hungy video at the same time; to get around this, folks typically have to buy a separate PCI TOE card to edit video over a SAN. Pain in the butt.
This kinda defeats a lot of the ease of use, especially when Apple went to so much trouble to create user friendly X-SAN - it also means users have to go to the trouble of paying the money after the fact to reap the benefit, which they probably won't, since all this cycle sharing won't be something they'll get into unless it's made very easy, and the benefits are clearly made for developers and users alike.
So, stick TOE cards on all new PowerMac's (hoping it can be done cheap enough to make it worth while), and reap the benefits of this - people will just think "Oh, Mac's can do that?" - SAN and other hi-bandwidth side benefits too - much as Logic Node currently does - but pushing far more over the pipe.
All at the cost of a mobo TOE NIC.
It's a winner if they can do it cheaply enough.
mmmpie
01-25-2005, 11:39 PM
Originally posted by OverToasty
folks typically have to buy a separate PCI TOE card to edit video over a SAN.
Arent SANs typically defined by running over a medium other than IP/Ethernet? Isnt that the world of Fibre Channel and Infiniband? Is the xserve raid doing IP over FC? or a custom SAN protocol?
I thought that using IP/Ethernet you were running a NAS. I certainly wouldnt considered a NAS an appropriate solution for high bandwidth tasks like video editing. I can see how putting a NAS on its own subnet/vlan/cable with a TOE NIC would improve the situation. Is that cheaper than just using a SAN?
Seems to me that it is likely as the computing environment improves SANs will be replaced by NAS', and that in the future you will see TOE NICs showing up in all machines. I also think that that is some time away. But the ubiquity of IP/ethernet always seems to overcome its limitations.
Programmer
01-25-2005, 11:44 PM
POWER5 has FastPath technology, one part of which was handling of the TCP stack without requiring processor interrupts. That gets you most of the benefit of a TOE without requiring an extra part, and (best of all) it runs at processor clock rate and uses the processor's fast memory subsystem and cache resources.
Rhumgod
01-26-2005, 09:16 AM
I also remember reading somewhere that Altivec can handle this as well. Not sure how effectively, however. I think it was in their Developer section.
OverToasty
01-26-2005, 09:22 AM
Originally posted by mmmpie
Arent SANs typically defined by running over a medium other than IP/Ethernet? Isnt that the world of Fibre Channel and Infiniband? Is the xserve raid doing IP over FC? or a custom SAN protocol?
I thought that using IP/Ethernet you were running a NAS. I certainly wouldnt considered a NAS an appropriate solution for high bandwidth tasks like video editing. I can see how putting a NAS on its own subnet/vlan/cable with a TOE NIC would improve the situation. Is that cheaper than just using a SAN?
Seems to me that it is likely as the computing environment improves SANs will be replaced by NAS', and that in the future you will see TOE NICs showing up in all machines. I also think that that is some time away. But the ubiquity of IP/ethernet always seems to overcome its limitations.
Yeup, you're correct - Fiber Channel is the way it's done now, and perhaps eventually it'll work it's way over to NAS ... but I wasn't so concerened with the server end of things, nice if it comes along for the inexpesive ride too (thanks for the clarification though) ... I was more concerned with the workstation, and getting the most of out it vis a vis piping huge amounts of data over IP, while doing lot's of heavy internal processing at the same time. I can see a need for this as two worlds appear to be converging - Network Storage & Cycle Sharing - Vis Storage, to greatly help workgroup workflows, such as in the video realm, where keeping all the video data on a single local machine is just silly ... and for the purposes of cycle sharing across machines, which I suspect is going to greatly grow in importance, especially for Apple, since they can leverage this so easily - which BTW, is particulairly helpful in the media realm (think Apple, again).
OverToasty
01-26-2005, 09:40 AM
Originally posted by Programmer
POWER5 has FastPath technology, one part of which was handling of the TCP stack without requiring processor interrupts. That gets you most of the benefit of a TOE without requiring an extra part, and (best of all) it runs at processor clock rate and uses the processor's fast memory subsystem and cache resources.
WOW!
Well this is good news, I certainly hope Apple takes advantage of this: I was at a conference a few years back on Pooch technology (simple Multi-Computer Cycle sharing/clustering) and one of the big advantages Mac's had was that they where a known quantity - you could simply install Pooch on a group of Standard Mac's, and boom - instant cluster: as opposed to x86 boxes, where who knows what they're made of and what's on them - in short, for Microsoft to try to play on this field would be a hell of a lot more work, just as the Linux Beowulf doods.
So - if TOE technology (or it's FastPath counterpart) is as useful to heavy media processor clustering & storage as I think it might be ... then Apple should play this card.
Hmmmmmmm
Programmer
01-26-2005, 10:13 AM
Originally posted by Rhumgod
I also remember reading somewhere that Altivec can handle this as well. Not sure how effectively, however. I think it was in their Developer section.
Not exactly -- AltiVec can be used to make the processor's main thread do this work faster, but the processor's main thread still does the work. The FastPath is an additional section of the processor that you can think of as a dedicated thread which operates completely in parallel with the main thread (or threads in the case of POWER5). This is a huge advantage because the cost on interrupting a modern pipelined processor to handle an incoming message is very significant, and that overhead alone can bring a processor to its knees in a busy network environment.
Rhumgod
01-26-2005, 11:06 PM
Originally posted by Programmer
Not exactly -- AltiVec can be used to make the processor's main thread do this work faster, but the processor's main thread still does the work. The FastPath is an additional section of the processor that you can think of as a dedicated thread which operates completely in parallel with the main thread (or threads in the case of POWER5). This is a huge advantage because the cost on interrupting a modern pipelined processor to handle an incoming message is very significant, and that overhead alone can bring a processor to its knees in a busy network environment.
Ah, found the article I was referring to - it was IBM's library, not Apple's.
Link here (http://www-106.ibm.com/developerworks/library/pa-altivec1/).
Programmer
01-28-2005, 11:25 PM
Originally posted by Rhumgod
Ah, found the article I was referring to - it was IBM's library, not Apple's.
Link here (http://www-106.ibm.com/developerworks/library/pa-altivec1/).
Yes, that allows the conventional method of having the processor handle network messages to run somewhat faster. Unfortunately the processor still has to handle the network messages. That means it has to interrupt whatever it was doing, process the message, and the go back to what is was doing before -- and all of that requires time and tends to pollute caches and such. The TOEs that this thread is about (and the POWER5's FastPath) handle the network messages without requiring processor attention at all. Much better.
OverToasty
01-29-2005, 07:33 PM
Originally posted by Programmer
Yes, that allows the conventional method of having the processor handle network messages to run somewhat faster. Unfortunately the processor still has to handle the network messages. That means it has to interrupt whatever it was doing, process the message, and the go back to what is was doing before -- and all of that requires time and tends to pollute caches and such. The TOEs that this thread is about (and the POWER5's FastPath) handle the network messages without requiring processor attention at all. Much better.
Which leads me to suspect that, for a central point feeding a cluster, there's a ballance point between the offloading of cycles vs interrupts on the processor - after a certain point, no matter how many other processors you've got kicking around, the amount of interrupts coming back at you, would completely outweigh the benefit of the offloading (with cache clearing and pipeline bubbling).
Naturally, there's more to this phenomena (and, in fact, there's almost certainly a name for it) however, with some form of TOE-ish NIC's, Mac's could push that ballance-point down to the next bottle-necking factor, that could perhaps be better by a factor of 2, perhaps an order of magnitude? (beats me) - whatever it may be, the nice thing about this is - if you've got the processors kicking around to take advantage of the new pipe - you'll be able to do things easily with clustered Mac's that most non-mac shops could only dream of.
If so, this would give one hell of a competitive advantage to a Mac shop.
8)
Splinemodel
01-29-2005, 09:37 PM
Originally posted by Rhumgod
TOE NICs only benefit are for high-bandwidth, sustained connections. A SAN storage server perhaps, but a PowerMac? I don't see the benefit for the end user.
For those not too informed on the matter, follow this link (http://limnos.csrd.uiuc.edu/notes/linux-networking/toe.html).
I'm a bit confused at the whole wonderment behind "TOE." Maybe I'm incorrect about the nature of the concept, but it doesn't seem like a big deal to me. A fairly cheap microcontroller can easily handle a TCP/IP stack, even at gigabit speeds.
Programmer
01-29-2005, 11:53 PM
Originally posted by Splinemodel
I'm a bit confused at the whole wonderment behind "TOE." Maybe I'm incorrect about the nature of the concept, but it doesn't seem like a big deal to me. A fairly cheap microcontroller can easily handle a TCP/IP stack, even at gigabit speeds.
Not if that gigabit network is at maximum capacity with all the packets coming at the machine in question. At that kind of load level, even high end servers start to buckle under the strain. Conventional systems get to the point that they are interrupted so often they can't manage to do any real work.
Splinemodel
01-30-2005, 12:30 AM
Originally posted by Programmer
Not if that gigabit network is at maximum capacity with all the packets coming at the machine in question. At that kind of load level, even high end servers start to buckle under the strain. Conventional systems get to the point that they are interrupted so often they can't manage to do any real work.
They buckle because they aren't designed specifically to manage a TCP/IP stack. The actual rate of stack update is irrelevant as long as the maximally long path in the code is less than the required time, which incidentally is a requirement for it being GigE. It just means that the chip won't get much idle time. More specifically, you select a high clock rate MCU with a simple set of integer and conditional instructions. Presumably you only need to make sure that it can do a data I/O every 30ns, perhaps even slower, and then mate it to a really high speed shift register. What we have now are people who are doing just this, and then padding up the cost a ton because they know how overpriced anything in the server market is.
Anyway, the bottom line is that you could put it all on a relatively tiny piece of silicon, at least compared to what you see in servers. Eventually, I bet that all ethernet controllers will be single chip TOE, and it's not unlikely that we'll see them in Powermacs first, since PC's still don't tend to have autosensing ports, DVI-out (in laptops) or anything that would make you think you're living in 2005.
Programmer
01-30-2005, 02:41 PM
Originally posted by Splinemodel
It just means that the chip won't get much idle time.
Well this is pretty much exactly the point -- what is the use in handling all the packets if you can't actually do any work on them because you're so busy handling them?
Sounds like we agree, however... implementing this isn't terribly complex and it could easily be integrated into either the chipset or the processor with a cost delta of approximately zero.
Splinemodel
01-30-2005, 04:02 PM
As usual, Programmer, you are very astute. I guess my point is that I think all ethernet should be TOE, and that I'm a big fan of decentralization of processing. Additionally, I don't think it's out of the question for some party to pursue a single chip TOE ethernet solution, and I bet something of that nature is in development now.
mmmpie
01-30-2005, 04:15 PM
Originally posted by Splinemodel
They buckle because they aren't designed specifically to manage a TCP/IP stack. The actual rate of stack update is irrelevant as long as the maximally long path in the code is less than the required time, which incidentally is a requirement for it being GigE. It just means that the chip won't get much idle time. More specifically, you select a high clock rate MCU with a simple set of integer and conditional instructions. Presumably you only need to make sure that it can do a data I/O every 30ns, perhaps even slower, and then mate it to a really high speed shift register. What we have now are people who are doing just this, and then padding up the cost a ton because they know how overpriced anything in the server market is.
I went and read some interesting articles about the performance of tcp/ip stacks, and TOE.
* Saturating a gigE requires a pentium IV 2.4 GHz
* Memory bandwidth is quite high receiving packets due 3 memory copies.
Using a TOE relieves the load on the cpu, and if it is well designed it can dramatically reduce the memory bandwidth. The silicon cost depends on how much of the stack is implemented, maintaining the state of TCP connections is expensive.
* 16mb of RAM to maintain the state for all TCP ports.
* RAM is required to buffer the incoming data. How much is pretty much an open question. iSCSI has very precise predefinition of memory requirements, and it seems that TOE chips are going to be targeted at those needs.
* 10gigE is going to require 10 times as much memory for its buffers.
It is pretty easy to envisage a TOE card having 256 or 512mb of RAM.
I dont think any of this is to expensive, certainly isnt going to make it into a consumer machine any time soon.
For the original context of the discussion ( streaming media for editing over TCP ) FC was intended as a replacement for SCSI. It is the appropriate medium for this sort of work. It can do 2gb a second without any of the issues that TCP has ( for that matter, Im sure you could do SCSI over ethernet - but I havent seen any mention of it, people want routable protocols ). The issue is really one of asset management. It doesnt make a whole lot of sense to have hundreds of workstations with their own terabyte FC raids. Not only is it a maintenance nightmare, but its also an asset management nightmare. The solution to that is to use the workstations as a front end to a compute server, which is what I think most graphics effect companies do. The desire then is to create the compute server out of off the shelf products ( xserves ) which brings you full circle.
Conclusion: TOE is inevitable. Although technologies like FC are cheaper to implement, their poor market penetration burdens them with high costs. Eventualy TOE is going to be integrated into all ethernet chips, simply because it is possible ( initially a value add ). I dont see TOE being a huge benefit for the general market - its memory requirements are large - but for people running iSCSI, or fibre channel over ethernet ( theres about four protocols for that ) it will be invaluable.
Will Apple get an early start on this? They dont create their own ethernet chips. They will use TOE when their suppliers integrate it into their products. Apple has had a good history of early adoption of ethernet technology, and I do expect to see it shipping TOEs as standard before the general x86 workstation market does. The large amount of memory required makes it an obvious choice for a shared memory architecture. When you look at it that way it may well be cheaper ( initially ) to just use dual cpu machines. One cpu does the TCP stack while the other does the compute. With dual core this becomes even more likely.
TOE seems like a technology that is between a rock and a hard place. High end machines can just throw cpus at the problem, low end machines cant afford to begin with, and by the time it is cheap enough cpus will be dual core. 10gigE will change that, I guess I see the TOE vendors waiting for 10gigE to really hit the market.
Splinemodel
01-30-2005, 09:12 PM
Originally posted by mmmpie
I went and read some interesting articles about the performance of tcp/ip stacks, and TOE.
* Saturating a gigE requires a pentium IV 2.4 GHz
* Memory bandwidth is quite high receiving packets due 3 memory copies.
yadda yadda yadda
You're thinking about this the wrong way. A TCP/IP stack can be written comfortably in 64K of program space. That is, no OS, no over complex BS. Using a Pentium 4, or for that matter a G4, to do TOE is akin to killing a frog with a tank rather than a BB gun. As I said, all you need to do is make sure that a complete I/O loop and the maximally long stack operation can be performed in 30ns, perhaps even 120ns if you want to use 128bit bus rather than 32bit bus. Now, doing it the easy way, that is, using higher level coding paradigms and then giving it enough clock and memory until it works, may require a P4 with 16MB of RAM. Doing it the elegant way will not.
Programmer
01-30-2005, 11:16 PM
Hence the advantage of IBM's choice to put that functionality into the POWER5 -- leverage the processors clock rate and bus(es). No extra chips in the system either. Just a bit of hardware added to the CPU that replaces some software that used to run on the programmable part of the CPU.
mmmpie
01-30-2005, 11:27 PM
Originally posted by Splinemodel
You're thinking about this the wrong way. A TCP/IP stack can be written comfortably in 64K of program space. That is, no OS, no over complex BS. Using a Pentium 4, or for that matter a G4, to do TOE is akin to killing a frog with a tank rather than a BB gun. As I said, all you need to do is make sure that a complete I/O loop and the maximally long stack operation can be performed in 30ns, perhaps even 120ns if you want to use 128bit bus rather than 32bit bus. Now, doing it the easy way, that is, using higher level coding paradigms and then giving it enough clock and memory until it works, may require a P4 with 16MB of RAM. Doing it the elegant way will not.
Im not discussing the complexity of the stack.
Im not a TCP expert, the article I read claims that it takes ~256 bytes to maintain the state information for one open port ( sliding window, syn number, missing packets etc ).
There are 65536 ports, giving you a total of 16mb of space required to maintain the state information for a system with all available ports open.
Now, it is fair to say that most systems wont encounter a situation where all their ports are open, but it certainly could happen.
Note, that the high performance cost is for receiving data.
That 16mb is just state, and doesnt cover the amount of memory required to buffer the incoming data while it is assembled.
OverToasty
01-31-2005, 10:31 AM
Originally posted by mmmpie
TOE seems like a technology that is between a rock and a hard place. High end machines can just throw cpus at the problem, low end machines cant afford to begin with, and by the time it is cheap enough cpus will be dual core. 10gigE will change that, I guess I see the TOE vendors waiting for 10gigE to really hit the market.
I agree with much of what you say Mr. Pie, but I'm afraid when it comes to extreme multimedia stuff - HD Video (which is quickly becoming standard on all Macs) and audio, computers need all the CPU horsepower they can get, and loosing a CPU just to handle an IP stack pretty much negates all that could perhaps be gained by distributed processing: it's a pretty good example of the cure being no better than the disease.
I like Programmer's mentioning of the POWER5 fastpath trick - if whatever the next chip Apple comes out with takes advantage of this, this would be a wonderful use to put it to, however - short of this - some form of small, dedicated parallel processor is pretty much a must: I need my 20 Lexicon quality reverbs/real- time HD layered ChromaKey effects etc ... and the only way to guarantee that kind of stuff, is if I have both processors crunching away on my media App.
Sincerely
Still-Kinda-Freaked-At-What-I-Can-Do-With-All-This-Stuff
OT
:D
Programmer
01-31-2005, 11:17 PM
Originally posted by OverToasty
...loosing a CPU just to handle an IP stack pretty much negates all that could perhaps be gained by distributed processing: it's a pretty good example of the cure being no better than the disease.
I like Programmer's mentioning of the POWER5 fastpath trick ...
This reminds me -- one reason that FastPath (or an offchip TOE) is especially important is because interrupting a deeply pipelined, cache dependent processor is really brutal to its performance. Just dealing with the context changes loses you a great deal of performance -- even if the time spent actually processing the TCP packets is zero. The dedicated TCP processing circuitry, however, is designed to respond quickly and exclusively to these message and thus is very efficient about it... and it leaves the main threads undisturbed.
I have not seen any mention of FastPath from IBM in some time.
iReady (now part of nVidia) and Broadcom supposedly have cheap single-chip 1Gbps TOE chips. Chelsio and NetEffect claim to have 10Gbps TOEs, but those are $$$$.
The downside of TOE that has been discovered (in the few real-world implementations that exist) is that talking to the TOE costs almost as much as just doing the TCP processing.
I agree that eventually TOE will become free, but I am skeptical about the performance benefits.
Programmer
02-03-2005, 09:07 AM
Originally posted by wmf
I have not seen any mention of FastPath from IBM in some time.
...
The downside of TOE that has been discovered (in the few real-world implementations that exist) is that talking to the TOE costs almost as much as just doing the TCP processing.
I agree that eventually TOE will become free, but I am skeptical about the performance benefits.
IBM said hardly anything about FastPath even when they were talking about it. Looking at their server performance numbers compared to their SPEC numbers, however, seems to indicate that they did something significant in their TCP stack handling.
I'm not surprised about the off-chip TOE performance issues -- that's why I emphasize the benefits of on-chip implementations. In the future we may see other kinds of on-chip hardware useful for handling this kind of work (and other things).
Splinemodel
02-03-2005, 10:03 AM
Originally posted by wmf
I have not seen any mention of FastPath from IBM in some time.
iReady (now part of nVidia) and Broadcom supposedly have cheap single-chip 1Gbps TOE chips. Chelsio and NetEffect claim to have 10Gbps TOEs, but those are $$$$.
The downside of TOE that has been discovered (in the few real-world implementations that exist) is that talking to the TOE costs almost as much as just doing the TCP processing.
Then someone is doing something incorrectly. Generally, a chip's GPIOs (general purpose input output) are tied to a memory buffer (register), and are managed by onboard circuitry. That is, the chip doesn't have to strobe the IOs, it just dumps data to the output register for output, and waits for input interrupts before loading up the input registers. The port circuitry takes care of the command signaling, and ultimately I'd hope that a TOE would be intelligent enough to merely suck up some binary data and spit it out as ethernet.
mdriftmeyer
02-03-2005, 06:38 PM
Sure seems to be a budding business opportunity.
If this seems fishy to you then build a solution, hunt down a Venture Capital Firm and bring a cheaper solution to market.
Originally posted by Splinemodel
Then someone is doing something incorrectly. Generally, a chip's GPIOs (general purpose input output) are tied to a memory buffer (register), and are managed by onboard circuitry. That is, the chip doesn't have to strobe the IOs, it just dumps data to the output register for output, and waits for input interrupts before loading up the input registers. The port circuitry takes care of the command signaling, and ultimately I'd hope that a TOE would be intelligent enough to merely suck up some binary data and spit it out as ethernet.
Programmer and I have different theories about FastPath, but I'll say no more.
Splinemodel: TOE chips aren't attached directly to the processor; they have to go through PCI, which has its own limitations. But I was talking more about the software overhead anyway. To get data to the TOE you have to potentially enter the kernel, set up a descriptor, use a PIO to ring the doorbell, etc. Likewise on the RX side.
OverToasty
02-03-2005, 07:23 PM
Originally posted by wmf
Programmer and I have different theories about FastPath, but I'll say no more.
Splinemodel: TOE chips aren't attached directly to the processor; they have to go through PCI, which has its own limitations. But I was talking more about the software overhead anyway. To get data to the TOE you have to potentially enter the kernel, set up a descriptor, use a PIO to ring the doorbell, etc. Likewise on the RX side.
What are workstations currently using to solve the distributed processing problem? Perhaps we'll see something migrating down very shortly?
Originally posted by OverToasty
What are workstations currently using to solve the distributed processing problem? Perhaps we'll see something migrating down very shortly?
I'm not sure what you mean. Workstations still tend to use non-TOE Ethernet. Clusters often use Myrinet or Infiniband, both of which have full transport protocol offload, but I don't see those moving downmarket.
mmmpie
02-03-2005, 10:05 PM
Originally posted by wmf
But I was talking more about the software overhead anyway. To get data to the TOE you have to potentially enter the kernel, set up a descriptor, use a PIO to ring the doorbell, etc. Likewise on the RX side.
And this seems to be why TOE is moving towards the specialised area of iSCSI acceleration. The iSCSI protocol is very regimented ( being SCSI ) and so all of the buffers required in the hosts memory can be preallocated. The DMA descriptors can be setup in advance etc etc. The TOE can simply offload incoming data into the buffers while the host consumes them.
What I have read indicates that the real value of TOE comes in receiving data ( not tranmitting ) due to difficulty in making optimisations in the TCP stacks memory handling. Im pretty sure that hosts have an easy time ( excluding PCI saturation ) of pushing the limits of gigabit, particularly in the streaming scenario that started this thread ( TOEs dont seem like a general purpose solution, but might become widely available due to cost ).
What are workstations currently using to solve the distributed processing problem? Perhaps we'll see something migrating down very shortly?
It all depends on the parallelism of the problem being solved. Things like distributed compilation work quite nicely over ethernet ( files can be copied while you edit them ). The more communication a task requires the more value you get out of low latency low overhead mechanisms. Hence the use of inifiniband, fibre channel etc. Im pretty sure that both PCI express and Hypertransport are getting out of box connectors specified primarily to attack these sorts of problems.
Of course, at the high end supercomputers are built in big boxes so that high speed interconnects can be used, even shared memory ( although big SMP boxes arent real popular anymore ).
Horses for courses.
Programmer
02-03-2005, 10:47 PM
Originally posted by wmf
Programmer and I have different theories about FastPath, but I'll say no more.
Okay, that's just blatant baiting. :)
Do you think they removed it from the POWER5 spec? Or something else? Any evidence to back that up?
mmmpie
02-21-2005, 05:02 PM
Looks like I was wrong about the introduction of TOE into low end machines.
Nvidia have turned on their TOE engine in their nforce pro chipset.
http://www.nvidia.com/page/nforce_Pro.html
They have had firewall functionality in previous chipsets, looks like this is an advance on that.
TOE is coming, TOE is coming.
Aphelion
02-21-2005, 08:58 PM
TOE IS COMING (http://www.theregister.co.uk/2005/02/18/intel_tcpip_attack/)
vBulletin® v3.8.4, Copyright ©2000-2010, Jelsoft Enterprises Ltd.