Rackmounts Coming

1246

Comments

  • Reply 61 of 114
    thttht Posts: 5,451member
    Dual channel Rambus channels is not memory interleaving. The processor will see 512 MByte of memory if a 256 MByte Rambus module is on one channel and another 256 MByte Rambus module is on the other channel.



    Rambus is a serialized DRAM memory protocol while DDR SDRAM is the usual parallel memory protocol. Rambus being serialized represents some advantages and disadvantages for computers.



    There is no way to deny it, Rambus has twice the bandwidth per pin as DDR SDRAM. In terms of pure bandwidth, Rambus will always be faster than DDR SDRAM. In sustained sequential memory transfers, Rambus will deliver twice the memory performance as DDR will. Ie, single channel PC800 will deliver 1400 MByte/s while PC2100 DDR SDRAM will only deliver 700 MByte/s. It can deliver huge amounts of data which is why the Pentium 4 performs better with DRDRAM than DDR SDRAM. Serialized protocols can deliver huge amounts of data, as we will see with upcoming RapidIO and Hypertransport buses.



    The problem with DRDRAM is that it's a serialize protocol. Multiple DRDRAM chips lie serially on the same serial bus and all must be synched properly. The latency for DRDRAM is the combination of all the chips on a Rambus channel. The more chips in a channel, the more time it takes to find the one chunk of memory because each chip must be checked one at a time. Any sort of random I/O will kill Rambus performance. So for very large memories, 2+ Gbyte, and computing with lots of random I/O, Rambus isn't a great solution.
  • Reply 62 of 114
    matsumatsu Posts: 6,558member
    So a nice big fast cache between the CPU and a DDR set-up may be the best all-round solution for price and performance. What's the fastest L3 cache that wouldn't be prohibitively expensive?
  • Reply 63 of 114
    programmerprogrammer Posts: 3,458member
    [quote]Originally posted by THT:

    <strong>while PC2100 DDR SDRAM will only deliver 700 MByte/s</strong><hr></blockquote>



    Not to quibble with an otherwise good post, but PC2100 DDR will deliver quite a bit more than 700 MBytes/s. Apple's current non-DDR SDRAM will deliver 700-1000 MBytes/s. The double-data rate version will deliver about 2 GBytes/s. The RDRAM advantage comes at the high end where they can scale the clock rate and add multiple channels to a degree that DDR simple cannot match -- at its not clear that they'll ever be able to catch up in terms of raw bandwidth.



    RamBus is also a political/economic nightmare, let us not forget that mess.
  • Reply 64 of 114
    daveleedavelee Posts: 245member
    It is also heartening to bear in mind that the Motorola 8540 (yes, yes, I know; it isn't the G5) sports a DDR 333MHz controller. I think it shows that Motorola (and therefore Apple) are not resting on their laurels with regard to memory architecture.



    It just remains to be seen just how DYNAMIC they can be with future technology (the one thing, I think we can agree, that has been seriously lacking with Apple's later crop of 'professional' computers).



    I hope they can get them out the door quickly.



    These upcoming servers may be very interesting (especially if they have DDR support).
  • Reply 65 of 114
    thttht Posts: 5,451member
    <strong>Originally posted by Programmer:

    Not to quibble with an otherwise good post, but PC2100 DDR will deliver quite a bit more than 700 MBytes/s.</strong>



    I think I low balled it, but 40% bus utilization (of the theoretical maximum) should be about the optimum performance for SDRAM technology. So, perhaps around 800 to 900 MByte/s for PC2100.



    <strong>The double-data rate version will deliver about 2 GBytes/s.</strong>



    I wouldn't be too sure of that. The 40% number has always been very solid mark from what I've seen, and DDR SDRAM doesn't come very close to that <a href="http://www.aceshardware.com/Spades/read.php?article_id=20000191"; target="_blank">mark</a>.



    I'm not sure if the MPX bus can truly improve upon that mark or not, but probably not since the limits seems to be on the SDRAM chips themselves...



    <strong>The RDRAM advantage comes at the high end where they can scale the clock rate and add multiple channels to a degree that DDR simple cannot match -- at its not clear that they'll ever be able to catch up in terms of raw bandwidth.</strong>



    I think a quad channel Rambus with only 1 chip per channel, similar to the Emotion Engine, acting as a backside L2 cache would be a very interesting solution. That can be 128 MBytes of L3 cache at 8.4 GB/s bandwidth with DDR SDRAM level latencies.



    <strong>RamBus is also a political/economic nightmare, let us not forget that mess.</strong>



    Yes, unfortunate that Rambus looks to be dying from their business policies.
  • Reply 66 of 114
    g-newsg-news Posts: 1,107member
    The fastest RDRAM you can get (or can't evne get actually) today is about 32ns, that's like 4 times as much as cheap SDRAM...

    What's the lowest level for system SDRAM ? 7ns?

    and graphics stuff goes down to 4ns or so...



    That's where they'd have to improve.



    SDRAM could improve by lowering the CAS latency, that'd cost, but would yield better performance.



    One thing that Apple has done great, is the RAM performance, slow ass ram but nicelly used, up to 700MB/sec or so from 800 last time I checked. and if I remember correctly.



    G_News
  • Reply 67 of 114
    bodhibodhi Posts: 1,424member
    [quote]Also, concerning the whole renderfarm theory: Usually, machines in a cluster are not referred to as "servers" (unless the whole cluster's purpose is to be some sort of a load-balanced server), and as SJ obviously specifically said "dedicated server hardware", I think this pretty much indicates that clusters or renderfarms are NOT the primary field of use those rackmounts are intended for.<hr></blockquote>



    First Jobs said very little about these machines. Second of all, sure those machines in a cluster are not usually 'called' servers but the machines usually running a render farm 'ARE' servers, usually rackmounts.



    Note that Jobs also said that they can be headless, they do not need a monitor.
  • Reply 68 of 114
    thttht Posts: 5,451member
    <strong>Originally posted by G-News:

    The fastest RDRAM you can get (or can't evne get actually) today is about 32ns, that's like 4 times as much as cheap SDRAM...

    What's the lowest level for system SDRAM ? 7ns?

    and graphics stuff goes down to 4ns or so...</strong>



    Read what I said carefully. 1 chip per Rambus channel acting as a backside cache. This eliminates most of problems seen in main memory systems using Rambus. Read latency is lower because there is only 1 Rambus memory chip per channel, instead of 4 to 16 seen in RIMMs where the latency is the farthest DRDRAM chip from in the channel. A single Rambus chip should have about the same read latency as SDRAM. Trace lengths are 10x shorter since it'll be right next to the CPU, 1 or 2 inches away instead of 10 to 20 inches in a multi-RIMM setup. The controller would be right on the CPU itself. All this will allow a Rambus chip to be clocked higher as well.



    128 bit DDR SDRAM used in graphics cards is very nice, but it won't be faster then a Rambus system in a multi-channel setup because the bandwidth per pin advantage for DDR SDRAM is hard to overcome. However, I don't know why DDR SDRAM is preferred for graphics cards. Hmm... I will try to find out.



    Those latency numbers you give are latencies after the initial read latency. The read latency will last from 6 to 10 cycles around 40 to 80 ns for SDRAM. Rambus RIMM systems will be 80+ seconds for read latency. It'll get less with the PC1066 and PC1200 systems.



    <strong>One thing that Apple has done great, is the RAM performance, slow ass ram but nicelly used, up to 700MB/sec or so from 800 last time I checked. and if I remember correctly.</strong>



    Like to see benchmarks on this actually.
  • Reply 69 of 114
    welshdogwelshdog Posts: 1,898member
    All this talk of faster memory and such could be good for Apple in another way. Avid (maker of video editing systems (both Mac & PeeCee) announced at NAB this year that they would not have High Definition versions of their editing systems on Macs. Why? Even a Dual 1 gig couldn't handle the througput, something the high end Wintel boxes can do with ease.



    I realize that fast memory can't fix everything, but if we are talking faster ram, faster CPU's and faster busses then maybe Avid can make it work on the new Apple servers. How many slots do servers usually have? The current Mac based Avid's have to use an external PCI chassis, not enough slots.



    It's kindof sad. Avid was created on a Mac and now slowly but surely it is migrating to the PeeCee.
  • Reply 70 of 114
    razzfazzrazzfazz Posts: 728member
    [quote]Originally posted by Bodhi:

    <strong>

    First Jobs said very little about these machines. Second of all, sure those machines in a cluster are not usually 'called' servers but the machines usually running a render farm 'ARE' servers, usually rackmounts.

    </strong><hr></blockquote>



    I think they would probably be referred to as "nodes".

    Anyway, how is a cluster node a server? How do you define "server"?





    [quote]<strong>

    Note that Jobs also said that they can be headless, they do not need a monitor.</strong><hr></blockquote>



    Um, yeah, but what does that have to do with the dedicated server vs. renderfarm discussion?



    Bye,

    RazzFazz
  • Reply 71 of 114
    bodhibodhi Posts: 1,424member
    [quote]Originally posted by RazzFazz:

    <strong>



    Um, yeah, but what does that have to do with the dedicated server vs. renderfarm discussion?



    Bye,

    RazzFazz</strong><hr></blockquote>





    I am not going to debate you back and forth, I guess we will both find out next week. I guess you will be one of Apple's rare rackmount customers that buy just one.
  • Reply 72 of 114
    brendonbrendon Posts: 642member
    [quote]Originally posted by THT:

    <strong>[qb]Originally posted by Programmer:

    Not to quibble with an otherwise good post, but PC2100 DDR will deliver quite a bit more than 700 MBytes/s.</strong>



    I think I low balled it, but 40% bus utilization (of the theoretical maximum) should be about the optimum performance for SDRAM technology. So, perhaps around 800 to 900 MByte/s for PC2100.



    <strong>The double-data rate version will deliver about 2 GBytes/s.</strong>



    I wouldn't be too sure of that. The 40% number has always been very solid mark from what I've seen, and DDR SDRAM doesn't come very close to that <a href="http://www.aceshardware.com/Spades/read.php?article_id=20000191"; target="_blank">mark</a>.



    I'm not sure if the MPX bus can truly improve upon that mark or not, but probably not since the limits seems to be on the SDRAM chips themselves...<hr></blockquote>



    I wonder, in light of this, how hard it would be to widen the 133MHz bus and interleave it?? RAM is at 32bit wide, how hard would it be to set two interleaved 32's to make a 64bit, or to widen to 128bit?? I guess it all depends on the memory controller. I know that there would be latency issues but the through put would be very high.



    [ 05-08-2002: Message edited by: Brendon ]</p>
  • Reply 73 of 114
    amorphamorph Posts: 7,112member
    [quote]Originally posted by Brendon:

    <strong>



    I wonder, in light of this, how hard it would be to widen the 133MHz bus and interleave it?? RAM is at 32bit wide, how hard would it be to set two interleaved 32's to make a 64bit, or to widen to 128bit?? I guess it all depends on the memory controller. I know that there would be latency issues but the through put would be very high.

    </strong><hr></blockquote>



    MaxBus is already 64 bits wide. It can theoretically go to 128 bits, but that's twice as many traces on the motherboard, which makes it expensive and hot.



    It's easier and less expensive (and, generally, less effective, but close enough) to double- or quad- pump the bus than it is to widen it.
  • Reply 74 of 114
    brendonbrendon Posts: 642member
    [quote]Originally posted by Amorph:

    <strong>



    MaxBus is already 64 bits wide. It can theoretically go to 128 bits, but that's twice as many traces on the motherboard, which makes it expensive and hot.



    It's easier and less expensive (and, generally, less effective, but close enough) to double- or quad- pump the bus than it is to widen it.</strong><hr></blockquote>



    Thanks for clarification!
  • Reply 75 of 114
    programmerprogrammer Posts: 3,458member
    [quote]Originally posted by THT:

    <strong>I think I low balled it, but 40% bus utilization (of the theoretical maximum) should be about the optimum performance for SDRAM technology. So, perhaps around 800 to 900 MByte/s for PC2100.</strong><hr></blockquote>



    I wish I had benchmarks lying around, but I have seen some memory bandwidth tests from the latest QuickSilver machines... and they are astonishing, frankly. Apple is getting more out of a 133 MHz x 64-bit SDRAM system than is widely believed possible. This is in sharp contrast to their systems from a few years ago which typically under performed on the same memory technology when compared to PCs. I think this is a result of Apple paying careful attention to their memory controller (they have to because they can't go to faster memory), and the improvements to the 7455's load/store unit and MPX implementation. As I said, I don't have anything I can point to but the lowest numbers I've seen are 750 MBytes/sec, and there was at least one approaching 1000 (!!). If the DDR266 version of this controller can retain its current efficiency level, this means we'll see 1500 MBytes/sec... minimum.



    The MPX/7455 combination is well pipelined with multiple non-blocking transactions and a very useful cache streaming mechanism as part of the AltiVec extensions. I'm just hoping for a DDR version of the same bus, but if they manage to bump it to 166MHz as well that'll be great. I don't expect to ever see 128-bit bus unless they do a really high end workstation ($10K+).
  • Reply 76 of 114
    g-newsg-news Posts: 1,107member
    128bit busses would have been possible since the introduction of the G4, Arstechnica actually said it was sad they didn't go for 128bit. But it was a very clever move of Apple NOT to go 128bit, because we're already bitching about high prices...a 128bit mobo would have taken us far beyong the 5000$ mark I'd say, and frankly nobody is ready to pay such an amount of money today. People buy computers for 1000 and 2000$, and with 3000 we're already clearly above that level.



    Also 128bit bus would only make sense if you had 128bit memory or interleaved dual bank 64bit memory, which would again add costs.



    I'm looking forward to seeing what Apple has cooked up this time.



    G-News
  • Reply 77 of 114
    g-newsg-news Posts: 1,107member
    Here are some RAm performance tests, check out the STREAM results, that's the good stuff there.



    <a href="http://www.xlr8yourmac.com/systems/g4_867/G4_867_rev2_1_CPU.html"; target="_blank">http://www.xlr8yourmac.com/systems/g4_867/G4_867_rev2_1_CPU.html</a>;



    and here



    <a href="http://www.xlr8yourmac.com/systems/dual_1ghz_performance_test.html#storytop"; target="_blank">G1 DP 1GHz</a>



    [ 05-08-2002: Message edited by: G-News ]</p>
  • Reply 78 of 114
    programmerprogrammer Posts: 3,458member
    [quote]Originally posted by G-News:

    <strong>Here are some RAm performance tests, check out the STREAM results, that's the good stuff there.</strong><hr></blockquote>



    Its not really all that "good stuff"... none of them are 7455 optimized, and none of the RAM moving tests are even AltiVec optimized (except the one to the video card, which doesn't really count but does show &gt;900 in one case). Proper use of the AltiVec streaming instructions is key to optimal memory throughput on these machines.



    Optimized memory moves are a bit tricky to get right, and not all 133 MHz SDRAM is created equal. I'm not up on all the terminology and details, but you can put either CL3 or CL2 SDRAM onto your motherboard and this will impact performance significantly.
  • Reply 79 of 114
    thttht Posts: 5,451member
    <strong>Originally posted by Programmer:

    I wish I had benchmarks lying around, but I have seen some memory bandwidth tests from the latest QuickSilver machines... and they are astonishing, frankly. Apple is getting more out of a 133 MHz x 64-bit SDRAM system than is widely believed possible. This is in sharp contrast to their systems from a few years ago which typically under performed on the same memory technology when compared to PCs. I think this is a result of Apple paying careful attention to their memory controller (they have to because they can't go to faster memory), and the improvements to the 7455's load/store unit and MPX implementation. As I said, I don't have anything I can point to but the lowest numbers I've seen are 750 MBytes/sec, and there was at least one approaching 1000 (!!). </strong>



    G-News' links had some interesting and very impressive results. The results from the <a href="http://www.cs.virginia.edu/stream/mac/Bandwidth.html"; target="_blank">Stream page</a> are also very good. If the Stream PPC601 optimized results and the AltiVec and CopyBits results are real - it represents real-world usage - then I can see why Apple is in not much of a hurry to move to PC2100 DDR SDRAM. Those are pretty close to DDR SDRAM numbers in the PC world. Somewhere in the 50 to 80% bus utilization is mightily impressive.



    The primary reason it got better is because of the MPX bus and Apple moving from Moto's MPC105/106 northbridge used in all Power Mac G3 systems to their own custom northbridge for the G4 systems. All of the G4 computers should have twice as much (and in Steve Jobs' words 3x) memory bandwidth as G3 systems. Moto has yet to develop an MPX bus memory controller.



    <strong>I'm just hoping for a DDR version of the same bus, but if they manage to bump it to 166MHz as well that'll be great.</strong>



    PC166 (166 MHz SDRAM) v. PC2100 DDR SDRAM would be a very interesting comparison. It could come out in a dead heat in terms of performance.



    [ 05-08-2002: Message edited by: THT ]</p>
  • Reply 80 of 114




    Found a pic.



    TING5
Sign In or Register to comment.