New Intel Technology to put DRAM in the CPU

bluematter · July 2, 2008 1:16AM

wouldnt this use more watts and run hotter?

outsider · July 2, 2008 8:26AM

Quote:

Originally Posted by bluematter

wouldnt this use more watts and run hotter?

Compared to what? Having a CPU, an off die memory controller and the RAM on DIMMS compared to everything on one die? At worst you come out even, but likely you are to save energy on the whole, especially since this is being planned on being used with an improved manufacturing process.

hiro · July 2, 2008 11:39AM

Quote:

Originally Posted by bluematter

wouldnt this use more watts and run hotter?

More watts on the die itself, incrementally yes, but at a far lower energy-used density, so it is cooler in comparison. Even cache shows this pattern and SPARC researchers been fooling with methods that keep hot cores from running next to each other. Most of the work is currently on software core scheduling but one of the hardware ideas is to sandwich cores between memory and let the memory act like a heatsink/buffer, rather than let two hot adjacent cores push each other to a heat related failure.

hiro · July 2, 2008 11:56AM

Quote:

Originally Posted by Programmer

Fair enough, but you're comparing today's apples to tomorrow's oranges. If this tech can be used for on-chip RAM, it can be used with off-chip RAM. So your statement is really that memory technology is going to become more reliable because of this. Good stuff!

That's a historical perspective that may not last into the future, or at least not universally. The move to large numbers of simple in-order cores makes processors much more like a commodity than ever before... and embedding them with large on-chip memories even more so. Imagine a future where you buy your tightly couple memories and cores on DIMMs and plug them into your motherboard. A single heavy-weight central (multi-core) CPU may remain, but the majority of the computational power and parallelism will be embodied in the vast array of simple processors which are coupled tightly with memory (minimizing latency and maximizing bandwidth). Buying DIMMs suddenly gets a whole lot more complicated.

I'm not saying it will go this way, but the idea of putting computational ability into the memory has been around for a long time... at least ever since Backus bemoaned Von Neumann's bottleneck over 30 years ago. And if we are looking at putting 100+ processors in a single machine, we've got to get serious about addressing this problem.

It's possible this type of memory can be used for off CPU die memory, but some of the reliability benefits are because the memory isn't on the bus, so those would be lost. Also it wouldn't do much of anything special for bandwidth. The really the big deal with this stuff is the no capacitors so it can be effectively used on-die to substantially reduce latency and increase bandwidth. Everything else is just gravy.

In a few years the question will be 64 cores per CPU die and off-chip memory, or 32 cores and ~8GB of on-die memory. I'll vote for the latter. Then imagine having a several dozen GB worth of off-die hard drive DRAM cache on the MB. With OS X's memory manager already treating RAM as if it were physical VM cache, which is just a different way of saying consider it a HD cache, its already well placed to take advantage of this type of evolution.

programmer · July 2, 2008 10:38PM

Quote:

Originally Posted by Hiro

In a few years the question will be 64 cores per CPU die and off-chip memory, or 32 cores and ~8GB of on-die memory. I'll vote for the latter.

Hmmm, I think you're missing what I'm saying. Imagine an array of 8GB memory chips with 32 cores on each memory chip in a machine where you can pop in new RAM/core chips as easily as we currently install DIMMs... and operating systems that adapt easily to hundreds of processors. Each core has ~256MB of "local" memory, plus very fast access to all of the 8GB on the chip, and NUMA-style access to the other "DIMMs".

amorph · July 2, 2008 11:38PM

Quote:

Originally Posted by Programmer

Hmmm, I think you're missing what I'm saying. Imagine an array of 8GB memory chips with 32 cores on each memory chip in a machine where you can pop in new RAM/core chips as easily as we currently install DIMMs... and operating systems that adapt easily to hundreds of processors. Each core has ~256MB of "local" memory, plus very fast access to all of the 8GB on the chip, and NUMA-style access to the other "DIMMs".

This is where the architectures are gradually but inexorably heading now that we have fabrics on desktop motherboards that at least resemble the sort of architectures that used to be baked into high-end workstations.

If there is any part I'm skeptical about it's the part where you pop things in. This sounds almost like a cluster-on-a-board, which is great, except that you'd better have a fabric with high reliability, high bandwidth and low latency. Put a slot or a plug between cores and you get maybe two out of three unless you're willing to put down serious money. If somebody pulls it off, hey, you have the "modular cube" speculation we were all tossing around in the late 1990s, only without FireWire as the interface.

You would probably need a CPU to bootstrap the whole thing, but there's no need for it to be especially powerful: it only needs to be powerful enough to black-start the machine and delegate tasks. Turns out IBM was right after all, they were just building this architecture when it cost millions of dollars and filled basements.

nvidia2008 · July 3, 2008 9:29AM

Not a big big problem until maybe 2009. Intel has to demonstrate with Nehalem, etc. that the northbridge going into the CPU is good. Then it can move towards throwing alll the RAM into the CPU.

programmer · July 3, 2008 9:36AM

Actually I don't think you need particularly high bandwidth or low latency... at least no better than we're currently getting between CPU and main memory. The reason is that these connections are between tightly coupled RAM/core nodes that have extremely high bandwidth and relatively low latency. Where the "pop it in like DIMMs" analogy falls apart is in power management and heat dissipation -- cores are hungry and hot compared to memory. On the other hand these things don't need to run at a particularly high clock rate since the goal is to maximize parallelism.

I too would expect a central primary CPU (1 chip with 4-8 hyperthreaded out-of-order cores and a significant pool of their own RAM or cache... i.e. Nehalem or Sandy Bridge). Conventional software will primarily run on this thing, highly parallel software will leverage the "farm" of lower-clocked in-order cores tightly couple to the bulk of the RAM. The GPU won't exist anymore... it'll become this pool of processors.

programmer · July 3, 2008 9:44AM

Quote:

Originally Posted by nvidia2008

Not a big big problem until maybe 2009. Intel has to demonstrate with Nehalem, etc. that the northbridge going into the CPU is good. Then it can move towards throwing alll the RAM into the CPU.

Saying "The CPU" is just wrong... for that matter the term "central processing unit" ought to be decommissioned. Processing isn't going to be centralized anymore, it will be distributed. This isn't that far off... 2010-2012 and we should start to see it happening.

Were you being facetious and poking at Intel with the "demonstrate" comment? Intel is playing catch up on this one, they aren't demonstrating anything that hasn't been seen before several times.

hiro · July 3, 2008 4:09PM

Quote:

Originally Posted by Programmer

Hmmm, I think you're missing what I'm saying. Imagine an array of 8GB memory chips with 32 cores on each memory chip in a machine where you can pop in new RAM/core chips as easily as we currently install DIMMs... and operating systems that adapt easily to hundreds of processors. Each core has ~256MB of "local" memory, plus very fast access to all of the 8GB on the chip, and NUMA-style access to the other "DIMMs".

We seem to be describing very similar things different directions. Your above plug and play architecture is a bit more ambitious than I think we will see in the next 4-6 years, but the individual chips are roughly the same.

nvidia2008 · July 3, 2008 4:18PM

Quote:

Originally Posted by Programmer

Saying "The CPU" is just wrong... for that matter the term "central processing unit" ought to be decommissioned. Processing isn't going to be centralized anymore, it will be distributed. This isn't that far off... 2010-2012 and we should start to see it happening.

Were you being facetious and poking at Intel with the "demonstrate" comment? Intel is playing catch up on this one, they aren't demonstrating anything that hasn't been seen before several times.

No I was not being facetious... I guess it is just that Nehalem etc. shipping in volume will need to happen first to even see first if Intel can move all that northbridge(?) stuff onto "the CPU".

programmer · July 3, 2008 11:39PM

Quote:

Originally Posted by nvidia2008

No I was not being facetious... I guess it is just that Nehalem etc. shipping in volume will need to happen first to even see first if Intel can move all that northbridge(?) stuff onto "the CPU".

I suppose there might be skeptics, but in my mind it is Fait accompli. Intel's full weight is behind Nehalem and if there is a company out there that knows how to deliver its next generation of an architecture, its Intel.

Quote:

Originally Posted by Hiro

We seem to be describing very similar things different directions. Your above plug and play architecture is a bit more ambitious than I think we will see in the next 4-6 years, but the individual chips are roughly the same.

We'll see. Intel is pushing the Terascale stuff hard, and they want to ensure their future stake in the integrated device space. From Atom, to the top of the line processors, to GPUs. Eventually people are going to stop laughing at "integrated graphics".

New Intel Technology to put DRAM in the CPU

Comments