Intel gets official on Nehalem architecture (successor to Penryn)

Posted:
in Future Apple Hardware edited January 2014
Intel this week offered its first official overview Nehalem, the highly scalable microarchitecture positioned to succeed Penryn in delivering a new generation of processors for notebooks, desktops, and servers, that offer "dramatic" energy efficiency and performance improvements.



Slated to enter production later this year, the architecture marks the next step in the chipmaker's rapid "tick-tock" cadence for delivering new process technology (tick) or an entirely new microarchitecture (tock) every year. High performance server chips are expected to be first out of the gates, with variants for mainstream notebook and desktop systems making their way to market sometime next year.



The leap in performance and energy efficiency offered by Nehalem will be similar to the jump made by Intel's Core microarchitecture over the first 90-nanometer (nm) Pentium M processors, according to company vice president Pat Gelsinger. Key to this, he said, is simultaneous multithreading (SMT), an advanced version of hyper-threading that will create a new dimension in parallelism by enabling a single processor core to run two threads at the same time.



With Intel's plans for Nehalem calling for chips with 1 to 8 (or more) cores, this means a quad-core processor could run eight threads simultaneously, and similarly, an octo-core version up to sixteen threads simultaneously. Depending on the application, the resulting performance boost over today's Penryn chips could be as much as 20 to 30 percent, according to the chipmaker. At the same time, the ability to run more instructions in a single clock cycle allows the processor to return to a low-power state more quickly, therefore also boosting power efficiency.



Nehalem processors will also utilize a new point-to-point processor interconnect called Intel QuickPath Interconnect, which will serve as a replacement for the legacy front side bus (FSB). Instead of using a single shared pool of memory connected to all the processors in a server or high-end workstation through FSBs and memory controller hubs, most Nehalem processors will pack their own dedicated memory that will be accessible directly through an Integrated Memory Controller on the processor die itself.



In cases where a processor needs to access the dedicated memory of another processor in a multi-processor system, it can do so through the QuickPath interconnect that links all the processors. This improves scalability and eliminates the competition between processors for bus bandwidth, according to Gelsinger, as there is no longer a single bus for which multiple chips would need to contend in order to reach memory and I/O services.



A close-up of a Nehalem processor wafer.



Nehalem will also offer the option for an integrated graphics controller for highly efficient mobile designs, and add an inclusive shared L3 (last-level) cache that can be up to 8MB in size. In addition to being shared across all cores, the L3 cache can increase system performance while reducing traffic to the processor cores.



Other features discussed by Gelsinger during this week's Nehalem architectural briefing include support for DDR3-800, 1066, and 1333 memory, SSE4.2 instructions, 32KB instruction cache, 32KB Data Cache, 256K L2 data and instruction low-latency cache per core and new 2-level TLB (Translation Lookaside Buffer) hierarchy.



The majority of these advancements are pertinent to Apple, as the company's ongoing Mac roadmap implies that it will at the very least adopt both server and mobile variants of Nehalem, beginning with a Xeon-based successor to the Harpertown chip presently powering its Mac Pro line of professional desktop systems.
«13

Comments

  • Reply 1 of 48
    Sounds like these babies will fly.



    It also looks like the best chance for Apple to go ahead and release major redesigns of the hardware. Cooling is affected and several components are moved around.



    If ever there was a chance for a redesign of the laptops and Mac Pro, IMHO this is it.
  • Reply 2 of 48
    zunxzunx Posts: 620member
    What is the theoretical limit in chipmaking (in nm; currently at 45 nm)?
  • Reply 3 of 48
    yeah, this is great news, I'm so excited I can't hide it! yeah yeah, great news for video editing & power hungry people like myself.



    oh zunx, no way, they are already talking 32 nm and check out Moore's law, here.



  • Reply 4 of 48
    bageljoeybageljoey Posts: 2,004member
    Quote:
    Originally Posted by zunx View Post


    What is the theoretical limit in chipmaking (in nm; currently at 45 nm)?



    Ha ha. I'd like to hear the answer to that one!

    I know that +/-10 years ago the absolute theoretical physical limit was 100nm. With the properties of light and all, it would be impossible to get below that. What would happen to Moore's Law? everyone wondered.



    I don't think it matters what the best brains might think now, everything will be different in 10 years!
  • Reply 5 of 48
    melgrossmelgross Posts: 33,510member
    Quote:
    Originally Posted by zunx View Post


    What is the theoretical limit in chipmaking (in nm; currently at 45 nm)?



    The next node is 32 nm, which we will see in 2009 (maybe late 2008). The one after that, which Intel is working on is 22 nm, which, if everything comes out all right, will be here, possibly in 2010.



    22 nm is going to be interesting because the tools to do the lithography haven't been finalized as yet. At these small sizes, lithography becomes the most difficult part, other than the need to be able to hold the features to a much closer tolerance.



    After 22 nm comes 15, which, right now, is not possible, as the litho tools won't work at that size.



    But, they have been working on numerous methodologies, such as x-rays.



    After that is about 10 nm, which is being argued. Some scientists, and engineers, aren't sure if we can do 10 nm at all.



    If not, perhaps carbon nanotubes can be used. IBM and Hp have shown work with those that show promise, but it's thought that a production process, if possible, won't come for about ten years.



    There are other technologies under investigation as well, such as protein storage, etc.



    And, of course, there's optical, an area in which there have been several breakthroughs recently, by IBM, for one.
  • Reply 6 of 48
    melgrossmelgross Posts: 33,510member
    Quote:
    Originally Posted by Bageljoey View Post


    Ha ha. I'd like to hear the answer to that one!

    I know that +/-10 years ago the absolute theoretical physical limit was 100nm. With the properties of light and all, it would be impossible to get below that. What would happen to Moore's Law? everyone wondered.



    I don't think it matters what the best brains might think now, everything will be different in 10 years!



    I don't remember 100 nm as ever having been thought of as a theoretical limit. They did have to change the tools used. Perhaps that was what you mean. The tools used at larger sizes didn't work below 130 nm.
  • Reply 7 of 48
    zunxzunx Posts: 620member
    OK, here is the roadmap found after Googling:



    CMOS manufacturing processes

    16 nanometer

    http://en.wikipedia.org/wiki/16_nanometer



    Succeeded by 11 nm, if ever possible.



    ---2007: The 45 nanometer (45 nm) process is the next milestone (commercially viable as of November 2007) in semiconductor fabrication. Intel started mass producing 45 nm chips in November 2007, AMD is targeting 45 nm production in 2008, while IBM, Infineon, Samsung, and Chartered Semiconductor have already completed a common 45 nm process platform.



    ---2009-2010: The 32 nanometer (32 nm) process (also called 32 nanometer node) is the next step after the 45 nanometer process in CMOS manufacturing and fabrication. "32 nm" refers to the expected average half-pitch of a memory cell at this technology level. The two major chip rivals, Intel and AMD, are both working on a 32 nanometer process for logic, which uses significantly looser design rules. AMD has partnered with IBM on this process, as it did with the 45 nm process. The 32 nm process is due to arrive in the 2009-2010 timeframe.



    ---2011-2012: The 22 nanometer (22 nm) node is the CMOS process step following 32 nm. It is expected to be reached by semiconductor companies in the 2011-2012 timeframe. At that time, the typical half-pitch for a memory cell would be around 22 nm. The exact naming of this technology node comes from the International Technology Roadmap for Semiconductors (ITRS).



    ---2013-2018: The 16 nanometer (16 nm) node is the technology node following 22 nm node. The exact naming of these technology nodes comes from the International Technology Roadmap for Semiconductors (ITRS). By conservative estimates the 16 nm technology is expected to be reached by semiconductor companies in the 2018 timeframe. It has been claimed that transistors cannot be scaled below the size achievable at 16 nm due to quantum tunneling, regardless of the materials used.[1] At that time, the typical half-pitch for a memory cell would be around 16 nm. However, in complying with its own "Architecture and Silicon Cadence Model",[2] Intel will need to reach a new manufacturing process every two years. This would imply going to 16 nm as early as 2013.



    ---2015 and beyond: Succeeded by 11 nm.
  • Reply 8 of 48
    mfagomfago Posts: 24member
    Quote:
    Originally Posted by melgross View Post


    The next node is 32 nm, which we will see in 2009 (maybe late 2008). The one after that, which Intel is working on is 22 nm, which, if everything comes out all right, will be here, possibly in 2010.

    [...]



    Great summary.



    How about quantum tunneling etc? The lattice spacing of Silicon is only 0.5 nm, so with 10 nm features you're talking only 20 or so atoms. Although I don't recall the exact definition of feature-size with respect to a "32 nm process" etc.
  • Reply 9 of 48
    boogabooga Posts: 1,082member
    Quote:

    SSE4.2 instructions



    What are SSE4.2 instructions?
  • Reply 10 of 48
    melgrossmelgross Posts: 33,510member
    Quote:
    Originally Posted by MFago View Post


    Great summary.



    How about quantum tunneling etc? The lattice spacing of Silicon is only 0.5 nm, so with 10 nm features you're talking only 20 or so atoms. Although I don't recall the exact definition of feature-size with respect to a "32 nm process" etc.



    This is one of the problems with the 10 nm node. Tunneling is something that is used now in "tunneling diodes" and other devices. In fact, it's something without which, modern electronics wouldn't function. however, even now, it is causing problems for chips. It's responsible for the inefficiency of chips from 90 nm down. It was one of the reasons why the G5 never made it to 3 GHz, at 90 nm. And why the Prescott, at 90 nm, never made it to 4 GHz.



    New methods, such as metal gates at 45 nm can lessen the effects, but when we get to 10 nm, or so, it isn't understood as to whether it can be lowered to the point of functionality.



    What's interesting at these levels is that carbon nanotubes conduct electricity better than does copper.



    We begin to lose the properties of "bulk" matter, and quantum effects start to dominate.



    One way around this is to use "spintronics", something that is already being used in a few settings.



    The advantage to spin is that it uses almost no power, as compared to "electron" electronics, which depends upon moving electrons about, which uses considerable energy.



    Sigh! But, of course, it isn't all that simple.
  • Reply 11 of 48
    melgrossmelgross Posts: 33,510member
    While I'm waiting for Nehalem eagerly, there are a couple of things that do bother me.





    The biggest, for the moment:



    I mentioned, in another thread, that while moving to DDR3 from FB-DIMMS can be a good thing, I'm not so sure about the specs Intel has chosen.



    It's true that while DDR2 has less latency than FB-DIMMS (though it was criticized for having more than DDR1), DDR3 has more than DDR2.



    The estimates I've read say that DDR3 must be at least 1,333 speed, to have an advantage over DDR2 800.



    But, as of now, that is the HIGHEST speed that Intel allows (on a non over-clocked enthusiasts' board).



    BUT, new DDR3 memory modules have already broken through 2,000! By the time Nehalem comes out late this year, we may see 2,500, maybe more. Will Intel update their specs? I wonder.
  • Reply 12 of 48
    I may not know about all of these specs, but I certainly want a Nehalem chip in Apple's products. I will have graduated by 2009 or 2010 and plan to upgrade from my 2.16 Ghz Intel Core 2 Duo iMac after that. Looks like I have a lot to look forward to.
  • Reply 13 of 48
    bageljoeybageljoey Posts: 2,004member
    Quote:
    Originally Posted by melgross View Post


    I don't remember 100 nm as ever having been thought of as a theoretical limit. They did have to change the tools used. Perhaps that was what you mean. The tools used at larger sizes didn't work below 130 nm.



    True. As I think on this, I guess it was more like 15 years ago. I remember researching a paper on the topic. My recollection is fuzzy, but I think the discussion was that light wouldn't work below 100nm and that a switch to X-rays would be necessary. X-rays were seen as problematic and a long way off. So the theoretical limit was only for light.

    Certainly, new tools were developed which "broke" the "theoretical limit" and made this discussion quaint by today's standards, but that was what people were thinking (if I remember correctly).



    (Obviously I am not a computer engineer, so I apologize in advance if my recollection is off.)
  • Reply 14 of 48
    aplnubaplnub Posts: 2,605member
    Take this for what it is worth in 2008. This was the point of view in 2005.



    I have summarized an article below from a Business Week publication. I pulled up my summary when I caught this post. Obviously, this is old news since it is based on an article over 2 years old. But, it is an interesting read regardless especially when you notice they think 45 nm will be in 2010. Intel seems to be ahead of the curve from the point of view in 2005. Exciting1 Processor construction will change yet to keep speed increases coming in the way of 3 dimensional stacking.



    I was reading my wife's BusinessWeek magazine, June 20, 2005 issue, and ran across an article titled, "More Life for Moore's Law". It was a good read but also included some surprising quotes from IBM. The article is on page 108 may be nice to know.



    Article Summary



    BuisnessWeek reports that future solutions to keeping speeds of processors increasing in sync with Moore?s law are starting to immerge. Current processes rely on shrinking transistors on chips reducing the time needed for electrons to reach their destination. ?This year and next they?ll go down to 65 nm, followed by 45 nm by 2010, 32 nm by 2013, and 22 nm by 2016? increasing the speed of processors the old fashioned way.



    The next step in increasing speeds without shrinking circuit lines would be the utilization of multicore processors, where more than one processor core is coupled together and both fit on the same semiconductor. There is a big push from Intel to encourage software to take advantage of multicore processors. ?Intel has committed 3,000 of its 10,000 software programmers to help accelerate the shift to multicore designs.? Philip Emma, manager of systems technology and microarchitecture at IBM, predicts that personal computers will likely see a peak of 8 core processors.



    The next possible solution is to design ?ways to stack circuitry, layer upon layer into multi-story, 3D structures.? This would allow the pathway distance for electrons to be reduced to 10 microns from 20,000 microns allowing current 90 nm processors to perform similar to 32 nm processors scheduled for 2011. There are challenges to be overcome when stacking transistors one on top of the other and this technology could take as long as 2011 to make an appearance.



    "We're going to see a lot of evolution happening very fast,? said Philip Emma.
  • Reply 15 of 48
    petermacpetermac Posts: 115member
    Quote:
    Originally Posted by Booga View Post


    What are SSE4.2 instructions?



    Here you go.



    http://en.wikipedia.org/wiki/SSE4
  • Reply 16 of 48
    mr. hmr. h Posts: 4,870member
    Let's not forget when talking about how small chip feature sizes can go, we have to start considering chip lifetimes as well.



    We're used to a working microchip that is not abused essentially lasting forever, however that's no longer the case. With these small feature sizes, atoms can and do move after manufacture, causing failures. I read an article in the IET (UK equivalent to the IEEE) magazine stating that some 45 nm chips may not even last 1 year before failure. As the feature size goes down, this problem gets worse.
  • Reply 17 of 48
    I've been waiting 8 years to replace my old PowerPC G4, and it looks like Nehalem is it! As soon as Apple releases a Mac Pro based on Nehalem, I'm going to get one!
  • Reply 18 of 48
    hirohiro Posts: 2,663member
    Quote:
    Originally Posted by Mr. H View Post


    Let's not forget when talking about how small chip feature sizes can go, we have to start considering chip lifetimes as well.



    We're used to a working microchip that is not abused essentially lasting forever, however that's no longer the case. With these small feature sizes, atoms can and do move after manufacture, causing failures. I read an article in the IET (UK equivalent to the IEEE) magazine stating that some 45 nm chips may not even last 1 year before failure. As the feature size goes down, this problem gets worse.



    I don't buy that. SRAM has been manufactured by Intel for over two years at 45nm. Memory always leads the process change parade because it is regular and therefore easier to verify the process. Those 45nm memories aren't failing en masse or EVERYONE would be aware of it by now.
  • Reply 19 of 48
    retroneoretroneo Posts: 240member
    Quote:
    Originally Posted by Hiro View Post


    I don't buy that. SRAM has been manufactured by Intel for over two years at 45nm. Memory always leads the process change parade because it is regular and therefore easier to verify the process. Those 45nm memories aren't failing en masse or EVERYONE would be aware of it by now.



    SRAM is used for on-die cache, one of the simplest features and first used to demonstrate a new manufacturing process. SDRAM is used for "everyone's" memory and is manufactured at 70nm and moving towards 65nm. Sampling of 65nm memories began only 3 months ago.



    45nm SDRAM is a way off.



    http://www.physorg.com/news113668406.html
  • Reply 20 of 48
    retroneoretroneo Posts: 240member
    Quote:
    Originally Posted by zunx View Post


    What is the theoretical limit in chipmaking (in nm; currently at 45 nm)?



    It's probably 16nm. Some say there could possibly be one process node after that (11nm).



    I remember reading about "the wall" they would reach around 15nm perhaps 10-15 years ago. Quantum effects will make it unworkable to make semiconductor transistors any smaller.



    After that processors will find new ways to improve performance, such as optical interconnects on silicon to improve performance. These are being worked on today, but it will take many years to commercialize.



    QuickPath (Intel) and HyperTransport (AMD) Interconnects are both already designed with an optical future today.



    Carbon nanotubes and the like are even further away.



    Each process node gives you twice the number of transistors on the same die size.



    45nm --> 32nm = 2x transistors

    32nm --> 22nm = 4x transistors

    22nm --> 16nm = 8x transistors



    820 million transistors of Harpertown



    Expect around 6.5 billion transistors on a high end 16nm processor.
Sign In or Register to comment.