zimmie

About

Username
zimmie
Joined
Visits
172
Last Active
Roles
member
Points
2,737
Badges
1
Posts
651
  • Apple Silicon iMac & MacBook Pro expected in 2021, 32-core Mac Pro in 2022

    dewme said:
    mattinoz said:
    dewme said:
    I presume this article is talking about 32 cores being on the same die. If true, that's good to hear. But don't see why Apple couldn't put multiple physical packages with N-cores each on the same system board. Moreover, Apple should do this as an expansion card for existing Intel Mac Pros so that Intel Mac Pros can run M1-apps natively.
    They could, but it would diminish some of the benefits of the M1's unified memory architecture. Yes, you'd still have more cores available for working on highly parallelizable problems, but once the parts of the work are chunked and doled out to different groups of cores in different packages you'd have to designate something, like a "master" package or CPU to reassemble (memory copy) all of the distributed work back to a central point for final processing - and it would have to fit into the memory space of the master.

    Keep in mind that the multiple CPUs on traditional multiple CPU motherboards all shared the same main memory. Additionally, each CPU or CPU core had its own memory caches and all of those caches had to be maintained in a coherent state because they were all referencing shared memory locations. If one CPU/core changed the value of a cached location in shared memory all caches in all CPUs/cores that also hold a reference to that main memory location in their caches have to be updated.  With the M1 Apple can optimize its cache coherency logic much easier within its unified memory architecture than it can with shared off-package memory. If Apple brings in multiple packages each with multiple cores to work on shared problems, how many of these traditional memory management problems come back into the fray? I ask this only because a lot of the architectural benefits that Apple achieves with the M1 are due to tight-coupling between CPU cores, tight-coupling between CPU and GPU cores, and having a unified memory architecture very close to all of the cores. Moving to a multi-package model brings loose coupling back into the picture, which will most definitely limit the overall speedup that is attainable. There are alternate ways to get something in-between tight and loose coupling, as AMD has done, but you want to keep everything as close as possible.

    Two things to keep in mind with all Von Neumann based architectures, which includes all current Macs and PCs, 1: copying memory around always sucks from a performance standpoint, and 2: even small amounts of non-parallelizable code has a serious negative impact on the attainable speedup that can be seen with any number of processing units/cores (Ahmdahl's Law). This doesn't mean that speedups aren't possible, it just means that unless your code is well above 90% parallelizable you're going to max out on performance very quickly regardless of the number of cores. For example for 90% case, with 8 cores you'd get a speedup of 5 over a single core, but with 32 cores you're only getting a speedup of 8 over a single core even though you've quadrupled the number of cores. To double the speedup in the 90% case you'd have to go from 8 cores to around 512 cores. 

    The idea of a M1 coprocessor board for the Mac Pro is not a bad idea, and it's a strategy that's been used since the dawn of personal computers. One of DEC's first PC-compatible computers, the Rainbow 100, had a 8088 and a Z80 processor so it could run both MS-DOS and CP/M (and other) operating systems and applications. I've had PCs, Sun workstations, and industrial PLCs that had plug-in coprocessor boards for doing specialized processing semi-independently of the host CPU. The challenge is always how to integrate the coprocessor with the host system so the co-processor and the host system cooperate in a high quality and seamless manner. Looking across the tight-coupling to loose-coupling spectrum, these plug-in board solutions tend to be more on the looser side of the spectrum unless the host system (and all of its busses) were purposely designed with the coprocessor in mind, like the relationship between the old 8086/8088 general purpose CPU and the 8087 numerical coprocessor.  

      
    Could the Unified Memory be used as a bridge between 2 or more chips?

    Could they be planning upgraded systems as multiples of the same chip with a Memory Controller/ PCIe hub chip between them?
    That could let them use 6Core bin of chips as to make 12 core clusters and have options for 14 and 16 at a premium. Mirror again to get up to a 32core beast.

    Based on the descriptions Apple has published the unified memory pool is in the SoC package and is directly addressable by the CPUs and GPUs to avoid memory copying. This would seemingly make the unified memory address space local to the SoC. If a portion of this memory was shared with another SoC that has its own memory address space this would seemingly introduce cache coherency type of requirements to the architectural model. 

    But there’s nothing preventing Apple from defining a range of addresses in unified memory that serve as I/O ports so to speak, possibly with a DMA semantics, to allow high speed data interchange between SoCs. This would be more like a thunderbolt bus interchange and less like CPUs on the same SoC sharing access to the same memory address space. 

    If you look at the Mac in a slightly broader sense it is really a system of systems. There’s no doubt that Apple could leverage its Apple Silicon and M-series SoCs in particular as building blocks to compose an Apple SoC powered system of systems in a form factor like we’ve never seen in the markets that Macs currently target. We’re barely past the starting line on this journey. 
    There’s nothing special about the memory being on-package. The special part is that the CPU cores and GPU cores share a memory controller and an address space. The chiplet packaging AMD uses also involves a big IO die in the middle with the processor dies off to the sides. The IO die handles memory management, peripherals like PCIe, and so on. Apple can do the same thing relatively easily, and all cores connected to the IO die—whether CPU or GPU—would work with the same pool of memory.

    The M1’s on-package RAM is LPDDR4X. That might be a challenge to extend to slotted memory, so they just switch it to normal DDR4. Easy enough. There’s nothing intrinsic to the M1’s design which precludes off-package RAM (such as in the 16” MBP) or slotted RAM.

    I expect most of the four-Thunderbolt machines to go ARM next. 4TB3 13” MBP, 16” MBP, a high-end Mac Mini, and the small iMac could all get the same processor differentiated by cooling solution. They’ll have off-package (but still soldered) RAM.
    mattinozwatto_cobra
  • Apple Silicon iMac & MacBook Pro expected in 2021, 32-core Mac Pro in 2022

    razorpit said:
    mjtomlin said:
    rob53 said:
    Why stop at 32 cores. The current TOP500 supercomputer is comprised of a ton of Fujitsu A64X 48 compute core CPUs based on ARM v8.2-A. Cray is also working on changing to the ARM architecture. The current implementation is only running at 2.2 GHz using a 7nm CMOS FinFET design. This is on one chip! (ref: https://en.wikipedia.org/wiki/Fujitsu_A64FX, also https://www.fujitsu.com/downloads/SUPER/a64fx/a64fx_datasheet.pdf)

    Those are designed for massive parallelism. This is not something that would be necessary for what is essentially a single user computer. You could get more perceived performance from a lower core, higher clock speed design.

    Even a 12 HP core design clocked higher would basically blow past almost everything on the market right now.
    I would agree with the current state of MacOS, but who's to say MacOS '11.1' wouldn't start pushing background processes to individual cores.

    Instead of applications sharing processor time on a smaller number of cores, you could theoretically scale it up for theses new beast-mode™ processors. I'm guessing the average user wouldn't notice much of a difference, but once you have it figured out it would be easier to apply it to the entire product range and have everyone benefit from it.

    I'm sure someone will come along and happily point out how wrong I am, but I'm just floating an idea out there. Remember there was a time when a large part of the "tech elites" also thought 64-bit processors on an iPhone was ludicrous.
    The XNU scheduler already does this. POSIX processes and threads are kind of a headache for programmers to use, so Apple added Grand Central Dispatch in macOS 10.6 in 2009. Most of Apple's new APIs since then have been aggressively asynchronous. For example, URLSession/NSURLSession makes it harder to process returned data in the thread where you made the call than it is to process it in a completion handler block which gets scheduled on another queue. GCD also has a concept of priority levels, which tell it what kind of core to prefer for a given queue's tasks. User-interactive queues will usually send tasks to a high-performance core, while background queues will usually send them to a low-power core.

    Even so, single-core performance matters more to most users' perception of speed than parallelism. I expect the Mac Pro to have fairly extreme parallelism, of course, but it is also almost certainly going to have faster individual cores as well.

    I agree with Blastdoor, chiplets are probably the way it's going to go. GPU performance scales almost linearly with core count, and core count scales linearly with die area, but reject rate scales with the square of the die area. The main thing keeping Apple's GPUs from competing with AMD's or Nvidia's is die size, but big dies give you fewer parts per wafer, and fewer of those parts are functional. GPU chiplets would let Apple get to competitive core counts with small dies.

    The only concern after that is heat dissipation. Chiplets don't help spread the heat into more heatsink volume like a CPU and a separate GPU card does. That said, heat pipes are pretty good at getting heat from point A to point B, so it may not be too big a deal. As long as the total CPU+GPU power consumption is under 300W, the Mac Pro's existing CPU heatsink is plenty. With the performance and heat we've seen from the M1, I bet the Mac Pro's combined TDP will be maybe 180W. Probably less.
    razorpitwatto_cobra
  • Apple service documents suggest new hardware release coming on Dec. 8



    melgross said:
    ...an updated Mac Pro. I didn’t buy one last year, because I wanted the new PCIe 4 bus, which is now becoming common enough for Apple to move to. 
    In an Apple Silicon Mac a PCIe bus may not do everything that we might want. For example, a PCIe video card may not work since MacOS Big Sur may not support any video cards even including Metal-based AMD video cards.

    I agree that Apple will probably support PCIe v4 in an Apple Silicon Mac with expansion ports, but if it's not for video cards, maybe Apple should just re-think the whole purpose of the expansion card idea entirely. Could Thunderbolt be enough? How about Thunderbolt 4 which is due any day now from Intel (and probably from Apple too, since Apple now supports Thunderbolt without any Intel chips)? What type of cards do people want to install in a Mac that can't be done through a Thunderbolt port in some other way? Thunderbolt 4 is 26% faster than even a 16-lane PCIe v4 connector, and it's almost as fast as 16-lane PCEe v5 which won't be out for a couple of years. Why would anyone want a slow connection like a 16-lane PCIe v4 when Thunderbolt 4 is so much faster? I'll answer that - because you don't have to buy a Thunderbolt cable. So to save $50 people want to settle for a lower bus speed?

    This is food for thought; I'm not sure Apple will do what I'm suggesting. I expect the usual claims from people here questioning my sanity. 
    Maybe not your sanity, but I think you messed up your units here.

    Thunderbolt 4: PCIe data at 32 Gb/s (GigaBITS)
    PCIe v4 x 16: 32 GB/s (GigaBYTES) = 256 GigaBITS

    "The PCIe 4.0 specification will also bring the OCuLink-2 connector, an alternative to Thunderbolt 3, that provides 8GB/s of bandwidth via four PCIe 4.0 lanes" — That's 64 Gb/s.

    Someone correct me if I screwed up here, but PCIe v4 x16 is quite a bit faster than Thunderbolt 4.

    Uhoh, that's embarrassing. I will believe you. I think part of the reason I was confused is that some websites talk about GT/s (gigatransfers/sec) and I was probably not discrerning between "gigatransfers of bytes" and "gigatransfers of bits."

    But I won't give up so easily. Please note that Thunderbolt is currently using copper cabling. When Thunderbolt was first introduced as "Light Peak" it used fibre optic instead. Corning has bragged about having fibre-optic Thunderbolt (see the video below). I'm sure that fiber-optic is faster than copper, and it's potentially a way forward for Apple without using PCIe. Corning has been silent about this product for years, which could mean they've given up, or could mean they have an NDA with Apple for a future product.


    Optical fiber isn't faster than copper for this. Both can get full Thunderbolt speeds.

    You can also buy optical Thunderbolt cables already. Have been able to for years. The starting cost is a lot higher than that of copper cables. Corning makes cables up to 50m:

    AOC-CCU6JPN005M20
    AOC-CCU6JPN010M20
    AOC-CCU6JPN015M20
    AOC-CCU6JPN025M20
    AOC-CCU6JPN050M20

    The number just ahead of the "M20" is the cable's length in meters.
    avon b7fastasleepmuthuk_vanalingam
  • Early 2021 Apple Silicon iMac said to have 'A14T' processor

    xsmi said:
    Apple said at their keynote that the Mac chips would be designed from the ground up with the Mac in mind. Wouldn’t that preclude them using a variant of iPhone/iPad chips? 
    No, it doesn't. They could say they built the A14 "from the ground up" for Macs, and would you look at that! It just happens to work for the iPad Pro, too! What a coincidence!

    From a purely practical standpoint, we know the chips aren't going to be new "from the ground up". The instruction set is definitely shared. The core designs are definitely going to be shared, with the possibility of an additional desktop-only core design (I don't think a desktop-only core is likely, but it's possible).
    canukstormronnd_2Alex1Nwatto_cobra
  • Low-light performance of iPhone 12 Pro aided by wider ISO range & aperture

    “The iPhone 12 Pro Max also has a slightly narrower telephoto aperture of f/2.2,“

    are you sure this is correct? I thought the light gathering was also a function of the focal length, which is different for the Max 
    Yes. The light collected is actually about the area of the aperture, which isn't directly conveyed in the f-stop. The f-stop number (like f/2, f/2.2) is a ratio of aperture diameter to the focal length of the lens. You can't say whether an f/1.2 aperture is actually larger than an f/4 aperture without knowing the other part of the ratio.

    I don't know the real focal lengths of the lenses, so I can't calculate the real aperture sizes. The math in this next bit will be for 35mm-equivalent numbers (i.e., focal lengths and aperture diameters with which a 35mm film camera would give you the same field of view).

    The 12 Pro's long lens has a 35mm-equivalent focal length of 52mm, while the 12 Pro Max's long lens has a 35mm-equivalent focal length of 65mm. f/2 at 52mm means the aperture is 52/2, or 26mm in diameter, and 531mm^2 area (approximately). f/2.2 at 65mm gives 65/2.2, or 29.5mm in diameter, and 685mm^2 area (again, approximately). The 65mm lens would collect about 30% more light.

    I seriously doubt the long lenses have a focal length greater than 8mm, so they probably don't actually have a telephoto group. I don't know for sure one way or the other, though.
    avon b7muthuk_vanalingambageljoeywatto_cobrajony0