Nvidia 1080ti with new drivers in external enclosure quadruples MacBook Pro native perform...

2»

Comments

  • Reply 21 of 34
    Mike WuertheleMike Wuerthele Posts: 6,861administrator
    Can we see tests on a 2016 Core i7 13 inch MacBook? I mean it's amazing that this works but it's kind of redundant to be testing this on a laptop that already has a pretty decent dedicated GPU.

    seeing as the 13 inch Core i7's are running the Iris 550 chipsets I can see where an eGPU would be far more useful in this scenario.
    If I can get my hands on one, sure. The benchmarks would change a little bit for the eGPU, maybe downwards by 10%.
    KrisArkade
  • Reply 22 of 34
    jmey267jmey267 Posts: 57member
    I just tested my Mac Pro 5,1 with a gtx1050ti using valley 1.0 got a score of 1694 and averaged 40fps with the same settings the author used 1680x1050 8xAA full screen. Not bad for a 7 year old machine with a new cheap nvidia card $120 after rebate.
  • Reply 23 of 34
    xzu said:
    Is the Blade Pro worth a performance comparison (OS aside) with its GTX 1080 and phase change cooling system ?
    https://www.razerzone.com/gaming-systems/razer-blade-pro
    http://www.pcmag.com/news/352646/razer-blade-pro-laptop-gets-thx-certification-kaby-lake
    That is first Laptop I would consider a desktop replacement. Kaby Lake, 32gb Ram, dual PCIe drives, and a 1080, oh my goodness. 

    The eGPU is exciting as well. At least we are moving in the right direction.
    Though apparently no one told Razer that a large percentage of creative professionals are left handed. Whoops.
    xzu
  • Reply 24 of 34
    Sounds like this vector processor may be good for running a next generation AI platform on your desk to control a swarm of drones and droids, but if it's connected it to (or near) certain "unshielded" external monitors, a loud pop may be heard and reportedly followed by UltraFine radiation emissions that may induce unusual changes in mood, behavior or thoughts of suicide, especially in children, teens and young adults (elderly dementia patients may also have an increased risk of death or suicide) as they realize how much this POS cost them and under no circumstances should it be used near an air traffic corridor so as to avoid interference with aircraft navigation systems. /s
  • Reply 25 of 34
    Can we see tests on a 2016 Core i7 13 inch MacBook? I mean it's amazing that this works but it's kind of redundant to be testing this on a laptop that already has a pretty decent dedicated GPU.

    seeing as the 13 inch Core i7's are running the Iris 550 chipsets I can see where an eGPU would be far more useful in this scenario.
    If I can get my hands on one, sure. The benchmarks would change a little bit for the eGPU, maybe downwards by 10%.
    That would be awesome and much appreciated. It may be the deciding factor for me when it comes to which laptop I'd like to go with. I like the portability of my MacBook Air and the core i7 MacBook Pro 13 inch is a little beast in of itself. Not as powerful as the 15 but it's an acceptable margin I think.

    Id love to see the eGPU numbers with the Core i7 13 inch. Thank you!!
  • Reply 26 of 34
    Can we see tests on a 2016 Core i7 13 inch MacBook? I mean it's amazing that this works but it's kind of redundant to be testing this on a laptop that already has a pretty decent dedicated GPU.

    seeing as the 13 inch Core i7's are running the Iris 550 chipsets I can see where an eGPU would be far more useful in this scenario.
    If I can get my hands on one, sure. The benchmarks would change a little bit for the eGPU, maybe downwards by 10%.
    That would be awesome and much appreciated. It may be the deciding factor for me when it comes to which laptop I'd like to go with. I like the portability of my MacBook Air and the core i7 MacBook Pro 13 inch is a little beast in of itself. Not as powerful as the 15 but it's an acceptable margin I think.

    Id love to see the eGPU numbers with the Core i7 13 inch. Thank you!!
    We have been keeping track of eGPU implementations. There are a handful of Late 2016 13" MacBook Pro + eGPU pairings.
  • Reply 27 of 34
    zimmie said:
    zimmie said:
    PCIe throughput matters less than most people think for most GPU use. You can block off PCIe lanes with tape. On most GPUs in most games, there is no difference at all dropping it from 16 lanes to eight. When you drop to four lanes, you typically get longer loading times and a small framerate drop. Dropping to two lanes typically gives significantly longer loading times and >30% framerate drops.

    This can matter for OpenCL use, but it frequently does not. PCIe throughput is only really used getting your dataset into the video card's RAM and getting the result out. It will slow down some really trivial data manipulation, and it will be slower to work on datasets too large for the card's RAM. Anything which requires more than a few seconds to compute won't be meaningfully slower on an eight-lane or four-lane bus.
    I know less than nothing about computer architecture, so please forgive me if I misunderstood what @"Mike Wuerthele" wrote, but the impression I got is that the bandwidth limitation becomes an issue because the Thunderbolt 3 buss is a two-way street in this scenario. It has to carry both the instructions to the card AND all the pixel data back to the internal display. That's why performance was better when using an external display.

    Is my understanding correct or have I missed something?
    They didn't give the pixel dimensions of the external display. The 15" MBP's internal display is 2880x1800 pixels for a total of a little under 5.2 million pixels. The most common external displays I see are 1920x1080, which is just under 2.1 million pixels. I don't know if these benchmarks correct for the output pixel count. If they don't, a 2x improvement could come solely from running 1/2 the pixels.
    On the 2016 15" MBP, the internal display automatically scales to 1680x1050 when running Unigine benchmarks in 1920x1080 fullscreen mode. Therefore, I set all tests to run at Unigine Valley default Extreme settings which is 1600x900. You can click on the result numbers to see the screen caps with additional information.

    Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.
  • Reply 28 of 34
    Mike WuertheleMike Wuerthele Posts: 6,861administrator
    theitsage said:

    Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.
    Appreciate your input!
  • Reply 29 of 34
    zimmiezimmie Posts: 651member
    theitsage said:
    zimmie said:
    zimmie said:
    PCIe throughput matters less than most people think for most GPU use. You can block off PCIe lanes with tape. On most GPUs in most games, there is no difference at all dropping it from 16 lanes to eight. When you drop to four lanes, you typically get longer loading times and a small framerate drop. Dropping to two lanes typically gives significantly longer loading times and >30% framerate drops.

    This can matter for OpenCL use, but it frequently does not. PCIe throughput is only really used getting your dataset into the video card's RAM and getting the result out. It will slow down some really trivial data manipulation, and it will be slower to work on datasets too large for the card's RAM. Anything which requires more than a few seconds to compute won't be meaningfully slower on an eight-lane or four-lane bus.
    I know less than nothing about computer architecture, so please forgive me if I misunderstood what @"Mike Wuerthele" wrote, but the impression I got is that the bandwidth limitation becomes an issue because the Thunderbolt 3 buss is a two-way street in this scenario. It has to carry both the instructions to the card AND all the pixel data back to the internal display. That's why performance was better when using an external display.

    Is my understanding correct or have I missed something?
    They didn't give the pixel dimensions of the external display. The 15" MBP's internal display is 2880x1800 pixels for a total of a little under 5.2 million pixels. The most common external displays I see are 1920x1080, which is just under 2.1 million pixels. I don't know if these benchmarks correct for the output pixel count. If they don't, a 2x improvement could come solely from running 1/2 the pixels.
    On the 2016 15" MBP, the internal display automatically scales to 1680x1050 when running Unigine benchmarks in 1920x1080 fullscreen mode. Therefore, I set all tests to run at Unigine Valley default Extreme settings which is 1600x900. You can click on the result numbers to see the screen caps with additional information.

    Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.
    Oh! I didn't notice those were links. So the test was controlled for pixel count. Good. Still doesn't seem like shuffling the pixels back to the internal display should cost that much performance, but there aren't many other variables.

    1600x900 at 16 bits per channel is 69.1 Mb per frame. At 60 frames per second, that's only 4.1 gigabits of traffic. Thunderbolt 3 limits PCIe to 22 gigabits per second. Assuming the return data has to be pulled from a framebuffer over PCIe, that's about 19% of the possible throughput. 82 FPS (the eGPU, but internal display), brings that to about 26% of the potential throughput.

    This changes proportionally with color depth, of course. I picked 16 bits per channel as a common "deep color" depth. Do the benchmarks actually use full color depth, or do they run at 8 bits per channel? Any idea on the bits per channel on the external monitor?

    This is all fascinating data. Thank you for providing it!
    theitsage
  • Reply 30 of 34
    zimmie said:
    theitsage said:
    zimmie said:
    zimmie said:
    PCIe throughput matters less than most people think for most GPU use. You can block off PCIe lanes with tape. On most GPUs in most games, there is no difference at all dropping it from 16 lanes to eight. When you drop to four lanes, you typically get longer loading times and a small framerate drop. Dropping to two lanes typically gives significantly longer loading times and >30% framerate drops.

    This can matter for OpenCL use, but it frequently does not. PCIe throughput is only really used getting your dataset into the video card's RAM and getting the result out. It will slow down some really trivial data manipulation, and it will be slower to work on datasets too large for the card's RAM. Anything which requires more than a few seconds to compute won't be meaningfully slower on an eight-lane or four-lane bus.
    I know less than nothing about computer architecture, so please forgive me if I misunderstood what @"Mike Wuerthele" wrote, but the impression I got is that the bandwidth limitation becomes an issue because the Thunderbolt 3 buss is a two-way street in this scenario. It has to carry both the instructions to the card AND all the pixel data back to the internal display. That's why performance was better when using an external display.

    Is my understanding correct or have I missed something?
    They didn't give the pixel dimensions of the external display. The 15" MBP's internal display is 2880x1800 pixels for a total of a little under 5.2 million pixels. The most common external displays I see are 1920x1080, which is just under 2.1 million pixels. I don't know if these benchmarks correct for the output pixel count. If they don't, a 2x improvement could come solely from running 1/2 the pixels.
    On the 2016 15" MBP, the internal display automatically scales to 1680x1050 when running Unigine benchmarks in 1920x1080 fullscreen mode. Therefore, I set all tests to run at Unigine Valley default Extreme settings which is 1600x900. You can click on the result numbers to see the screen caps with additional information.

    Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.
    Oh! I didn't notice those were links. So the test was controlled for pixel count. Good. Still doesn't seem like shuffling the pixels back to the internal display should cost that much performance, but there aren't many other variables.

    1600x900 at 16 bits per channel is 69.1 Mb per frame. At 60 frames per second, that's only 4.1 gigabits of traffic. Thunderbolt 3 limits PCIe to 22 gigabits per second. Assuming the return data has to be pulled from a framebuffer over PCIe, that's about 19% of the possible throughput. 82 FPS (the eGPU, but internal display), brings that to about 26% of the potential throughput.

    This changes proportionally with color depth, of course. I picked 16 bits per channel as a common "deep color" depth. Do the benchmarks actually use full color depth, or do they run at 8 bits per channel? Any idea on the bits per channel on the external monitor?

    This is all fascinating data. Thank you for providing it!
    Nvidia Optimus and AMD XConnect are official software solutions in Windows 10 to minimize performance loss (10-15%) when feeding the data from eGPU back to the internal display. There's absolutely no official software or hardware support in macOS. All progress so far are through trial and error. We've learned theoretical max PCIe bandwidth and throughput mean very little. In the demo video, I basically tricked the MBP into using the ghost display (HDMI headless adapter) to accelerate Unigine Valley through the GTX 1080 Ti eGPU. Once Unigine Valley was running, I moved it to the internal display via a software utility called Spectacle.

    It's a demonstration of will rather than usability with eGPU in macOS atm. External GPU is a great solution for the direction Apple Mac computers are heading. So it makes very little sense to us why Apple hasn't already endorsed it. Unless Apple is building its own solution with the "Pro Display". While you wait, check out my not-so-pro display with an eGPU hanging out the back. :smiley: 

    theitsage said:

    Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.
    Appreciate your input!
    Thank you for spreading the word on eGPU for Mac!
    edited April 2017
  • Reply 31 of 34
    Having your main graphics card in a separate enclosure always seemed crippled to me.  Can someone explain the benefits of this over internal PCIe bus in terms of speed/latency etc.?
  • Reply 32 of 34
    Mike WuertheleMike Wuerthele Posts: 6,861administrator
    Having your main graphics card in a separate enclosure always seemed crippled to me.  Can someone explain the benefits of this over internal PCIe bus in terms of speed/latency etc.?
    It's not an advantage -- but it is the only game in town if you have a Thunderbolt Mac without PCI-e ports.
  • Reply 33 of 34
    zimmiezimmie Posts: 651member
    Having your main graphics card in a separate enclosure always seemed crippled to me.  Can someone explain the benefits of this over internal PCIe bus in terms of speed/latency etc.?
    It's not an advantage -- but it is the only game in town if you have a Thunderbolt Mac without PCI-e ports.
    Which does sort of make it an advantage in that it allows the main computer to be smaller and to run cooler. If you don't need a desktop GPU when moving your laptop around, but you enjoy gaming at home, an external GPU potentially allows you to do both.

    It isn't an advantage in capability, but it is an advantage in flexibility.

    Of course, another viable option is a laptop when mobile and a separate desktop for gaming. Tons of people go that direction. That's another sort of compromise, because it's harder to share your files and so forth. Just like having a small SSD for OS and software with a large rotational drive for music and videos is a compromise and having a single gargantuan SSD is a different compromise.
  • Reply 34 of 34
    zimmiezimmie Posts: 651member

    theitsage said:
    zimmie said:
    theitsage said:
    zimmie said:
    zimmie said:
    PCIe throughput matters less than most people think for most GPU use. You can block off PCIe lanes with tape. On most GPUs in most games, there is no difference at all dropping it from 16 lanes to eight. When you drop to four lanes, you typically get longer loading times and a small framerate drop. Dropping to two lanes typically gives significantly longer loading times and >30% framerate drops.

    This can matter for OpenCL use, but it frequently does not. PCIe throughput is only really used getting your dataset into the video card's RAM and getting the result out. It will slow down some really trivial data manipulation, and it will be slower to work on datasets too large for the card's RAM. Anything which requires more than a few seconds to compute won't be meaningfully slower on an eight-lane or four-lane bus.
    I know less than nothing about computer architecture, so please forgive me if I misunderstood what @"Mike Wuerthele" wrote, but the impression I got is that the bandwidth limitation becomes an issue because the Thunderbolt 3 buss is a two-way street in this scenario. It has to carry both the instructions to the card AND all the pixel data back to the internal display. That's why performance was better when using an external display.

    Is my understanding correct or have I missed something?
    They didn't give the pixel dimensions of the external display. The 15" MBP's internal display is 2880x1800 pixels for a total of a little under 5.2 million pixels. The most common external displays I see are 1920x1080, which is just under 2.1 million pixels. I don't know if these benchmarks correct for the output pixel count. If they don't, a 2x improvement could come solely from running 1/2 the pixels.
    On the 2016 15" MBP, the internal display automatically scales to 1680x1050 when running Unigine benchmarks in 1920x1080 fullscreen mode. Therefore, I set all tests to run at Unigine Valley default Extreme settings which is 1600x900. You can click on the result numbers to see the screen caps with additional information.

    Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.
    Oh! I didn't notice those were links. So the test was controlled for pixel count. Good. Still doesn't seem like shuffling the pixels back to the internal display should cost that much performance, but there aren't many other variables.

    1600x900 at 16 bits per channel is 69.1 Mb per frame. At 60 frames per second, that's only 4.1 gigabits of traffic. Thunderbolt 3 limits PCIe to 22 gigabits per second. Assuming the return data has to be pulled from a framebuffer over PCIe, that's about 19% of the possible throughput. 82 FPS (the eGPU, but internal display), brings that to about 26% of the potential throughput.

    This changes proportionally with color depth, of course. I picked 16 bits per channel as a common "deep color" depth. Do the benchmarks actually use full color depth, or do they run at 8 bits per channel? Any idea on the bits per channel on the external monitor?

    This is all fascinating data. Thank you for providing it!
    Nvidia Optimus and AMD XConnect are official software solutions in Windows 10 to minimize performance loss (10-15%) when feeding the data from eGPU back to the internal display. There's absolutely no official software or hardware support in macOS. All progress so far are through trial and error. We've learned theoretical max PCIe bandwidth and throughput mean very little. In the demo video, I basically tricked the MBP into using the ghost display (HDMI headless adapter) to accelerate Unigine Valley through the GTX 1080 Ti eGPU. Once Unigine Valley was running, I moved it to the internal display via a software utility called Spectacle.

    It's a demonstration of will rather than usability with eGPU in macOS atm. External GPU is a great solution for the direction Apple Mac computers are heading. So it makes very little sense to us why Apple hasn't already endorsed it. Unless Apple is building its own solution with the "Pro Display". While you wait, check out my not-so-pro display with an eGPU hanging out the back. :smiley: 
    Ah. So you started it on the "external display" provided by the dummy load, then got macOS to move the window back onto the internal display after the GL rendering target was running. I'm pretty sure the OS is just grabbing the contents of the framebuffer at that point, vaguely like operating on a remote memory cache in a NUMA system. I assume it would do that over PCIe rather than the DisplayPort virtual channel. From briefly poking through information on XConnect, it sounds like they're doing the switchable GPU stuff we've had since ~2008, just over a remote connection. I have a lot of reading to do.

    I wonder how it would do if you went to Settings > Mission Control and disabled "Displays have separate Spaces". That should take you back to the old compositor, which treats all monitors as one contiguous rendering target instead of each monitor being completely separate. Specifically, I wonder if that would give you the lower stats even on the external monitor. I also wonder what the old compositor would do with the window half on each monitor.

    Probably not useful tests, but they might give a better understanding of how the OS is shuffling the pixel data around.
Sign In or Register to comment.