Nvidia 1080ti with new drivers in external enclosure quadruples MacBook Pro native perform...

Mike Wuerthele · April 18, 2017 1:49AM

KrisArkade said:

Can we see tests on a 2016 Core i7 13 inch MacBook? I mean it's amazing that this works but it's kind of redundant to be testing this on a laptop that already has a pretty decent dedicated GPU.

seeing as the 13 inch Core i7's are running the Iris 550 chipsets I can see where an eGPU would be far more useful in this scenario.

If I can get my hands on one, sure. The benchmarks would change a little bit for the eGPU, maybe downwards by 10%.

jmey267 · April 18, 2017 1:52AM

I just tested my Mac Pro 5,1 with a gtx1050ti using valley 1.0 got a score of 1694 and averaged 40fps with the same settings the author used 1680x1050 8xAA full screen. Not bad for a 7 year old machine with a new cheap nvidia card $120 after rebate.

lorin schultz · April 18, 2017 2:17AM

xzu said:

bobolicious said:

Is the Blade Pro worth a performance comparison (OS aside) with its GTX 1080 and phase change cooling system ?
https://www.razerzone.com/gaming-systems/razer-blade-pro
http://www.pcmag.com/news/352646/razer-blade-pro-laptop-gets-thx-certification-kaby-lake

That is first Laptop I would consider a desktop replacement. Kaby Lake, 32gb Ram, dual PCIe drives, and a 1080, oh my goodness.

The eGPU is exciting as well. At least we are moving in the right direction.

Though apparently no one told Razer that a large percentage of creative professionals are left handed. Whoops.

shapetables · April 18, 2017 5:27AM

Sounds like this vector processor may be good for running a next generation AI platform on your desk to control a swarm of drones and droids, but if it's connected it to (or near) certain "unshielded" external monitors, a loud pop may be heard and reportedly followed by UltraFine radiation emissions that may induce unusual changes in mood, behavior or thoughts of suicide, especially in children, teens and young adults (elderly dementia patients may also have an increased risk of death or suicide) as they realize how much this POS cost them and under no circumstances should it be used near an air traffic corridor so as to avoid interference with aircraft navigation systems. /s

KrisArkade · April 18, 2017 6:54AM

Mike Wuerthele said:

KrisArkade said:

Can we see tests on a 2016 Core i7 13 inch MacBook? I mean it's amazing that this works but it's kind of redundant to be testing this on a laptop that already has a pretty decent dedicated GPU.

seeing as the 13 inch Core i7's are running the Iris 550 chipsets I can see where an eGPU would be far more useful in this scenario.

If I can get my hands on one, sure. The benchmarks would change a little bit for the eGPU, maybe downwards by 10%.

That would be awesome and much appreciated. It may be the deciding factor for me when it comes to which laptop I'd like to go with. I like the portability of my MacBook Air and the core i7 MacBook Pro 13 inch is a little beast in of itself. Not as powerful as the 15 but it's an acceptable margin I think.

Id love to see the eGPU numbers with the Core i7 13 inch. Thank you!!

theitsage · April 18, 2017 12:45PM

KrisArkade said:

Mike Wuerthele said:

KrisArkade said:

Can we see tests on a 2016 Core i7 13 inch MacBook? I mean it's amazing that this works but it's kind of redundant to be testing this on a laptop that already has a pretty decent dedicated GPU.

seeing as the 13 inch Core i7's are running the Iris 550 chipsets I can see where an eGPU would be far more useful in this scenario.

If I can get my hands on one, sure. The benchmarks would change a little bit for the eGPU, maybe downwards by 10%.

That would be awesome and much appreciated. It may be the deciding factor for me when it comes to which laptop I'd like to go with. I like the portability of my MacBook Air and the core i7 MacBook Pro 13 inch is a little beast in of itself. Not as powerful as the 15 but it's an acceptable margin I think.

Id love to see the eGPU numbers with the Core i7 13 inch. Thank you!!

We have been keeping track of eGPU implementations. There are a handful of Late 2016 13" MacBook Pro + eGPU pairings.

theitsage · April 18, 2017 1:01PM

zimmie said:

lorin schultz said:

zimmie said:

PCIe throughput matters less than most people think for most GPU use. You can block off PCIe lanes with tape. On most GPUs in most games, there is no difference at all dropping it from 16 lanes to eight. When you drop to four lanes, you typically get longer loading times and a small framerate drop. Dropping to two lanes typically gives significantly longer loading times and >30% framerate drops.

This can matter for OpenCL use, but it frequently does not. PCIe throughput is only really used getting your dataset into the video card's RAM and getting the result out. It will slow down some really trivial data manipulation, and it will be slower to work on datasets too large for the card's RAM. Anything which requires more than a few seconds to compute won't be meaningfully slower on an eight-lane or four-lane bus.

I know less than nothing about computer architecture, so please forgive me if I misunderstood what @"Mike Wuerthele" wrote, but the impression I got is that the bandwidth limitation becomes an issue because the Thunderbolt 3 buss is a two-way street in this scenario. It has to carry both the instructions to the card AND all the pixel data back to the internal display. That's why performance was better when using an external display.

Is my understanding correct or have I missed something?

They didn't give the pixel dimensions of the external display. The 15" MBP's internal display is 2880x1800 pixels for a total of a little under 5.2 million pixels. The most common external displays I see are 1920x1080, which is just under 2.1 million pixels. I don't know if these benchmarks correct for the output pixel count. If they don't, a 2x improvement could come solely from running 1/2 the pixels.

On the 2016 15" MBP, the internal display automatically scales to 1680x1050 when running Unigine benchmarks in 1920x1080 fullscreen mode. Therefore, I set all tests to run at Unigine Valley default Extreme settings which is 1600x900. You can click on the result numbers to see the screen caps with additional information.

Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.

Mike Wuerthele · April 18, 2017 1:11PM

theitsage said:

Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.

Appreciate your input!

zimmie · April 18, 2017 5:22PM

theitsage said:

zimmie said:

lorin schultz said:

zimmie said:

PCIe throughput matters less than most people think for most GPU use. You can block off PCIe lanes with tape. On most GPUs in most games, there is no difference at all dropping it from 16 lanes to eight. When you drop to four lanes, you typically get longer loading times and a small framerate drop. Dropping to two lanes typically gives significantly longer loading times and >30% framerate drops.

This can matter for OpenCL use, but it frequently does not. PCIe throughput is only really used getting your dataset into the video card's RAM and getting the result out. It will slow down some really trivial data manipulation, and it will be slower to work on datasets too large for the card's RAM. Anything which requires more than a few seconds to compute won't be meaningfully slower on an eight-lane or four-lane bus.

I know less than nothing about computer architecture, so please forgive me if I misunderstood what @"Mike Wuerthele" wrote, but the impression I got is that the bandwidth limitation becomes an issue because the Thunderbolt 3 buss is a two-way street in this scenario. It has to carry both the instructions to the card AND all the pixel data back to the internal display. That's why performance was better when using an external display.

Is my understanding correct or have I missed something?

They didn't give the pixel dimensions of the external display. The 15" MBP's internal display is 2880x1800 pixels for a total of a little under 5.2 million pixels. The most common external displays I see are 1920x1080, which is just under 2.1 million pixels. I don't know if these benchmarks correct for the output pixel count. If they don't, a 2x improvement could come solely from running 1/2 the pixels.

On the 2016 15" MBP, the internal display automatically scales to 1680x1050 when running Unigine benchmarks in 1920x1080 fullscreen mode. Therefore, I set all tests to run at Unigine Valley default Extreme settings which is 1600x900. You can click on the result numbers to see the screen caps with additional information.

Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.

Oh! I didn't notice those were links. So the test was controlled for pixel count. Good. Still doesn't seem like shuffling the pixels back to the internal display should cost that much performance, but there aren't many other variables.

1600x900 at 16 bits per channel is 69.1 Mb per frame. At 60 frames per second, that's only 4.1 gigabits of traffic. Thunderbolt 3 limits PCIe to 22 gigabits per second. Assuming the return data has to be pulled from a framebuffer over PCIe, that's about 19% of the possible throughput. 82 FPS (the eGPU, but internal display), brings that to about 26% of the potential throughput.

This changes proportionally with color depth, of course. I picked 16 bits per channel as a common "deep color" depth. Do the benchmarks actually use full color depth, or do they run at 8 bits per channel? Any idea on the bits per channel on the external monitor?

This is all fascinating data. Thank you for providing it!

theitsage · April 18, 2017 7:33PM

zimmie said:

theitsage said:

zimmie said:

lorin schultz said:

zimmie said:

PCIe throughput matters less than most people think for most GPU use. You can block off PCIe lanes with tape. On most GPUs in most games, there is no difference at all dropping it from 16 lanes to eight. When you drop to four lanes, you typically get longer loading times and a small framerate drop. Dropping to two lanes typically gives significantly longer loading times and >30% framerate drops.

This can matter for OpenCL use, but it frequently does not. PCIe throughput is only really used getting your dataset into the video card's RAM and getting the result out. It will slow down some really trivial data manipulation, and it will be slower to work on datasets too large for the card's RAM. Anything which requires more than a few seconds to compute won't be meaningfully slower on an eight-lane or four-lane bus.

I know less than nothing about computer architecture, so please forgive me if I misunderstood what @"Mike Wuerthele" wrote, but the impression I got is that the bandwidth limitation becomes an issue because the Thunderbolt 3 buss is a two-way street in this scenario. It has to carry both the instructions to the card AND all the pixel data back to the internal display. That's why performance was better when using an external display.

Is my understanding correct or have I missed something?

They didn't give the pixel dimensions of the external display. The 15" MBP's internal display is 2880x1800 pixels for a total of a little under 5.2 million pixels. The most common external displays I see are 1920x1080, which is just under 2.1 million pixels. I don't know if these benchmarks correct for the output pixel count. If they don't, a 2x improvement could come solely from running 1/2 the pixels.

On the 2016 15" MBP, the internal display automatically scales to 1680x1050 when running Unigine benchmarks in 1920x1080 fullscreen mode. Therefore, I set all tests to run at Unigine Valley default Extreme settings which is 1600x900. You can click on the result numbers to see the screen caps with additional information.

Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.

Oh! I didn't notice those were links. So the test was controlled for pixel count. Good. Still doesn't seem like shuffling the pixels back to the internal display should cost that much performance, but there aren't many other variables.

1600x900 at 16 bits per channel is 69.1 Mb per frame. At 60 frames per second, that's only 4.1 gigabits of traffic. Thunderbolt 3 limits PCIe to 22 gigabits per second. Assuming the return data has to be pulled from a framebuffer over PCIe, that's about 19% of the possible throughput. 82 FPS (the eGPU, but internal display), brings that to about 26% of the potential throughput.

This changes proportionally with color depth, of course. I picked 16 bits per channel as a common "deep color" depth. Do the benchmarks actually use full color depth, or do they run at 8 bits per channel? Any idea on the bits per channel on the external monitor?

This is all fascinating data. Thank you for providing it!

Nvidia Optimus and AMD XConnect are official software solutions in Windows 10 to minimize performance loss (10-15%) when feeding the data from eGPU back to the internal display. There's absolutely no official software or hardware support in macOS. All progress so far are through trial and error. We've learned theoretical max PCIe bandwidth and throughput mean very little. In the demo video, I basically tricked the MBP into using the ghost display (HDMI headless adapter) to accelerate Unigine Valley through the GTX 1080 Ti eGPU. Once Unigine Valley was running, I moved it to the internal display via a software utility called Spectacle.

It's a demonstration of will rather than usability with eGPU in macOS atm. External GPU is a great solution for the direction Apple Mac computers are heading. So it makes very little sense to us why Apple hasn't already endorsed it. Unless Apple is building its own solution with the "Pro Display". While you wait, check out my not-so-pro display with an eGPU hanging out the back.

Mike Wuerthele said:

theitsage said:

Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.

Appreciate your input!

Thank you for spreading the word on eGPU for Mac!

aussiepaul · April 19, 2017 7:15AM

Having your main graphics card in a separate enclosure always seemed crippled to me. Can someone explain the benefits of this over internal PCIe bus in terms of speed/latency etc.?

Mike Wuerthele · April 19, 2017 11:05AM

aussiepaul said:

Having your main graphics card in a separate enclosure always seemed crippled to me. Can someone explain the benefits of this over internal PCIe bus in terms of speed/latency etc.?

It's not an advantage -- but it is the only game in town if you have a Thunderbolt Mac without PCI-e ports.

zimmie · April 19, 2017 10:16PM

Mike Wuerthele said:

aussiepaul said:

Having your main graphics card in a separate enclosure always seemed crippled to me. Can someone explain the benefits of this over internal PCIe bus in terms of speed/latency etc.?

It's not an advantage -- but it is the only game in town if you have a Thunderbolt Mac without PCI-e ports.

Which does sort of make it an advantage in that it allows the main computer to be smaller and to run cooler. If you don't need a desktop GPU when moving your laptop around, but you enjoy gaming at home, an external GPU potentially allows you to do both.

It isn't an advantage in capability, but it is an advantage in flexibility.

Of course, another viable option is a laptop when mobile and a separate desktop for gaming. Tons of people go that direction. That's another sort of compromise, because it's harder to share your files and so forth. Just like having a small SSD for OS and software with a large rotational drive for music and videos is a compromise and having a single gargantuan SSD is a different compromise.

zimmie · April 19, 2017 11:11PM

theitsage said:

zimmie said:

theitsage said:

zimmie said:

lorin schultz said:

zimmie said:

PCIe throughput matters less than most people think for most GPU use. You can block off PCIe lanes with tape. On most GPUs in most games, there is no difference at all dropping it from 16 lanes to eight. When you drop to four lanes, you typically get longer loading times and a small framerate drop. Dropping to two lanes typically gives significantly longer loading times and >30% framerate drops.

This can matter for OpenCL use, but it frequently does not. PCIe throughput is only really used getting your dataset into the video card's RAM and getting the result out. It will slow down some really trivial data manipulation, and it will be slower to work on datasets too large for the card's RAM. Anything which requires more than a few seconds to compute won't be meaningfully slower on an eight-lane or four-lane bus.

I know less than nothing about computer architecture, so please forgive me if I misunderstood what @"Mike Wuerthele" wrote, but the impression I got is that the bandwidth limitation becomes an issue because the Thunderbolt 3 buss is a two-way street in this scenario. It has to carry both the instructions to the card AND all the pixel data back to the internal display. That's why performance was better when using an external display.

Is my understanding correct or have I missed something?

They didn't give the pixel dimensions of the external display. The 15" MBP's internal display is 2880x1800 pixels for a total of a little under 5.2 million pixels. The most common external displays I see are 1920x1080, which is just under 2.1 million pixels. I don't know if these benchmarks correct for the output pixel count. If they don't, a 2x improvement could come solely from running 1/2 the pixels.

On the 2016 15" MBP, the internal display automatically scales to 1680x1050 when running Unigine benchmarks in 1920x1080 fullscreen mode. Therefore, I set all tests to run at Unigine Valley default Extreme settings which is 1600x900. You can click on the result numbers to see the screen caps with additional information.

Keep in mind the Nvidia Pascal drivers are beta and there's lots of room for further optimization. Running these same benchmarks in Windows would have shown a more accurate difference. The intent of the article however was to showcase the latest Nvidia GPUs running in macOS.

Oh! I didn't notice those were links. So the test was controlled for pixel count. Good. Still doesn't seem like shuffling the pixels back to the internal display should cost that much performance, but there aren't many other variables.

1600x900 at 16 bits per channel is 69.1 Mb per frame. At 60 frames per second, that's only 4.1 gigabits of traffic. Thunderbolt 3 limits PCIe to 22 gigabits per second. Assuming the return data has to be pulled from a framebuffer over PCIe, that's about 19% of the possible throughput. 82 FPS (the eGPU, but internal display), brings that to about 26% of the potential throughput.

This changes proportionally with color depth, of course. I picked 16 bits per channel as a common "deep color" depth. Do the benchmarks actually use full color depth, or do they run at 8 bits per channel? Any idea on the bits per channel on the external monitor?

This is all fascinating data. Thank you for providing it!

Nvidia Optimus and AMD XConnect are official software solutions in Windows 10 to minimize performance loss (10-15%) when feeding the data from eGPU back to the internal display. There's absolutely no official software or hardware support in macOS. All progress so far are through trial and error. We've learned theoretical max PCIe bandwidth and throughput mean very little. In the demo video, I basically tricked the MBP into using the ghost display (HDMI headless adapter) to accelerate Unigine Valley through the GTX 1080 Ti eGPU. Once Unigine Valley was running, I moved it to the internal display via a software utility called Spectacle.

It's a demonstration of will rather than usability with eGPU in macOS atm. External GPU is a great solution for the direction Apple Mac computers are heading. So it makes very little sense to us why Apple hasn't already endorsed it. Unless Apple is building its own solution with the "Pro Display". While you wait, check out my not-so-pro display with an eGPU hanging out the back.

Ah. So you started it on the "external display" provided by the dummy load, then got macOS to move the window back onto the internal display after the GL rendering target was running. I'm pretty sure the OS is just grabbing the contents of the framebuffer at that point, vaguely like operating on a remote memory cache in a NUMA system. I assume it would do that over PCIe rather than the DisplayPort virtual channel. From briefly poking through information on XConnect, it sounds like they're doing the switchable GPU stuff we've had since ~2008, just over a remote connection. I have a lot of reading to do.

I wonder how it would do if you went to Settings > Mission Control and disabled "Displays have separate Spaces". That should take you back to the old compositor, which treats all monitors as one contiguous rendering target instead of each monitor being completely separate. Specifically, I wonder if that would give you the lower stats even on the external monitor. I also wonder what the old compositor would do with the window half on each monitor.

Probably not useful tests, but they might give a better understanding of how the OS is shuffling the pixel data around.

Nvidia 1080ti with new drivers in external enclosure quadruples MacBook Pro native perform...

Comments