Galaxy S 4 on steroids: Samsung caught doping in benchmarks

tooltalk · July 31, 2013 1:35AM

Quote:

Originally Posted by Bao D Nguyen

This is so funny. Reminds me of the cheating Korean students in college. Yes, they were all cheating all the time.

Please let's not go down this road. There are cheaters, liars, criminals in every ethnic groups.

patrickwalker · July 31, 2013 6:31AM

Who really pays attention to these benchmarks except people like Anand and the market he caters to?

cnocbui · July 31, 2013 7:03AM

That is pretty low of Samsung.

Intel do something like this too. Apparently their C++ compiler checks to see if the CPU vendor ID is genuineintel and if it isn't it generates intentionally sub-optimal code.

tallest skil · July 31, 2013 8:45AM

patrickwalker wrote: »

Who really pays attention to these benchmarks…

Unacceptable reaction.

mikejones · July 31, 2013 9:26AM

Quote:

Originally Posted by LAKings33

This is only in the International Exynos 5410 version of the Galaxy S4, NOT the Snapdragon 600 model.

It would also be worth pointing out that even without the overclock, the Exynos 5410 still offers double the performance of the A6 inside of the iPhone 5, and both are behind the Snapdragon 600.

GFXBench 2.7 T-Rex HD C24Z16 - Offscreen (1080p):

Apple A6 - 373 Frames

Samsung Exynos 5410 - 794 Frames

Qualcomm Snapdragon 600 - 971 Frames

Is that supposed to be frames per second? If it is, who the hell cares? You can't play the games at those frame rates anyway. So who the hell cares other than someone trying to epeen brag?

vicaustin · July 31, 2013 9:36AM

Clearly fraud. A class of Galaxy S4 buyers ought to get themselves a free iPhone out of Samsung.

I wasn't damaged. I'm paying attention. But somebody must have been.

kdarling · July 31, 2013 9:42AM

Quote:

Originally Posted by diplication

I'm waiting for Gatorguy and KDarling to weigh in on this issue!

Really? Okay.

It was underhanded. It was also extremely dumb, publicity and reputation wise, for someone at Samsung to do this. They need to send an edict from the top, telling their employees to straighten up or expect to be fired.

At the same time, it's hardly shocking or rare. That's why I didn't care to comment. It's like a news article about someone getting an email from a fake Nigerian prince. Tuning for benchmarks has gone on since benchmarks were invented.

Remember the 2003 Power Mac G5 benchmark scandal? Where Steve Jobs quoted tests from a company they hired to "prove" that the G5 outperformed Intel desktops? It turned out they used Apple supplied tools to tweak the performance, and also changed the malloc code to work faster.

Graphics card makers are infamous for watching for benchmarks.

Various browsers often had / have code that watches for benchmarks. Then there's the opposite problem. Sometimes companies come up with their own benchmarks that favor themselves. Microsoft has done that.

This is why benchmarks need to be changed fairly often, so that programmers can't watch for them.

The upshot is, I'd actually be more surprised if every OS didn't have something tweaked to look good in benchmarks, even if they don't go so far as to explicitly watch for certain apps. That's why I tell people to never trust benchmarks. Or emails from Nigerian princes.

lakings33 · July 31, 2013 10:34AM

Quote:

Originally Posted by jragosta

Which merely points out how useless those benchmarks are - since the iPhone is easily as fast in real life (and probably faster).

Actually you fail to understand the purpose of those benchmarks and how they are applied. UI optimization is different from 3D performance, something the A6 is lacking in (it's also missing key API support for OpenGL ES 3.0, something only the Adreno 320 inside of the Snapdragon SoC has). You also fail to realise those scores are to compare SoCs on an equal playing field, hence offscreen 1080p. On screen performance is at a different resolution which depends on the device.

Example - native resolution performance:

Apple iPad 4 (A6X) - 2048x1536 - 713 Frames

Samsung Galaxy S4 (Exynos 5410) - 1920x1080 - 794 Frames

iPhone 5 (A6) - 1136x640 - 817 Frames

Samsung Galaxy S4 (Snapdragon 600) - 1920x1080 - 971 Frames

Google/LG Nexus 4 (Snapdragon S4 Pro) - 1280x720 - 1252 Frames

Samsung Galaxy S4 LTE-Advanced (Snapdragon 800) - 1920x1080 - 1479 Frames

As you can see, the iPhone 5 has a much improved performance at its native resolution allowing it to even outperform the iPad 4.

lakings33 · July 31, 2013 10:43AM

Quote:

Originally Posted by MikeJones

Is that supposed to be frames per second? If it is, who the hell cares? You can't play the games at those frame rates anyway. So who the hell cares other than someone trying to epeen brag?

No, the Fps score can be calculated from that.

GFXBench 2.7 T-Rex HD C24Z16 - Offscreen (1080p):

NVIDIA Tegra 3 - 234 Frames (4.2 Fps)

Apple A6 - 373 Frames (6.7 Fps)

Samsung Exynos 5410 - 794 Frames (14.2 Fps)

Qualcomm Snapdragon S4 Pro - 871 Frames (15.6 Fps)

Apple A6X - 971 Frames (17.3 Fps)

Qualcomm Snapdragon 600 - 971 Frames (17.3 Fps)

NVIDIA Tegra 4 - 1363 Frames (24.3 Fps)

Qualcomm Snapdragon 800 - 1479 Frames (26.4 Fps)

chick · July 31, 2013 1:20PM

Re: the Intel benchmarking. I once consulted at Houston Instuments (they made plotters). I was hired to rewrite the code which governed how the inkjet nozzles spit out the ink droplets. In doing so, I discovered that the compiler that I was provided was optimizing some of my code away. Negating the changes they wanted. I had to turn off certain optimizations to make the desired changes. In further investigating the compiler, I discovered that the compiler optimized the code which did a memory self test away. Nothing sinister, but the compiler recognized that the code which copied the old memory data and then wrote patterns to test the memory ended up restoring the original data back in the same address without ever using the temporary patterns anywhere else...so it optimized the patterns out of the code. No one at HI recognized that the memory test did nothing because the plotter display still showed the test executing and counting down on the display depending upon the total amount of memory installed. If you examined the assembly code you wrote, everything looked ok but if you looked at the resulting machine code, you discovered the problem. I see too many programmers/engineers that don't do proper sanity checks. If it compiles with no errors and runs with no obvious errors then all is good. If it runs twice as fast as the code it replaced, then they say WOW! that optimizing compiler really works. Either inexperienced or incompetent.

Another thing, optimizing compilers have switches that allow you select which optimizations are allowed just because of things like my experience above. Also they read the cpuid in order to turn on or off the assembly code for various instructions depending on the rev and type of processor. Different processors have different capabilities depending upon their design and the cpuid is used to identifiy the differences between them. Some processors may have a bug which keeps one or more instructions from working as specified. If a compiler is aware, it can either flag instances where the instruction is used or substitute other code automatically (not my preference since it may work but be very sup-optimal for my particular purposes). Anyway what I am getting at is that the arm versus atom benchmark was probably an oops by the programmer(s) at AnTuTu. Btw, I have been retired for over 5 years now and haven't really kept up with the x86 community so I have a question - who owns AnTuTu? If Intel owns or subsidizes them I am more inclined to think this was deliberate.

I used to work for Intel and I used to work for AMD. Intel always wrote their own reference compilers but got really serious about it when when they realized that many of the commercially available compilers weren't making full use of their instruction set (C, C++ etc) with assemblers it is painfully obvious if an instruction is not fully supported but this is not so with higher level languages.Then AMD found that the Intel compilers weren't making good use of the AMD instructions so AMD startede making their own reference compilers....and so it goes. For instance in the AMD K6, AMD optimized one instuction so it executed in 4 clock cycles, the Intel Pentium at the time took iirc 15 clock cycles. Sorry I no longer remember the opcode involved but it meant that unless that opcode was absolutely necessary, the Intel compiler substituted a two opcode instruction in its place which took less than 15 clock cycles and it did the same thing on the AMD processor which made the AMD compiled code take longer than necesary. So the compiler swwitches alone can make one processor look better than another or vice-versa. I can't really reasonably comment on arm vs. atom. The soc chips I have worked with are all older, 4 bit word/12 to 16 bit address space plus some of the 8 to 16 bit data/16 to 24 bit address space x86 and 68x based socs. My real expertise has been on the x86 space from the 8088 through some of the early Xeons and the K5 through the Opterons and Athlons. The last chip I had anything to do with was the bulldozer core at AMD and I retired before that chip was released.

So, the atom vs arm benchmark may have been deliberate but I doubt it from the way things seem to have worked out. On the other hand there is no doubt that Samsung was gaming the benchmarks. That's my two cents.

hjb · July 31, 2013 4:50PM

GSMArena reported Samsung's response to this allegation. http://blog.gsmarena.com/samsung-responds-to-benchmark-cheating-allegations/

That is;

Under ordinary conditions, the Galaxy S4 has been designed to allow a maximum GPU frequency of 533MHz. However, the maximum GPU frequency is lowered to 480MHz for certain gaming apps that may cause an overload, when they are used for a prolonged period of time in full-screen mode. Meanwhile, a maximum GPU frequency of 533MHz is applicable for running apps that are usually used in full-screen mode, such as the S Browser, Gallery, Camera, Video Player, and certain benchmarking apps, which also demand substantial performance.

The maximum GPU frequencies for the Galaxy S4 have been varied to provide optimal user experience for our customers, and were not intended to improve certain benchmark results.

We remain committed to providing our customers with the best possible user experience.

mac voyer · July 31, 2013 6:07PM

Headline of the year!

qamf · July 31, 2013 9:38PM

Quote:

Originally Posted by Chick

Re: the Intel benchmarking. I once consulted at Houston Instuments (they made plotters). I was hired to rewrite the code which governed how the inkjet nozzles spit out the ink droplets. In doing so, I discovered that the compiler that I was provided was optimizing some of my code away. Negating the changes they wanted. I had to turn off certain optimizations to make the desired changes. In further investigating the compiler, I discovered that the compiler optimized the code which did a memory self test away. Nothing sinister, but the compiler recognized that the code which copied the old memory data and then wrote patterns to test the memory ended up restoring the original data back in the same address without ever using the temporary patterns anywhere else...so it optimized the patterns out of the code. No one at HI recognized that the memory test did nothing because the plotter display still showed the test executing and counting down on the display depending upon the total amount of memory installed. If you examined the assembly code you wrote, everything looked ok but if you looked at the resulting machine code, you discovered the problem. I see too many programmers/engineers that don't do proper sanity checks. If it compiles with no errors and runs with no obvious errors then all is good. If it runs twice as fast as the code it replaced, then they say WOW! that optimizing compiler really works. Either inexperienced or incompetent.

Another thing, optimizing compilers have switches that allow you select which optimizations are allowed just because of things like my experience above. Also they read the cpuid in order to turn on or off the assembly code for various instructions depending on the rev and type of processor. Different processors have different capabilities depending upon their design and the cpuid is used to identifiy the differences between them. Some processors may have a bug which keeps one or more instructions from working as specified. If a compiler is aware, it can either flag instances where the instruction is used or substitute other code automatically (not my preference since it may work but be very sup-optimal for my particular purposes). Anyway what I am getting at is that the arm versus atom benchmark was probably an oops by the programmer(s) at AnTuTu. Btw, I have been retired for over 5 years now and haven't really kept up with the x86 community so I have a question - who owns AnTuTu? If Intel owns or subsidizes them I am more inclined to think this was deliberate.

I used to work for Intel and I used to work for AMD. Intel always wrote their own reference compilers but got really serious about it when when they realized that many of the commercially available compilers weren't making full use of their instruction set (C, C++ etc) with assemblers it is painfully obvious if an instruction is not fully supported but this is not so with higher level languages.Then AMD found that the Intel compilers weren't making good use of the AMD instructions so AMD startede making their own reference compilers....and so it goes. For instance in the AMD K6, AMD optimized one instuction so it executed in 4 clock cycles, the Intel Pentium at the time took iirc 15 clock cycles. Sorry I no longer remember the opcode involved but it meant that unless that opcode was absolutely necessary, the Intel compiler substituted a two opcode instruction in its place which took less than 15 clock cycles and it did the same thing on the AMD processor which made the AMD compiled code take longer than necesary. So the compiler swwitches alone can make one processor look better than another or vice-versa. I can't really reasonably comment on arm vs. atom. The soc chips I have worked with are all older, 4 bit word/12 to 16 bit address space plus some of the 8 to 16 bit data/16 to 24 bit address space x86 and 68x based socs. My real expertise has been on the x86 space from the 8088 through some of the early Xeons and the K5 through the Opterons and Athlons. The last chip I had anything to do with was the bulldozer core at AMD and I retired before that chip was released.

So, the atom vs arm benchmark may have been deliberate but I doubt it from the way things seem to have worked out. On the other hand there is no doubt that Samsung was gaming the benchmarks. That's my two cents.

the Antutu compiler changed part of the code to all 0's

Quote: Exophase (http://forums.anandtech.com/showthread.php?t=2330027)

What it's doing is, where possible, setting entire 32 bit runs to 0 or 1. The lines at f64c3 and f64c6 are critical. It's replacing 32 iterations of the ARM loop above with those two instructions. Needless to say, it's dozens of times faster doing it this way.

This is what we call breaking the benchmark. Where the compiler applies some logic that makes the benchmark much faster by doing a set of operations that the benchmark identifies as correct (if it even checks) but are not performing the intended function of the benchmark. Classic examples include omitting code entirely if the results are never read, or performing a complex computation at compile-time instead of run time if the inputs can determined to be constant (then just reporting the results).

In this case I'm sure Intel could claim that they're performing a legitimate optimization. Frankly, I doubt it; this kind of optimization would be difficult to recognize and apply in generic code. It'd also be for little benefit, because I've never seen someone use code like this to set or clear huge sets of bits. That part is kind of the catch, because this optimization would make the code slower if the run lengths weren't sufficiently large. In nbench's case they are, but there's no way the compiler could have known that on its own.

What's more, this optimization wasn't present in ICC until a recent release. Somehow I don't think that they just now discovered it has general purpose value. More likely case is that they discovered is they could manipulate AnTuTu's scores.

How is this utilizing more of the Intel CPU? This is blatantly cheating the purpose of the benchmark. This does not make the Intel CPU run the benchmark faster thanks to special functions, it makes the benchmark easier to run.

Much, much worse than what the S4 does, especially in light of Samsung's response, which, while not great, did show it wasn't just in some benchmarks (Barely)

-QAMF

Galaxy S 4 on steroids: Samsung caught doping in benchmarks

Comments