Why Cheating on Smartphone Benchmarks Matters to You
Earlier this month a story posted on popular tech review site Anandtech discovered some interesting data when looking at the performance of flagship Huawei smartphones. As it turns out, benchmark scores in some popular graphics tests, including UL Benchmark’s 3DMark and long-time mobile graphics test GFXBench, were being artificially inflated to gain an advantage over competing phones and application processors.
These weren’t small changes. Performance in a particular subset of the GFXBench test (T-Rex offscreen) jumped from 66.54 FPS to 127.36 FPS, an improvement of more than 2x. The lower score is what the testing showed when “benchmark detection mode” was turned off – in other words, when the operating system and device was under the assumption that this was a normal game. The higher score is generated when the operating system (customized by Huawei) is able to detect a popular benchmark application and jump up power consumption on the chip outside levels that would actually be integrated in a phone. This is done so that reviews that utilize these common tests paint the Huawei devices in a more favorable light.
The team behind the Geekbench benchmark found similar results, and I posted about them on ShroutResearch.com recently. Those results showed multi-core performance deltas as high 31% in favor of the “cheating” mode.
While higher scores are better, of course, there are significant problems with the actions Huawei undertook to mislead the editorial audience and consumers.
First and maybe most importantly for Huawei going forward, is that this testing and revelation paints the newly announced Kirin 980 SoC (developed in-house by HiSilicon) in a totally different light. While the launch press conference looked to show a new mobile chipset that could run screaming past Qualcomm’s Snapdragon 845 platform, we now look at the presented benchmarks from Huawei as dubious at best. Will the Kirin 980 actually live up to the claims that the company put forward?
The most obvious group affected by Huawei’s decision to misrepresent current shipping devices is the consumer. For buyers of flagship devices that often depend on reviews, and the benchmarks that lead up to an author making a recommendation, to aid in the buying process. And customers that are particularly interested in the gaming and high-end application performance of their smartphones would pay even more direct attention to benchmark results, some of which are falsely presented.
Other players in the smartphone market that are not taking part in the act of cheating on benchmarks also suffer due to Huawei’s actions, which is obviously the point. Competing handset vendors like Samsung, Oppo, Vivo, perhaps even Apple, are handicapped by the performance claims Huawei has made, showing the competing devices in an artificially negative light. In the Chinese market where benchmarks and performance marketing are even more important than in the US, Huawei’s attempt to stem the tide of competition has the most affect.
To a lesser degree, this hurts Qualcomm and Samsung’s Exynos products too, making their application processor solutions look like they are falling behind when in fact they may actually be the leaders. Most of the high-end smartphones in China and the rest of the world are built around the Snapdragon line and pressure from its own customers after seeing Huawei supposedly taking performance leadership was growing.
This impacts the software developers of tools like 3DMark, Geekbench, and GFXBench as well. To some on the outside this will invalidate the work and taint the impact of other, non-cheating results in these tests. Consumers will start to fear that other scores are artificially inflated and not a representation of the performance they should expect to see in their devices. Other silicon and device vendors might back out of support for the tools, reducing the development resources for these companies to improve and innovate on benchmark methodology.
Huawei’s answer of “it’s just some AI” that is purposefully resulting in the shifting benchmark scores has the potential to cause damage to the entire AI movement. If consumers begin to associate AI-enabled devices and software as misrepresenting their work, or that everything that integrates AI is actually a scam, we could roll back the significant momentum the market has built and risk cutting it off completely.
Measuring performance on smartphones is already a complicated and tenuous task. The benchmarks we have today are imperfect and arguably need to undergo some changes to more accurately represent the real-world experiences that consumers get with different devices and more capable processors. But undergoing acts like cheating makes it harder for the community at large to work together and address the performance questions as a whole.
Do we need better mobile device testing for performance and features and cameras and experiences? Yes. But cheating isn’t the way to change things and, when caught, can do significant damage to a company’s reputation.