BenchmarkBoost-Gate- More Lies, More Damn Lies and More Damn Benchmarks | NotebookReview

tilleroftheearth Wisdom listens quietly...

Reputations:: 5,398

Messages:: 12,692

Likes Received:: 2,717

Trophy Points:: 631

Did some interesting listening this morning (~70 minutes) to Anand's latest podcast.

Interesting how he admits that not only have benchmarks been 'gamed' again lately, but that it started at least as long ago as when he was 15/16 years old...

See:
AnandTech | The AnandTech Podcast: Episode 24

Yeah, we're talking 'BenchmarkBoost-Gate'.

Whenever I've mentioned that benchmarks are not indicative of real world use, and that manufacturer's often 'cheat' by achieving high benchmark scores by tricking the bm software, or that unless your workflow is simply running benchmarks night and day; bm's are something to be ignored - I've been laughed at, accused of having conspiracy theories and generally made fun of for not believing the hundreds of thousands of websites that use benchmarks to show a product is better than another. Or simply told to get lost because I am raining on the 'score' parade...

The most recent example of this started around this post I made which is unsurprisingly with another product of the same Samsung (or now 'Shamesung') found out once again:

See:
http://forum.notebookreview.com/sol...ung-ssds-before-you-will-now.html#post9302724

The podcast says in black and white (hmmm... in audio, would that be soft and loud, loud and clear or empathic monotone?) that as soon as a reviewer confronted a manufacturer about a performance problem, the manufacturer would ask for the benchmarks so they could 'fix' the scores...

They continue and also state that code is injected into the product's driver/firmware that detects the bm running and simply spits out a better number than would otherwise be output (which I have stated here has happened before and will continue happening with bm's).

In this case, not only are 'scores' allowed to inflate - the actual hardware is pushed to potentially unsafe levels too... with the concern being heat burns or even worse caused by the unchecked TDP levels the units are allowed to run at...

Just so it doesn't seem like I'm picking on Samsung; Netgear also recently sued Asus for producing routers with higher than allowed radio output.

See:
NETGEAR Suing ASUS For Wireless Hanky Panky - SmallNetBuilder

And as proof that Asus was up to some kind of hanky panky, they released a new firmware in days with lower performance...

See:
ASUS RT-AC66U Second Wireless Retest - SmallNetBuilder

While it cost the RT-AC66u, the top ranked 'ac' router a 1 level drop in the 'ac 1750' rankings, it is now also well below the over 18 month old RT-N66U for greatest wireless range and throughput.

This doesn't help because the next logical question is: will the RT-N66U also be named in a court document sooner/later and will it also affect the performance of the units we've already bought?

Anyway;
With this post, I want to let others know (audio files are hard to search for on the web...) how insidious this 'cheating' is in the bm 'scores'. Not that I know for a fact that most manufacturers drop to this level...

What I do know is that I have not been able to trust bm scores for many, many years now when I'm in a position to buy new tech and/or upgrade existing/older components. The only way to do it properly, for me, is to buy the various versions of the product(s) I'm interested in and compare them to my running systems which I know the base performance of and see if the new component is faster or slower or see no discernible difference, and then act accordingly.

While this process of evaluating new components against my current standard systems will not change soon, I am amazed and impressed how Anand Lal Shimpi and Brian Klug are also on constant guard against this willingness (even today) by the biggest players (Samsung, ATI, Asus and... 'all' manufacturers....) to cheat and lie their way to present the best 'scores' for their products - even when they don't perform at that level under normal/comparable real world use.

With today's podcast, I have a newfound respect for benchmarks and the people that try to present them as pristine as possible - but that doesn't change the fact that they are still essentially unusable to me (in a buy/do not buy decision mode).

My question to the forum today is this:

Was there a time where you based a decision on a benchmark and post-purchase, found out you had been fooled?

My most recent example is partly stated above: recommending the RT-AC66U to clients - when at this point, it is clear that the RT-N66U is the more desirable (and cheaper) option when maximum wireless range/throughput is the most important aspect at the time of purchase.

The scary part is that these changes of performance are possible even after purchase - via firmware upgrades (and possibly even driver updates too, where applicable). No issues if performance increases!!! But I don't want a few dozen people knocking on my door asking why the hardware now is worse than what they already had...

You can suggest that I/they stick with the firmware that provides the greatest range: but the bugs/fixes and other enhancements that later firmware offers can only be ignored for so long (and not at all when one of the clients have a specific issue that later firmware fixes).

Of course, this post is in the hardware/aftermarket upgrade forum because this is where most people come to make a decision on new components.

Looking forward to your 'bm' stories (horror, or otherwise)...

And I'm going to make a prediction too: when (not if) a benchmark can be made undefeatable, and the 'scores' for the components it covers can be reliably used to make sound purchase decisions with; that is when we'll have a true 'compute' revival once again: the manufacturer's will no longer be able to make claims that can't be backed up (at the time of the product's introduction) - and if they are simply giving us the same, same old, same old... well - I hope that at that point ALL of us vote with our wallets and let them know to give us a real improvement, or we'll let them sit on their empty promises and rot.

Even if you have no 'bm' story to tell, I hope that you'll keep this real issue in mind and help spread the word around too.

This post was written by tilleroftheearth at 9.87zTPS* (real world 'score' to be converted/verified at a later date).

*zTPS = zillion thoughts per second

tilleroftheearth, Aug 4, 2013

#1

HTWingNut Potato

Reputations:: 21,580

Messages:: 35,370

Likes Received:: 9,877

Trophy Points:: 931

There will always be artificial benchmarks and real world performance tests. You can't rely on just one. In the same respect with any other product you wouldn't / shouldn't just take the review of the product by a single source to heart. The more informed the consumer, the better. I think this fixing of performance for a benchmark is simply ridiculous. We see it everywhere, but it only makes the OEM look dumb in the end. Again, part of the consumer being informed about the product they're buying.

Sure, it has to stop, but as consumers we also have to make it clear to them that we are not stupid and understand that those benchmarks are only part of the buying decision and not the whole.

HTWingNut, Aug 4, 2013

#2

djembe drum while you work

Reputations:: 1,064

Messages:: 1,455

Likes Received:: 203

Trophy Points:: 81

I understand how benchmarks can be gamed or hacked so certain devices score better, but they still provide a decent platform for cross-product comparison for those of us who cannot afford to try everything out in person. And if you know the focus of each benchmark, you can have a decent idea of performance against a familiar or established baseline.

Categorically ignoring benchmarks can result in purchasing a device that has much less performance than you may have thought based on advertisement. While companies can cheat sometimes and in some ways on benchmarks, they are still more representative of a device's performance than advertisement, which lists everything as the best system since the invention of the microchip.

djembe, Aug 4, 2013

#3

TANWare Just This Side of Senile, I think. Super Moderator

Reputations:: 2,548

Messages:: 9,585

Likes Received:: 4,997

Trophy Points:: 431

OEM's have been long known for tuning their hardware to run better under industry standard benchmarks. Case in point being SSD's with benchmarks tuned towards highly compressible data where in real world this is not the case. This though is true of everything from the CPU to just about every interface device ever made. With running higher that spec transmission power it seems they got caught with their hands in the cookie jar...............................

TANWare, Aug 5, 2013

#4

Qing Dao Notebook Deity

Reputations:: 1,600

Messages:: 1,771

Likes Received:: 304

Trophy Points:: 101

This is news to you? I take it you have been living under a rock since the dawn of the personal computer?

There are two types of benchmarking. There are those popular benchmark programs that only have one purpose, for bragging rights. Anybody can download these, run them on their system, and be given some numbers. Then there are benchmarks people run that are real programs that people use every day whose performance can be compared between two systems, but only under controlled conditions. This is what computer websites usually do when they test hardware against each other, although they still throw in the former types of benchmarks just for continuity.

Benchmarking is amazing and I use it heavily to base new purchases on. I have never been "burned" by it. In fact, I don't know what I would do without it.

Tiller, I know you don't like benchmarks, you say it all the time, but I think it is mostly because the well done controlled tests are often at odds with your over-enthusiastic claims.

Qing Dao, Aug 5, 2013

#5

saturnotaku Notebook Nobel Laureate

Reputations:: 4,879

Messages:: 8,926

Likes Received:: 4,701

Trophy Points:: 431

Qing Dao said: ↑

Tiller, I know you don't like benchmarks, you say it all the time, but I think it is mostly because the well done controlled tests are often at odds with your over-enthusiastic claims.

Click to expand...

Based on his posting history, Tiller doesn't use his computer the way 90+% of the public does, yet his personal experience is what he uses to recommend purchases to others.

saturnotaku, Aug 5, 2013

#6

tilleroftheearth Wisdom listens quietly...

Reputations:: 5,398

Messages:: 12,692

Likes Received:: 2,717

Trophy Points:: 631

HTWingNut said: ↑

There will always be artificial benchmarks and real world performance tests. You can't rely on just one. In the same respect with any other product you wouldn't / shouldn't just take the review of the product by a single source to heart. The more informed the consumer, the better. I think this fixing of performance for a benchmark is simply ridiculous. We see it everywhere, but it only makes the OEM look dumb in the end. Again, part of the consumer being informed about the product they're buying.

Click to expand...

(bold above, mine)

HTWingNut, thanks for your on-topic input. Yeah, that is my goal: to inform the (new) consumers. And that the only 'real world tests' that count are the ones that directly compare our current platform to the new/proposed/updated component/version in each of our specific workflows.

djembe said: ↑

I understand how benchmarks can be gamed or hacked so certain devices score better, but they still provide a decent platform for cross-product comparison for those of us who cannot afford to try everything out in person. And if you know the focus of each benchmark, you can have a decent idea of performance against a familiar or established baseline.

Categorically ignoring benchmarks can result in purchasing a device that has much less performance than you may have thought based on advertisement. While companies can cheat sometimes and in some ways on benchmarks, they are still more representative of a device's performance than advertisement, which lists everything as the best system since the invention of the microchip.

Click to expand...

djembe, great points and I agree with all of them. But what happens when the benchmarks themselves have been ized by certain manufacturer's?

What happens: we end up supporting a dishonest company which ends up fooling a lot more people (which we may not know/admit to for years).

Look at AMD and the micro-stuttering issue... their response? There is no issue; look at our superior frame rates.

Now (how many years later?), they're beginning to fix what they've denied for so long - all because bm's 'don't lie'. Until they do.

See:
AnandTech | AMD Frame Pacing Explored: Catalyst 13.8 Brings Consistency to Crossfire

(while the link points to the article - the real story is in the comments).

TANWare said: ↑

OEM's have been long known for tuning their hardware to run better under industry standard benchmarks. Case in point being SSD's with benchmarks tuned towards highly compressible data where in real world this is not the case. This though is true of everything from the CPU to just about every interface device ever made. With running higher that spec transmission power it seems they got caught with their hands in the cookie jar...............................

Click to expand...

TANWare, I wish that them getting caught with their hands in the cookie jar would shame them enough to stop... But that has never happened. This is why I bring this up whenever I can - this isn't something that is fixed and goes away - new (and old!) consumers get caught in this year after year - to the detriment of real and innovative progress while huge profits get raked in by these modern day mass swindlers.

Qing Dao said: ↑

This is news to you? I take it you have been living under a rock since the dawn of the personal computer?
...
Benchmarking is amazing and I use it heavily to base new purchases on. I have never been "burned" by it. In fact, I don't know what I would do without it.

Tiller, I know you don't like benchmarks, you say it all the time, but I think it is mostly because the well done controlled tests are often at odds with your over-enthusiastic claims.

Click to expand...

Qing Dao, no - not news to me. You once again didn't read what I wrote, huh.

What claims have I made? And which 'well done controlled tests' are they at odds with?

Everything I've claimed has been eventually 'verified' by others - even the tech press (Anandtech, Tomshardware, HardOCP, etc.) - even if some of my claims seemed ahead of their time...

Glad to hear that benchmarking has been so consistently great for you. But when it's time to put my cash down on the counter, benchmarks are the last consideration not because I ignore them completely - but because their relevance over the years/decades has proven to be less than critical and more often misleading.

saturnotaku said: ↑

Based on his posting history, Tiller doesn't use his computer the way 90+% of the public does, yet his personal experience is what he uses to recommend purchases to others.

Click to expand...

saturnotaku, Yes, of course I use my personal experience to recommend or at least inform others of their options. Whose experience would you have me base my recommendations on?

You're also ignoring the point that I, along with anyone else, can adjust our main use case to potentially match the person we're trying to help. Sure, I'm not perfect, but I do try to help (fully) where I can. With my biases and 'value' judgments.

I am confident that the person I respond to will come back and ask for more info, or simply correct me when/if my assumptions were wrong in their case.

Flatly stated, I can only offer help if I'm somewhat above (experience-wise) the person needing it. Or do you suggest that the blind should lead the blind?

tilleroftheearth, Aug 5, 2013

#7

triturbo Long live 16:10 and MXM-B

Reputations:: 1,577

Messages:: 3,845

Likes Received:: 1,238

Trophy Points:: 231

Ask Chris Harris about test driving a Ferrari, same thing. Some manufacturers tweak their products before review, some doesn't

triturbo, Aug 5, 2013

#8

tilleroftheearth Wisdom listens quietly...

Reputations:: 5,398

Messages:: 12,692

Likes Received:: 2,717

Trophy Points:: 631

Wow, I knew this benchmark mania had no bounds - but I didn't know that.

Good thing I'm not the target group for Ferrari - I would be pissed if I had paid/bought one!

Seriously, this just shows how much 'numbers/scores' mean - not just to people like Qing Dao, but to even manufacturer's of the most exclusive products (and therefore some of the world's smallest markets).

I think 'truth in advertising' died not too much later than the ink dried on that forlorn, trampled, abused and long forgotten idiom.

tilleroftheearth, Aug 5, 2013

#9

R3d Notebook Virtuoso

Reputations:: 1,515

Messages:: 2,382

Likes Received:: 60

Trophy Points:: 66

Benchmarks are accurate... You just have to know what they're testing.

R3d, Aug 5, 2013

#10

tilleroftheearth Wisdom listens quietly...

Reputations:: 5,398

Messages:: 12,692

Likes Received:: 2,717

Trophy Points:: 631

R3d said: ↑

Benchmarks are accurate... You just have to know what they're testing.

Click to expand...

And to know that; you need to test it yourself.

tilleroftheearth, Aug 5, 2013

#11

Peon Notebook Virtuoso

Reputations:: 406

Messages:: 2,007

Likes Received:: 128

Trophy Points:: 81

Since benchmarks are so unreliable, what better method do you propose a prospective buyer use in order to evaluate what to get, short of buying, testing, and returning dozens of products and hoping that at least one of them is to their liking?

Peon, Aug 5, 2013

#12

idiot101 Down and Broken

Reputations:: 996

Messages:: 3,901

Likes Received:: 169

Trophy Points:: 131

I am more pragmatic. I don't like to look at things that are the best for the fear of being disappointed but to cheap out on the good enough (never chose anything which was the cheapest, learned my lesson early). I look at benchmarks for what they are - a quantification of the quality of the device and take it with a pinch of salt. I wait for user reviews to make my decisions. I look at well-rounded products rather than go for an out-and-out sector leader. If I don't like something, I will go the extra mile to denounce it and try to wipe it off the face of the Earth . I never forgive. Once burned is enough for me.

I really don't like to such things happen. I tend to cut out such companies slowly until I never buy from them again. I did it to OCZ for their atrocious wares, Puma for a crappy pair of sneakers and Sony (I was cursed for my stupidity/vanity). For all their b**ls**t, the SSD from Samsung I own is still going strong. Looks like they still have me as a potential customer sometime down the line.

Excuse me if this is irrelevant. Nice write up BTW.

idiot101, Aug 5, 2013

#13

djembe drum while you work

Reputations:: 1,064

Messages:: 1,455

Likes Received:: 203

Trophy Points:: 81

idiot101 said: ↑

I am more pragmatic. I don't like to look at things that are the best for the fear of being disappointed but to cheap out on the good enough (never chose anything which was the cheapest, learned my lesson early). I look at benchmarks for what they are - a quantification of the quality of the device and take it with a pinch of salt. I wait for user reviews to make my decisions. I look at well-rounded products rather than go for an out-and-out sector leader. If I don't like something, I will go the extra mile to denounce it and try to wipe it off the face of the Earth . I never forgive. Once burned is enough for me.

I really don't like to such things happen. I tend to cut out such companies slowly until I never buy from them again. I did it to OCZ for their atrocious wares, Puma for a crappy pair of sneakers and Sony (I was cursed for my stupidity/vanity). Samsung is next (I only own one SSD from them anyway).

Click to expand...

i see benchmarks more as a quantification of the performance of a device, not the quality. But with that said, I also look at reviews (both professional and end user) when making purchasing decisions. And it's also important to know your own use or anticipated use for a device, so you will purchase accordingly. As an example, I'm one of the rare "corner cases" that bought a Thinkpad as a gaming and home use machine, as opposed to a professional use system. Do I know there are other systems with up to double the graphics performance of my W530? Yes. Do I realize that I could get a similarly-powered system (or even a higher-powered system) for less money? Yes, I do. Because I also realize that nothing else had the combination of superior build quality, a great screen that tilts back the full way, a DVD burner, and 8 hours battery life with the potential for more. And there are other reasons I chose this system as well. Even though people will say Thinkpads should be primarily used for business & professional purposes, I've found that mine is my ideal home computer. And in a similar way, if you know what you're looking for, you'll be better able to find the benchmarks that will tell you whether a potential system lives up to your expectations.

djembe, Aug 5, 2013

#14

TANWare Just This Side of Senile, I think. Super Moderator

Reputations:: 2,548

Messages:: 9,585

Likes Received:: 4,997

Trophy Points:: 431

This is why I hate to be the early adopter. I would much rather buy a 6 month old product with lots of user reviews. I place much more faith in that than all the benchmarks out there.......

TANWare, Aug 5, 2013

#15

Raidriar ლ(ಠ益ಠლ)

Reputations:: 1,708

Messages:: 5,820

Likes Received:: 4,311

Trophy Points:: 431

Well the benchmarks and real world application were certainly on point for 7970m vs 680m for professional 3d work.

But really, benchmarks are worthless IMHO. Real world numbers/experience trumps everything else, for me at least.

Raidriar, Aug 6, 2013

#16

Marksman30k Notebook Deity

Reputations:: 2,080

Messages:: 1,068

Likes Received:: 180

Trophy Points:: 81

Sigh, the age old problem. You've basically identified the paradox that drives this world, right down to the education system.
Benchmarks which are highly reproducible, standardized and quantifiable but are less relevant
or Real world testing which is less reproducible, tough to standardize and are often qualitative

You basically cannot trust either fully and the best decision is still a combined evaluation.

Marksman30k, Aug 6, 2013

#17

BenchmarkBoost-Gate: More Lies, More Damn Lies and More Damn Benchmarks

tilleroftheearth Wisdom listens quietly...

HTWingNut Potato

djembe drum while you work

TANWare Just This Side of Senile, I think. Super Moderator

Qing Dao Notebook Deity

saturnotaku Notebook Nobel Laureate

tilleroftheearth Wisdom listens quietly...

triturbo Long live 16:10 and MXM-B

tilleroftheearth Wisdom listens quietly...

R3d Notebook Virtuoso

tilleroftheearth Wisdom listens quietly...

Peon Notebook Virtuoso

idiot101 Down and Broken

djembe drum while you work

TANWare Just This Side of Senile, I think. Super Moderator

Raidriar ლ(ಠ益ಠლ)

Marksman30k Notebook Deity