Intel core 2 duo vs dual,core i5 or i7 | Page 2 | NotebookReview

Qing Dao Notebook Deity

Reputations:: 1,600

Messages:: 1,771

Likes Received:: 304

Trophy Points:: 101

octiceps said: ↑

i7-3540M: 3.5 GHz
i5-4330M: 3.4 GHz
C2D T9900: 3.06 GHz

So the alternative is to compare completely different classes of processors TDP-wise (i3 ULV...are you kidding me?!) and be even more misleading?

Click to expand...

No. The point is not to say that in general, Haswell far outclasses Penryn. We all know this. We also know that for the same performance, Haswell is going to use less power. You don't need to post benchmarks showing an i7 demolishing a C2D. That doesn't help anything. What does help a lot is to see where the C2D fits, performance-wise, into the current Haswell lineup. What we can see is that the Penryn lines up performance-wise with Haswell ultra-low and low-voltage dual core i3, i5, and i7 processors. I think that is a much more valid comparison than just saying Haswell > Penryn.

Also these low-voltage processors are a lot more common than they used to be. Today they are less the exception and more the rule in mainstream laptops. For many home-users, that is all the performance they need, and even a T9900 is perfectly adequate, at least performance-wise. The explosive growth in CPU performance of years past has slowed down immensely, and performance requirements for mainstream non-gaming computer use has plateaued.

ComradeQuestion said: ↑

They'll be drastically faster.

Modern CPUs have larger caches, meaning that they have to fetch things from RAM less often. Every time your CPU leaves the cache, your performance drops from 50-500x.

Another way to avoid that is what's called branch prediction. If in your code there is an "if" statement, or some condition that won't be known until runtime, the CPU can compute the most likely future branch, again avoiding unneeded computation or latency. Modern CPUs are far better at this than old CPUs.

Clock cycles for computation have improved a lot. All computation comes down to clock cycles, though the bottleneck is almost always latency.

Oh, and more cores/ hyperthreading means that more code can be run in parallel.
If I've misunderstood the question, let me know. I can try to answer more suitably.

Click to expand...

ComradeQuestion said: ↑

Is it only on some benchmarks? I'm jus tlooking at ones that look meaningful so far and I haven't seen both of them show up. Can you point one out?

This site would really benefit from a basic unified test suite.

Of course, the comparison between these CPUs is completely flawed anyways. But I'm curious. Even if that i3 is 3x as efficient power wise, I'd imagine it's still faster.

Click to expand...

ComradeQuestion said: ↑

Interesting. They seem quite close, despite the fact that one CPU uses 1/3rd of the energy. Still, I would have expected better, but I suspect these benchmarks are largely throughput.

octiceps appears to have posted a more relevant benchmark, with similar TDP's. Results are obviously in favor of modern CPUs.

Even still, the comparison is weak, because the largest factors are in latency. I'd like to see something that's difficult on cache locality, and really shows off the massive effects of modern CPU prefetching.

For example, if you have a giant segment of memory, which you read from indexed in predictable but >cache size locations, you'd see 50x performance because modern prefetching will predict the next offset, and old CPUs will have to replenish their cache.

Honestly, I find most benchmarks that aren't very very explicit in the technology they target, to be like reading a magic 8 ball.

If you understand how a CPU works, and how modern programs work, you'll appreciate that 1% of what a CPU does is calculate data, and 99% of it is fetching data. Benchmarks never seem to reflect that properly.

Click to expand...

Your pseudo-scientific arguments have no merit if all you can do is try to discredit the evidence against them with even more techno-babble and you are completely incapable of offering any evidence to support your theories.

ComradeQuestion said: ↑

Well, the proof is the rest of my post that explains each of the improvements to modern CPUs. If you want me to clarify any of it I can.

Computation-wise it's easier to get hard numbers because you can look at FLOPS, and I'm sure there's a graph showing exponential progression there. But FLOPS are rarely where your bottleneck is, typically it's latency, which is why my post is discussing that aspect in particular.

Click to expand...

You can't actually be serious by saying that your words are your proof? Are you sure this isn't Tilleroftheearth's second account?

Qing Dao, Jan 13, 2015

#51

Starlight5 likes this.

tilleroftheearth Wisdom listens quietly...

Reputations:: 5,398

Messages:: 12,692

Likes Received:: 2,717

Trophy Points:: 631

Yeah; that is exactly the point. Duh.

Did we take a stupid pill today? Lol...

Qing Dao said: ↑

No. The point is not to say that in general, Haswell far outclasses Penryn. We all know this. We also know that for the same performance, Haswell is going to use less power. You don't need to post benchmarks showing an i7 demolishing a C2D. That doesn't help anything. What does help a lot is to see where the C2D fits, performance-wise, into the current Haswell lineup. What we can see is that the Penryn lines up performance-wise with Haswell ultra-low and low-voltage dual core i3, i5, and i7 processors. I think that is a much more valid comparison than just saying Haswell > Penryn.

Also these low-voltage processors are a lot more common than they used to be. Today they are less the exception and more the rule in mainstream laptops. For many home-users, that is all the performance they need, and even a T9900 is perfectly adequate, at least performance-wise. The explosive growth in CPU performance of years past has slowed down immensely, and performance requirements for mainstream non-gaming computer use has plateaued.

Your pseudo-scientific arguments have no merit if all you can do is try to discredit the evidence against them with even more techno-babble and you are completely incapable of offering any evidence to support your theories.

You can't actually be serious by saying that your words are your proof? Are you sure this isn't Tilleroftheearth's second account?

Click to expand...

tilleroftheearth, Jan 13, 2015

#52

Krane Notebook Prophet

Reputations:: 706

Messages:: 4,653

Likes Received:: 108

Trophy Points:: 131

ajkula66 said: ↑

Judging by this thread, the OP has chosen to go QC:

http://forum.notebookreview.com/wha...593-dell-precision-vs-hp-elitebook-zbook.html

Click to expand...

But does that mean he saw the light or simply acquiesced?

Krane, Jan 13, 2015

#53

ComradeQuestion Notebook Consultant

Reputations:: 204

Messages:: 120

Likes Received:: 9

Trophy Points:: 31

Fun.

The point is not to say that in general, Haswell far outclasses Penryn.

Click to expand...

That was the conversation I was having, and seemed to be the question. See my first post where I ask for clarification on the question. In fact, it appears that benchmarks *did* have to be posted, so I'm left wondering what topic you're reading?

What we can see is that the Penryn lines up performance-wise with Haswell ultra-low and low-voltage dual core i3, i5, and i7 processors. I think that is a much more valid comparison than just saying Haswell > Penryn.

Click to expand...

How is "Haswell > Penryn" not simply shorthand for "High end penryn's line up with low end haswell's that focu son low voltage over performance" ? Seems to be the same exact statement to me.

Your pseudo-scientific arguments have no merit if all you can do is try to discredit the evidence against them with even more techno-babble and you are completely incapable of offering any evidence to support your theories.

Click to expand...

Techno-babble, eh? Interesting lol I didn't know that mentioning cache locality was "techno babble".

How about this, you go wikipedia all of the big words that you didn' tunderstand from my post, and after that you get back to me?

You can't actually be serious by saying that your words are your proof?

Click to expand...

Well he focused on a very specific part of my post, and the evidence for my assumptions was given in the rest of my post.

Again, if you want to Google/ Wikipedia how cache works, how latency affects performance, and everything les eI mention, be my guest. I provided the information, it's not hard to find ou tmore.

Herb Sutter discusses these things a lot:
https://www.youtube.com/watch?v=L7zSU9HI-6I

I don't know your technical background, but generally any compiled language experience will be enough to understand these concepts.

https://en.wikipedia.org/wiki/Locality_of_reference

If you have questions about any of this I can answer them. But you seem to want to frame the conversation in a different light, even though it appears that you want to have the same exact conversation.

ComradeQuestion, Jan 13, 2015

#54

ajkula66 Courage and Consequence

Reputations:: 3,018

Messages:: 3,198

Likes Received:: 2,318

Trophy Points:: 231

Krane said: ↑

But does that mean he saw the light or simply acquiesced?

Click to expand...

Well, OP would have to answer that one for you...

Having said that, a 8770W will run circles around T500 no matter how one wants to look at it...as it should, in all fairness...

ajkula66, Jan 13, 2015

#55

Qing Dao Notebook Deity

Reputations:: 1,600

Messages:: 1,771

Likes Received:: 304

Trophy Points:: 101

tilleroftheearth said: ↑

Qing Dao said:

The point is not to say that in general, Haswell far outclasses Penryn. We all know this. We also know that for the same performance, Haswell is going to use less power. You don't need to post benchmarks showing an i7 demolishing a C2D.

Click to expand...

Yeah; that is exactly the point. Duh.

Did we take a stupid pill today? Lol...

Click to expand...

If a single Penryn can perform better than a single Haswell, then that point is clearly either wrong or over-simplified.

If relating performance of different processors is too intellectually challenging for you, let's just finish off by saying that all i7's are better than all i5's and call it a day.....

ComradeQuestion said: ↑

Fun.

Qing Dao said:

The point is not to say that in general, Haswell far outclasses Penryn.

Click to expand...

That was the conversation I was having, and seemed to be the question. See my first post where I ask for clarification on the question.

Qing Dao said:

What we can see is that the Penryn lines up performance-wise with Haswell ultra-low and low-voltage dual core i3, i5, and i7 processors. I think that is a much more valid comparison than just saying Haswell > Penryn.

Click to expand...

How is "Haswell > Penryn" not simply shorthand for "High end penryn's line up with low end haswell's that focus on low voltage over performance" ? Seems to be the same exact statement to me.

Click to expand...

Not even close. Haswell > Ivy Bridge, Haswell > Netburst, Haswell > 4004. These sorts of comparisons leave a lot to be desired.

ComradeQuestion said: ↑

Techno-babble, eh? Interesting lol I didn't know that mentioning cache locality was "techno babble".

How about this, you go wikipedia all of the big words that you didn' tunderstand from my post, and after that you get back to me?

Well he focused on a very specific part of my post, and the evidence for my assumptions was given in the rest of my post.

Again, if you want to Google/ Wikipedia how cache works, how latency affects performance, and everything les eI mention, be my guest. I provided the information, it's not hard to find ou tmore.

Herb Sutter discusses these things a lot: https://www.youtube.com/watch?v=L7zSU9HI-6I

I don't know your technical background, but generally any compiled language experience will be enough to understand these concepts. https://en.wikipedia.org/wiki/Locality_of_reference

If you have questions about any of this I can answer them. But you seem to want to frame the conversation in a different light, even though it appears that you want to have the same exact conversation.

Click to expand...

Sorry, that was rude. But please, show some real world examples of what you are talking about. You are just having a theoretical monologue. Your logic is correct, but you vastly overestimate the performance differences. Nothing shows the kinds of leaps and bounds of cache prediction that you are talking about.

Qing Dao, Jan 13, 2015

#56

Starlight5 likes this.

ComradeQuestion Notebook Consultant

Reputations:: 204

Messages:: 120

Likes Received:: 9

Trophy Points:: 31

It's not really theoretical, and the talk I link discusses, in depth, how modern CPUs can have massive effects. I'm not sure if that talk references prefetching, Herb Sutter has dozens of talks on these subjects, but it becomes clear that changes in these technologies can have very significant effects.

I'm trying to think of a more direct example. Like, Bjarne Stroutsup at one point shows that O(log n) data structures end up being slower than O (n) data structures (an exponential difference) purely because of cache locality, which proves quite definitively that cache is incredibly important to performance.

I found some of his slides here:
c++ : Locality, Locality, Locality

Unfortunately, to fully appreciate how this difference is exponential requires at least a cursory knowledge of data structures and algorithms. I can't really "source" something like that, but this is a literal demonstrable effect that the creator of C++ is demonstrating, so that should be a hint.

As cache sizes increase we can fit larger objects into the cache. This is very important.

As prefetching technology increases, we can predict what to put into the cache. Also incredibly important - I wish I had the Herb Sutter talk on this, but again, we see a difference between logarithmic speed and linear, all because a prefetcher was improved on the CPU.

These are the ways CPUs increase performance now these days. We've stopped increasing clock speed as drastically for years, and the focus has been on increasing cache, increasing throughput with multithreading (and also intel's shared cache for hyperthreading), branch prediction, etc.

I don't really look at benchmarks. They don't matter to me, because they rarely explain the exact technical implementation. But very simple demonstrations like the one in the link above will show how drastic it can be. These features are just not there, or not as refined, in older CPUs.

That is why I think modern CPUs are significantly better than older ones.

Not to mention instruction sets, but that's not really as worth going into or as important for performance.

ComradeQuestion, Jan 13, 2015

#57

Qing Dao Notebook Deity

Reputations:: 1,600

Messages:: 1,771

Likes Received:: 304

Trophy Points:: 101

ComradeQuestion said: ↑

It's not really theoretical, and the talk I link discusses, in depth, how modern CPUs can have massive effects. I'm not sure if that talk references prefetching, Herb Sutter has dozens of talks on these subjects, but it becomes clear that changes in these technologies can have very significant effects.

I'm trying to think of a more direct example. Like, Bjarne Stroutsup at one point shows that O(log n) data structures end up being slower than O (n) data structures (an exponential difference) purely because of cache locality, which proves quite definitively that cache is incredibly important to performance.

I found some of his slides here:
c++ : Locality, Locality, Locality

Unfortunately, to fully appreciate how this difference is exponential requires at least a cursory knowledge of data structures and algorithms. I can't really "source" something like that, but this is a literal demonstrable effect that the creator of C++ is demonstrating, so that should be a hint.

As cache sizes increase we can fit larger objects into the cache. This is very important.

As prefetching technology increases, we can predict what to put into the cache. Also incredibly important - I wish I had the Herb Sutter talk on this, but again, we see a difference between logarithmic speed and linear, all because a prefetcher was improved on the CPU.

These are the ways CPUs increase performance now these days. We've stopped increasing clock speed as drastically for years, and the focus has been on increasing cache, increasing throughput with multithreading (and also intel's shared cache for hyperthreading), branch prediction, etc.

I don't really look at benchmarks. They don't matter to me, because they rarely explain the exact technical implementation. But very simple demonstrations like the one in the link above will show how drastic it can be. These features are just not there, or not as refined, in older CPUs.

That is why I think modern CPUs are significantly better than older ones.

Click to expand...

Cache sizes are not entirely relevant. Cache sizes of Penryn and Haswell are very consistent. Also, comparing performance of differences of the same processor with different cache sizes shows that it hardly makes any difference at all besides in some exceptional circumstances where cache size is cut dramatically. (65nm Core 2 based Celerons that only had 512KB of L2 cache are the best example of this.)

Another thing is that Penryn is a modern CPU. We aren't comparing Haswell to a 486DX here. Each successive generation between Penryn and Haswell has offered only slight incremental improvements. On a per clock basis, Haswell offers a best possible performance increase over Penryn is nearly 50%. This works out perfectly when comparing high speed Penryns to low speed Haswells. You can talk about how cache prediction on Haswell is a million times better than on Penryn, but even if that is so, it doesn't seem to translate very well to real-world performance.

ComradeQuestion said: ↑

Not to mention instruction sets, but that's not really as worth going into or as important for performance.

Click to expand...

Using new instruction sets is the only way one will ever see absolutely dramatic clock for clock performance differences with Haswell compared to Penryn. For example, take a look at AES encryption.

Qing Dao, Jan 13, 2015

#58

Starlight5 likes this.

nipsen Notebook Ditty

Reputations:: 694

Messages:: 1,686

Likes Received:: 131

Trophy Points:: 81

ComradeQuestion said: ↑

It's not really theoretical, and the talk I link discusses, in depth, how modern CPUs can have massive effects. I'm not sure if that talk references prefetching, Herb Sutter has dozens of talks on these subjects, but it becomes clear that changes in these technologies can have very significant effects.

I'm trying to think of a more direct example. Like, Bjarne Stroutsup at one point shows that O(log n) data structures end up being slower than O (n) data structures (an exponential difference) purely because of cache locality, which proves quite definitively that cache is incredibly important to performance.

I found some of his slides here:
c++ : Locality, Locality, Locality

Unfortunately, to fully appreciate how this difference is exponential requires at least a cursory knowledge of data structures and algorithms. I can't really "source" something like that, but this is a literal demonstrable effect that the creator of C++ is demonstrating, so that should be a hint.

As cache sizes increase we can fit larger objects into the cache. This is very important.

As prefetching technology increases, we can predict what to put into the cache. Also incredibly important - I wish I had the Herb Sutter talk on this, but again, we see a difference between logarithmic speed and linear, all because a prefetcher was improved on the CPU.

These are the ways CPUs increase performance now these days. We've stopped increasing clock speed as drastically for years, and the focus has been on increasing cache, increasing throughput with multithreading (and also intel's shared cache for hyperthreading), branch prediction, etc.

I don't really look at benchmarks. They don't matter to me, because they rarely explain the exact technical implementation. But very simple demonstrations like the one in the link above will show how drastic it can be. These features are just not there, or not as refined, in older CPUs.

That is why I think modern CPUs are significantly better than older ones.

Not to mention instruction sets, but that's not really as worth going into or as important for performance.

Click to expand...

Other than that last part, we have a winner!

But yeah, that's one way of explaining it. The problem is that what you've really explained is that all the intel offerings are practically identical. And that to get the huge payoffs in anything other than special cases where you literally gate the compiler and feed the processor manufactured junk data that never occurs in a practical example (at least if it's to produce anything meaningful) -- we need a completely different architecture. And along with it, new programming languages and compiler techniques.

nipsen, Jan 14, 2015

#59

ajkula66 likes this.

ComradeQuestion Notebook Consultant

Reputations:: 204

Messages:: 120

Likes Received:: 9

Trophy Points:: 31

Cache sizes are not entirely relevant. Cache sizes of Penryn and Haswell are very consistent. Also, comparing performance of differences of the same processor with different cache sizes shows that it hardly makes any difference at all besides in some exceptional circumstances where cache size is cut dramatically. (65nm Core 2 based Celerons that only had 512KB of L2 cache are the best example of this.)

Another thing is that Penryn is a modern CPU. We aren't comparing Haswell to a 486DX here. Each successive generation between Penryn and Haswell has offered only slight incremental improvements. On a per clock basis, Haswell offers a best possible performance increase over Penryn is nearly 50%. This works out perfectly when comparing high speed Penryns to low speed Haswells. You can talk about how cache prediction on Haswell is a million times better than on Penryn, but even if that is so, it doesn't seem to translate very well to real-world performance.

Click to expand...

I agree that cache sizes are not entirely relevant. It's less common to have a single cache line required to be larger than your cache size, outside of more specific contexts.

But it has grown, certainly. Architecturally, the cache hierarchy has been split to per-core (for hyperthreading among other things). This means that the entire cache can't be invalidated by a single thread, only a small portion.

And yes, Penryn is a modern Core 2 Duo, certainly. It was part of the first 'Core' lineup, which started a much larger focus on cache and parallelism.

Clock per clock being 50% higher, to me, is quite drastic when they also use such a considerable amount less energy. I guess that's maybe where I differ here, but if you use 30% of the energy to get 50% improved results, that's really quite significant.

As for cache prediction and real-world performance, I suppose it really just depends on the case. Most CPUs at this point are so fast that even latency issues won't be as noticeable. But I spend a lot of time benchmarking and profiling my own applications to optimize for throughput and latency, and it makes a big difference there.

For example, in two identical pieces of code, there is a 50x performance gain from making it cache friendly. Of course, both pieces of code run in milliseconds, so you won't ever notice it.

Why I think it's more important is because the gap between CPU and RAM has only grown, and it continues to grow. So I think that latency is only going to get *worse*, and therefor any time you can actually avoid it, it will be all that much more important.

Perhaps real world applications haven't gotten to that point yet.

Using new instruction sets is the only way one will ever see absolutely dramatic clock for clock performance differences with Haswell compared to Penryn. For example, take a look at AES encryption.

Click to expand...

I disregarded instruction sets because they are very rarely used, and the program has to be built with support. Naturally, instruction sets can make a significant difference. AES-NI is basically the best you'll get though - AES was built to be implemented in hardware, and encryption always benefits from instruction sets. Most programs probably won't be doing that sort of thing.

nipsen,

But yeah, that's one way of explaining it. The problem is that what you've really explained is that all the intel offerings are practically identical. And that to get the huge payoffs in anything other than special cases where you literally gate the compiler and feed the processor manufactured junk data that never occurs in a practical example (at least if it's to produce anything meaningful) -- we need a completely different architecture. And along with it, new programming languages and compiler techniques.

Click to expand...

The real issue here is that programs are rarely CPU bound. If they were, we'd see quite large speedups.

Last edited: Jan 15, 2015

ComradeQuestion, Jan 14, 2015

#60

nipsen likes this.

nipsen Notebook Ditty

Reputations:: 694

Messages:: 1,686

Likes Received:: 131

Trophy Points:: 81

ComradeQuestion said: ↑

The real issue here is that programs are rarely CPU bound. If they were, we'd see quite large speedups.

Click to expand...

Mm. Well, in computer science classes, you're usually taught that if you can exchange a short algorithm with exponential factor for one with linear execution time, then you should do so. Almost regardless of the size of the constant. The reason for that is that we always work with limited data sets. So it's actually so rare that you run into a real world example where say, a search has to return in a short amount of time, and there's a difference between a brute-force algorithm of some sort and a heap-implementation, for example, that it's considered a special case.

This also fits perfectly with the general industry narratives, of course. Because if you design linear algorithms - that all are completely cpu-bound - then you will always encourage short development time, low cost implementations (i.e., outsource the implementations to India and China). And the company that use the solution will then, when newer technology that is marginally faster, simply need to buy the latest generation offering. Which increases the performance - or trawls more data in a shorter amount of time - without having to change any implementation of the expensively developed software. And in practice, this is typically cheaper than developing completely new software with completely new ideas. If that is even an option.

So it's not that most tasks running on a PC aren't cpu-bound. It's just that they're technically designed in such a way that they're only /occasionally/ cpu-bound. And you don't see that as an issue even in real-time applications, because the return time usually is fast enough anyway. And when it isn't, like you mention a few examples of, then we're moving on to special cases where we need "optimization".. right? ;D

I mean, I agree that in the short term, it's possible to speed things up considerably if the transport layer was improved. We'd be able to design UI with better response, we'd easily smuggle in some reduction and occlusion detection algorithms for any 3d contexts. Switching contexts would be faster, you could rely on either core logic or external sources mostly interchangeably, that sort of thing. That would be useful.

But it's not giving you an integrated bus with several elements that have common access to working memory. To, say, design a 3d imaging program that could increase and decrease the complexity of the model from a selection of data - a data store that would be too big to crunch with brute force, but where the selection you'd fetch would still be representative and sufficient for the visual representation at all times. VR, basically. We just can't design something like that on current architecture. And we can't really do it with the tools we have now either, because they're stuck in a paradigm where "when in doubt, increase the constant of the algorithm" is a perfectly valid rule, both practically and in theory as well..

nipsen, Jan 21, 2015

#61

jedisurfer1 Notebook Deity

Reputations:: 39

Messages:: 785

Likes Received:: 50

Trophy Points:: 41

I'll have to to say I still use a c2d t9300 on my t61p everyday. and yes it still runs 3-4 virtual machines concurrently very well. The only thing holding it back is that it maxes out at 8gb of ram and that limits my vm work. I'd say it's faster than my i5 4300u ultrabook. I still adore that machine and it is just the gimped by the max ram. I also run a decked out w520, w530, w540 and the t9300 even in non dual ida mode is not slow to me. Then again I can't even tell much of a difference between sata2 vs sata3 difference in everyday use.

I don't play games so I can't compare it in that department. But for vt-x vmware stuff it's surprisingly still runs perfectly fine.

It's very comparable to all the current dual core cpu. I'm wondering who runs a quad core c2d in a laptop if it's on par with quad core current cpu

Last edited: Jan 22, 2015

jedisurfer1, Jan 22, 2015

#62

hirobo2 Notebook Consultant

Reputations:: 32

Messages:: 119

Likes Received:: 13

Trophy Points:: 31

I'm still using Merom (1st gen c2d) as my main PC. The only thing I can't do with it is play 1080p files larger than 8.0GB (I get audio syncing issues after 30 mins). CPUs have only advanced since then in terms of using less power and having faster RAM on board. I also own a current i5. I can attest, were the i5 on the same RAM as my c2d, it would have only performed marginally faster...

hirobo2, Jan 22, 2015

#63