No. The point is not to say that in general, Haswell far outclasses Penryn. We all know this. We also know that for the same performance, Haswell is going to use less power. You don't need to post benchmarks showing an i7 demolishing a C2D. That doesn't help anything. What does help a lot is to see where the C2D fits, performance-wise, into the current Haswell lineup. What we can see is that the Penryn lines up performance-wise with Haswell ultra-low and low-voltage dual core i3, i5, and i7 processors. I think that is a much more valid comparison than just saying Haswell > Penryn.
Also these low-voltage processors are a lot more common than they used to be. Today they are less the exception and more the rule in mainstream laptops. For many home-users, that is all the performance they need, and even a T9900 is perfectly adequate, at least performance-wise. The explosive growth in CPU performance of years past has slowed down immensely, and performance requirements for mainstream non-gaming computer use has plateaued.
Your pseudo-scientific arguments have no merit if all you can do is try to discredit the evidence against them with even more techno-babble and you are completely incapable of offering any evidence to support your theories.
You can't actually be serious by saying that your words are your proof? Are you sure this isn't Tilleroftheearth's second account?
-
tilleroftheearth Wisdom listens quietly...
Yeah; that is exactly the point. Duh.
Did we take a stupid pill today? Lol...
-
-
ComradeQuestion Notebook Consultant
Fun.
How about this, you go wikipedia all of the big words that you didn' tunderstand from my post, and after that you get back to me?
Again, if you want to Google/ Wikipedia how cache works, how latency affects performance, and everything les eI mention, be my guest. I provided the information, it's not hard to find ou tmore.
Herb Sutter discusses these things a lot:
https://www.youtube.com/watch?v=L7zSU9HI-6I
I don't know your technical background, but generally any compiled language experience will be enough to understand these concepts.
https://en.wikipedia.org/wiki/Locality_of_reference
If you have questions about any of this I can answer them. But you seem to want to frame the conversation in a different light, even though it appears that you want to have the same exact conversation. -
Having said that, a 8770W will run circles around T500 no matter how one wants to look at it...as it should, in all fairness...
-
If relating performance of different processors is too intellectually challenging for you, let's just finish off by saying that all i7's are better than all i5's and call it a day.....
ComradeQuestion said: ↑Fun.
Qing Dao said:The point is not to say that in general, Haswell far outclasses Penryn.Click to expand...
Qing Dao said:What we can see is that the Penryn lines up performance-wise with Haswell ultra-low and low-voltage dual core i3, i5, and i7 processors. I think that is a much more valid comparison than just saying Haswell > Penryn.Click to expand...Click to expand...
ComradeQuestion said: ↑Techno-babble, eh? Interesting lol I didn't know that mentioning cache locality was "techno babble".
How about this, you go wikipedia all of the big words that you didn' tunderstand from my post, and after that you get back to me?
Well he focused on a very specific part of my post, and the evidence for my assumptions was given in the rest of my post.
Again, if you want to Google/ Wikipedia how cache works, how latency affects performance, and everything les eI mention, be my guest. I provided the information, it's not hard to find ou tmore.
Herb Sutter discusses these things a lot: https://www.youtube.com/watch?v=L7zSU9HI-6I
I don't know your technical background, but generally any compiled language experience will be enough to understand these concepts. https://en.wikipedia.org/wiki/Locality_of_reference
If you have questions about any of this I can answer them. But you seem to want to frame the conversation in a different light, even though it appears that you want to have the same exact conversation.Click to expand...Starlight5 likes this. -
ComradeQuestion Notebook Consultant
It's not really theoretical, and the talk I link discusses, in depth, how modern CPUs can have massive effects. I'm not sure if that talk references prefetching, Herb Sutter has dozens of talks on these subjects, but it becomes clear that changes in these technologies can have very significant effects.
I'm trying to think of a more direct example. Like, Bjarne Stroutsup at one point shows that O(log n) data structures end up being slower than O (n) data structures (an exponential difference) purely because of cache locality, which proves quite definitively that cache is incredibly important to performance.
I found some of his slides here:
c++ : Locality, Locality, Locality
Unfortunately, to fully appreciate how this difference is exponential requires at least a cursory knowledge of data structures and algorithms. I can't really "source" something like that, but this is a literal demonstrable effect that the creator of C++ is demonstrating, so that should be a hint.
As cache sizes increase we can fit larger objects into the cache. This is very important.
As prefetching technology increases, we can predict what to put into the cache. Also incredibly important - I wish I had the Herb Sutter talk on this, but again, we see a difference between logarithmic speed and linear, all because a prefetcher was improved on the CPU.
These are the ways CPUs increase performance now these days. We've stopped increasing clock speed as drastically for years, and the focus has been on increasing cache, increasing throughput with multithreading (and also intel's shared cache for hyperthreading), branch prediction, etc.
I don't really look at benchmarks. They don't matter to me, because they rarely explain the exact technical implementation. But very simple demonstrations like the one in the link above will show how drastic it can be. These features are just not there, or not as refined, in older CPUs.
That is why I think modern CPUs are significantly better than older ones.
Not to mention instruction sets, but that's not really as worth going into or as important for performance. -
ComradeQuestion said: ↑It's not really theoretical, and the talk I link discusses, in depth, how modern CPUs can have massive effects. I'm not sure if that talk references prefetching, Herb Sutter has dozens of talks on these subjects, but it becomes clear that changes in these technologies can have very significant effects.
I'm trying to think of a more direct example. Like, Bjarne Stroutsup at one point shows that O(log n) data structures end up being slower than O (n) data structures (an exponential difference) purely because of cache locality, which proves quite definitively that cache is incredibly important to performance.
I found some of his slides here:
c++ : Locality, Locality, Locality
Unfortunately, to fully appreciate how this difference is exponential requires at least a cursory knowledge of data structures and algorithms. I can't really "source" something like that, but this is a literal demonstrable effect that the creator of C++ is demonstrating, so that should be a hint.
As cache sizes increase we can fit larger objects into the cache. This is very important.
As prefetching technology increases, we can predict what to put into the cache. Also incredibly important - I wish I had the Herb Sutter talk on this, but again, we see a difference between logarithmic speed and linear, all because a prefetcher was improved on the CPU.
These are the ways CPUs increase performance now these days. We've stopped increasing clock speed as drastically for years, and the focus has been on increasing cache, increasing throughput with multithreading (and also intel's shared cache for hyperthreading), branch prediction, etc.
I don't really look at benchmarks. They don't matter to me, because they rarely explain the exact technical implementation. But very simple demonstrations like the one in the link above will show how drastic it can be. These features are just not there, or not as refined, in older CPUs.
That is why I think modern CPUs are significantly better than older ones.Click to expand...
Another thing is that Penryn is a modern CPU. We aren't comparing Haswell to a 486DX here. Each successive generation between Penryn and Haswell has offered only slight incremental improvements. On a per clock basis, Haswell offers a best possible performance increase over Penryn is nearly 50%. This works out perfectly when comparing high speed Penryns to low speed Haswells. You can talk about how cache prediction on Haswell is a million times better than on Penryn, but even if that is so, it doesn't seem to translate very well to real-world performance.
ComradeQuestion said: ↑Not to mention instruction sets, but that's not really as worth going into or as important for performance.Click to expand...Starlight5 likes this. -
ComradeQuestion said: ↑It's not really theoretical, and the talk I link discusses, in depth, how modern CPUs can have massive effects. I'm not sure if that talk references prefetching, Herb Sutter has dozens of talks on these subjects, but it becomes clear that changes in these technologies can have very significant effects.
I'm trying to think of a more direct example. Like, Bjarne Stroutsup at one point shows that O(log n) data structures end up being slower than O (n) data structures (an exponential difference) purely because of cache locality, which proves quite definitively that cache is incredibly important to performance.
I found some of his slides here:
c++ : Locality, Locality, Locality
Unfortunately, to fully appreciate how this difference is exponential requires at least a cursory knowledge of data structures and algorithms. I can't really "source" something like that, but this is a literal demonstrable effect that the creator of C++ is demonstrating, so that should be a hint.
As cache sizes increase we can fit larger objects into the cache. This is very important.
As prefetching technology increases, we can predict what to put into the cache. Also incredibly important - I wish I had the Herb Sutter talk on this, but again, we see a difference between logarithmic speed and linear, all because a prefetcher was improved on the CPU.
These are the ways CPUs increase performance now these days. We've stopped increasing clock speed as drastically for years, and the focus has been on increasing cache, increasing throughput with multithreading (and also intel's shared cache for hyperthreading), branch prediction, etc.
I don't really look at benchmarks. They don't matter to me, because they rarely explain the exact technical implementation. But very simple demonstrations like the one in the link above will show how drastic it can be. These features are just not there, or not as refined, in older CPUs.
That is why I think modern CPUs are significantly better than older ones.
Not to mention instruction sets, but that's not really as worth going into or as important for performance.Click to expand...
But yeah, that's one way of explaining it. The problem is that what you've really explained is that all the intel offerings are practically identical. And that to get the huge payoffs in anything other than special cases where you literally gate the compiler and feed the processor manufactured junk data that never occurs in a practical example (at least if it's to produce anything meaningful) -- we need a completely different architecture. And along with it, new programming languages and compiler techniques.ajkula66 likes this. -
ComradeQuestion Notebook Consultant
Cache sizes are not entirely relevant. Cache sizes of Penryn and Haswell are very consistent. Also, comparing performance of differences of the same processor with different cache sizes shows that it hardly makes any difference at all besides in some exceptional circumstances where cache size is cut dramatically. (65nm Core 2 based Celerons that only had 512KB of L2 cache are the best example of this.)
Another thing is that Penryn is a modern CPU. We aren't comparing Haswell to a 486DX here. Each successive generation between Penryn and Haswell has offered only slight incremental improvements. On a per clock basis, Haswell offers a best possible performance increase over Penryn is nearly 50%. This works out perfectly when comparing high speed Penryns to low speed Haswells. You can talk about how cache prediction on Haswell is a million times better than on Penryn, but even if that is so, it doesn't seem to translate very well to real-world performance.Click to expand...
But it has grown, certainly. Architecturally, the cache hierarchy has been split to per-core (for hyperthreading among other things). This means that the entire cache can't be invalidated by a single thread, only a small portion.
And yes, Penryn is a modern Core 2 Duo, certainly. It was part of the first 'Core' lineup, which started a much larger focus on cache and parallelism.
Clock per clock being 50% higher, to me, is quite drastic when they also use such a considerable amount less energy. I guess that's maybe where I differ here, but if you use 30% of the energy to get 50% improved results, that's really quite significant.
As for cache prediction and real-world performance, I suppose it really just depends on the case. Most CPUs at this point are so fast that even latency issues won't be as noticeable. But I spend a lot of time benchmarking and profiling my own applications to optimize for throughput and latency, and it makes a big difference there.
For example, in two identical pieces of code, there is a 50x performance gain from making it cache friendly. Of course, both pieces of code run in milliseconds, so you won't ever notice it.
Why I think it's more important is because the gap between CPU and RAM has only grown, and it continues to grow. So I think that latency is only going to get *worse*, and therefor any time you can actually avoid it, it will be all that much more important.
Perhaps real world applications haven't gotten to that point yet.
Using new instruction sets is the only way one will ever see absolutely dramatic clock for clock performance differences with Haswell compared to Penryn. For example, take a look at AES encryption.Click to expand...
nipsen,
But yeah, that's one way of explaining it. The problem is that what you've really explained is that all the intel offerings are practically identical. And that to get the huge payoffs in anything other than special cases where you literally gate the compiler and feed the processor manufactured junk data that never occurs in a practical example (at least if it's to produce anything meaningful) -- we need a completely different architecture. And along with it, new programming languages and compiler techniques.Click to expand...Last edited: Jan 15, 2015nipsen likes this. -
ComradeQuestion said: ↑The real issue here is that programs are rarely CPU bound. If they were, we'd see quite large speedups.Click to expand...
This also fits perfectly with the general industry narratives, of course. Because if you design linear algorithms - that all are completely cpu-bound - then you will always encourage short development time, low cost implementations (i.e., outsource the implementations to India and China). And the company that use the solution will then, when newer technology that is marginally faster, simply need to buy the latest generation offering. Which increases the performance - or trawls more data in a shorter amount of time - without having to change any implementation of the expensively developed software. And in practice, this is typically cheaper than developing completely new software with completely new ideas. If that is even an option.
So it's not that most tasks running on a PC aren't cpu-bound. It's just that they're technically designed in such a way that they're only /occasionally/ cpu-bound. And you don't see that as an issue even in real-time applications, because the return time usually is fast enough anyway. And when it isn't, like you mention a few examples of, then we're moving on to special cases where we need "optimization".. right? ;D
I mean, I agree that in the short term, it's possible to speed things up considerably if the transport layer was improved. We'd be able to design UI with better response, we'd easily smuggle in some reduction and occlusion detection algorithms for any 3d contexts. Switching contexts would be faster, you could rely on either core logic or external sources mostly interchangeably, that sort of thing. That would be useful.
But it's not giving you an integrated bus with several elements that have common access to working memory. To, say, design a 3d imaging program that could increase and decrease the complexity of the model from a selection of data - a data store that would be too big to crunch with brute force, but where the selection you'd fetch would still be representative and sufficient for the visual representation at all times. VR, basically. We just can't design something like that on current architecture. And we can't really do it with the tools we have now either, because they're stuck in a paradigm where "when in doubt, increase the constant of the algorithm" is a perfectly valid rule, both practically and in theory as well.. -
I'll have to to say I still use a c2d t9300 on my t61p everyday. and yes it still runs 3-4 virtual machines concurrently very well. The only thing holding it back is that it maxes out at 8gb of ram and that limits my vm work. I'd say it's faster than my i5 4300u ultrabook. I still adore that machine and it is just the gimped by the max ram. I also run a decked out w520, w530, w540 and the t9300 even in non dual ida mode is not slow to me. Then again I can't even tell much of a difference between sata2 vs sata3 difference in everyday use.
I don't play games so I can't compare it in that department. But for vt-x vmware stuff it's surprisingly still runs perfectly fine.
It's very comparable to all the current dual core cpu. I'm wondering who runs a quad core c2d in a laptop if it's on par with quad core current cpuLast edited: Jan 22, 2015 -
I'm still using Merom (1st gen c2d) as my main PC. The only thing I can't do with it is play 1080p files larger than 8.0GB (I get audio syncing issues after 30 mins). CPUs have only advanced since then in terms of using less power and having faster RAM on board. I also own a current i5. I can attest, were the i5 on the same RAM as my c2d, it would have only performed marginally faster...
Intel core 2 duo vs dual,core i5 or i7
Discussion in 'Hardware Components and Aftermarket Upgrades' started by The Fire Snake, Dec 31, 2014.