Question about CPU's | Page 2 | NotebookReview

Qing Dao Notebook Deity

Reputations:: 1,600

Messages:: 1,771

Likes Received:: 304

Trophy Points:: 101

VaultBoy! said: ↑

What is RAID 6?

Click to expand...

RAID 6 is like RAID 5, but with two sets of parity instead of one. It can withstand any two drives failing. There are many, many ways to make redundant arrays of disks besides just the standard RAID levels, but I think RAID 6 is the most common and easiest that can withstand any two disks failing.

VaultBoy! said: ↑

In other words, it's very good for servers where a loss of data would be catastrophic xD

Click to expand...

RAID 10 can withstand any single disk failing, but two disks failing and the whole array can be toast. If we are talking about an array of just 4 disks, RAID 10 is fine. Read and write performance is good, and the processing power to run it is minimal. But if you want more storage space and more data security, RAID 10 falls on its face.

Qing Dao, Oct 8, 2012

#51

nipsen Notebook Ditty

Reputations:: 694

Messages:: 1,686

Likes Received:: 131

Trophy Points:: 81

Qing Dao said: ↑

I really, really used to love overclocking. I was very passionate about it and was buying, selling, and swapping computer components ALL the time. But eventually there was no more sport in it, and it was just an exercise in spending the most money on hardware.

Click to expand...

Yeah, it was always about being able to get the most out of the cheap-o kits I bought for my pocket money for me, at least. No point after a while. And you don't get those insane boosts nowadays, so not much point. ..I was underclocking my llano setup

When the K6 was around, Intel processors came unlocked too.

Click to expand...

Yeah, but they cost too much. I had a Celeron, and then one of the boxed Pentium 3s.. with a small cache? I think. The budget alternative when I upgraded later was locked down both multipler and bus-speed. I know there were other models that were unlocked, but on the ones I had I could only adjust the bus-speed (inside a set range).

When AMD released K7 (Athlon), Both AMD and Intel processors came locked, although Intel started locking theirs first. When the Athlon XP processors came out they were all locked too but could still be unlocked by a hard mod, although it was more difficult. Then AMD released Athlon XP Thoroughbred (256k L2) and Barton (512k L2) 130nm cores, and these were all initially unlocked. But after a short time, AMD locked them again, but this time in a way that was impossible to unlock. They have been like that ever since except for AMD's FX and Black Edition processors.
(...)
There was no 1.2Ghz Athlon XP, but 1.2Ghz Athlon XP based Duron would never ever reach even 2Ghz on air cooling. And the only way even the latest and greatest Athlon XP chips would ever reach 2.8Ghz is with sub-zero temperatures under phase-change cooling.

Click to expand...

Mm, you're right. Could it have been a much earlier processor I had a 120% overclock on, on air..? Had a lot of luck by running the bus-speed up, fsb:dram at 1:2 or 1:3. But just switched it around to get the highest cpu-speed I could, reasonably stable. Had better performance on higher bus-speeds and lower speed, though. So.. it must have been sdram and before athlon xp..

Actually, no. It was an athlon xp 1600. This was the low range in the thoroughbred..? line. And if you were lucky, you could get that up extremely high. It was a very specific processor that had amazing ocs. Not sure how I found that out..

nipsen, Oct 9, 2012

#52

nipsen Notebook Ditty

Reputations:: 694

Messages:: 1,686

Likes Received:: 131

Trophy Points:: 81

Qing Dao said: ↑

There seems to not be a real good understanding of how hyperthreading works. It does not increase processing power of the CPU. The processing power of the CPU remains the same whether you have hyperthreading enabled or disabled. All hyperthreading does is to help come closer to the theoretical maximum processing power of the CPU.

There are the parts of the CPU core that do the computations. But these are never used 100% efficiently. When your computer says 100% load, it does not mean that everything is being utilized to its fullest. Normally, every clock cycle only one thread may be executed on the execution unit of the core, but multiple instructions can be executed at the same time from that one thread, to try to use as many of the execution units as possible. Many times there are long lapses when the core is not doing anything while it is is waiting for data from cache or memory, so many clock cycles can be wasted. Also it is unlikely that any single thread will be able to use all of the execution units available at any one time. Where hyperthreading comes in is that it allows two threads to execute instructions on the core at the same time. This does not mean a theoretical doubling of processing power! Processing power remains the same. It just reduces inefficiency in the core and helps utilize otherwise unused execution units. In practice, performance improves by negative 5% up to a limit of around 30% depending on the scenario.

Click to expand...

Yes, good explanation.

Also adds some power-draw and extra circuits to have the functionality there. ..Not entirely sure exactly what components they're duplicating down the pipeline, though.

nipsen, Oct 9, 2012

#53

Karamazovmm Overthinking? Always!

Reputations:: 2,365

Messages:: 9,422

Likes Received:: 200

Trophy Points:: 231

well you are comparing 2 different archs, not counting of course with the xeon from foster.

but the ''added'' hardware was there to improve the pipeline and the performance of HT.

Thing is that the HT is basically to use the resources that are available to a said core, emulating the presence of another core. this is done and engaged when the core is being underutilized or given a task that would benefit from more cores. The FP calculators are the same and shared between the pipeline.

For the programs to use HT is another can of worms, you have to have multithreaded code, and that aint simple to do. The logic behind that is far more advanced and entails more code hours, and thats why a lot of things today aint multi. And sometimes thats not a problem, since the OS is getting better coded, it can put forth more things so that you can multitask without a major compute bottleneck

Karamazovmm, Oct 9, 2012

#54

Marksman30k Notebook Deity

Reputations:: 2,080

Messages:: 1,068

Likes Received:: 180

Trophy Points:: 81

nipsen said: ↑

Yes, good explanation.

Also adds some power-draw and extra circuits to have the functionality there. ..Not entirely sure exactly what components they're duplicating down the pipeline, though.

Click to expand...

I believe they duplicated the fetch and decode logic so the processor can tap in to TLP of the code. It works under the assumption that no thread has enough ILP to utilize all the FPUs and ALUs available to the CPU core at over the given execution period. Therefore, HT is a way to keep the cores busier at little or no extra energy cost since its likely that the adjacent Thread would have tappable ILP if being run in parallel instead of waiting for the first to finish.
All CPUs have being trying to extract ILP from code to improve IPC but this is very difficult and inconsistent which results in some of the execution logic sitting around idle since there wasnt enough ILP to go around. Core 2 in fact, has extremely good IPC partly due to the superb ILP extraction.
Pentium 4 wasn't as wide in decoding and was very Serial in its execution so I believe HT didn't result in as big a gains simply because there were fewer idle execution units.
Nehalem took the Core 2 frontend and duplicated more downstream decode logic so it resulted in beastly HT performance. Sandybridge then took Nehalem and included a uOps cache to cut down on branch prediction penalties amongst other improvements and thus dramatically improved IPC. Ivy bridge then took sandy's hardware and included logic to allow the duplicate decode logic to dynamically either serve their own threads or can gang together when only one thread is active per core. Haswell supposedly has a beefier excution backend with Ivy's front-end.

Marksman30k, Oct 9, 2012

#55

nipsen Notebook Ditty

Reputations:: 694

Messages:: 1,686

Likes Received:: 131

Trophy Points:: 81

^if people wonder. TLP - Thread level parallelism. ILP - Instruction level parallelism. IPC - instructions per cycle.

So.. the goal is better instruction level parallelism on existing processing elements. And adding IPC.. more Mhz.. won't do that. Neither does adding cores.. So adding larger caches and an automatic/simple symmetric multiprocessing scheduler opens up some of the potential gain from the increased speed and amount of processing elements..?

Marksman30k said: ↑

Nehalem took the Core 2 frontend and duplicated more downstream decode logic so it resulted in beastly HT performance.

Click to expand...

Right. Because that was where the first core2 quad processors didn't perform. ...?

nipsen, Oct 9, 2012

#56

Qing Dao Notebook Deity

Reputations:: 1,600

Messages:: 1,771

Likes Received:: 304

Trophy Points:: 101

Hyperthreading took up like 5% of the die on nehalem, but in the best case scenarios it was increasing performance by 30%. It, like almost every other change or tweak in the architecture, is designed to boost the performance of the processor for the amount of real estate on a silicon wafer.

Qing Dao, Oct 9, 2012

#57

Marksman30k Notebook Deity

Reputations:: 2,080

Messages:: 1,068

Likes Received:: 180

Trophy Points:: 81

nipsen said: ↑

^if people wonder. TLP - Thread level parallelism. ILP - Instruction level parallelism. IPC - instructions per cycle.

So.. the goal is better instruction level parallelism on existing processing elements. And adding IPC.. more Mhz.. won't do that. Neither does adding cores.. So adding larger caches and an automatic/simple symmetric multiprocessing scheduler opens up some of the potential gain from the increased speed and amount of processing elements..?

Right. Because that was where the first core2 quad processors didn't perform. ...?

Click to expand...

Its actually psuedo TLP exploitation.
The goal was to maintain as much ILP in flight as possible on existing elements, TLP was a means to an end. Therefore, the cores more efficiently utilize their peak IPC capability, the speedup is not the same magnitude as true TLP with a separate core but it was greater than serial execution.

Designing a CPU is full of tradeoffs.
Adding cores is a particularly power and space expensive way to improve TLP performance, IPC improvements are also very difficult to extract because you have to re-engineer things like Branch predictors or even ALUs/FPUs in the core. ILP is dependent very much on the code so its impossible to have perfect ILP unless you are repetitively using the same instruction over and over and your instructions are completely independent of each other (e.g. Cryptography).
ILP is tricky to extract, AMD found this out the hard way with their VLIW architecture for GPUs. Graphics workloads are crazy parallel ( I think only cryptography is more so) so AMD actually had 5 (later 4) ALUs that could execute in parallel to extract ILP from the thread. Even then, only about 3.4 units were used on average.

You are spot on with the cache and scheduler

The Core 2 architecture actually would've benefitted much from HT but Intel did not see fit to include it due to lack of competition from AMD.
It was a very wide design but I suspect a lot of the hardware was idle because its tough to have lots of ILP with typical CPU workloads.

Its a delicate balance between front end and execution, this is probably why IPC performance alone has only crawled along with every refresh, with the most significant improvements in speed coming from the combination with higher clockspeeds afforded by better manufacturing.

Marksman30k, Oct 10, 2012

#58

Qing Dao Notebook Deity

Reputations:: 1,600

Messages:: 1,771

Likes Received:: 304

Trophy Points:: 101

Marksman30k said: ↑

The Core 2 architecture actually would've benefitted much from HT but Intel did not see fit to include it due to lack of competition from AMD.
It was a very wide design but I suspect a lot of the hardware was idle because its tough to have lots of ILP with typical CPU workloads.

Click to expand...

This doesn't make much sense. When Intel was designing the Core architecture, which was first released with the Core 2 Duo in 2006, AMD's K8 processors had just come out, which were killing Intel's flagship Pentium processors. Also, if you have something that will make the processor better, you include it. Purposefully crippling your chips is stupid. What Intel has done in the past few years, since the introduction of Nehalem, was to just slow down their release schedule. I had always read that adding hyperthreading to the Core microarchitecture was a tradeoff that Intel did not see paying off until smaller manufacturing processes, and thus more transistors, became available to play with.

Qing Dao, Oct 10, 2012

#59

Marksman30k Notebook Deity

Reputations:: 2,080

Messages:: 1,068

Likes Received:: 180

Trophy Points:: 81

Qing Dao said: ↑

This doesn't make much sense. When Intel was designing the Core architecture, which was first released with the Core 2 Duo in 2006, AMD's K8 processors had just come out, which were killing Intel's flagship Pentium processors. Also, if you have something that will make the processor better, you include it. Purposefully crippling your chips is stupid. What Intel has done in the past few years, since the introduction of Nehalem, was to just slow down their release schedule. I had always read that adding hyperthreading to the Core microarchitecture was a tradeoff that Intel did not see paying off until smaller manufacturing processes, and thus more transistors, became available to play with.

Click to expand...

I stand corrected

Marksman30k, Oct 11, 2012

#60