Just curious as to how the Core i7s handle their stuff. Take the 820QM for example. 4 physical + 4 virtual cores, so 8 threads total.
When a dual-threaded app comes along, does it use 1 physical + 1 virtual core or 2 physical cores and 0 virtual cores?
When turbo boost turns off certain cores, does it
- Turn off all cores except the one(s) that have/has a thread being worke on
- Redirect all thread(s) to a specific core and then turn off all other cores?
-
I think that if a multi threaded application came along, it would probably use two physical cores but put the clock speed down. That way it would use less power since power draw increases exponentially with clock speed.
Again, for turboboost I would think that it would try to spread the load evenly between as many cores as possible to reduce power consumption. This way it also reduces heat, producing faster processors as they can be reliably clocked higher without worry of overheating -
1) As far as I'm aware, Nehalem will power down all cores under a certain threshold.
2) @Funky - I believe you have it backwards. Power draw increases more with voltage than with clock speed. Though they tend to go hand in hand since more voltage is often required for higher clock speeds. Note, take a look at undervolting and why it makes such a big difference.
3) The thing about turbo boost is that it works best when only a single core is working. In the papers I studied, having two cores loaded meant that the clock speed gains from TB were smaller than a single core and thus, if you're able to just have one core fully loaded, the application will perform fastest that way (note that this is only for single-threaded applications). -
From a conversation we were having in another thread ( http://forum.notebookreview.com/showthread.php?t=440539), it apparently would depend on your OS. Windows 7 server version would go physical core + logical core, while non server versions of Windows 7 would prefer to go with physical cores.
-
from what I've read. it is VERY dependent on what is actually being processed and how well the OS recognizes HT-processors.
for something 'heavy' like a dual-threaded video encoder(bear with me), it would go straight for two physical cores. it would also use a HT-thread to run OS processes.
however, if the system was in idle, it would run OS processes on a real-core and then move those processes onto a ht-thread when a more CPU-hungry application is run.
basically, it will try to use HT-threads when the OS thinks that the HT-thread has enough power for the process, otherwise, it will go for a real thread. -
thinkpad knows best Notebook Deity
That is a good idea, since virtual "cores" usually perform lesser than that of physical cores.
-
You have 4 physical cores, capable of handling two threads each. The difference is when you refer to "virtual cores".. Using that concept, you could have 4 virtual cores and 4 physical cores maxed out at 100% cpu utilization. That's not how it works.. What you actually have is 4 cores, handling 8 threads.. If the 4 cores are running at 100% cpu utilization, you have 8 threads, and each thread using 50% (or 60/40, 70/30, etc..) of the cpu time.
Similar ideas, but very different implementation.
Also, the application has the final say over how the threads are utilized, not the OS. That's why a single threaded app can make your Windows super-slow and almost unresponsive at 100% utilization. If it was up to the OS, that wouldn't happen, but the OS handles the requests of the app, not vice-versa.
That's why you also read a lot about thread-optimization and "properly threaded" apps.
Edit - I should add, that what I was referring to was default behavior. You can, of course, tell Windows to only use specific cpu's/cores for an application. You, as the user also have final control over the threading, as you can turn HT off in the bios. -
Well, the point is that to the OS, it "looks" like 8 cores. It wasn't until windows 7 that windows could even tell the difference between a physical and logical core. In terms of actual engineering, yes, you're right in that it's really only 4 physical cores that can handle up to 2 threads (relatively) simultaneously. We're not quite at the point where we can create processors out of thin air yet (although I'm sure _someone's_ trying!).
And I would clarify that it's not so much that the application has the final say over how the threads are utilized, so much as the application defines what's in a thread, and how many threads are presented to the OS. That single threaded app that makes your Windows super-slow and almost unresponsive is an example of an application that presents the OS with only one thread stuffed full of everything that needs to be done, as opposed to a more intelligently progammed application (in this era of multi-core processors) that splits up that thread into multiple threads to share the load. -
-
davepermen Notebook Nobel Laureate
actually, it is simultaneous execution.. if enough free processing units are available on the core.
-
-
Ahh Wikipedia agrees with me
-
Thanks weinter! That's what I thought.. I think for most purposes saying "simultaneous" is OK, but technically it's not, since there's a single point of execution. Good find!
-
Tinderbox (UK) BAKED BEAN KING
Anybody know how to test the i7 turbo mode the 720 is supposed to be able to have 1 core at 2.8ghz
EDIT : I was just watching the core frequency with cpu-z and saw core 0 go to just under 2.8ghz for a split second. -
I will try and explain what I know on this.
I have a 720QM and so far this is how it goes.
When stressing ONE core the load will jump between cores...it is a rather strange behaviour, but it works so no complaints.
The cores have somewhat priority, but the core 0.1 (my naming for the first virtual core) seems to have priority over the core 1.0 and 1.1.
Just checking my task manager, the cores are:
Core 0 is 4%
Core 0.1 is 0%
Core 1 is 0%
Core 1.1 is 0%
Core 2 is 8%
Core 2.1 is 3%
Core 3 is 0%
Core 3.1 is 3% -
the execution units are constantly swapping data in and out of the pipeline and the caches. this is context switching between threads.
with HT processors though, there is no need to swap out the caches, the 'new' info it needs is ALREADY in the second set of caches. meaning that memory latency is much shorter.
theres alot more technical details than that, but i'm pretty sure thats an acceptable working model. -
-
Well, the theory is that you have a two lane road. The actual core can get both lanes of information, and that is why you see two cores, see it this way, each lane you have is a "core" ( which in reality is a thread, but let's use the core word sincr markerting does). Each one of these "cores" has the ability to do one thing at the time, so you get in the case of the 720QM a total of eight "cores" working.
The logic behind it is that you use the least physical cores as possible while doing as much things as possible. The physical core has priority when doing a single task, when launching a second task, if the first one has some overhead, it will have priority and lower the other 3+3 unless they are needed. When you launch a thitd thread the next physical core will come into play and the virtual accompaigning this core gets ready for usage, so the first 2 threads on the core 0 and 0.1 get a speed cut sincer they have to share. And so on until you cope all eight cores.
Now enough theory, in real life it is somewhat different, when stressing one core is not THAT single core the onlyt one, all 8 are active, but trhe other seven are on a very low power consumption mode. Why? My guess is that one core cannot handle to much time under full 2.8GHz for too long, since when you check the processor with the task manager or TMonitor or CPU-Z you will see how the TB junps around, and the processes go from 0 to 1 to 2 to 3 and their repesctive virtual cores. I have yet to fully test mmine, but on regular tasks the ht has a priority over the phisical when one thread is already running on the physical core.
And that is why I see the loads jump all over the CPU... -
Serg, the reason Windows jumps threads around is because way back in the single core days, it helped improve multi-threading performance.
(Actually the more used term for TMT is SoEMT)
SMT is explicitely to increase utilization of otherwise idle execution units. Therefore two threads can process simultaneously.
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=18353&lang=eng -
-
Explosivpotato Notebook Consultant
I also understood (as some previous posters already stated), that it allowed simultaneous processing of 2 dissimilar threads on a single core, but only when one thread leaves an execution portion of the core idle that another thread could be using. -
davepermen Notebook Nobel Laureate
the result will be two computations happen in parallel, for two independent threads.
as far as i know, it does NOT interleave if the workloads are completely independent, and don't require some parts of the cpu that only exist once.
that's how i learned hyperthreading: two (or more) schedulers scheduling jobs onto the single core, that has still the same amount of computational units. so it can't do twice the inteter, or sse, or floatingpoint work (which a dualcore could). BUT it can do two different things at the same time, putting the cpu to better usage.
of course, the main feature of hyperthreading is to hide memory latencies. when ever on thread waits for some data, the other can use the cpu at it's fullest. -
Well i7 architecture is a share-the-hardware-between-two-cores thing if you want to see it that way.
This means the hardware is one, but the core can do two things.
Launch thread one. Core 0 processes thread one. Launch thread two. Core 1 could process it, but why bother, Core 0 can handle more. Thread 2 goes to core 0, and while thread one is not being processed and is in idle waiting for response from the peripherals or software, the core 0 can get to work on thread 2. This is interpreted as another core by the computer. But it is only one core. So thread 1 has a response, and now thread 2 needs one. So core 0 swaps between thread 1 and thread 2. Thread 1 is being processed while thread 2 is on stand by for response. And so on.
When you launch a thread 3, it cannot go into the stand by mode on core 0, since core 0 is already working on two threads. So thread 3 goes to the next core. Process repeats for thread 4.
If you launch a thread 5 and thread 1 is done, thread 5 will go to occupy thread 1 place, and work there, not to bother the core 2 and 3 since they are not needed. -
Wow, this got lively! Good discussion everyone! I just want to put a couple of things out there (as I understand them).
1 - Windows is an SMT aware OS. That is why with no HT enabled, it can process parallel instructions (since we have multiple physical cores).
2 - Intel takes advantage of Windows being SMT aware for HT, and advertises HT as additional cores, this is why we see 8 CPU's for Quad-core HT. The OS treats these 'virtual' cores as actual cores, it is on the hardware end where the actual 'division of labor' happens.
3 - HT as it applies to a single core, is TMT. A single core cannot process two instructions at once (unless you're using specific cases, like Judicator mentioned, with FPU / integer operations). The processor will switch between threads to execute instructions, but most of the time, it's not actually simultaneous.
4 - If we have two threads (thread 1 and thread 2) assigned to core 0, and core 0 is busy executing thread 1, thread 2 (assuming the application is optimally coded) can be executed by another available (non-busy) core. In that sense, HT can be parallel. However, as it might apply to the P4 (or any single core solution), this isn't possible, as there's only one core.
I think that sums up the main points everyone was trying to make.. agree? Did I miss something, or incorrectly word anything? -
3) I wasn't the one that mentioned specific cases (it was more notyou and daverpermen), but at that point I think a lot depends on exactly how many functional processing units are in each "processor core", and what the thread that's attempting to run currently actually needs out of those resources. That, of course, is highly dependent on the actual architecture of said core as well as the thread itself.
4) I think that decision is up to the OS, not the processor. The OS assigns threads to cores, and the cores then run said threads. This was a big part of why Windows 7 was supposed to be so important for i7 with it's SMT parking; it'll deliberately try to assign tasks to separate physical cores before using hyperthreading to put 2 threads onto one core. Vista and XP, from what I can tell, can't tell the difference between a physical and logical core, and thus are just as likely to put the 2 most taxing threads on a single physical core, and thus slow things down overall. Note that Windows 7 server apparently goes the other way, and thanks to Core parking, will schedule things on as few (physical) cores as possible, to save power. -
Interesting, thanks Judicator. Apologies if I mixed up who said what.
Just to clarify, "Windows 7 Server" = Server 2008, correct? -
davepermen got it right. Each physical core can handle two threads AT ONCE. Each physical core has smaller units, like floating point unit (FPU), integer unit (IU), and so on. If thread 1 requires FPU only and not IU, so that core 1 IU is idle, then core 1 can handle another thread 2 IF thread 2 only requires IU but not FPU. That's the whole concept of SMT, and the reason why some programs will see advantage (usually scientific program) while some don't in using SMT-capable CPU.
I saw a paper somewhere by Intel or a graduate student that analyzed the new SMT in the i7 and how it is now more efficient than the old P4-HT. And in fact, Intel is not the first one to SMT. IBM and SUN are already talking about 8-logical core per CPU (or is it 80?). Although theirs are different architechtures altogether. -
-
-
davepermen Notebook Nobel Laureate
no. the os knows to which physical and virtual core it assigns the thread to, and priories according to workload.
-
-
-
AFAIK the OS assigns the threads to the cores.
As a little of topic, ran a little test using CATIA V5 R19 on my 720QM. The 4 physical cores got a small load shared between, and the 4 virtual cores saw little to no action, or a very small quantity compared to their physical brothers.
Same will testing Microsoft ISE. I have yet to install CS4 on my laptop and test it. -
Core i7 logic
Discussion in 'Hardware Components and Aftermarket Upgrades' started by fred2028, Dec 14, 2009.