This thread will summarize the enormous effort and complex steps taken by Intel and manufacturers to get the power savings Haswell will achieve.
( IMPORTANT!!: Most of the power saving techniques only apply to Ultrabook, meaning U-series and Y-series chips)
Few years ago, an information about a future chip called Haswell was released, and one of the statements were that it had REVOLUTIONARY power management.
3 steps taken to save power in Haswell(ULT/Y):
1. New super low CPU and Package C-states, C8/C9/C10, only for (ULT/Y)
2. An Intel-created framework called Power Optimizer to manage interrupts between devices
3. Collaboration with numerous hardware vendors to achieve lower power, and enable low power states
#1 explained: Current CPUs, when they say "idle", really mean critical chips are idle. Modern chips like Ivy Bridge and Haswell don't only contain CPU cores. For Haswell, there's the CPU cores, the GPU subsystems, the L3 cache, the memory controller, the System Agent(Power Control Unit or PCU/I/O connections, Router), which are all connected by the Ring Bus.
In Ivy Bridge, basically only CPU and GPU can go idle. They'll consume very little power(mWatts), but rest of the chip will be on. The reason? It's because various devices in a computer and I/O have to wake up the chip once in a while. That means part of the chip has to be "ready". All Haswell chips decouple the Ring Bus and L3 cache from the core, so the core can be asleep when GPU needs the ring bus for example. In Haswell ULT, C8/C9/C10 allows it to turn off everything.
#2 will explain how #1 is done. Contrary to what most people think, the lowest power state on the 17W Ivy Bridge CPU is at 2.2W. That means at C7 power state.
Basically, you have software/firmware/OS doing "burst" on interrupts, so it can save up and do it all at once, rather than waking up the CPU very often for just one device. It's called "Interrupt Coalescing". Every interrupt by every device is done at same time if possible. Intel created a range of specifications, and hardware called Power Optimizer to achieve this. Every device is required to follow "LTR" or Latency Tolerant Report. It basically means it tells the Power Optimizer how long it can sleep until the next interrupt. Every device really means every device. Touchscreen controllers, keyboard controllers, CPU, GPU, PCH, sensors(GPS, NFC, Cameras, etc), System Memory, Hard drive, PCI Express, USB 2/3, etc.
Even the Operating System, and this is where Windows 8 comes in. Windows 7 used to periodically poll for interrupts, Windows 8 takes it away. It only polls it when a device needs it. Because time between interrupts are longer, the CPU(and rest of devices) can go into deeper power states. The reason this is so important, is because going in and out of different power C-states actually takes time. Frequent transitions may even cause it to use more power.
#3 Devices get new power states as well. Storage subsystems like SATA SSDs get Runtime D3, which is effectively "off". You have intermediate states like Slumber, which wakes up faster, and uses considerably more power, but still much less than traditional SATA sleep. Again, ALL devices get more, and lower power states. The power delivery system will get better as well with much better efficiencies in the low power region.
Smaller effect on power reduction is due to the integrated Voltage Regulator, which will make switching between states and frequencies faster, and a TDP level that goes from current 17W + 3-3.6W PCH, to 15W(and from 13W + 3W to 11.5W on Y). There's also Panel Self Refresh(PSR) which allows display to be refreshed without requiring signal from the platform. That will save display power when display isn't changing much.
-
"It's called 'Interrupt Coalescing'"
..thanks for posting a very good explanation. But that's it? They're adding latency to the interrupt controller, along with a new idle power state - instead of reducing the "idle" drain? -
Karamazovmm Overthinking? Always!
Given that most of the intel line up being made of U and Y I think 3/4, this is a good measure, I wonder if in broadwell they will extend that to quads
-
Good read for me. Maybe some of it is a bit above my comprehension but you gotta start somewhere. I am certain with some thought and effort I will get it very in depth.
I am so ready for Haswell to come out. When it gets in the hands of people other than Intel we will likely learn and see if all this theory actually works.
I read comment today that it offers 50% battery improvement with no performance hit. I guess it could be true but I want to see it.
I myself would rather see improved performance with power reduction. Quads in Ultra's with 17w/25w TDP is what I would like. Not 10w TDP CPU's.
We will all see shortly I might be way off. -
-
Nipsen: Sorry if it wasn't clear. I explained it in the later sentences.
It didn't matter if you had all these low power states, if it would wake up so often. Combining interrupts all into one means there's a much greater opportunity to go into the lower power states. Remember, lower power states take LONGER time to enter and exit.
Of course it doesn't mean it doesn't have lower power states, at the lowest c-state C10, CPU idle is said to be in the few mW range, and storage for example will go from current several hundred mW to few mW as well.
HTWingnut: -
-
tilleroftheearth Wisdom listens quietly...
If I were to guess:
Each device says it needs to be polled a minimum of 'xxx' seconds...
The O/S (Win8 forward...) collects all this information and 'knows' to wake up at a common minimum time slice and then polls each and every device ...
Simple, eh? -
-
tilleroftheearth Wisdom listens quietly...
Yeah, I don't see a contradiction?
Each device tells Windows when to poll, and Windows collects and acts on this information in the smallest timeframe possible (so it could be in 'deep sleep' for as long as possible). -
I said it was "off", not off.
When a CPU core gets power-gated, the CPU core is really off, but it needs dedicated SRAM for storing the state of the core. It probably works in a similar way for other devices as well. -
Can't wait for the English reviews to start to hit from the bigger sites. When do the embargoes break?
-
Karamazovmm Overthinking? Always!
no one knows, some say june 3rd, the others say june 3rd, but we are still in the wait and see for a really confirmed date
-
OTOH web sales are 24/7/365, so Sunday June 2 also won't surprise me. -
Karamazovmm Overthinking? Always!
I know Im that gorgeous and with a wonderful personality, but that was just saying the release date
-
No, I understand what it does, and how that would would look fairly good on a run-through where we have long idle periods. As in that these idle periods last longer than at least one of the longest "100ms" interrupts. This would make sense for laptops that are powered up but idle and turn off the screen, while all input devices, filewriters are inactive, etc.
But in practice, if my assumptions are correct, at least... what it really means is that as long as you're actually running something on the computer that has a requirement for an interrupt poll shorter than 100ms on the hardware level (which is where we're really interested in looking: IO and so on can be slow, but still require updated caches and of course math operations performed continuously independent of the filestream). Which really is literally all programs ever written for x86 in the first place - then the power-drain will be identical to the previous platform.
Let's assume that you can get all input devices and the sound card, hdd and anything with dma to lower the frequency for polling, though. So that you are able to run a burst of cpu activity every 100ms instead of having it active continuously. Given that it is possible to schedule that. With writes to cache, etc., how much is it possible to shave off here?
I'm just saying that that's the kind of examples Intel needs to show us. -
Example of what's possible is well demonstrated in Clover Trail platform versus the previous Atom platforms.
Despite having similar platform level TDP, my 5-inch Atom device had a battery life of 7 hours with screen-on idle, using a 24WHr battery. But video and browsing will drop that figure to 5 hours. In comparison, a 10-inch screen Atom Z2760 Tablet I have at home gets 8-9 hours video playback using nearly similar 25WHr capacity. The difference is, it can idle at 1.4W! And it can go into Connected Standby which wakes up the device literally in a blink and uses maybe only 3-4% battery in 12 hour period. -
Right. And that's useful for a tablet or a mobile phone, where we know it's going to sit and idle completely for hours at a time.
When it comes to a windows-computer, or a PC setup in general, then we know it's going to run processes fairly often, and wake up pretty much constantly.. And when the ambient drain is larger than what the actual core activity is most of the time in the first place.. Then it's difficult to see how much benefit we're going to see for a laptop.. just saying..
Just to take an example. I have a tegra3 device. As long as that device runs on the helper core (which it can do with out of order execution for general processing, etc) - then that draws very little power. If I put that on a screen and connect my bt keyboard, I basically have a pc I can write on constantly for a full day. I can add dropbox sync and spotify in the background, and still not start to pull the battery-drain into overdrive.
So, you know.. yes, I can see there's an improvement of some sort here. I suppose I can see where Intel wants to apply this.
What I don't see is where that particular solution is going to make a laptop draw less power while active. Or while sitting with the screen on standby, and the OS being run in "powerconfig" mode, etc.
I mean, if Intel could show any of that specifically, I'd be interested. And that's just me being suspicious, but that they're not being specific doesn't fill me with much confidence that the power-draw improvement over Sandy and Ivy bridge isn't... half invented and half made up, more or less.. -
Nipsen: Yes, you have a point. But that's what most of the power improvements have targeted at. Load power hasn't really improved. It's gains in web browsing and video playback, which are burst peak workloads, that stand to gain the greatest benefit from c-states and such.
Not to say Haswell won't have load power improvement. Peak TDP goes down by 4-5W, and integrated voltage regulator allows faster switching meaning more and more applications that used to peg the CPU at high because timeframe between peak and idle was too short, can take advantage of such situations. One of the Intel research from couple of years ago was a modified Pentium M laptop where it applies fine grained power management for improvement of 30-50% in battery life for games.
Ideally, nearly all of the applications fall between 0-100% load, and power should scale exactly with load. But practical limitations(like that how you can't instantaneously switch between 0s and 1s, and cause real square waves have a ramp up time) prevent that, and loads that are consistently high enough might never take advantage of idle/c-state power advancements.
Also, its important to consider this. Just because an application is at load, doesn't mean it loads every single device to the same magnitude. So if you really get every devices in the system have those new low power states, you stand to benefit from it even if you are doing something very demanding. Just because some devices don't need to get that high.
My XPS 12 Ultrabook uses 6.5W with minimum display brightness idle(display off idle = 5.5W), meaning even if I am just doing word processing, it can't get lower than that. Web browsing gets it to 8-9W, dropping that by few watts would help immensely. Of course games and 3D rendering will get that up high, but even Clover Trail and ARM-based devices get lot less battery when fully loaded. And since full load is dependent on TDP, Y series Haswell chips with 11.5W will do even better than 15W U series. -
-
To make very clear the current ULV's are not the ones of old that while likely having lower clocks also suffered a reduced ability clock for clock. The current ULV's have the same computational power as the SV CPU's. They are also currently underrated in clock speeds vs SV. If you call it Turbo or whatever the ULV i7 has 9/11 multiplier over it's base. The SV has a 5/7 multiplier. How you want to look at this at some level approaches is the glass half full or half empty. My 3617U is rated at 1.9GHz but if you compare to SV it would be correct to compare to a 2.4GHz SV. OK so a SV is now at what 2.9GHz/3.0GHz? I know faster but nothing more than a shade of gray if you ask me.
On your comment about quads. I am playing a little bit in a fantasy world. If your comment is more in the real world. And by that I mean current and making quad I agree with you. Yes I would not want a 1GHz quad which I think is sort of what you are saying. Yes I do not want that. I might take a 2GHz quad but sure maybe not possible. I am and was just having a fantasy. -
In the meantime, do we have real scenario tests between Windows 7 and Windows 8 running on Haswell regarding the interrupt handling and resulting power consumption / battery usage?
Is the linux kernel already aware of?
Haswell power savings explained
Discussion in 'Hardware Components and Aftermarket Upgrades' started by IntelUser, May 25, 2013.