Haswell power savings explained | NotebookReview

IntelUser Notebook Deity

Reputations:: 364

Messages:: 1,642

Likes Received:: 75

Trophy Points:: 66

This thread will summarize the enormous effort and complex steps taken by Intel and manufacturers to get the power savings Haswell will achieve.

( IMPORTANT!!: Most of the power saving techniques only apply to Ultrabook, meaning U-series and Y-series chips)

Few years ago, an information about a future chip called Haswell was released, and one of the statements were that it had REVOLUTIONARY power management.

3 steps taken to save power in Haswell(ULT/Y):

1. New super low CPU and Package C-states, C8/C9/C10, only for (ULT/Y)
2. An Intel-created framework called Power Optimizer to manage interrupts between devices
3. Collaboration with numerous hardware vendors to achieve lower power, and enable low power states

#1 explained: Current CPUs, when they say "idle", really mean critical chips are idle. Modern chips like Ivy Bridge and Haswell don't only contain CPU cores. For Haswell, there's the CPU cores, the GPU subsystems, the L3 cache, the memory controller, the System Agent(Power Control Unit or PCU/I/O connections, Router), which are all connected by the Ring Bus.

In Ivy Bridge, basically only CPU and GPU can go idle. They'll consume very little power(mWatts), but rest of the chip will be on. The reason? It's because various devices in a computer and I/O have to wake up the chip once in a while. That means part of the chip has to be "ready". All Haswell chips decouple the Ring Bus and L3 cache from the core, so the core can be asleep when GPU needs the ring bus for example. In Haswell ULT, C8/C9/C10 allows it to turn off everything.

#2 will explain how #1 is done. Contrary to what most people think, the lowest power state on the 17W Ivy Bridge CPU is at 2.2W. That means at C7 power state.

Basically, you have software/firmware/OS doing "burst" on interrupts, so it can save up and do it all at once, rather than waking up the CPU very often for just one device. It's called "Interrupt Coalescing". Every interrupt by every device is done at same time if possible. Intel created a range of specifications, and hardware called Power Optimizer to achieve this. Every device is required to follow "LTR" or Latency Tolerant Report. It basically means it tells the Power Optimizer how long it can sleep until the next interrupt. Every device really means every device. Touchscreen controllers, keyboard controllers, CPU, GPU, PCH, sensors(GPS, NFC, Cameras, etc), System Memory, Hard drive, PCI Express, USB 2/3, etc.

Even the Operating System, and this is where Windows 8 comes in. Windows 7 used to periodically poll for interrupts, Windows 8 takes it away. It only polls it when a device needs it. Because time between interrupts are longer, the CPU(and rest of devices) can go into deeper power states. The reason this is so important, is because going in and out of different power C-states actually takes time. Frequent transitions may even cause it to use more power.

#3 Devices get new power states as well. Storage subsystems like SATA SSDs get Runtime D3, which is effectively "off". You have intermediate states like Slumber, which wakes up faster, and uses considerably more power, but still much less than traditional SATA sleep. Again, ALL devices get more, and lower power states. The power delivery system will get better as well with much better efficiencies in the low power region.

Smaller effect on power reduction is due to the integrated Voltage Regulator, which will make switching between states and frequencies faster, and a TDP level that goes from current 17W + 3-3.6W PCH, to 15W(and from 13W + 3W to 11.5W on Y). There's also Panel Self Refresh(PSR) which allows display to be refreshed without requiring signal from the platform. That will save display power when display isn't changing much.

IntelUser, May 25, 2013

#1

nipsen Notebook Ditty

Reputations:: 694

Messages:: 1,686

Likes Received:: 131

Trophy Points:: 81

"It's called 'Interrupt Coalescing'"

..thanks for posting a very good explanation. But that's it? They're adding latency to the interrupt controller, along with a new idle power state - instead of reducing the "idle" drain?

nipsen, May 25, 2013

#2

Karamazovmm Overthinking? Always!

Reputations:: 2,365

Messages:: 9,422

Likes Received:: 200

Trophy Points:: 231

Given that most of the intel line up being made of U and Y I think 3/4, this is a good measure, I wonder if in broadwell they will extend that to quads

Karamazovmm, May 25, 2013

#3

Ultra-Insane Under Medicated

Reputations:: 122

Messages:: 867

Likes Received:: 2

Trophy Points:: 30

Good read for me. Maybe some of it is a bit above my comprehension but you gotta start somewhere. I am certain with some thought and effort I will get it very in depth.

I am so ready for Haswell to come out. When it gets in the hands of people other than Intel we will likely learn and see if all this theory actually works.

I read comment today that it offers 50% battery improvement with no performance hit. I guess it could be true but I want to see it.

I myself would rather see improved performance with power reduction. Quads in Ultra's with 17w/25w TDP is what I would like. Not 10w TDP CPU's.

We will all see shortly I might be way off.

Ultra-Insane, May 25, 2013

#4

HTWingNut Potato

Reputations:: 21,580

Messages:: 35,370

Likes Received:: 9,877

Trophy Points:: 931

IntelUser said: ↑

Even the Operating System, and this is where Windows 8 comes in. Windows 7 used to periodically poll for interrupts, Windows 8 takes it away. It only polls it when a device needs it. Because time between interrupts are longer, the CPU(and rest of devices) can go into deeper power states.

Click to expand...

So how does it poll only when a device needs it? How can it know if it doesn't poll?

ASUS-UX32VD said: ↑

I myself would rather see improved performance with power reduction. Quads in Ultra's with 17w/25w TDP is what I would like. Not 10w TDP CPU's.

We will all see shortly I might be way off.

Click to expand...

Well that's it. It's not ultimate TDP that matters, it's low power or idle TDP and power consumption that matter for battery life. I'd rather have power on demand with very low idle power consumption.

HTWingNut, May 25, 2013

#5

IntelUser Notebook Deity

Reputations:: 364

Messages:: 1,642

Likes Received:: 75

Trophy Points:: 66

Nipsen: Sorry if it wasn't clear. I explained it in the later sentences.

Because time between interrupts are longer, the CPU(and rest of devices) can go into deeper power states. The reason this is so important, is because going in and out of different power C-states actually takes time.

Click to expand...

Coalescing means to combine it together. Previous platforms had the devices polling the CPU whenever it needs it, without regards to the platform. So 12 devices may interrupt the CPU 12 times in say, 100 milliseconds. Interrupt coalescing on the Haswell platform would try to combine multiple, if not all into one if possible. So you'd have only 1 interrupts in that 100 milliseconds. In order to do that, you need all devices to work together. Some may only need interrupts every 60ms, while others may need 100ms.

It didn't matter if you had all these low power states, if it would wake up so often. Combining interrupts all into one means there's a much greater opportunity to go into the lower power states. Remember, lower power states take LONGER time to enter and exit.

Of course it doesn't mean it doesn't have lower power states, at the lowest c-state C10, CPU idle is said to be in the few mW range, and storage for example will go from current several hundred mW to few mW as well.

HTWingnut:

So how does it poll only when a device needs it? How can it know if it doesn't poll?

Click to expand...

I could have explained it better I guess. Windows 7 would just send interrupts every few ms. So with Windows 7 you have two kinds of interrupts. Device-based interrupts that would occur every time it requires it, and OS-based interrupts that just happens. Windows 8 takes out the periodic OS-based interrupts, so you are left with doing it only when devices need it.

IntelUser, May 25, 2013

#6

HTWingNut Potato

Reputations:: 21,580

Messages:: 35,370

Likes Received:: 9,877

Trophy Points:: 931

IntelUser said: ↑

Nipsen: Sorry if it wasn't clear. I explained it in the later sentences.

Coalescing means to combine it together. Previous platforms had the devices polling the CPU whenever it needs it, without regards to the platform. So 12 devices may interrupt the CPU 12 times in say, 100 milliseconds. Interrupt coalescing on the Haswell platform would try to combine multiple, if not all into one if possible. So you'd have only 1 interrupts in that 100 milliseconds. In order to do that, you need all devices to work together. Some may only need interrupts every 60ms, while others may need 100ms.

It didn't matter if you had all these low power states, if it would wake up so often. Combining interrupts all into one means there's a much greater opportunity to go into the lower power states. Remember, lower power states take LONGER time to enter and exit.

Of course it doesn't mean it doesn't have lower power states, at the lowest c-state C10, CPU idle is said to be in the few mW range, and storage for example will go from current several hundred mW to few mW as well.

HTWingnut:

I could have explained it better I guess. Windows 7 would just send interrupts every few ms. So with Windows 7 you have two kinds of interrupts. Device-based interrupts that would occur every time it requires it, and OS-based interrupts that just happens. Windows 8 takes out the periodic OS-based interrupts, so you are left with doing it only when devices need it.

Click to expand...

Thanks. So you have to make sure the device polls the system itself then? And how does it do that if there's no power to it?

HTWingNut, May 25, 2013

#7

tilleroftheearth Wisdom listens quietly...

Reputations:: 5,398

Messages:: 12,692

Likes Received:: 2,717

Trophy Points:: 631

If I were to guess:

Each device says it needs to be polled a minimum of 'xxx' seconds...

The O/S (Win8 forward...) collects all this information and 'knows' to wake up at a common minimum time slice and then polls each and every device ...

Simple, eh?

tilleroftheearth, May 25, 2013

#8

HTWingNut Potato

Reputations:: 21,580

Messages:: 35,370

Likes Received:: 9,877

Trophy Points:: 931

tilleroftheearth said: ↑

If I were to guess:

Each device says it needs to be polled a minimum of 'xxx' seconds...

The O/S (Win8 forward...) collects all this information and 'knows' to wake up at a common minimum time slice and then polls each and every device ...

Simple, eh?

Click to expand...

Not really because it says Windows 8 doesn't poll, the device does.

HTWingNut, May 25, 2013

#9

tilleroftheearth Wisdom listens quietly...

Reputations:: 5,398

Messages:: 12,692

Likes Received:: 2,717

Trophy Points:: 631

Yeah, I don't see a contradiction?

Each device tells Windows when to poll, and Windows collects and acts on this information in the smallest timeframe possible (so it could be in 'deep sleep' for as long as possible).

tilleroftheearth, May 25, 2013

#10

IntelUser Notebook Deity

Reputations:: 364

Messages:: 1,642

Likes Received:: 75

Trophy Points:: 66

tilleroftheearth said: ↑

The O/S (Win8 forward...) collects all this information and 'knows' to wake up at a common minimum time slice and then polls each and every device ...

Click to expand...

This is dependent on the hardware. For example, Ivy Bridge can't do that. Clover Trail has Intel's version, Qualcomm has their own version, etc. For Haswell its called Power Optimizer. It was stated back in early days of Windows 8 that various hardware vendors can put their own power manager framework(in some place in the OS or integrated or whatever). I guess this was what they were talking about, and its an optional feature as well, based on how Ivy Bridge don't care about that.

Not really because it says Windows 8 doesn't poll, the device does.

Click to expand...

It would be wrong to say it won't poll. It just goes from a timer-based interrupt to a dynamic one. Doing whatever the OS needs to do will just be combined with what various device needs. And all that is actually done by Power Optimizer, it will decide what to do based on reports by the device. So its both(OS and device), and neither.

I said it was "off", not off.

When a CPU core gets power-gated, the CPU core is really off, but it needs dedicated SRAM for storing the state of the core. It probably works in a similar way for other devices as well.

IntelUser, May 25, 2013

#11

Mr. Wonderful Notebook Evangelist

Reputations:: 10

Messages:: 449

Likes Received:: 6

Trophy Points:: 31

Can't wait for the English reviews to start to hit from the bigger sites. When do the embargoes break?

Mr. Wonderful, May 26, 2013

#12

Karamazovmm Overthinking? Always!

Reputations:: 2,365

Messages:: 9,422

Likes Received:: 200

Trophy Points:: 231

no one knows, some say june 3rd, the others say june 3rd, but we are still in the wait and see for a really confirmed date

Karamazovmm, May 26, 2013

#13

OtherSongs Notebook Evangelist

Reputations:: 113

Messages:: 640

Likes Received:: 1

Trophy Points:: 31

Karamazovmm said: ↑

no one knows, some say june 3rd, the others say june 3rd,

Click to expand...

Cute, but I'm smiling.

Karamazovmm said: ↑

but we are still in the wait and see for a really confirmed date

Click to expand...

As June 3 is a Monday, that makes sense to me.

OTOH web sales are 24/7/365, so Sunday June 2 also won't surprise me.

OtherSongs, May 26, 2013

#14

Karamazovmm Overthinking? Always!

Reputations:: 2,365

Messages:: 9,422

Likes Received:: 200

Trophy Points:: 231

I know Im that gorgeous and with a wonderful personality, but that was just saying the release date

Karamazovmm, May 26, 2013

#15

nipsen Notebook Ditty

Reputations:: 694

Messages:: 1,686

Likes Received:: 131

Trophy Points:: 81

IntelUser said: ↑

Nipsen: Sorry if it wasn't clear. I explained it in the later sentences.
(...)
Coalescing means to combine it together. Previous platforms had the devices polling the CPU whenever it needs it, without regards to the platform. So 12 devices may interrupt the CPU 12 times in say, 100 milliseconds. Interrupt coalescing on the Haswell platform would try to combine multiple, if not all into one if possible. So you'd have only 1 interrupts in that 100 milliseconds. In order to do that, you need all devices to work together. Some may only need interrupts every 60ms, while others may need 100ms.

Click to expand...

No, I understand what it does, and how that would would look fairly good on a run-through where we have long idle periods. As in that these idle periods last longer than at least one of the longest "100ms" interrupts. This would make sense for laptops that are powered up but idle and turn off the screen, while all input devices, filewriters are inactive, etc.

But in practice, if my assumptions are correct, at least... what it really means is that as long as you're actually running something on the computer that has a requirement for an interrupt poll shorter than 100ms on the hardware level (which is where we're really interested in looking: IO and so on can be slow, but still require updated caches and of course math operations performed continuously independent of the filestream). Which really is literally all programs ever written for x86 in the first place - then the power-drain will be identical to the previous platform.

Let's assume that you can get all input devices and the sound card, hdd and anything with dma to lower the frequency for polling, though. So that you are able to run a burst of cpu activity every 100ms instead of having it active continuously. Given that it is possible to schedule that. With writes to cache, etc., how much is it possible to shave off here?

I'm just saying that that's the kind of examples Intel needs to show us.

nipsen, May 26, 2013

#16

IntelUser Notebook Deity

Reputations:: 364

Messages:: 1,642

Likes Received:: 75

Trophy Points:: 66

Example of what's possible is well demonstrated in Clover Trail platform versus the previous Atom platforms.

Despite having similar platform level TDP, my 5-inch Atom device had a battery life of 7 hours with screen-on idle, using a 24WHr battery. But video and browsing will drop that figure to 5 hours. In comparison, a 10-inch screen Atom Z2760 Tablet I have at home gets 8-9 hours video playback using nearly similar 25WHr capacity. The difference is, it can idle at 1.4W! And it can go into Connected Standby which wakes up the device literally in a blink and uses maybe only 3-4% battery in 12 hour period.

IntelUser, May 26, 2013

#17

nipsen Notebook Ditty

Reputations:: 694

Messages:: 1,686

Likes Received:: 131

Trophy Points:: 81

Right. And that's useful for a tablet or a mobile phone, where we know it's going to sit and idle completely for hours at a time.

When it comes to a windows-computer, or a PC setup in general, then we know it's going to run processes fairly often, and wake up pretty much constantly.. And when the ambient drain is larger than what the actual core activity is most of the time in the first place.. Then it's difficult to see how much benefit we're going to see for a laptop.. just saying..

Just to take an example. I have a tegra3 device. As long as that device runs on the helper core (which it can do with out of order execution for general processing, etc) - then that draws very little power. If I put that on a screen and connect my bt keyboard, I basically have a pc I can write on constantly for a full day. I can add dropbox sync and spotify in the background, and still not start to pull the battery-drain into overdrive.

So, you know.. yes, I can see there's an improvement of some sort here. I suppose I can see where Intel wants to apply this.

What I don't see is where that particular solution is going to make a laptop draw less power while active. Or while sitting with the screen on standby, and the OS being run in "powerconfig" mode, etc.

I mean, if Intel could show any of that specifically, I'd be interested. And that's just me being suspicious, but that they're not being specific doesn't fill me with much confidence that the power-draw improvement over Sandy and Ivy bridge isn't... half invented and half made up, more or less..

nipsen, May 26, 2013

#18

IntelUser Notebook Deity

Reputations:: 364

Messages:: 1,642

Likes Received:: 75

Trophy Points:: 66

Nipsen: Yes, you have a point. But that's what most of the power improvements have targeted at. Load power hasn't really improved. It's gains in web browsing and video playback, which are burst peak workloads, that stand to gain the greatest benefit from c-states and such.

Not to say Haswell won't have load power improvement. Peak TDP goes down by 4-5W, and integrated voltage regulator allows faster switching meaning more and more applications that used to peg the CPU at high because timeframe between peak and idle was too short, can take advantage of such situations. One of the Intel research from couple of years ago was a modified Pentium M laptop where it applies fine grained power management for improvement of 30-50% in battery life for games.

Ideally, nearly all of the applications fall between 0-100% load, and power should scale exactly with load. But practical limitations(like that how you can't instantaneously switch between 0s and 1s, and cause real square waves have a ramp up time) prevent that, and loads that are consistently high enough might never take advantage of idle/c-state power advancements.

Also, its important to consider this. Just because an application is at load, doesn't mean it loads every single device to the same magnitude. So if you really get every devices in the system have those new low power states, you stand to benefit from it even if you are doing something very demanding. Just because some devices don't need to get that high.

My XPS 12 Ultrabook uses 6.5W with minimum display brightness idle(display off idle = 5.5W), meaning even if I am just doing word processing, it can't get lower than that. Web browsing gets it to 8-9W, dropping that by few watts would help immensely. Of course games and 3D rendering will get that up high, but even Clover Trail and ARM-based devices get lot less battery when fully loaded. And since full load is dependent on TDP, Y series Haswell chips with 11.5W will do even better than 15W U series.

IntelUser, May 26, 2013

#19

Peon Notebook Virtuoso

Reputations:: 406

Messages:: 2,007

Likes Received:: 128

Trophy Points:: 81

ASUS-UX32VD said: ↑

I myself would rather see improved performance with power reduction. Quads in Ultra's with 17w/25w TDP is what I would like. Not 10w TDP CPU's.

Click to expand...

But would it even be desirable? I mean, ULV CPUs aren't particularly fast to begin with and getting a quad core down to 17W will inevitably mean a huge clockspeed hit on top of that.

Peon, May 27, 2013

#20

Ultra-Insane Under Medicated

Reputations:: 122

Messages:: 867

Likes Received:: 2

Trophy Points:: 30

Peon said: ↑

But would it even be desirable? I mean, ULV CPUs aren't particularly fast to begin with and getting a quad core down to 17W will inevitably mean a huge clockspeed hit on top of that.

Click to expand...

Good comment you have me thinking but 1st let me get rid of some baggage. You say ULV not particularly fast? OK but just to be sure you understand as many don't. ULV are not particularly fast/performance vs "quads". To really be correct what you and all others really mean if you want to be correct is "dual" core CPU's are not particularly fast/performance. ULV are only dual core as you know but SV dual cores are nothing more than a different shade of gray vs the more "black and white" of the quads.

To make very clear the current ULV's are not the ones of old that while likely having lower clocks also suffered a reduced ability clock for clock. The current ULV's have the same computational power as the SV CPU's. They are also currently underrated in clock speeds vs SV. If you call it Turbo or whatever the ULV i7 has 9/11 multiplier over it's base. The SV has a 5/7 multiplier. How you want to look at this at some level approaches is the glass half full or half empty. My 3617U is rated at 1.9GHz but if you compare to SV it would be correct to compare to a 2.4GHz SV. OK so a SV is now at what 2.9GHz/3.0GHz? I know faster but nothing more than a shade of gray if you ask me.

On your comment about quads. I am playing a little bit in a fantasy world. If your comment is more in the real world. And by that I mean current and making quad I agree with you. Yes I would not want a 1GHz quad which I think is sort of what you are saying. Yes I do not want that. I might take a 2GHz quad but sure maybe not possible. I am and was just having a fantasy.

Ultra-Insane, May 27, 2013

#21

oled Notebook Evangelist

Reputations:: 221

Messages:: 587

Likes Received:: 33

Trophy Points:: 41

In the meantime, do we have real scenario tests between Windows 7 and Windows 8 running on Haswell regarding the interrupt handling and resulting power consumption / battery usage?

Is the linux kernel already aware of?

oled, Jun 21, 2013

#22