The Clock Modulation register 0x19A is pretty easy to understand. The lower bits in EAX are used while the upper bits are Reserved by Intel. Bit[0] is also reserved and is not used.
Intel® 64 and IA-32 Architectures Software Developer's Manual
Volume 3A: System Programming Guide
14.5.3 Software Controlled Clock Modulation
http://www.intel.com/Assets/PDF/manual/253668.pdf
Bit[4] is the one that controls whether Clock Modulation is enabled. When this bit is set, clock modulation is enabled. Make sure that bit is always zero and no more clock modulation. That's all ThrottleStop does to disable this.
Bit[0] is reserved and not used and bits [3:1] let you choose 8 different levels of clock modulation from 12.5% to 87.5%. The Intel documentation is very good explaining this.
You can use my MSR Tool to have a look at how this register changes. Use ThrottleStop to play around with clock modulation and then use MSR Tool to read what is in register 0x19A.
The register beside that one, 0x199, is very simple for Core i7 CPUs. For the i7-720 the default multiplier is 12 so you need to add one on to this which tells the CPU to use as much turbo boost as is available. Keep this register at 13 for the 720 and 14 for the 820 and there will be no more multiplier throttling.
Fixing things at the bios level is a lot better way to fix this problem rather than the way ThrottleStop tries to fix things after it detects throttling. It's better to stop throttling before it happens rather than try to reverse it after the fact.
-
-
unclewebb to the rescue...
you don't happen to recall anything about other methods it uses to control the multiplier, eg. available power?
unless there's a master register (or different set of registers) that intel's PowerManagement module writes to, our nop replacement won't work. -
just a random question. is that a normal warning in Event Viewer ?
"The speed of processor 0 in group 0 is being limited by system firmware. The processor has been in this reduced performance state for 71 seconds since the last report."
There's one for each processor. -
thanks again thalanix and thanks unclewebb for the assistance as well.
": The embedded controller (EC) did not respond within the specified timeout period. This may indicate that there is an error in the EC hardware or firmware or that the BIOS is accessing the EC incorrectly. You should check with your computer manufacturer for an upgraded BIOS. In some situations, this error may cause the computer to function incorrectly."
and a handful of some similar warnings from the same time periods:
": The embedded controller (EC) returned data when none was requested. The BIOS might be trying to access the EC without synchronizing with the operating system. This data will be ignored. No further action is necessary; however, you should check with your computer manufacturer for an upgraded BIOS."
Hmmm...
Peter -
thalanix: I sort of understand how these CPUs operate at the user level but I don't understand anything beyond that or how the bios controls throttling, etc. The publicly available Intel documentation is full of Reserved registers. I'd give my right arm to get the full docs from Intel.
Usually you only get those warning messages when the CPU is throttling. If you are getting these messages when you're not working your computer very hard then something else must be wrong. -
hopefully you can get that mass-msr-dump implemented, if not, no worries. thanks for all the help.
for now i suggest we look through http://developer.intel.com/Assets/PDF/manual/253669.pdf appendix B.5 for power-related MSRs -
-
I think it may be more likely that the errors may be linked to CPU throttling - and I may even have started getting those EC errors while running tests with ThrottleStop - they only appear in the log during the 5 hours or so that I was testing with it, not before or since. The Processor "reduced performance state" messages have always been there though, since I got my G51Jx.
Peter -
i had those messages as well before disabling performance counters. may or may not be related, i don't think it matters.
now for some potential bad news, disabling ACPI does not stop throttling. i did this via acpi=off boot stanza and made sure lsmod didn't have it loaded. keyboard and backlight controls didn't work so i assume it was off. not sure why i have 2 CPUs listed when groceries only had 1. the results:
full screenshots, same thing as above.
http://i137.photobucket.com/albums/q228/HoldFire/ss28.jpg
http://i137.photobucket.com/albums/q228/HoldFire/ss29.jpg
http://i137.photobucket.com/albums/q228/HoldFire/ss30.jpg
http://i137.photobucket.com/albums/q228/HoldFire/ss31.jpg -
how did u get non-throttled and throttled test scores?
-
non-throttled: running the benchmark normally. throttled: limiting CPU to 7x wit cpufreqd. the right three were without ACPI and running the bench alongside phoronix-test-suite unigine-sactuary.
-
how much cpu is being used when trying to cause throttling by the other app.
-
by pts, the monitor says 100% on one thread. i don't think that would be possible, so it's most likely throttling.
-
I'm trying to understand all these results. Do you have a test with ACPI and running the bench alongside phoronix-test-suite Unigine-Sanctuary to compare to? Ah I see, you did a test without ACPI but forcing the CPU to 7x with cpufeqd to get the Throttled Score - and then compared that to the score when the test was being run concurrently with some other process but (supposedly) still non throttled?
The results tables don't really make any sense - some runs of Prime95 (especially in the last column) are going faster at higher FFT lengths than at lower ones. If the effect of running with throttling = low score, without throttling = good score and without throttling and with other load = ~throttled score - wouldn't that be a better thing since it indicates we are getting more out of the system when under heavier load?
The tests with Unigine-Sanctuary show 3 cores in use as opposed to the 1 in the Prime95 loan test, which would essentially put Turbo Boost at a low level since it cannot push up all the speed on the one core (something it also couldn't do in the simulated Throttle test since you locked the CPU Freqs at 7x).
As well, we know that the GPU/CPU will operate more slowly as it reaches the actual wattage limits of the 120W adapter - something it seems able to max out with ACPI off quite easily. Could that be where you are seeing this throttling in the later tests? Edit: I don't think this is the case, more likely the option I mentioned in the last paragraph - but I left this here since it is something to keep in mind as we push these systems to their limits.
Sorry, I may be missing something here.
Thanks very much,
Peter -
lower/faster times = better. throttled was with ACPI (else cpufreqd wouldn't work), but the same would work by running prime95 with it to keep consistency.
idealy, the scores should be falling somewhere between the throttled and non-throttled score.
eg 1280k - best time is 29, worst is 73, and in three tests it falls at 72, 72 and 48. sometimes it falls between, sometimes it hits rock bottom. when it hits the lower threshold, it matches the score i get while throttled (2 threads @ 933 < 1 core @ 2.8)
without ACPI, i have one core available with hyperthreading. -
@thalanix, is there a program that you used to prevent throttling, if not how did you get the non-throttled scores?
-
without running pts
if the 3 unigine scores are confusing, those are sequentially (not parallel). -
what's "pts"? Sorry, I was not reading the last 22 pages.
-
phoronix test suite. benchmark utility on linux.
-
one final bump for victory: i don't know who added the last edit, but you are my hero. i love you with all my manly heart.
forget the BIOS editing. the fix is simply to disable thermal monitoring in 0x1FC, by disabling bit 0 (eg. using the MSR tool to write 0x02 into a single core; it propagates on its own to the other three).
initial testing:
furmark + stock nvidia clocks can now run
furmark + stock nvidia clocks + prime95 = shut down the power brick, meaning that we are now at the PSU level. when this happens, you have to unplug/replug it and pray it works (did twice so far).
temps can get extremely high and there is no automatic shutdown; use at your own risk and keep an eye on them whenever possible -
I see - thanks for the clarification. So AllurGroceries was getting the same sorts of results with no ACPI but the Hyperthreaded core did not come up in the stats?
It's interesting looking at those stats - they cover a period of only just over 100 ms if I read it right - Turbo Boost can take a few seconds to fully cycle up (or get throttled down) - and some of the test results get better as the test continues, while others remain low - since these are all only taking the best time out of multiple runs of a few ms then we can assume that for those that do have good timings, Turbo Boost was in effect for at least part of that run - or nearly all of it - where as other runs might have been half or entirely throttled.
It's too bad we don't have some times for all those iterations to get data from a longer time window. If there is even a slight pause in UE then it could have a profound impact on those timings - and that might be why some of the #s don't match the trend (768K FFT length. Best time: 43.801 ms, 1280K FFT length. Best time: 48.125 ms, 1536K FFT length. Best time: 41.762 ms). I like how superpi does it, showing us the data for more than a single iteration in most cases.
To confirm, you could get that Non-Throttled score with ACPI on or off and it would be the same in both cases right? As long as nothing else is running concurrently you are not seeing any throttling?
By the way, do you have any Kill-A-Watt meter? Or just AllurGroceries?
Thanks,
Peter -
So this has been "fixed"? What temps are we talking about? And I can test the 150w with furmark+stock clocks+Prime95 to see if I can maintain it.
Can you tell me how to disable the thermal monitor? I am not as versed as some of you haha. If possible, also tell me how to re-enable it later
-
.
Does seem we are at the limits of the PSU then - can you tell me how you disabled thermal monitoring and I'll run a test and see what my Power Meter readings are giving me. Sounds like this might be a fix to show the potential of the system to show ASUS we need a higher limit/stronger PSU to make use of our laptop fully.
Peter -
http://g51jbsod.wikia.com/wiki/CPU_throttling#fix
* grab the MSR tool at the bottom of the page
* in MSR Number, type in 0x1FC
* in the box under -0003 (EAX), type in 2 and Write MSR
* Read MSR to make sure all cores are set to 0x02 and enjoy
using http://www.fileden.com/files/2008/3/3/1794507/MSR.zip
the brick turning off can be a little scary the first time, everything goes blank. this was at nvidia clocks with furmark and prime95, i don't suppose it'll happen on stock (hopefully). -
Wow! Great work.
So this would stop the laptop from shutting itself off if it overheats?... is there anyway a 3rd party program could be used to do the same thing, or is temperature monitoring completely disabled using this method? -
yep. you can still read your temperatures, but the auto-shutdown and auto-throttling when DTS hits 0 are no longer enabled.
they aren't much higher than just furmark, but they are higher. i don't think we'll be getting to the furmark+prime95 level any time soon, so i wouldn't worry about temperatures too much.
we can change the default bits in PowerManagement and Intelligent Power Sharing, but that's far too risky than just setting it before playing. -
Thermal shut down is not supposed to happen in Intel CPUs until approximately 125C. When the DTS=0, the CPUs is only at about 100C.
When the core temperature is between 100C and 125C, it is no longer possible for any software to monitor this because the DTS will keep showing zero.
If this truly is a fix then I can write a very simple tool with a single button to disable MSR 0x1FC. -
I will check in a bit.
Thalanix or anyone else, can you guys test with 120w and stock clocks (not the nvidia reference clocks) to see if there is also shutdown?.
I can help with Asus to get 150w in case 120w is not sufficient. But Asus won't accept overclocked results. -
after having the brick shut off 3 times, i'm a little bit of a chicken
the irony if the power brick becomes a brick...
this is breaking asus' throttle scheme, so i doubt they'll hand out 150's anyway.
edit: we already claimed the top of google with MSR 0x1FC, so there has to be a drought of info about it. -
I can confirm with my G51Jx running with these sets, the CPU temps aren't going abnormally high - but I am getting 2.96GHz constant Turbo Boost while running FurMark at ASUS GTS360M stock clocks.
Usually Turbo Boost would cease working when I reached 80*C GPU temps in my previous tests - but now it's still going at 89*C and it's going at 2.96GHz almost 95% of the time (occasionally going down to 2.82 GHz) - much much better than before.
The CPU Core 0 temp never passed 71*C - the only thing that reached the limit was the PSU - it was peaking at 130W constant on my Power Meter (something I never was able to reach before). The system kept at this level, with nothing changing except the GPU temp inching up minute by minute for 5 mins when I shut it off to write this report.
This was just with only Furmark running, GPU at stock, CPU w/Extreme Turbo enabled and with no USB devices plugged in and the screen brightness at 20%.
Peter -
This looks very exciting. I'm going to go develop a quick tool with a single button so it will be easy for any user to toggle this on and off for testing purposes.
-
130W without prime95? if that's true, then these are some mighty PSUs... i was able to handle furmark @ nvidia clocks for more than a few minutes on highest fan state.
-
Whoa, at a very mild GPU overclock (575/1900/1450) and brightness at 100% when I started FurMark I hit 135W within 3 seconds with the fan speeds never having kicked up. It seems there really is no limit now outside of the PSU!
Peter -
-
- I just put the screen at full LCD brightness w/ backlight KB on and my USB mouse plugged in - by the time I got to 82*C GPU temp the Power Meter was showing 134W constant draw from the wall. I'm sure I could get a shutdown at stock clocks with this PSU with this setup - and that's not even taking into account having my phone or any other USB device plugged in to recharge, or any joysticks/keyboards, external HDDs, ODD, Camera or any other part of the laptop active.
The G51Jx needs 150W at Stock ASUS clocks.
Peter -
after 15 minutes of just furmark on stock clocks, it peaked at around 99-100C GPU for me, active core at 70. i can see why they put in the throttling.
-
What would be ideal is if we can have an app that we can put in what we feel are our max temps for GPU and CPU and when we reach those temps, have it reset the MSR to what it was originally until the temps get to some other level we decide - and then reactivate the non-throttled state.
That way we can track what temps we feel are OK as well as knowing by our own testing how much power the PSU can supply safely (since some people with cooling pads, better paste, etc... know their temps can get higher than their PSU can allow).
What do you think?
Peter -
not needed. the GPU will still throttle at 105, and the CPU will never reach that high unless the fan is broken (which would be pretty obvious).
-
Peter -
still not worth it imo. power consumption will vary day-to-day depending on what you have plugged in, and that's discounting the possibility of the PSU degrading from consecutive beatings like these.
it'll be shutting down long before it hits temperatures over 90. -
I want to avoid as many system shut downs due to power as I can - I think you do too - without being able to monitor the power input all the time this is one way to avoid that imho - do you know a better one?
Thanks,
Peter -
as nice as it would be, it's not possible in any way i know.
to measure power, is to measure temps. but to get temps to extremely high levels, means getting the power extremely high. power gives before temps. catch 22.
edit: to clarify, stuff like the ODD and camera will be negligible compared to the 50-70W of the GPU. the difference might even be attributed to the power brick preferring colder weather. going simply by GPU clocks isn't enough, especially not on windows7, so that leaves only going by temperatures. unfortunately there are too many variables that can change the "safe" point, so unless the goal is to set 110W, it may as well be right back down with asus' 90-100. -
In my tests, at stock levels, the PSU was maxed out even at ASUS stock clocks with very little extra power drain.
If it were possible to set a limit to re-enable the throttling system when the GPU reached 80*C and one of the various CPU 70*C (where I know between the fan levels, GPU power use, and Turbo Boost I'm at ~130W on my PSU) then I could save myself from a system shut down and still have 5W of USB and other additional power drains factored in.
Then if I get a 150W PSU I would increase that GPU limit to 90*C and the CPU cores likewise to 80*C for example. That's why the app would need to be completely customizable in letting us put in the limits - otherwise we would be back to the ASUS/Dell 90/100/110W type of limits.
If I have a system shut down with my safe levels inserted, then I would lower those safe levels some more.
I know it's not perfect, but it would make feel safe gaming with this MSR fix you discovered - without any sort of protection I would feel I'm on pins and needles all the time I have it active unless we find the absolute max power drain and confirmed it was safe and I have a PSU that can support that.
Thanks,
Peter -
that would give a false sense of security.
if you set the limit too tight, then a harddrive spinning up after it's inactive, plugging in your phone, or loading a map each can go over. these events are unpredictable and unmeasurable.
if you set it too light, then you will go back into the state that asus has it in but with a lot more cycling. it's at the very limit with furmark as it is. the simple truth is, is that asus should have a 150W adapter with this notebook and they don't. (it should also come with a better cooling system, but that's a different topic...) -
Peter -
even if we can figure out a formula and method to estimate (within 5W, any more or less and it's for nothing) the actual consumption, event-based cuts like that will have to be made by ACPI. i don't think anyone (except maybe unclewebb) here can write a full driver to replace the one asus provides.
-
It wouldn't be possible to have an app run in the background that can monitor the temps every 5 seconds or so and have a trigger based on what you input that could either bring up a message or auto-throttle you when the temps reach the limits you set? Unclewebb already offered to make a button that can change the MSR at a click - we would be doing that, but adding the extra dimension that it would also change the MSRs when the temps reached those user defined levels.
It seems possible to me, and not too difficult imho...
We could come up with a formula it's true, taking into account all the different Cores combined with the GPU temp and adding in a ceiling and trying to take into account extra user usage (or having been given some modifier by the user based on what extra loads they anticipate) - but just allowing the user to input the temps would be straightforward and give us a starting point imho.
Peter -
the only input we have is clocks, temperatures, and device IDs.
if you go by temperature, then as i said before; the power draw will get there first every time. i can run furmark + prime95 while oc'd, but the temps never go over 87 (when the fan should spin up a notch) because the brick will shut off the instant it does. 5 seconds is nowhere near enough. reading the fan speed is beyond our reach.
if you go by clocks, those are too unstable to provide any meaning. the GPU likes to jump up on moving a window and the CPU is all over the place.
if you go by connected devices, i don't think it's worth the trouble to go through every device you have and fill in it's max usage (which is still very variable).
adding everything together assuming the potential max draw will get you the limit asus set. -
We ignore clocks (we are dealing with a case when they are the max that the user defined anyway), we ignore connected devices (we just trust the user to put the temps to a level that takes into account extra devices loaded). Users can use Kill-A-Watt meters to get precise power usage amounts from the wall and if they observe when the adapter shuts down (~137W in my case) then they can observe how low below that they are and determine what safety margin they themselves want to allow.
We are not trying to calculate based on Dev-IDs - that is all left to the user to test and determine themselves.
The system I'm suggesting would be quite simple and you can definitely set a value that is lower than 87 combined with a CPU temp that you observed with that to prevent the shut down from happening before it does, since it is only when both are running at high levels that we see the crash. We could allow people to input different temps over different combination of usage over the different cores - but even that is refining things beyond what I suggested. The GPU max temp works as well since it is well beyond the spikes you get when you move the mouse - and as I said, you could set it so that when the GPU gets to 10*C or so below the cutoff, then you re-activate the MSR de-throttle. You could also wait for the GPU to be at the max temp for 3 seconds or something before doing the MSR throttling - because the fan activity is part of the power drain.
At this moment though, all I am suggesting is a system that when GPU reaches XX and any CPU core is also at YY then de-activate the MSR de-throttle - then we can use that as a basis from which to test and then refine the system as we use it if we find improvements through testing that worth implementing.
Peter -
Toggle1FC - Version 2.0
http://www.sendspace.com/file/07d762
Here's a nice simple program to let users toggle this magic bit on and off.
I might build this into ThrottleStop but I'll wait and see how feedback goes first and how many people blow up their power adapters.
Furmark is a beast but for regular gaming I don't think you will have to worry about your power adapter shutting down.
This new tool only works on Core i CPUs so let me know if it works OK.
[Fixed/Workaround] Asus G51J(x) CPU throttling investigation
Discussion in 'ASUS Gaming Notebook Forum' started by thalanix, Jan 20, 2010.