O.K. test @ 1024x768 passed, but test @ 1680x1050 "failed" at 14 min. mark. The interesting thing was the program didn't crashed. The screen froze for about 20 sec, turned blank (white), then Vista desktop popped up and I got a message saying graphic driver stopped responding. After about a min, the test resumed for about 2 min, and then the "FAIL" message popped up. I guess I'll try again later to see if I can duplicate the problem.
EDIT: I kind of agree with Renee that this could be a thermal issue.
-
alright I ran it again using same settings but with load at 100 ( didnt have to change the max refresh setting as that was the default ) for 15 mins and it passed. GPU temp was 71C max.
Running 1680x1050 now.. -
I just ran the test again and this time it crashed after 5 minutes. The crash is just like you describe, white screen, then message that the driver stopped responding. This is the error I usually get in games too.
SOMETIMES I get a blue screen of death crash that says NMI Parity Error.
All 3 of these types of crashes have occurred with the loaner T61P as well as my original T61P, but I have never been able to reproduce ANY of these errors when running with only 1 stick of RAM in the machine. -
About what FPS were you getting?
I get around 70 or so.
I remember reading someone on the thinkpads.com forum who felt very strongly that this was a thermal issue.
He said he was convinced that when running the RAM in dual channel mode, the north bridge runs hotter. The north bridge apparently has no heatsink whatsoever. It isn't a problem when in single channel mode but when running dual channel the north bridge runs hot enough to overheat. Maybe this person was right?
I need to pick up one of those laser thermometers and take some temperature readings on the northbridge chip when it crashes and see how bad they are... -
hellbore do you run a temp monitor for cpu and gpu? And report your temps if you can.
-
Morphy, what's your avg fps? I'm only getting about 48-51 fps. You mentioned earlier that you got as high as 70? We have the same processor and similar RAM (4GB vs 3GB), HD (7200 rpm). The differences are that you're running 64-bit OS.
EDIT: Did you do a clean install? I suspect since mine is factory installed there are whole bunch of bloatwares running in the background. Right now I'm simply testing my T61p to see if there are any probelm before I do a clean install. -
I get about 50+ on 1680x1050..not sure about 1024...test was too boring to watch...should have made a few planes crash into the side of a mountain or something and make one huge fireball
The 70 was the GPU temp in Celsius.
edit actually it dips under 50 quite a bit into the 40s..soyou're in the right range, I wouldnt be too concerned. (typin this on my desktop ) -
I don't have a temperature monitor program, is there one you can reccommend that works well on the T61P?
galin, are you talking about FPS on 1024x768x32? On that resolution I get average about 70 fps. Did you make sure duty cycle was 100% and when you set the resolution for the test, that you set it on "max refresh rate"?
Just a thought... If this turns out to be a heat related issue, I wonder if Lenovo would allow customers to add a heatsink or if that would risk voiding the warranty... -
I have one Hellbore, but I need to test it on a t61. It measures Cpu and Ambient... not gpu. I suspect your problems are coming from the GPU.
-
At this point I'm more interested in knowing what the north bridge temperature is... that is directly related to the RAM and it makes perfect sense that this chip would get hotter with 2 channels vs. 1.... As far as I know, the north bridge has no heatsink whatsoever...unless I am mistaken... -
Another temp monitor that works well is Rivatuner's. The nice thing about that is you can set it up so that the temps show up in-game. Much like fraps but with temps. it can also display fps. Gophn made a good post on setting up rivatuner here for temp monitoring. The only caveat is it doesnt support cpu monitoring, I think anyway. Haven't really looked into it.
Other good monitoring utils I've seen mentioned are Speedfan (free) but I've only used it on my desktop. -
Does anyone know what the round yellow part is in this picture? (ignore the red circle)
http://jerry.cleedo.com/t61guts4.jpg
And, is the north bridge the chip you can see peeking out from under the yellow part?
If so, it shouldn't be hard to check up on the temperature, just would need to lift the keyboard. -
btw the 1680x1050 test passed
-
It may be more complex than that. In terms of air flow .... is the extra memory interrupting?
Or in terms of system throughput at reduced memory.... is the system just not hitting the card as hard with reduced memory? In other words, is the reduced memory slowing things down enough so the GPU doesn't heat up as much and the same would be true for the Northbridge.
All I have to say is thank goodness, I'm not a gamer. -
The yellow part looks an awful lot like a lithium bios battery for NVRAM.
-
as for the NB chip since it probably doesnt have any cooling on it , I doubt the temps would vary that much from one system to another. If it had a heatsink on it then there's a possibility it wasn't seated properly during the assembly. But since like you said it's uncooled I doubt that's your problem.
-
Large desktop systems go to some lengths to cool Northbridges.
-
Morphy said: "But since like you said it's uncooled I doubt that's your problem."
That seems backwards to me, you're saying since there's nothing to cool the northbridge, you don't think it could be overheating?
I would say, since there is no heatsink on the northbridge, that makes me think it is MORE likely to overheat. -
hehe true that...sometimes they go overboard with it ie. the Gigabyte P35-DQ6. The heatsinks on that look like a mini cityscape.
-
-
but isn't that what Hellbore is saying?
-
is there are software that will allow you to "underclock" your ram or to freeze dual-channeling on it? I'm glad you guys are slowly making progress to find a solution or at least pinpoint the problem
-
As far as I know there isn't a way to turn off dual channel mode manually. Not that I know of anyway. -
how about swapping in 2 match 1gb sticks and allowing them to run in dual channel? see if the same problem pops up?
-
" Maybe dual channel mode is the issue here."
That would be consistent with what I've been suggesting about throughput heating things up. If you can't develop the throughput, you won't tax the gpu enough to really heat it up.
Earlier you had an error in the driver. When hardware is flakly from heat you'll see all kinds of indeterminant results.
Shouldn't we send Lenovo a bill for this? Perhaps they already know it? -
Later tonight I'm going to also try Morphy's ram setup of 1gb + 2gb and see if mine stops crashing.
First I want to get a thermometer though, I want to take temperature readings while doing this test. -
He did give me some interesting information when he wasn't telling me to f*** off and not to be a sh*thead f*g etc.
-
I dont have 2x1gb sticks to test.
But if its a design flaw in the cooling system like you suggest than there would be a hell lot of thinkpads ending up at the depot. I just ran back to back to back 3dm06 earlier and if there was a heat issue it would have manifested itself in shutdowns.
I've had this machine for over a month running Bioshock, Orange Box, GRAW2, UT 3 demo and not ONCE did I get a bsod or freeze.
A problem with dual channel mode is more plausible or probably even a defective part. -
FWIW, my ram setup was 2Gb stick in the topmost slot and the 1Gb at the bottom. Not sure why I did it that way or if it'd made any difference..
-
-
Another possible cause that would lead it to become a heat issue is improper installation. I've seen new thinkpads that overheat for no apparent reason but after its sent back for a planar swap the temps are back to normal. But that doesnt mean a design flaw - the culprit there is bad QC and/or poor assembly which can happen.
-
but if you only had one stick of DDR2 ram in there, wouldn't it already be running in dual-channel? maybe it's only when you open up the pipelines with dual-channel AND you have high amounts of memory info passing through?
has anyone tried to limit the memory the gpu barrows from the ram? -
What part of the motherboard would only overheat when running 4gb of RAM but not 2gb? Why would that be the case? Wouldn't it only make sense for the north bridge to run hotter with 4gb vs. 2gb? Or do you think the CPU or GPU could run hotter with 4gb vs. 2gb?
I need to take some temperature measurements with 4gb vs. 2gb, we need to know, when running the exact same 3d graphics test:
1. Does the CPU run hotter with 4gb than with 2gb
2. Does the GPU run hotter with 4gb than with 2gb
3. Does the north bridge run hotter with 4gb than with 2gb
Then maybe we can have some clue what is going on... -
-
Not sure why dual channel was brought in in relation to it or why that has much to do with heat. Of course an extra stick will add more heat but not to the extreme that it would cause bsods when one stick doesn't. Dual channel is entirely different matter. This isnt the first time I've seen dual channel mentioned in crashes.
Also if it was a heat issue it would have crashed on you with that 1 2Gb stick since like you say you were pushing it really hard already. -
Remember I'm runing 2 sticks too. Unless someone can show me that dual channel mode Ram adds that much more significant heat to the point that instability issues arise. More likely its not thermal but the bios not implementing dual channel properly.
-
Another forum user (I forget his name) was explaining that running dual channel mode causes the north bridge to run A LOT hotter than single channel mode.
Maybe you're right, then again maybe HE is right.
The temperature readings will tell us for sure.
In your case, yes you have 2 sticks of RAM but they are different sizes so obviously they aren't being used in dual channel mode. -
-
and for Lenovo to recognise it as a problem first if indeed thats the problem..
-
Well my understanding is that Lenovo is looking into this now and testing to try and reproduce the problems we have been having, so that is a step in the right direction.
I just thought maybe the north bridge might run hotter in dual channel mode because it would be physically accessing twice as much memory per clock cycle as in single channel mode (even if you have 2 sticks). -
One thing you didn't catch is that I had the same problem WITH 1 GB in bottom and 2 GB in the top.
Now, I had the game on for a few hours just messing around (ie NOT pushing the system) and it didn't give me the BSOD...just hang up every once in a while. When I was raiding (ie pushing the system) I did get the BSOD - frequently.
What was the difference? Temperature! The laptop was very warm to the touch on the bottom during all the BSODs.
Granted, I got the BSOD the next morning when the laptop was cool. -
When a Northbridge chip overheats its usually due to excessive voltage applied which is often the case when the system is overclocked, voltage is increased to support the higher clocks. Which is why heatsinks are there on a typical desktop Northbridge chip as they're to allow for overclocking headroom. And its not uncommon on non-enthusiasts boards to often find non present active or even passive cooling on the Northbridge. An extra channel isnt going to have nearly the same impact as an overclocked NB thus the NB has no need for passive cooling if its not designed to support overclocking. The same reason why we dont see heat spreaders on Sodimms at least not for the current speeds they are designed for.
Granted I could be totally wrong but its hard to know for sure since NB's dont come equip with thermal diodes. Instead they're usually on the more critical components heatwise ie cpu,gpu,hdd and there's a reason why - If there's a heat issue those sub systems are going to be affected first ( in non overclock systems) -
OK I took some temperature measurements.
These measurements were taken with the keyboard removed.
Interestingly, the 3D graphics test that usually fails with 4 gigs of RAM, passed 3 times in a row with the keyboard removed. That would seem to indicate a heat issue right? By removing the keyboard and exposing the parts, better access to ambient air could provide better cooling right?
However, the temperatures noted didn't seem significantly different with 2 2gb sticks vs. 1 2gb stick. This is just a rudimentary observation, my test was very basic. My gut tells me there isn't a significant difference.
I powered on the machine and ran the test once just to warm things up, then after that I started taking measurements. I took the temperature readings for each component 3 times for each memory configuration then averaged them. The temperature readings were taken during the 3D grapics test at 5 minutes, 10 minutes, and 15 minutes into the test.
All temperatures are in celsius. Temperatures were measured using a non-contact laser thermometer. Next I'll try it with the software that measures temperature using the system's built-in temperature measurement capabilities.
Two 2gb sticks of RAM:
CPU: 68
GPU: 70
Northbridge: 72
Ambient temperature: 27
One 2gb stick of RAM:
CPU: 67
GPU: 69
Northbridge: 71
Ambient temperature: 28
So to me, these temperatures don't seem significantly higher with 2 sticks vs. 1.
What do you guys think about these temperatures? Do they fall within safe operating ranges for these parts?
I also measured most of the larger and more prominent chips on the board to see if I could notice a dramatic heat difference in any area of the motherboard but I wasn't able to find any.
It will be interesting to see if the CPU and GPU temperatures vary much with the keyboard on, I won't be able to observe the northbridge temperature though.
One thing interesting I did notice: There appears to be a silicone pad that sits atop the GPU heatsink, whose purpose I assume is to conduct heat from the GPU into the metal bottom of the keyboard in order to help cool the GPU. This pad is shaped to fit the top of the GPU heatsink, but when I removed the keyboard, the pad was shifted to the side so that it was only providing contact to about half of the GPU heatsink top. I re-applied it so that it is lined up with the heatsink so it gets full contact. I wonder if this could make any difference.
If you look at the following picture, you'll see that this silicone pad is also not perfectly aligned on the T61P pictured:
http://jerry.cleedo.com/t61guts2.jpg
The one in the picture isn't off by much though.
I wonder if the alignment of this pad could possibly make any difference, especially since the one in the picture is only off alignment by I dunno, probably less than a centimeter...
I wonder if the pads are supposed to be offset like this? I can't imagine why, it seems like it would be best to align them perfectly so they cover the top of the heatsink and provide maximum contact with the keyboard's metal bottom. I kind of doubt this would make enough difference to cause overheating though. -
With the new C2D Processors, the 8XXX Series (Or equivalent Quadro) 70C is perfectly normal. In fact its often expected in Laptop's believe it or not.
What would concern me is how hot it gets with the keyboard on, I am tempted to agree with you that the Northbridge is what is causing the problems. The CPU, RAM & GPU should be able to handle up to 80~90 C without having any issues, but anything above 90C would be getting on the dangerous side. (Heat problem and lockup wise, I'm pretty sure the newer components are designed to handle a tad bit more heat than that) I know for a fact that Nvidia rates the 8XXX Series able to handle 100 C~ without endangering the components, though I would never dream of letting the temperature get that high.
Do you know if the Northbridge has its own heatsink or fan on the ThinkPad T61P? There is a reason a majority of desktop motherboards have them, they tend to have a low tolerance for excessive temperatures and are much more sensitive than other components. (In my experience)
It definitely doesn't help that Intel Processors tend to strain the Northbridge significantly more than AMD Processors. -
Another thing I messed up about my test was I didn't cover up the northbridge with the little yellow lithium battery. Normally there is a yellow shrink-wrapped lithium battery that sits right atop the north bridge, touching it. It is held in place with tape. Do you think this would make the north bridge get hotter?
The modder in me would love to try installing a heatsink on the north bridge, there is room if you relocate the yellow battery. All you would need is a longer battery wire.
Unfortunately I really can't do this, I'm pretty sure Lenovo would not be OK with it. This is just a loaner laptop, and even if it belonged to me, I'm pretty sure attaching a heatsink to the north bridge would somehow void my warrantyAnd I paid for 3 years of coverage...
-
Are you kidding me, a lithium battery shrink wrapped on top of the Northbridge?!?!?
Sounds like they decided to tape a spare motherboard battery inside the case. (Not a bad idea, but you think they could have put it somewhere else...)It shouldn't hurt anything though as long as its shrink wrapped, though it might cause the temperature to increase. But if it did, it would be marginal at best 1-3 C~. (Though I wouldn't bet on it)
As for the Northbridge, I'd suggest getting some mini copper heatsinks from Newegg.com or a similar store, I have them on my desktop's Asus Motherboard to cover up several places. (Not the Northbridge though, Asus was smart enough to give it a heatsink with a small mounted fan) They are cheap and should be risk free, but be warned they are applied with thermal adhesive and would void your warranty. (They come with the thermal adhesive already on them)
The MAXIMUM safe operating temperature for a Northbridge hovers around 60-65C. However, anything around 65C is pushing it and could attribute to issues with lockups like you are experiencing. (Especially with more RAM present as the NB has to work harder) -
Never mind.....
-
sounds like a heat issue to me.... i think you're running at the upper end of the accepted thermal ranges - and pushing it 1 or 2 C might be all it's taking to cause the Northbridge to auto-shut down to save itself. there's a threshold hard-wired in that shuts it and the computer down to protect itself.
I think any number of small things are causing this slight increase to over the thermal threshold. Putting in more ram, dual-channeling, etc. etc. etc - that's why the problem comes up at seemingly inconsistent times.
i say relocate that battery and put on some kind of heatsink if you're not worried about the warranty (or do it very cleanly so you could revert it in case you need to).
I'm interested to see if this is a design flaw in.
Here is what I would think should be your next test:
With the keyboard off and a big fan or two (i mean like regular fans) blowing right into the Nb / internals, try to run the most demanding test with 4gb in there. Let it run and run and run and see if it fails.
If you've adequately cooled the nb/internals with the fan (i'm guessing it should drop at least 5 or 6 C) and the tests never fail, then I say you've got a heat issue for sure. Where the heat is coming from / the possible solution could be next. You can then pinpoint the problem and once and for all rule out bios or driver compatibility with 4gb. -
http://www.thinkwiki.org/wiki/Thermal_Sensors
http://www.thinkwiki.org/wiki/Talk:Thermal_Sensors
The only tools that I know of that can read these sensors are either ThinkPad Fan Control:
http://sourceforge.net/projects/tp4xfancontrol/
Or Notebook Hardware Control (requires some configuration files for reading the TP sensors):
http://www.pbus-167.com/
My problem is that none of them support Vista 64!
For a while I wanted to see if the temperature was the problem that causes the shutdowns. I tested with SpeedFan before, but SF only reports the internal temperature of the CPU, GPU and HD and none of the results show anything outstanding. Less then a week ago I decided to modify the original source code of TP fan control, so it would work on Vista 64, and I can gladly report that I got it to work on my system. I replace WinIO dependacy with WinRing0, for the 64 bit support and added some temperature logging. I must say that I did all that without C/C++ skills what so ever, but with my perl skills and the help of Google.
Now, a couple a days ago I ran some tests to see if it is indeed a temperature problem:
Test1: 1 x 2GB, 10 min without a crash (shutdown).
Test2: 2 x 2GB, bios controlled fan, ended with shutdown after a couple of min.
Test3:2 x 2GB, fan speed set to max (7), ended with shutdown after about 3min. I tried this test because I noticed that @ max setting the fan runs faster by 200-300 RPM compared to the bios control.
Here are the results:
Test 1
After 10m
========
CPU: 76°C
APS: 56°C
PCM: 35°C
GPU: 74°C
BAT: 50°C
BAT: 34°C
BUS: 38°C
PCI: 60°C
PWR: 60°C
Test 2
Last reading
==========
CPU: 72°C
APS: 51°C
PCM: 33°C
GPU: 70°C
BAT: 50°C
BAT: 31°C
BUS: 36°C
PCI: 55°C
PWR: 56 °C
Test 3
Last reading
==========
CPU: 69°C
APS: 50°C
PCM: 33°C
GPU: 68°C
BAT: 50°C
BAT: 32°C
BUS: 36°C
PCI: 54°C
PWR: 54°C
As you can see that almost all sensors reported higher temperature reading when the laptop ran longer without a crash (Test 1). As to the tests that ended up with a shutdown, it seemed like that thay ran longer with the fan control set to max in manual mode.
Unfortunately I don’t have the exact location of all the sensors on the T61p, but relying on the T43 design the “BUS” sensor is the closest one to the Northbridge.
You guy have being mentioning the Northbridge as a probable cause to the problem and you might be in the right direction. For a while I have been wondering about Intel specification page showing that the PM965 supports 4GB @ 533MHz and not 667MHz (Third line: 2 SO-DIMMs / up to 4GB Max System Memory @ 533 MHz), see: http://compare.intel.com/PCC/showchart.aspx?mmID=28116&familyID=7&culture=en-US
On my laptop, the 4GB memory is running @ 667MHz, is the Northbridge overclocked?
If the is Northbridge overclocked, then it might explain the shutdowns that I am seeing with 4GB and not with any other memory configuration (1 x 2GB, 2x512MB and 2GB+512MB) that I tried. My problem with that theory is that:
1. The PM965 datasheet doesn’t seem to show the 4GB @ 533MHz limitation, so is there an error on Intel's web site?
2. For some users a planer card replacement fixed their problem. Did it really fix their problem? Is there a newer mobo rev? Better QC installation? What is the memory clock speed?
3. Some users report of having crashes with 3GB (2+1). Hmmm?
I have a problem with some of the crashes that some users might be reporting here. Different crash type might indicate different problems:
1. Software Crash (Application exit) –Would most likely indicate on a software problem, such as application errors, driver errors and rarely hardware.
2. Shutdown Crash (Instant shutdown) – This is mostly a hardware issue related to either temperature or power problem or even a HW errata.
3. BSOD – This can be both hardware and software issues, and it can lead us to issues with things like bad memory, defective graphic card, bad drivers etc… another thing to keep in mind is that there are a lot of known BSOD issues with Nvidia drivers for Vista!
So, can we consider the BSOD reports as part of the problem? Well maybe, but IMO it brings too much factors in to the equation. This leads me to ask the following:
Did any one have a shutdown crash with a 3GB (2+1) configuration? -
I think you're confusing the Southbridge for the Northbridge. The Northbridge sits directly under the GPU and is covered by the copper plate and heat pipe.
http://blog.mobile01.com/merlin/article/981
Scroll down to view pictures of the thermal system removed for clarification.
Strange things happening at the depot
Discussion in 'Lenovo' started by Hellbore, Oct 22, 2007.