Ok, its officialy a problem.
Since im running my PC 24/7, once a day, i get this problm.
There is a 24/7 load on the gpus, since i play eve, day and night.
The problem is. All the graphics goes to hell, and screen goes black, then recovers, then goes black, and in between the recoveries i see in the ballon the message that the driver has stopped responding and has recovered.
THe only difference is that it hasnt recovered as i must forcefully shut down the PC with the power button and restarted again before its good to go again.
It seems this problem persists with and without SLI turned ON.
Does this mean the graphic card are hardware broken ? Or does this mean there is an issue with the driver.
I must point out the fact that i had this issue with older drivers as well.
Now im at work. When im going to arrive home, i will give specific details: Driver number and post pictures with the laptop layout.
Also, is there some sort of utility that records gpu temps and makes a graph the whole day ? It would be interesting to see such a graph also.
Perhaps there is a:
1)Heat issue with the gpus (i will open the lapto whn i get home to check for dust)
2)Hardware fault with the gpus (in wich case brand new gpus would fix the problem)
3)Software issues of some sorts.
What are your recomandations ?
-
Driver version 280.26
Photos:
Is that not a good room for cooling ?
Is there any way to check is my video gpu memory is all ok, i mean, if there are no corrupted bits ? -
J.P.@XoticPC Company Representative
NVidia came out with a handy program to check for "soft errors" on GPU memory: Folding@home: NVIDIA GPU memory checker
Other troubleshooting steps include grabbing a hardware temperature monitor utility like HWMonitor, and stress testing with Furmark as well, to see if it is temperature related; or grabbing the most recent drivers from your reseller or manufacturer
That does look like more than enough space for cooling though. Have you checked the vents to make sure they're not clogged with dust at all? -
Do you have the back elevated at all? I had to elevate my notebook to get it to cool properly. The GPU makes wicked heat. Sometimes issues like that can be hard to diagnose. If you have a spare HD kicking around maybe do a clean install of Windows and see if the problem persists., Its a lot of work but at least you dont have to send the computer away to have it tested.
-
Click on your Action Center flag in the Taskbar and look up your errors. Are you seeing any that contains Locale ID: 1033 and BCCode: 117?
-
-
I have run furmark max 10 minutes, the dual core exe, temps max 84 and 78. No problem.
Stripped the laptop, took what little dust it had, and run it again for 10 minutes, temps are
I run the memtestG80 also, the bad thing is it test for only 128 or 256 of the memory, and i can't test it on all 1536.
Is there a chance that the error is happening not so frequent because of the big memory the gpu has ? Big memory, means the program will run into the bad memory block less often...
I've seen that this memtestg80 is a dos like executable. Could it be possible to run it without loading windows so that i could test the whole 1536 memory of each gpu ?
Damn, these video cards should really have a software for diagnostic, like those in star trek:
"Running lvl 5 diagnostic on fusion coils" etc..
For now, al points for a driver incompatibility somewhere.
It happened to me earlier this problem.
I had internet from a stick(3g network) and wanted to share it via a software, through the pcs wifi card, thus creating a hotspot.
Everytime that would happen the pc got a BSOD. Some sort of incompatibility. And i tried to share it with different methods with different software. Same BSOD. THus a software incompatibility.
Perhaps, later on i will upgrade to crossfire 28nm radeon 7000. See if their drivers make any more problems.
I had , and still have a packard bell with 240m geforce. Same happens. Driver has stopped responding and has recovered. ANother nvidia gpu.
AM i so lucky ? -
I have better pinpointed the cause of the so called freezes.
Sometime i get a BSOD, and in that bsod i can see the faulter.
nvlddmkm.sys
The same problem i had with gefroce 240m in the packard bell laptop.
A quick google search shows the fact that this error has been plaguing nvidia equipped pcs since the beggining of time. There is no way to fix it unlessyou change manufacturer.
I was hoping the latest (then) dx11 gpus from nvidia no longer suffered from this problem. And to think about it, dual 580m, are 800 euros more than dual 6990 and with the higher price tag you also buy the chance of having this error plaguing your system.
Way to go nvidia, i will never ever buy nvidia cards in my life.
Looks like this 460m sli is the last time i will ever use nvidia.
Is there anyone here that has any ideea what so ever what the hell is this BSOD and freeze about ?
Next update:
Radeon 7000 series crossfire. -
download Who Crashed and post the results on here as it gives one of the best dump reports.
-
Here it is:
On Sun 10/23/2011 10:20:57 GMT your computer crashed
crash dump file: C:\Windows\Minidump\102311-11434-01.dmp
This was probably caused by the following module: nvlddmkm.sys (nvlddmkm+0x170CD8)
Bugcheck code: 0x116 (0xFFFFFA800F65E4E0, 0xFFFFF8800F3F2CD8, 0x0, 0xD)
Error: VIDEO_TDR_ERROR
file path: C:\Windows\system32\drivers\nvlddmkm.sys
product: NVIDIA Windows Kernel Mode Driver, Version 280.26
company: NVIDIA Corporation
description: NVIDIA Windows Kernel Mode Driver, Version 280.26
Bug check description: This indicates that an attempt to reset the display driver and recover from a timeout failed.
A third party driver was identified as the probable root cause of this system error. It is suggested you look for an update for the following driver: nvlddmkm.sys (NVIDIA Windows Kernel Mode Driver, Version 280.26 , NVIDIA Corporation).
Google query: nvlddmkm.sys NVIDIA Corporation VIDEO_TDR_ERROR -
Has anyone else who has a clevo laptop equipped with nvidia card had this nvllddmkm.sys error before ?
-
Kingpinzero ROUND ONE,FIGHT! You Win!
That's commonly called TDR issue, and the drivers you're using are known to have this problem on some setups.
Nvidia recently fixed the tdr issue observed since 280.xx drivers in their releases starting from 285.62 whql.
Newer betas are out, 285.79, you may want to see if that fixes your problems.
Before upgrading to newer drivers, be sure you unistall them with a proper way. Use the guide linked in my signature for it.
Also if you have some sort of overclock, revert to stock settings before testing the new drivers.
If you did all the steps correctly you can exclude a software related problem and move to hardware troubleshooting.
Hope it helps. -
What does TDR stand for ?
I now have 285.62, and i remember correctly, maybe i had this problem with htese drivers also, but possibly not, i cannot say for sure.
I have never overcloked the GPUs.
Maybe, just maybe, its a heat related issue, since i stripped them a few times untill now, and the GPU paste wore off. Maybe im better off buying a new cooling paste and repaste the gpus and the cpu.
Also, what can i do if a fan simply breaks down ? Can i find replacement parts ?
Or when you buy a new gpu kit, you get the fans too ? -
Kingpinzero ROUND ONE,FIGHT! You Win!
Temperatures are the first to be checked. But if your temp in furmark is below 90c you're fine IMHO.
If fans breaks you need to ask to some spare resellers or eBay to get a replacement, usually the latter has alot of them.
A new gpu kit will come only with the gpus itself an nothing more I am afraid. -
Kingpinzero ROUND ONE,FIGHT! You Win!
Anyway replacing TC is always a good thing, repaste usually improves temps alot.
About TDR, it's the acronym of TimeOut Detection Recovery.
Basically when the card soft crashes windows tries to recover it without rebooting the machine at stock clocks.
With nvidia 280.xx the TDR timing on driver side was a bit bugged and messed up, which leads to strange and no reason TDRs out of nowhere.
With 285.62/79 they reverted back to old 275.xx version which is stable compared to the newer itineration.
That's should teach them the sayin "if ain't broke, don't fix it".
But also "fix it yourself or die trying" -
Thanks for the info in the signature regarding driver spweeper. I never knew you had to go so deep (safe mode) to remove old driver files.
Allthough i knew of this trick, i never thought i needed to do it, esspecialy since i saw , in the newer driver, the clean install option.
But now i know.
Hopefully i wont have any more problems.
Also another question, Why would the card Soft (Does that mean software ?) crash ?
Is it because there is a hardware fault, or temp, or instable freq, or software incompatibility, or what ? -
Kingpinzero ROUND ONE,FIGHT! You Win!
Thats why i told you to clean install the new drivers and to test the cards at stock speeds, with no overclock.
If the laptop "survives" to these tests, probably its not an hardware related problem.
As for the TDR issue and drivers like i reported, it was partially an nvidia fault. When clocks or voltages fluctuates beyond a point when its no more in control, or when data corrupts on gpu end, the driver crashes.
In this small span of time, Windows tries to query the gpu card and the driver: when it doesnt receive any response, it invokes a TDR, thus resetting both gpu and driver back to stock settings.
What nvidia messed up was the fact that their algorythm basically created a driver crash out of nowhere even when doing basic tasks such opening an explorer window, or browsing.
Even a small increase in clocks could create a TDR. In short terms their new implementation was "extremely" sensitive, that leaded to have TDRs out of the blue, even on newer hardwares or known-to-be-working peripherals.
Ive experienced them alot with my previous laptop which curiosly had gtx460m, single. But ive experienced them only when overclocking.
Basically i wasnt able to reach certian clock speeds considered as "standard" for that card, i had to stick with a mere weak overclock.
Driver after driver i came across their official statement after watching carefully few forums, specially guru3d, where peoples with desktop gtx4xx had these TDR with no reason and NO overclock.
Lately they stated the issue could be fixed reverting their system to the likes of 275.xx driver branch. And they fixed it starting from 285.62 WHQL afaik.
But as per the drivers, an incorrect installation messed up with old files and registry could lead to instability of all the sorts.
So before saying that the hardware could be the problem, try the steps ive adviced.
Also keep an eye on your temps...remember doesnt matter how many times you dismantle the laptop heatsinks and fans, EACH time you need to reapply the TP PROPRELY.
You cant reaseat everything back and hope it works. It does, but it could burn everything. -
Each time i opened it up, with the coolers and all, i never bothered to reapplied thermal paste, because i had none at my disposal. Basically, the paste is teh same it was when the laptop was new, and it got through multiples dissaemblies, i think 4 or 5.
I did send the laptop back to the sellers because i had a l lcd cable issue i could not pinpoint myself, perhaps they reapplied thermal paste but i dont know.
Now, the question arrises:
Do i have to reapply thermal paste, each time i take the coolers and the cards out ?
What happens if i dont ?
Does that mean i need to have a stack of thermal paste home ?
It seems to me best thing to do now is reapply thermal paste, and leave them without taing them out for no reason.
The reason i dissasemble them is because they gather dust and i want to take the dust out.
How you put it with the TDR, it seems to me this is an nvidia related messing up.
Thats why the next upgrade is gonna be some high end 100W 28nm radeon part, times 2.
I hope the radeons dont suffer from this out of nowehere TDR.
Bassicaly this bull of a story will convince me to switch vendors.
Not to mention, i ALSO have this problem on a packard bell geforce 240m laptop. I need to keep it in lower than normal freq if i want to play anything on it, and even so, the infamous nvlddmkm.sys error pops up.
I sure hope the RADEONS dont have this problem.
Its by by nvidia from my part. -
Kingpinzero ROUND ONE,FIGHT! You Win!
With a syring of TC like ICD24 or MX-4 youre good to go for 100 applies, if not more, you need a small pea sized amount on the center of the core, nothing much.
As for what happens when youre not repasting each time you disassemble the heatsinks and fans, well, its not hard to understand.
The TC tends to dread up when removed, not to mention that it likely has spreaded around the surface leaving some parts not covered with the grease.
Those parts dont fill up when you reseat the heatsink, also since air will probably be stuck in it, it will contribute to further dread the TC.
When the heatsink without TC touches the core, which is also without TC, bad things happen. Consider the high temperatures they reach, both metals will burn, and at somepoint, they will break something.
Its unfortunate to read about your problem with nvidia, but i hope your habit to constantly disassemble fans and heatsink doesnt apply to all your laptops.
Honestly its not a good thing. If you switch to AMD and keep on doing these tasks probably your new gpu will not last longer either, just saying.
Fans cleaning should be done once per year imho, if you take good care of your laptop. Repasting maybe once per year, or once every two years. In most clevo systems you can disassemble the fans without touching the heatsink, dunno how it works with your system thought. -
Thanks for all the information you gave me.
I also want to mention that i encountered several more different bsods, however since i repasted they seems to be not so frequent.
The thing is, attached to my laptop are:
1)Nameless powered usb 2.0 hub plugged in to the usb 3.0 port, with 4 ports in wich there is
1a)Logitech g13
1b)Logitech z cinema
1c)+1d)Asus 3g modem T500 wich uses a sim to give me internet.
2)Logitech mx 1100 in the oter usb 3.0 port
3)A dlink router plugged into the network port, because the internet connection from the asus 3g card is shared to the network port that connects to the router and so the router can give internet to my wifes laptop or any oter wifi connected device for that matter.
Not to mention that i do have a lots of programs installed in windows.
Perhaps somewhere there is an incompatibility that i cannot trace.
One more interesting thing i must point out.
Before i managed to pull off the router trick to share the internet connection i tried to share the internet connection through the laptops own wifi link, using a purchased ubs 3g modem (medion Mobile germany, with the sim inside this modem)
THe attepmt was made using a software program that can create hotspots using own wifi and wich can share internet from wherever you have it, to this hotspot created.
The connection to internet was created using medions mobile own connection program, or using asus t500 modem
Note that every time i tried this i got also a bsod.
Now since i am using the asus t500 modem, i can connect to internet throgh its own software, but i also noted, that once the connection is created, windos rememebers it, and can dial it on its own, without the need to start the asus program.
Somewhere in this mess has to be an incompatibility, that i cannot trace.
Is there any good program that you can reccomend me to clean windows of files that are useless and have piled up through usage of the system, perhaps something like tune up utilities or so ?
PS, is there a way to subscribe or something to this thread so that i can easily find it in the future ? -
Anthony@MALIBAL Company Representative
Resplendence Software - WhoCrashed, automatic crash dump analyzer
To clean up the machine, I'd just say either use driversweeper or revo uninstaller to remove drivers/programs, or just do a fresh Windows install to be safe.
Revo Uninstaller Pro - Uninstall Software, Remove Programs easily
Phyxion.net - Driver Sweeper
As for subscribing to the thread, there are a couple of options. Depending on your settings, you should be auto-subscribed just by posting. You can also change the "additional options" at the bottom of your window when replying and change the notification type to email notifications, or you can just scroll to the top of the page and go to Thread Tools > Subscribe. -
Here is a video of the error happening.
Clevo x7200 460m sli TDR Error - YouTube
Also i got this message, after the error was trigger, on a later date, after switching from "economy mode" to "normal mode" in tuneup utilities 2012.
Now i understand what these TDR error are all about, the question is, why do they happen ?
And another question, if i will change the gpus with a pair of radeon 7xxx, will i get rid of the TDR error, or is this TDR error bound to happen on amd gpus also ?
To me it seems this TDR error are DRIVER related, AND GPU related, meaning (so at least i believe) that if i change GPU, i may escape these errors that are plaguing my GPUs. Please correct me if im wrong.
Driver problem with 460m sli
Discussion in 'Sager and Clevo' started by Bytales, Oct 14, 2011.