Does this always point to a faulty card? Or can it be a driver issue or something? A windows 1903 issue, perhaps?
I've tried just about everything to troubleshoot it. It's not temps and it's not overclocking. SFC and DISM bring back nothing. All memory checks, stress tests, etc come back fine, too. I think it may be a driver issue somewhere, but I can't imagine what driver it is because I've tried fresh install of windows and that doesn't do anything to change the situation. Driver verifier of 3rd party drivers (non microsoft ones) has not caused any crashes, but microsoft-only verification of drivers has caused BSOD loops on startup (when loading into an account) that forced me to reset/turn off the verifier in safemode (this was especially bad after a particular windows update). I can't fathom that a laptop with an RTX 2060 that's only ~2 months old is already failing... and the nature of the this issue has changed from previous times.
I noticed something peculiar: the longer the PC is on, the more likely it is to fail. For example, can play a game for, say, 2-3 hours, and then it'll crash. It'll never happen before that 2-3 hour mark if the PC is rebooted freshly. Basically no early crashes. But that's not the case for another game. Victor Vran has a map where you choose a destination. When you open and close it, it'll crash. At some point it was so bad (I don't know how it got worse) that the moment you open the map, a guaranteed crash would happen immediately (video TDR, wait ~30-40 seconds, then it'll say windows is shutting down with fans suddenly blasting, and then the whole thing unexpectedly shuts down). If an attempt to manually shut down the PC happens while this crash takes place during the calm period before the system auto-shuts off, it will go to a BSOD that shows ntoskrnl as the driver that's responsible for the crash instead of nvlddmkm.sys when it BSODs (instead of the usual 30-40 second calm period where it automatically attempts to shut down). Sometimes it even does initiate shutdown procedure and goes to a blue screen to shut down, but it never finishes--it just turns off completely.
Very confused... anyone have any idea? :\
-
What temperatures are you getting in full load and idle?
-
DaMafiaGamer Switching laptops forever!
Please try downloading and using nvidiainspector, offset the core in the negative by around 100 to 150mhz and try running your games. Let me know how that goes
Offsetting the core in the negative means that it needs less vcore to power the gpu which leads to less wattage which in turn stresses the vrms less.
The fact that there is no artifcating of any sort shows that this is INDEED A VOLTAGE ISSUE!
Clevo fix your vrm schematics! -
Note when it refreshes, if I have not yet saved the settings in adjustments for offsets, it will refresh those back to 0 the next time it gets info from the card (e.g. sensor).
Here's what happens when I reduce the clocks. When it flashes, I happened to capture it. Check the sensor data now (I had to capture it fast since it goes away also equally fast):
Last edited: Sep 3, 2019 -
DaMafiaGamer Switching laptops forever!
-
But here's something I found out: if I close nvidia inspector, the overclocking resets. Maybe something to do with the bios's speed scaling being enabled? I don't really know. But yeah, everything goes back to stock after applying clocks and voltages to the card if I close the inspector. Seems like it doesn't stick.
I opened Furmark just now. The sensors are stably reading the card. No flashing, no flickering.
Here's with Furmark running:
Here's with Furmark with -150MHz on the core clock:
And ~3 minutes in:
Furmark closed, BUT the application itself is still open. Still with -150MHz on the core clock.
What could possibly be the problem...? Is there a way to permanently underclock the card in another way? And again I don't understand how the card is reaching such high clock values when I thought it was supposed to be downclocked for laptops.Last edited: Sep 3, 2019 -
DaMafiaGamer Switching laptops forever!
-
This is with Furmark @ 5 mins runtime and with nvidia inspector restarted. Seems that it is generally not going above ~1500 MHz, which is still more than what the card is rated for (especially with boost).
This is a really weird problem... really weird.Last edited: Sep 3, 2019 -
DaMafiaGamer Switching laptops forever!
-
I wish I had an answer. Something driver-related is my guess, but trying to single that out is impossible. It could simply be Microsoft fked up or something. I get Intel HD driver errors saying that The description for Event ID 0 from source igfxCUIService2.0.0.0 cannot be found. I've reinstalled the GPU driver 100 times. Doesn't help. Tried all versions. This is a Clevo P970ED, one of the newest versions of Clevo computers, and so I can only guess why the multitude of issues.
I don't know what's going on X_X...I've had driver issues from the beginning. I've gotten rid of most of them at this point. I also had Windows 10 upgrade 1903 fail on me and BSOD on me with updates with ntoskrnl driver being the culprit. And yet all tests, memory, SSD, HDD, etc, pass with no problems.
There is another funky thing: I have a custom fan profile currently. After I updated my drivers (this was not a problem before the updates of windows and other drivers), my fans would shoot to the sky only on the GPU at start-up and stay that way if on performance profile with the CCC. If on entertainment profile, this doesn't happen (CCC 3.0). But yeah, idk what's going on anymore. Computer doesn't BSOD or crash or anything anymore unless dealing with Win10 updates. Then it may. But that's been rare and only happened a couple of times in the last month of using it. The other ~30 crashes were all related to the GPU.
Next time I boot, I am going to undo the overclocking speed stepping technology that Clevo has enabled in the BIOS. But that's after I try the Nvidia Inspector underclock. I used to have MSI Afterburner, but that was pretty useless for this card other than to change the clock speeds (voltages are locked for RTX laptop cards).Last edited: Sep 3, 2019 -
A negative core offset is more like an overvolt than an underclock, but what it actually does to the card depends on the situation with the load and the power limit. It can either force a lower clock at the same voltage (under power limit, which is under furmark) or the same clock at a higher voltage. If the card is experiencing instability due to transient voltage drops this *may* provide extra stability, but the gpu vrm could also be faulty and it'll have no effect either way.
Conversely, a +ve core offset mostly acts like an undervolt - allowing the card to boost to a higher Mhz or at a lower voltage or a bit of both - should induce more crashing more often by eating into the stability tolerance zone.
Locking the card to a specific voltage/frequency (ctrl+L in the afterburner curve editor window) may be helpful to test stability. It doesn't override power limits, which is what the mobile cards spend almost all their time under, the core will still drop clocks, left along the boost curve.
But any problem that comes on only after multiple hours, and can't be specifically induced, is a giant pain in the backside to troubleshoot. Hopefully you have warranty and a service line to call to help you with it as it does sound like a hardware issue.
As for clocks under furmark, I'm not seeing anything other than normal behaviour, the core runs the fastest clock possible under the power limit. Furmark is a "heavier" and less variable load than anything else the core will ever run, so it is operating constantly under power limit condition, and stabilises at a lower overall clock (and lower voltage tied to that clock) than it would during a game load.Last edited: Sep 3, 2019 -
https://us.forums.blizzard.com/en/overwatch/t/render-device-lost-fix-for-rtx/263106/472
Maybe really a BIOS problem. I will need to explore it. The game I played, btw, is Vampyr. I play it on medium settings and at 1440x900 resolution). I even tried playing Victor Vran as I said, and that is played in windowed mode (I think 1280x720 resolution out of the capable 1080x1920 desktop resolution). There's no way a game like that which is also on med-high settings that came out like 4-5 years ago should be causing the GPU to overwork itself and be "lost"...and it used to happen immediately when I opened the map in the game as I mentioned. There was a time that I thought Nvidia Experience app was the problem. After uninstalling it, the map crash stopped. I reinstalled drivers only. Now the problem only happens after playing the game for a longer time (2+ hours, generally). Map no longer crashes on load. Baffling, I'd say that Nvidia Experience was responsible for it. I've since reinstalled the Experience software. No issues with map crashing. How can it be this capricious?
Edit 1:
I've disabled performance scaling in BIOS. Let's see what happens... I'm willing to try anything at this point, lol, but will test later (going to work now).
Lastly:
After rebooting, I get this when trying to access the NVidia Control Center:
But then when I right clicked again, to test if I'd get the same issue, the Nvidia Control Center started with no errors! So confusing.
Edit 2:
With Furmark, these are the results are ~3 mins (with GPU Scaling OFF in BIOS). Seems more "stable" with the clocks (no OC offset is used here).
Edit 3:
Apparently this also happened, but I had not noticed it (I think it happened when I tried to close it, so it decided to crash instead... just guessing):
Last edited: Sep 3, 2019 -
What version of BIOS and EC do you have?
Reboot your laptop and intermitently press F2 to enter BIOS and you will see said info in the primary screen that pops up -
I hope the numbers are the same since I short-cutted this by going into MSINFO32 to get this info: BIOS is INSYDE CORP. 1.07.03P dated 1/15/2019 and EC version is 7.04.Last edited: Sep 4, 2019 -
Well your BIOS is a bit old!
the latest one for your model is: BIOS Version 1.07.09
And the latest EC: 1.07.08
can you ask the store that sold you the laptop to send ya an update of both EC and BIOS? Worth a shot
(I got the info from clevo e-channel directly)
you got a mirror here for Clevo BIOS and EC:
https://repo.palkeo.com/clevo-mirror/P9xxEx/
But it isn't updated with the latest updated BIOS/EC from Clevo and clevo's ftp for downloading BIOS/EC has been down today (like always lol)
Edit:
was able to grab the latest EC 1.07.08:
https://mega.nz/#F!TBZhGYJA!tuzyRICl5OSs1oPoulkywg <- This is a link for a folder. If i can grab the latest BIOS from Clevo's ftp i will post it there too. For now only has the EC for your modelLast edited: Sep 4, 2019Amnvex likes this. -
And I have no idea how to flash the BIOS properly on a laptop like this. Especially an EC, something I don't recall ever having to mess with on a desktop computer from ~2005. I've done it before, but it was on a DELL desktop many years ago and I'm afraid something may break. Windows 10 1903 already doesn't like this computer with the BSODs that it has given me post-update. 1807 or w/e the version was before this didn't give me this many issues.
ALSO, I think I should say that no BIOS updates are shown for my laptop model: https://www.clevo.com.tw/en/e-services/download/ftpOut.asp?Lmodel=P9xxEx<ype=1&submit=+GO+
Idk why. I guess they think it doesn't need an update.Last edited: Sep 4, 2019 -
Meaker@Sager Company Representative
-
It wasn't after I uninstalled everything, reinstalled Windows, and didn't do driver updates from Clevo's FTP site that all errors resolved themselves (generally, except now there are GPU problems that seem to somehow be a result of Windows itself). That was also the same time that I requested a BIOS update because why else would there be ACPI errors? I'd get hkmoufltr BSODs, ntoskrnl.exe BSODs, driver verifier BSOD loop (twice, and all on Microsoft drivers), storage data corruption BSODs, etc. It was a nightmare... I was ready to throw the laptop out the window because I thought it was all hardware related originally. How can so many things go wrong in first month of a brand new laptop's life? Impossible. Right?
Anyway, I thank you all for your help thus far. I've been interested in testing out many things and you've given me ideas on what I can try (helps with the brainstorming). You're much more knowledgeable on this stuff than I am. I really do appreciate it! At some point, I think, a solution will be found. Just a matter of when, I guess.
Edit 1: I've emailed support and they're reluctant to help me get the BIOS. They've so far said that they want me to reinstall windows again, without keeping files or anything else. They said this may fix it if it's windows-caused. -__- and they said don't update anything except Windows.
Best advice ever... /sarcasm
Edit 2: Erased everything as suggested by you and the support people. Time to re-setup the laptop. Will need a couple days to test out.Last edited: Sep 4, 2019 -
Ok, update:
It seems I fixed it. I used to get intermittent sensor reports and blanking out of values as seen previously in my posts, but now everything seems to be stable and reporting as it is supposed to. I can say that this is definitely a Windows problem! What worked for me was this: go into safe mode, uninstall nvidia drivers, reinstall them, then install intel management engine components. This is the order it *must* be done in or it won't work. I know this because there was a sequence in a game that I tried to get through but it'd give me TDR errors and crash. That sequence is no longer a problem and there is no crash anymore! All because of this...
I also followed these steps (removed Windows, reinstalled but didn't allow auto-updates to anything--more importantly the GPU because it appears that Windows is dropping in corrupted nvidia files), changed TDR (just in case) values in registry, and installed GPU drivers straight from Nvidia's site (latest--even newer than what GeForce Experience offers) without installing any extra software (like GFE or audio drivers). Then I let Windows do updates, but not any that involve drivers of hardware (e.g., realtek audio). I had to rollback the realtek driver and not let it update through device manager (my mistake) because it was a messed up driver from Windows itself! (Just proves that Windows has issues with updating hardware stuff and should NOT be used!). BTW, this is where I got the info on how to fix it: https://www.nvidia.com/en-us/geforc...isplay-driver-nvlddmkm-stopped/?commentPage=2
Nothing intermittent: solid reporting. No patchy info anymore and no flickering of temp reporting or any of the other boxes!
Typical values during gaming (this is Vampyr):
Last edited: Sep 14, 2019
New computer, but experiencing nvlddmkm.sys VIDEO_DXGKRNL_FATAL_ERROR (code 141)?
Discussion in 'Sager and Clevo' started by Amnvex, Sep 3, 2019.