Hi guys,
I have a broken GTX 980 in my XMG U716. This notebook is a relabled Clevo P775DM1-G.
The problem consists of sporadically shut offs.
It happened first on high video loads (started playing New World) and got more bad. From shut off in idle to shut off in BIOS (no os booted) until now shut off while POST. Actually it is not starting at all.
But sometimes it works like nothing happened before. Until some minutes FurMark, that is killing the machine again.
This happens on battery only, on PSU only and with both. Sometimes stable with high loads, sometimes unstable with no load. So not power source related.
Now, as the it won't even POST, I tried to plug out this MXM card, and the Notebook cames through POST, maybe to BIOS. But it starts missing a proper temperature reading, I think, cause it goes off after about 30s with lots of beeping. This is far more than with GPU. But the notebook needs the dedicated GPU to display something. It does not use the intel integrated GPU.
Nevertheless the GPU seems to be defective. I have measured the voltag levels at several points on the mainboard and the GPU and observed short timed short circuits to bring the supply down. There is a big voltage drop down to ~4V on the main bus voltage. First the PSU stops due to OCP, some ms later the battery also. The PSU led goes off and needs replugging to the wall to run again. The battery needs a to be removed to start again. Replugging the PSU resets the battery OCP too.
Does anybody know this symptoms so far an can tell: This is typical, part XYZ is defect. ?
So it is a sporadic short circuit. F***, a permanent short circuit is far more easy to find.
But I played a bit around with this until it does not start multiple times, hoping the short circuit might stay.
I took out the gpu again and started measurements. The main power pins had no short, but around 100K.
But around the 3 MOSFETs (CSD87350Q5D) I observed someething. It seems to have a fault on TG (high side gate) to VSW (output). I measured from TG to GND a resistance of 5.7, 5.8 and 10k.
After desoldering the MOSFETs I measure inf on the same pins. The pads on the board have 2x 12K and 1xinf from TG to GND. Maybe the MOSFETs got healing by heat of the hot air station? This could explain, why the short cut is not permanent.
Here is what I measured: first fet mounted (hope in failure state), than desoldered, and the pads itself.
The first pads values differ from the other two do to different drivers.Code:FET on board GET solo Board Pad empty Pin R to GND R to GND R to GND VIN 100k inf 100k TG 10k inf inf TGR 4.8 2.2M inf VSW 4.8 2.2M 4.7 BG 29k inf 29K VIN 100k inf 100k TG 5.8 inf 12K TGR 4.8 1.1M 12K VSW 4.8 1.1M 4.7 BG 23M inf 38M VIN 100k inf 100k TG 5.7 inf 12k TGR 4.8 780k 12K VSW 4.8 780k 4.7 BG 22M inf 38M
The first one here is the 3rd phase with a separate driver: https://pdf1.alldatasheet.com/datasheet-pdf/view/1115200/UPI/UP1962.html
The other two are driven by: https://datasheetspdf.com/pdf-file/1347385/uPISemiconductor/uP1624P/1
Here is the datasheet of the MOSFETS: https://www.ti.com/lit/ds/symlink/csd87350q5d.pdf?ts=1638818626629
Can this values explain the failure mode?
-
Here are some voltage curves:
Time is in milliseconds. Voltage in millivolts.
Blue is the battery voltage on the main board after switches.
Orange is the mainboard main supply voltage. It is fed by PSU or battery.
Yellow is the GPU supply voltage after GPU supply switch, before MXM socket.
Green is the GPU supply voltage after MXM socket on the MXM board.
This was measured with an Arduino Uno and some resistor dividers.
Maybe someone can explain this plots? -
sorry for double post.
-
Problem found and solved. There was a sporadic Gate to Source short in two MOSFETs. After replacing the FETs and adding thermal pads to them the notebook is running nice and stable now. I did such a diagnosis and repair for the first time. I'm a little bit proud of me.
tilleroftheearth likes this. -
This is the most systematic data driven graphics card diagnostic I have ever seen someone perform. Congrats on it working out!
I am unclear on your gate to source short issue. Shorting gate to source would just turn the FET off. If its on one of the two main core phases this would get you a shutdown, but you wouldn't see your big voltage drop indicating a short like you do.
I have found many 980m boards have the 3rd GPU core phase driver die which disables the phase, causing the card to shut down on heavy load. On another card there is a pcb short if I have the 3rd phase MOSFETs installed. In general there are many problems with the 3rd phase.
Have you added in the 3 missing core MOSFETs? Adding those in helps the card efficiency under strong load quite a bit, reducing heat. If adding those MOSFETs you should also add in the missing high voltage supply caps too, or else RAM overclocks less.Last edited: Dec 12, 2021 -
Well, I can not explain too, why a Gate Source short can cause a Drain Source short. Maybe it had nothing to do with the sporadic shorts.
Maybe I never got a real measurement of this short failure. But it was near enough to get it work again.
No, I did replace the 3 existing fets only by new ones and I added thermal pads to the mosfets and the caps around.
Now with new thermal paste the system runs great and stable.
If the fets ever die again, and mouser or someone has them in stock again, I gonna place the other fets and caps for more load distribution. I hope, the thermal pads are enough forever. It is not a joy to solder these little bugs due to high cooling capabilities of the pcb. Using a solder iron to heat the big pad beneath the fet and hor air for the fet itself did the job.
Since one week I am using my notebook again with no trouble. With new thermal paste the CPU and GPU run both at max. 65°C with high load (furmark + cpu burner). I hope this stays forever... until I want better hardware... which probably means years... sorry oems
XMG broken GTX 980M
Discussion in 'Hardware Components and Aftermarket Upgrades' started by Arson, Dec 7, 2021.