The Notebook Review forums were hosted by TechTarget, who shut down them down on January 31, 2022. This static read-only archive was pulled by NBR forum users between January 20 and January 31, 2022, in an effort to make sure that the valuable technical information that had been posted on the forums is preserved. For current discussions, many NBR forum users moved over to NotebookTalk.net after the shutdown.
Problems? See this thread at archive.org.

    XMG broken GTX 980M

    Discussion in 'Hardware Components and Aftermarket Upgrades' started by Arson, Dec 7, 2021.

  1. Arson

    Arson Newbie

    Reputations:
    5
    Messages:
    5
    Likes Received:
    1
    Trophy Points:
    6
    Hi guys,
    I have a broken GTX 980 in my XMG U716. This notebook is a relabled Clevo P775DM1-G.
    The problem consists of sporadically shut offs.
    It happened first on high video loads (started playing New World) and got more bad. From shut off in idle to shut off in BIOS (no os booted) until now shut off while POST. Actually it is not starting at all.
    But sometimes it works like nothing happened before. Until some minutes FurMark, that is killing the machine again.

    This happens on battery only, on PSU only and with both. Sometimes stable with high loads, sometimes unstable with no load. So not power source related.

    Now, as the it won't even POST, I tried to plug out this MXM card, and the Notebook cames through POST, maybe to BIOS. But it starts missing a proper temperature reading, I think, cause it goes off after about 30s with lots of beeping. This is far more than with GPU. But the notebook needs the dedicated GPU to display something. It does not use the intel integrated GPU.

    Nevertheless the GPU seems to be defective. I have measured the voltag levels at several points on the mainboard and the GPU and observed short timed short circuits to bring the supply down. There is a big voltage drop down to ~4V on the main bus voltage. First the PSU stops due to OCP, some ms later the battery also. The PSU led goes off and needs replugging to the wall to run again. The battery needs a to be removed to start again. Replugging the PSU resets the battery OCP too.

    Does anybody know this symptoms so far an can tell: This is typical, part XYZ is defect. ?

    So it is a sporadic short circuit. F***, a permanent short circuit is far more easy to find.
    But I played a bit around with this until it does not start multiple times, hoping the short circuit might stay.
    I took out the gpu again and started measurements. The main power pins had no short, but around 100K.
    But around the 3 MOSFETs (CSD87350Q5D) I observed someething. It seems to have a fault on TG (high side gate) to VSW (output). I measured from TG to GND a resistance of 5.7, 5.8 and 10k.

    After desoldering the MOSFETs I measure inf on the same pins. The pads on the board have 2x 12K and 1xinf from TG to GND. Maybe the MOSFETs got healing by heat of the hot air station? This could explain, why the short cut is not permanent.

    Here is what I measured: first fet mounted (hope in failure state), than desoldered, and the pads itself.
    Code:
        FET on board    GET solo    Board Pad empty
    Pin    R to GND    R to GND    R to GND
    VIN    100k        inf        100k
    TG    10k        inf        inf
    TGR    4.8        2.2M        inf
    VSW    4.8        2.2M        4.7
    BG    29k        inf        29K
               
    VIN    100k        inf        100k
    TG    5.8        inf        12K
    TGR    4.8        1.1M        12K
    VSW    4.8        1.1M        4.7
    BG    23M        inf        38M
               
    VIN    100k        inf        100k
    TG    5.7        inf        12k
    TGR    4.8        780k        12K
    VSW    4.8        780k        4.7
    BG    22M        inf        38M
    
    The first pads values differ from the other two do to different drivers.
    The first one here is the 3rd phase with a separate driver: https://pdf1.alldatasheet.com/datasheet-pdf/view/1115200/UPI/UP1962.html
    The other two are driven by: https://datasheetspdf.com/pdf-file/1347385/uPISemiconductor/uP1624P/1
    Here is the datasheet of the MOSFETS: https://www.ti.com/lit/ds/symlink/csd87350q5d.pdf?ts=1638818626629

    Can this values explain the failure mode?
     
  2. Arson

    Arson Newbie

    Reputations:
    5
    Messages:
    5
    Likes Received:
    1
    Trophy Points:
    6
    Here are some voltage curves:
    [​IMG]
    [​IMG]
    [​IMG]

    Time is in milliseconds. Voltage in millivolts.
    Blue is the battery voltage on the main board after switches.
    Orange is the mainboard main supply voltage. It is fed by PSU or battery.
    Yellow is the GPU supply voltage after GPU supply switch, before MXM socket.
    Green is the GPU supply voltage after MXM socket on the MXM board.
    This was measured with an Arduino Uno and some resistor dividers.

    Maybe someone can explain this plots?
     
  3. Arson

    Arson Newbie

    Reputations:
    5
    Messages:
    5
    Likes Received:
    1
    Trophy Points:
    6
    sorry for double post.
     
  4. Arson

    Arson Newbie

    Reputations:
    5
    Messages:
    5
    Likes Received:
    1
    Trophy Points:
    6
    Problem found and solved. There was a sporadic Gate to Source short in two MOSFETs. After replacing the FETs and adding thermal pads to them the notebook is running nice and stable now. I did such a diagnosis and repair for the first time. I'm a little bit proud of me. :)
     
    tilleroftheearth likes this.
  5. Khenglish

    Khenglish Notebook Deity

    Reputations:
    799
    Messages:
    1,127
    Likes Received:
    979
    Trophy Points:
    131
    This is the most systematic data driven graphics card diagnostic I have ever seen someone perform. Congrats on it working out!

    I am unclear on your gate to source short issue. Shorting gate to source would just turn the FET off. If its on one of the two main core phases this would get you a shutdown, but you wouldn't see your big voltage drop indicating a short like you do.

    I have found many 980m boards have the 3rd GPU core phase driver die which disables the phase, causing the card to shut down on heavy load. On another card there is a pcb short if I have the 3rd phase MOSFETs installed. In general there are many problems with the 3rd phase.

    Have you added in the 3 missing core MOSFETs? Adding those in helps the card efficiency under strong load quite a bit, reducing heat. If adding those MOSFETs you should also add in the missing high voltage supply caps too, or else RAM overclocks less.
     
    Last edited: Dec 12, 2021
  6. Arson

    Arson Newbie

    Reputations:
    5
    Messages:
    5
    Likes Received:
    1
    Trophy Points:
    6
    Well, I can not explain too, why a Gate Source short can cause a Drain Source short. Maybe it had nothing to do with the sporadic shorts.
    Maybe I never got a real measurement of this short failure. But it was near enough to get it work again.

    No, I did replace the 3 existing fets only by new ones and I added thermal pads to the mosfets and the caps around.
    Now with new thermal paste the system runs great and stable.

    If the fets ever die again, and mouser or someone has them in stock again, I gonna place the other fets and caps for more load distribution. I hope, the thermal pads are enough forever. It is not a joy to solder these little bugs due to high cooling capabilities of the pcb. Using a solder iron to heat the big pad beneath the fet and hor air for the fet itself did the job.

    Since one week I am using my notebook again with no trouble. With new thermal paste the CPU and GPU run both at max. 65°C with high load (furmark + cpu burner). I hope this stays forever... until I want better hardware... which probably means years... sorry oems :D