The Notebook Review forums were hosted by TechTarget, who shut down them down on January 31, 2022. This static read-only archive was pulled by NBR forum users between January 20 and January 31, 2022, in an effort to make sure that the valuable technical information that had been posted on the forums is preserved. For current discussions, many NBR forum users moved over to NotebookTalk.net after the shutdown.
Problems? See this thread at archive.org.
 Next page →

    Broken GTX 980M

    Discussion in 'Hardware Components and Aftermarket Upgrades' started by Darker01, Nov 11, 2017.

  1. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Hello everyone.

    I made this thread with the hope that I would learn more about what caused my GTX 980M to fail after ~2 years of moderate/heavy gaming and academic use. Pictures can be found here.
    As far as I can tell, the front and back of the card looked normal. R47 and R22 had some small deformations on the surface that were a bit hard to see. I believe R22 is some sort of VRM/inductor, but I have no clue what R47 is. The power supply's tip was noticed to be discolored after the failure, but I'm not certain why it happened or how it was related to the failure.

    TL;DR: My GTX 980M failed despite my best effort to keep my P750ZM running cool. I'm not sure if the thing shorted, overheated, had an on-board temperature sensor failure, failed as a result of something else failing, or simply just reached the end of its life. No component was OCed.

    EDIT 8, 04/14/2020: I started looking into GPU repair recently, and a lot of the things I didn't understand back when I wrote this started to clear up after I spent some time reading how MOSFETs used in GPUs work. Look up N-Channel enhancement-type on Wikipedia. If you suspected one of your 980M's MOSFET bit the dust without something nearby or itself exploding, get a multimeter and check the resistance between T_G and V_SW. Replace the one(s) with a couple of Ohms of resistance between these 2 pins, or maybe replace all of the MOSFETs to reduce the likelihood of failure in the near future. Some MOSFETs are like $3 shipped per chip, so do what's best for you.

    EDIT 7: Added new MOSFETs to the card. It's working y'all. Pictures here.

    EDIT 6: Rework station arrived. Shorted MOSFET identified. Pictures here.

    EDIT 5: Row of 6 capacitor shorted to the power input pad (right one). Requested this from Texas Instrument (free with .edu email extensions) along with hot air rework station + flux + solder.

    EDIT 4: Measured the resistances of the black capacitor between the R22 coils (core) and the 2 adjacent to the one on the right (memory). Got 9.5 Ohms for the row of 6 and 24.0 Ohms for the row of 2. Remeasured resistance across the power input pins and got 9.5 Ohms. Possible correlation between resistances of core capacitors and the short.

    EDIT 3: @Khenglish suggested measuring resistance across power input pins. If >2kOhm then something else other than the power FET broke. Measured 7 Ohms. Picture here.

    EDIT 2: Added picture of backplate with thermal pad to the album.

    EDIT 1: R47 and R22 in the pictures had a resistance of about 1.5 Ohms as measured with the red Centech digital multimeter - same as the adjacent clean-looking R22.

    Background: the 8 Gb GTX 980M card with copper backplate (removed to take pictures) came with a used Clevo P750ZM I bought around October 2015. The laptop was originally purchased from RJTech, and the previous owner used it mainly for software development. Other than coil whine at high FPS, the machine was fine for the most part with VSync enabled. GPU temperature was checked with HWinfo64 and MSI Afterburner every so often, and I had never seen it being above 80*C. The laptop itself was always cooled by a home-made cooling pad with 4 120 mm ~1800RPM case fans installed, and I normally had internal fans on max speed whenever I play something demanding. Vents and fans were cleaned every 3-5 months, and the GPU vent wasn't blocked when the card failed. In short, I think I took care of the laptop decently given how much it cost me when my salary was $0. I got a few IRQL_not_less_or_equal BSODs related to the touch pad driver here and there, but that was about it for unusual behaviors. I didn't know about ThrottleStop before the failure, so the CPU was running at stock voltage in case anyone thinks power supply issue was involved. I bought the laptop hoping it would last 5 years or more, so I avoided OCing any component.

    On average, I gamed 1-2 hours each day for the first ~1.5 years and 4+ for a couple of months leading up to the failure. The laptop itself was kept on for about ~6-12 hours daily. For games like DOOM, BF1, GTA V, and Witcher 3, I lowered the texture and lightning settings to ensure that the GPU temperature was decent at 60 FPS and above classic Runescape graphic. Extraneous settings like bloom, blur, and AA were turned off entirely. Even with those precautions taken to control the thermal behavior, the failure occurred when I was walking around looting things in Witcher 3 ~2-3 hours into the session.

    The Failure:
    screen turned black. MSI Afterburner overlay was active at the time, but I didn't check it before the crash. The laptop turned off mid-game without any sign such as freezing/distorted audio. Gameplay was smooth for the most part, and there wasn't any cue (i.e. freezing, audio distortion, micro-stuttering, MSI Overlay readout etc...) to suspect the CPU or the GPU was running too hot. I attempted to turn on the laptop, and the power supply (Chicony 230W) started clicking at the same time that its indicator light flickered. Both the battery and the power indicator LEDs remained amber while the laptop was plugged in. Pressing the power button resulted in the power indicator briefly turning green and back to amber again (along with the clicking in the power supply). The fans did not turn on. Holding down the power button long enough and the power supply's indicator light turned off completely, and no more clicking sound can be heard. Re-plugging the power supply turned the indicator light on again, and the same scenario repeated when the power button is pressed while the laptop is plugged in. It was noticed that the area around the right speaker (which is directly above the exhaust for the GPU fan), the power supply tip, and the power supply were all really hot to the touch. I measured the temperature with an infrared thermometer, and I had readings at around 45-50 degree Celsius for those regions ~5-10 minutes after the failure. Discoloration on the tip was noticed then.

    I then removed the bottom panel and checked for anything unusual. Everything visible looked fine (i.e. no exploded component/charred regions). The power adapter on the motherboard showed no sign of shorting/melting. There wasn't any "burnt-plastic" or solder odor. RAM sticks all looked fine. Battery wasn't hot. I tried holding down the power button with the battery removed for over half a minute before plugging the power supply in, both with and without the battery, but the clicking persisted. NVRAM reset didn't work. The failure occurred at midnight on a Saturday, so I decided to let the laptop sitting unplugged without battery and check it again early on Sunday morning. Problem persisted, so I sent RJTech and RMA request which they promptly granted on Monday.

    The Aftermath: I had suspicions that the graphic card might be the cause of the failure to POST, but I decided that it is best to send the laptop to RJTech for them to evaluate the extent of the damage. I figured that I wouldn't be able to do much even if I removed the heatsink to check for the damage, and I was busy with work for the most part to buy a MXM card and do the diagnostic myself. Upon receiving the laptop, technical support noticed that there was some unidentified liquid on the VRAM chips (which I believed to be thermal pad oil) and sent me the 2 pictures that circled the affected chips. I asked them to check if the motherboard was still functional with a new GTX 980M installed, and after some stress testing they confirmed that other components survived. I confirmed with later testing that the power supply was still functional (enough to sustain the CPU under heavy load at least) although I never tried to push it to the +200W regime. While I'm grateful that RJTech accommodated my request for additional testing with a functional card, I decided to get the broken laptop back. I was uncertain about the reliability of my P750ZM at the time, so both getting a new card to restore the P750ZM to pre-failure performance and getting the broken card refurbished by Clevo were out of the question for me.

    Now: I found a surprisingly cheap Sager NP9870-S originally from Xotic-PC up for sale on Craigslist of all places. The thing has 980M in SLI, so at least I am comforted by the fact that if 1 card failed, there's always another one inside. It is also nice to know that in the event of simultaneous 2x card failure, I can always build a PE4C eGPU setup like what @bloodhawk did with his P870DM here. Call me paranoid but I already got a PE4C v4.1 and power supply just in case. I also upgraded my cooling pad with 4 of these. As for the P750ZM, I grabbed a GTX 765M and brought it back to life with that. Installed ThrottleStop on both machines and spent awhile to lower the voltages as low as possible.

    I still wanted to know what exactly went wrong with the GTX 980M that failed to hopefully prevent future failures. I've been looking around to see if anyone else posted something similar regarding their MXM graphic card causing power supply to short itself while still leaving other components unharmed. There's no schematic for the card floating around, so I hope people with intimate knowledge of the board would be able to help. I'll gladly provide close up pictures to the best of my ability.
     
    Last edited: Apr 15, 2020
    Ashtrix and Vasudev like this.
  2. Arrrrbol

    Arrrrbol Notebook Deity

    Reputations:
    3,235
    Messages:
    707
    Likes Received:
    1,054
    Trophy Points:
    156
    Is that the original power supply that came with the laptop?
     
  3. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Yes. Never bothered to check how much power was drawn from the wall. I do now.
     
  4. Arrrrbol

    Arrrrbol Notebook Deity

    Reputations:
    3,235
    Messages:
    707
    Likes Received:
    1,054
    Trophy Points:
    156
    Hard to tell what caused the problem, but hopefully its just the GPU failing and nothing else. That liquid you can see on the VRAM is probably just oil from the thermal pads though.
     
  5. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Other components were fine for the most part. I ran a couple of wPrime benchmarks on the 4790K while undervolting it without encountering anything unexpected.
    The pads I believed to be from Fujipoly given how it looked.
     
    Vasudev likes this.
  6. Dr. AMK

    Dr. AMK Living with Hope

    Reputations:
    3,961
    Messages:
    2,182
    Likes Received:
    4,654
    Trophy Points:
    281
  7. Mobius 1

    Mobius 1 Notebook Nobel Laureate

    Reputations:
    3,447
    Messages:
    9,069
    Likes Received:
    6,376
    Trophy Points:
    681
  8. Danishblunt

    Danishblunt Guest

    Reputations:
    0
    I think you already have a very good idea what went wrong, also the card might still be alive. As you already noticed the inductors are fine only some damage on the surface which really don't mean anything, but the thing that "killed" it was the vram shorting. Your power supply acted like a classic power supply that refused to power on a shorted system. If you clean the card with isopropylalcohol and replace some vram chips (you can buy them from ebay for around 3USD each) then ur card is back on track. If you're really really lucky, then an isobath alone might even "fix" the card.

    I think you realize yourself, that this is very likely caused by the cheap thermal pads you were using. So you might want to consider buying high quality ones in the future (grizzly minus for instance).
     
  9. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    I thought it was the oily substance initially too, but then I dismissed that idea for the past couple of months thinking that the silicone oil shouldn't be conductive. After reading your comment and some more searching on the forums, I think I was wrong to assume that the silicone oil wasn't contaminated. OP of this thread here had oily substance on the VRAM of their GTX 580M along with a blown cap. Other people pointed out that the oil might be contaminated, and OP later stated that they lived in a highly humid area. In my case, I think the pad themselves might have broken down significantly after 2 years, and stuffs leached into the oil. Although I don't have those pads anymore, I recalled that they were particularly spongy and yielded easily when stretched. Lesson learned. I hope there's a sale for Fujipoly thermal pads on Amazon this up coming Cyber Monday.

    I'll look around in my local area (SoCal) to see if there's any computer repair shop offering ultrasonic cleaning service. Given how the card looked, I think you may be correct that the majority of the card itself was intact still.
     
  10. Danishblunt

    Danishblunt Guest

    Reputations:
    0
    I fixed a GTX 980M not to long ago, it was shorted on 1 vRAM chip, I reballed it and tried it out, worked again. So yes, you might be lucky.

    Every substance can be more or less conductive, for instance destilled water has very bad conductivity while saltwater is way more conductive, so taking into account that the oil soaked up dirt and other substances it's not that unlikely to cause some issues really, so it doesn't really matter wheter or not the oil itself is conductive.
     
  11. woodzstack

    woodzstack Alezka Computers , Official Clevo reseller.

    Reputations:
    1,201
    Messages:
    3,495
    Likes Received:
    2,593
    Trophy Points:
    231
    Damn, if you didn't remove the seriel number you could have RMA'd this card directly to a clevo facility, and I could have helped you. Which is weird because no one would remove that seriel ESPECIALLY RJTech, I assume. Also that liquid stuff looks like the oil we use to clean the surfaces or maybe some of the flux from soldering. It really couldn't be anything else IMO, unless a larger component was cracked and leaking, but on both sides of the card and near no such components ???.

    Seems fishy to me, honestly.

    Do not know how your coils took physical damage either, because it's sort of just not possible. If that card was not touched by you, then either whoever [put it in when upgrading did it, or it came stock like that which would mean its a defect. and because I doubt it came stock like that or you put it in there and somehow damaged it long ago, and the fact the serial is ripped off, i am suggesting someone is replacing your "Alive" card with a dead one. If someone were to touch my seriel numbers, thats the first thought I'd have. Why would it get removed, and even show signs of being ripped off ? the heatsink doesn't even touch there, the laptop doesn;t make contact with it there, there's no reason, honestly. All it can do it help you RMA it or get warranty and identify the card. Since ALL of those apply to this card currently, there is even less chance you'd touch it.

    Unless I'm missing something here or just Call me paranoid. Thats my thoughts. I think the RAM was resoldered on, or some sort of oil from a broken components, or foul play because the seriel is missing. DO not know what caused your cards death, it's not plausible that - that oil or damage was there on a new card installed by your seller if it was new, so no idea's.

    Get grizzly Minus 8 pads, they're great ! Fujipoly are way to expensive and make no difference in GPU applications, honestly.

    Thanks for the mention. Cheers!
     
    Last edited by a moderator: Nov 12, 2017
    Dr. AMK likes this.
  12. Danishblunt

    Danishblunt Guest

    Reputations:
    0
    Actually the oil is very likely from thermal pads. I repaired a great deal of notebooks and have seen this issue a couple of times on many different models. I've even had this issue myself on my GT 70 back in the days, I mounted the card myself and applied everything, so I know it was impossible from another source.

    idk about the serial number tbh.
     
    woodzstack likes this.
  13. woodzstack

    woodzstack Alezka Computers , Official Clevo reseller.

    Reputations:
    1,201
    Messages:
    3,495
    Likes Received:
    2,593
    Trophy Points:
    231
    I thought about the pads, they can absorb humidity sure, but not oil, really. Normally pads get dry and in the cases they do not, they end up bonding with the components touching them. Least from my experience.
     
  14. Danishblunt

    Danishblunt Guest

    Reputations:
    0
    The thing is my GT 70 shorted one of the mosfets of my old GTX 570M because of the oily substance. It was exacly like OP's case, it wasn't just any kind of water, I assume this is actually melted glue mixed with some kind of fluid. I did the mounting and used cheap pads. I've also recently had a Asus G750 which also got shorted by this oily substance, only the pads were literally melted, so idk what happened there, it was disastrous, I should have taken a picture, it literally looked like someone took the pads, threw them into a mixer and threw them onto the card, then heating them up. All of the place was the oily substance as well.
     
  15. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    RJTech has nothing to do with the serial sticker being removed. I removed it to check components below after receiving the card back, but I saved the torn sticker chunk. I had no idea what I was doing, so I tried poking around. Serial number is missing at least 2-3 digits, but the barcode is "intact" for the most part. The card was indeed the one I sent them as I took the precaution to mark the card where it won't interfering with operation. Had they resoldered the VRAM chips and used an ultrasonic cleaner to rid the flux residue, the VRAM chips would look "cleaner," and the mark would have been washed away.

    Given the amount of oil covered on the R22 coils + VRAM chips, it has got to be from the thermal pads. Even Fujipoly themselves admitted here in the Warranty Statement that silicone oil can leach from their products. I spent some time looking at the components up close, and I couldn't find any trace of something big enough that would potentially hold/leach that much liquid.

    How R22 and R47 appeared damaged is beyond me as well. Not sure if this has any connection with the coil whine issue I mentioned in the original post. There wasn't anything sharp on the old pads that would have caused such damage. The temperatures looked good, so I never bothered to do a repaste. I just checked the original purchase invoice that the seller sent me, and I noticed that the P750ZM was purchased as a barebone with 980M installed. Hey @win32asmguy, do you remember seeing anything strange on the 980M when you installed the 4790K?

    I considered the Clevo RMA option, but given RJTech's estimate of 4-6 weeks + ~$400 fee + 90(?) days warranty and my "what else would fail next" mentality at the time, I decided to proceed with the NP9870-S purchase and shelf the P750ZM. One can only tolerate so much eye strain and frustration on a 11" Chromebook. I originally planned to use the slave 980M from the NP9870-S to resurrect the P750ZM and sell the Sager to recuperate losses. After looking at the bottom panel of the NP9870 long and hard, I conceded that it had a more superior cooling solution than the P750ZM and kept it for good. I thought about selling the P750ZM for parts, but I gave replacing the GPU w/ something else cheaper a try which fortunately happened to work. I thought I'll revisit the 980M when I have more time + better understanding of what happened to the card. I've been looking at causes since August without much progress, probably due to focusing too much on the shorting + power supply rather than the contaminated oil on the pad. This is why this thread was made.

    The thermal pads on the copper backplate was also oily as seen here. It was important to note that the thermal pads on the backplate is completely different from the one the main heatsink. Note that only the VRAM chips on the back side had a noticeable amount of oil on them. Front side VRAM chips had nothing on them at time of discovery.
     
    Last edited: Nov 12, 2017
  16. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    I'm planning on placing an order for ~1-2 120 mm x 20 mm strip of 0.5, 1.0, 1.5, and 2.0 mm. Do you think these will be enough to replace all the stock pads in a P870DM + P750ZM?
     
    woodzstack likes this.
  17. woodzstack

    woodzstack Alezka Computers , Official Clevo reseller.

    Reputations:
    1,201
    Messages:
    3,495
    Likes Received:
    2,593
    Trophy Points:
    231
    Yes I do think so.

    have some rep too for being on the forum, and welcome to NBR !
     
    Dr. AMK likes this.
  18. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Thank you! Order placed.
     
    Dr. AMK likes this.
  19. Khenglish

    Khenglish Notebook Deity

    Reputations:
    799
    Messages:
    1,127
    Likes Received:
    979
    Trophy Points:
    131
    Honestly I don't see anything really wrong with the card. Thermal pads can leave a bunch of liquid goo behind. I don't think anything you're looking at has anything to do with the failure.

    It looks like at some point someone may have scraped up the two inductors (the R22 is the inductor for the 3rd core phase, the R47 is the inductor for either the pci-e or 1.8V voltages). That will have zero effect on their performance though. They are not metal coils internally, but a solid block of bonded iron powder.

    You probably had a power FET blow. Usually that's the only failure that can cause a lot of heat to be generated. Power FETs can blow and not look like it. The voltage on anything else like memory is too low to make much heat. Also the laptop would still power up.

    Check the resistance between the two giant pins on one side of the mxm slot. This is the card's supply voltage. The resistance should be over 2k ohm. If a fet is dead you'll read 0. A blown cap can also cause a similar failure, but it is less likely.
     
    woodzstack, Darker01 and Papusan like this.
  20. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41

    Checked
    . 7 Ohms, effectively shorted. How do I check which power FET was blown? All of the identical looking ones between/near the R22 inductors had the same resistance across them (~7.7 Ohms).
    Nothing unusual was noticed on the package as well.

    EDIT: Caps mistaken for MOSFET. Ignore the ~7.7 Ohms measurements. See later posts.
     
    Last edited: Nov 13, 2017
  21. woodzstack

    woodzstack Alezka Computers , Official Clevo reseller.

    Reputations:
    1,201
    Messages:
    3,495
    Likes Received:
    2,593
    Trophy Points:
    231
    Well the mosfets are easy enough to replace, any electrical engineer should be able to do that for you.
     
  22. Khenglish

    Khenglish Notebook Deity

    Reputations:
    799
    Messages:
    1,127
    Likes Received:
    979
    Trophy Points:
    131
    7 ohms is definitely bad. The only way to find a short is to pull FETs one at a time until the short disappears. There are only 3 FETs for the core so there are not many to try.

    7 is odd though. I would expect a blown FET to be 0. Measure the resistance across the big rows of caps. There is a row of 6 and a row of 2. They are black. The row of 6 is the core and any non-zero resistance is fine, even like .5 ohm. The row of 2 is the memory. Memory is usually between 10 and 50 ohms.
     
    Last edited: Nov 13, 2017
    Vasudev likes this.
  23. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    I measured some 12 Ohms +/- 5% resistors and an 8 Ohm 20W power resistor to confirm that my multimeter is accurate enough at low resistances. Got ~13.5 Ohm and 9.5 Ohm, respectively, so I reckoned I'm off by at least 1 Ohm.

    I measured the resistances in the caps that you mentioned again, and I got another set of values this time. I thought they were MOSFET by mistake. The row of 6 all measured around ~9.5 Ohms, and I noticed the resistance measured across the power input pins is also the same (temperature effect? late evening vs. 6:00 AM?). Row of 2 measured at around 24.0 Ohms.

    I edited my previous post to indicate a mistake with the memory cap resistances.
     
  24. Khenglish

    Khenglish Notebook Deity

    Reputations:
    799
    Messages:
    1,127
    Likes Received:
    979
    Trophy Points:
    131
    Interesting. It sounds like the core voltage may have gotten shorted to the supply voltage. On the row of 6 the bottom side is the core voltage, and on the mxm power tab the right side is power. Check the resistances between these. It should be in the thousands but it sounds like you'll read 0.

    Other than the GPU core's power FETs there are only 2 components with connections between the GPU core voltage and the card's supply supply voltage. They are the VRM, and the phase driver for the GPU core's 3rd power phase. Both of these chips are on the backside near the top of the card, and the VRM is the bigger of the two. I've never seen these chips fail and short the GPU voltage and supply voltage together, but it's possible. If your VRM died I expect the GPU core to be fried. A working VRM can protect the core from overvoltage if a FET of the phase driver died, but if the short is in the VRM there's nothing to detect it.

    I'd still first pull and check each power FET for the GPU core. They're the 3 big chips at the very top of the card.
     
    Last edited: Nov 13, 2017
  25. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Thank you. I'll check the resistances after I get home.
     
  26. Khenglish

    Khenglish Notebook Deity

    Reputations:
    799
    Messages:
    1,127
    Likes Received:
    979
    Trophy Points:
    131
    Here are images showing what's what assuming that you do read 0.


    [​IMG]

    The core power FETs are boxed in red. One of them is probably dead.

    [​IMG]

    If it's not a dead power FET, then it's either the VRM or the 3rd phase's driver (phases 1 and 2 are integrated with the VRM).
     
    Ashtrix and Falkentyne like this.
  27. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Long day. Just got home. Checked the resistances between the power tab and the capacitors' end with the band and got essentially 0.
    I'll start looking for the replacement power FET later this evening.
    I'm curious. Did I just so happen to have a bad 980M, or are the more recent clevo cards bound to fail like mine eventually?
     
  28. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    The power FET has 87350D written on it. Found the product page from TI ( link). I still have my university email, so I requested 5 samples from them free of charge. I think they'll arrive in a couple of days.
    In the mean time I guess I'll start ordering equipments to desolder those FETs.
     
  29. Khenglish

    Khenglish Notebook Deity

    Reputations:
    799
    Messages:
    1,127
    Likes Received:
    979
    Trophy Points:
    131
    You just need a hot air gun, solder flux, and a heat gun for it.

    Remember that 2 of the FETs are still good, so just pull one at a time and check the card if it is ok. I recommend filling all 6 FET pads. For just getting the card working though the unused FET pad already has all the required solder and is easier to solder a FET onto than reusing the original pad. Just remember to follow the pin 1 arrows so you don't put the FET on backwards.
     
  30. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Thank you for the equipment advice. By hot air gun and heat gun, I assume you are talking about those hot air rework station and the hand held extremely hot air dryer thing?
    Not sure how it go with other components, but do you think if I get away with just using the hot air gun to remove the FET directly with sufficient heating of the surrounding area?

    With regard to the part selection, I think I'm going for this by the virtue of the reviews + EEVBlog video of a similar device. Hopefully it'll work well enough such that I won't have to return it.
     
  31. Khenglish

    Khenglish Notebook Deity

    Reputations:
    799
    Messages:
    1,127
    Likes Received:
    979
    Trophy Points:
    131
    The heat gun you selected will do the job fine if it works. I got one and it was DOA. The replacement worked for 1-2 years before the heating element blew. The replacement head for the replacement unit did not heat properly.

    I'm not sure what you mean by "but do you think if I get away with just using the hot air gun to remove the FET directly with sufficient heating of the surrounding area". You only want to use a heat gun. You should not be using an iron at all. You blow hot air directly on the component and board to remove and place a new component. There is nothing in the area that is significantly temperature sensitive that can be damaged by the heat.
     
    woodzstack and Darker01 like this.
  32. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Thank you. It'll take at least another week for all the parts to arrive. I'll take a look at hot air reworking tutorials in the mean time.
    I'll let you know the results.
     
  33. NGX83

    NGX83 Notebook Enthusiast

    Reputations:
    0
    Messages:
    13
    Likes Received:
    4
    Trophy Points:
    6
    Last edited: Nov 23, 2017
  34. MahmoudDewy

    MahmoudDewy Gaming Laptops Master Race!

    Reputations:
    474
    Messages:
    1,654
    Likes Received:
    744
    Trophy Points:
    131
    That card in the thread you linked is running in my CLEVO machine atm. I wouldn't say adding the MOSFETs helps with power draw or higher clocks (That is still dependent on your chip), it just helps the card to sustain higher loads and not die as fast as it would without the MOSFETs.
     
    NGX83 and Darker01 like this.
  35. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Happy Thanksgiving everyone. I'm back to provide update regarding the progress of the repair.

    TL,DR: 1 MOSFET is indeed shorted. This one had silicone oil on it where the V_SW and V_IN pins supposed to be. Still need flux and wick to clean the pads before soldering the MOSFETs back on. Might take another week or so.

    The W.E.P. 858D hot air rework station mentioned in one of my posts arrived on Wednesday. Popped it open and found that the thing wasn't put together haphazardly like some other 858D clones. Fuse's present and was hooked up correctly for the most part. Both the chassis and the metal casing on the heat gun was properly and securely grounded. There was a loose piece of broken plastic inside the heat gun case, and I'm not really sure there that came from. I guess it's a good thing I opened everything up to check. I was a bit worried about some strange magic smoke coming out of the heating element, but it turned out that I had a screw stuck in between the add-on tip and the heat gun's mouth.

    Flux is bound to arrive on Friday or Saturday, so I decided to start removing the MOSFETs and checking which one shorted. The 858D didn't explode, which was nice. This was my first time working with surface mount components, so needless to say it took a lot of trial and error to remove all 3 MOSFETs with the last one being the culprit of the short. Pictures are here.

    Only 1 MOSFET has V_IN and V_SW pins shorted to ground. Removing that one rid the short between the power pads altogether. The other 2 MOSFETs and the brand new ones did not have shorted pins, which is great I suppose. @Khenglish was right about 1 MOSFET being the issue. Nevertheless, I noticed that this shorted MOSFET had a noticeable amount of silicone oil on the package where the V_IN and V_SW pins supposed to be. Could the factory default thermal pads be the culprit?

    I think I'll clean up the pads and apply leaded solder to them before soldering the MOSFETs back. Not sure when the wick I ordered nearly 2 weeks ago from China is going to arrive.

    I'm not gonna OC the GTX 980M even if I managed to get the thing working again. I'm the kind of person that wants his things to last. That said, I'll still fill out all 6 MOSFET pads just to spread the heat.
     
  36. Khenglish

    Khenglish Notebook Deity

    Reputations:
    799
    Messages:
    1,127
    Likes Received:
    979
    Trophy Points:
    131
    Good to hear it was just a blown FET. The core should be ok.

    Too bad Radioshack no longer exists for flux. You don't need very good flux for soldering FETs, so you'd just spend a couple bucks and not have to wait.

    Btw, usually it's best to just get flux from a USA Ebay source. Amtech 4300 is usually the go to solder. Lacks nasty chemicals which sometimes show up in the China solders.

    Don't try soldering without flux. Heat transfer from the FET to the pcb will be terrible, so you could easily overheat and kill the FET.
     
    Darker01, woodzstack and Vasudev like this.
  37. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Thank you. I already ordered Amtech NC-559-V2 from here since it should be genuine. This is the flux that's going to arrive on Friday.
    I sure hope that I didn't damage the 2 functional FETs pulled from the PCB. I was still getting used to the hot air station while removing those.

    EDIT: Apparently I can still request more of the CSD87350Q5D MOSFETs. I guess I don't have to worry about reusing the original FETs. Knowing how fast TI ships things I think I'll resume the project on Monday or so.
     
    Last edited: Nov 24, 2017
    Vasudev likes this.
  38. Khenglish

    Khenglish Notebook Deity

    Reputations:
    799
    Messages:
    1,127
    Likes Received:
    979
    Trophy Points:
    131
    If you put on new FETs on the original FET's pads then you'd need to add solder to the board. I'd just reuse the originals if their solder is still in the correct spots.
     
  39. NGX83

    NGX83 Notebook Enthusiast

    Reputations:
    0
    Messages:
    13
    Likes Received:
    4
    Trophy Points:
    6
    Ok. For now, with only 3 FETs, i'm pretty sure my gpu can't pull more than 130w without some blackscreens.


    Ok no problem. I'm waiting you to see how it's gooing with adding this FETs.
     
  40. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Hello everyone.

    Flux arrived on Friday as expected, but the wick didn't. Regardless, I decided to proceed anyway with make-shift wick from a spare composite video cable. The wick wasn't perfect, but it helped getting rid of the extra solder on the center pads after I added leaded solder. I went through this to make soldering the MOSFETs easier since leaded solder melts at a lower temperature than the lead-free solder on the board.

    To the point: I added the MOSFETs, checked to see if the pins made contact, repositioned a few MOSFETs, rid excess solder with the soldering iron, wiped nearly all of the leftover flux off with IPA, dried the card with hot air, replaced the crummy thermal pads with Thermal Grizzly Minus Pad 8, installed the card, and booted the laptop.

    Laptop booted.

    There were a lot of things that could have killed the card for good during the past 5 months ranging from physical damage to ESD. It still amazes me that me with my lack of expertise in electronics and my janky setup somehow managed to get the card repaired. Overheating MOSFETs, stripping pads off of the PCB, burning surrounding components, blowing small capacitors into oblivion, jamming the soldering iron tip where it shouldn't be, not drying the card well enough, not pressing down the MOSFETs to squeeze out excess solder, giving myself 2nd degree burn, etc... were concerns that troubled me up to the point of booting the laptop up. I was prepared to be disappointed, but I guess setting the expectations low made seeing the laptop boots after 5 months all the more satisfying.

    After using DDU, I installed the driver. That got GPU-Z to recognize the card, and the specs looked about the same as other CLEVO GTX 980M cards.
    The next step was to check if the card is stable.I ran Heaven benchmark for about 5 minutes, and the card drew ~95-104W during the entire time. I didn't notice any artifact or anything unusual on the screen. At this point I decided to stop and have some food since I have been working non-stop for about 6 hours.

    I'll do more stability testing on the card in the near future, probably tomorrow. I'll occasionally post follow-ups test results here after that. I have yet to decide whether or not I want to sell this laptop to recuperate the cost of the Sager NP9870-S. As far as I concern the GTX 980M won't be accepted at the CLEVO repair center with its torn serial number and tampered PCB.

    Anyways, I believe thank-yous are in order. This repair wouldn't be possible without @Khenglish 's expertise with MXM GPU modification. His diagnostics was spot-on, and through that I saved quite a lot of $ by repairing the board myself (858D ~$50, soldering station $35 off of Craigslist, flux ~$25, FETs were free samples).
    I would also want to thank @Danishblunt and @woodzstack for suggesting replacement thermal pads. The Thermal Grizzly Minus Pad 8 is much more robust than the stock pads. Couldn't find anything to replace the pad on the row of MOSFETs though.
    As for everyone else, thank you for staying with me for the ride. It's one hell of an adventure going from knowing nothing about what caused my GTX 980M to fail to burning the card with Heaven benchmark.
     
    Last edited: Nov 25, 2017
  41. woodzstack

    woodzstack Alezka Computers , Official Clevo reseller.

    Reputations:
    1,201
    Messages:
    3,495
    Likes Received:
    2,593
    Trophy Points:
    231
    Yeah I've been keeping good stock supply on the Grizzly pads, I fell in love with them about a year ago, and always offer them at cost, which is not very expensive at all frankly. What I like about them though, is they last, and do not break up or become brittle when you change, they also do not squish too too much either, so that there's these indents that make them thinner over time - instead they stay as thick as they should be, while being squished does make them thinner it doesn't destroy them and they do not become ruined by it and become paper thin.
     
    Darker01 likes this.
  42. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Hello everyone.

    I ran Heaven benchmark 15 times and had HWInfo64 logged GPU stats the entire time. The card consistently drew ~100W for about 1 hour total with about 10-20s of down time between each test. Given that 5/6 added MOSFETs were brand new, I think the card should last for quite some time now. Results + log file are attached.
    It struck me as odd that having AA disabled during the first run caused the card to cap out on power consumption. Runs 2 to 14 all have AA enabled.

    I guess that's it for now. I think I'll Dremel the bottom plate to improve air flow somewhat and then build a cooling pad built for the P750ZM. I'll compile all of the information I learned thus far and add it to the 1st post some time later next week.
     

    Attached Files:

    NGX83 likes this.
  43. NGX83

    NGX83 Notebook Enthusiast

    Reputations:
    0
    Messages:
    13
    Likes Received:
    4
    Trophy Points:
    6
    Such a great job. Thanks a lot for your feedback and your pictures, i can do this for my GTX 980M, to increase overclocking stability now.

    Thanks again, and have fun with your modded GTX 980M.
     
    Darker01 and Vasudev like this.
  44. Kostasgreece

    Kostasgreece Notebook Guru

    Reputations:
    2
    Messages:
    53
    Likes Received:
    2
    Trophy Points:
    16
    hello. I read all your comments and I have a problem with my 980m. I bought it faulty and it was shorted circuit. I changed the shorted up1642p and 1 shorted mosfet csd87350 of the 3 , but sometimes after a hour in gaming shut down the laptop. is any solution for me?
     
  45. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Hard to tell without things like GPU temperature and power draw, but if the laptop shuts down black screen during high load, I think it is hardware. Could be poor contact between the replaced MOSFETs and the original thermal pads. The original ones on the heatsink was falling apart when I took the card out, and I had to buy replacement pads for them.
     
  46. Kostasgreece

    Kostasgreece Notebook Guru

    Reputations:
    2
    Messages:
    53
    Likes Received:
    2
    Trophy Points:
    16
    thank you for your answer, I changing now the thermal pads . when I say it's shutdown I mean it is closing all the system and I push again the power button. the temperature of the gpu is max 70-80 degrees celcium. the laptop is a eurocom p150em . it is better to use it without the battery?
     
  47. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    My 980M maxes out at 65*C during Heaven Unigine benchmarking. 70-80*C seems high to me. Try reapplying the thermal paste.
     
  48. Kostasgreece

    Kostasgreece Notebook Guru

    Reputations:
    2
    Messages:
    53
    Likes Received:
    2
    Trophy Points:
    16
    I use the xilence thermal paste and i will be waiting for the mx-4 . what is the brand of your laptop?
     
  49. Darker01

    Darker01 Notebook Consultant

    Reputations:
    46
    Messages:
    118
    Likes Received:
    81
    Trophy Points:
    41
    Clevo P750-ZM.

    I did a hour-long stress test with Heaven Unigine. Saved the results of the benchmarks here.
     
  50. Kostasgreece

    Kostasgreece Notebook Guru

    Reputations:
    2
    Messages:
    53
    Likes Received:
    2
    Trophy Points:
    16
    your clevo is newer generation from my laptop. I use the premamod bios to work the 980m gpu. at far cry 5 I have 76 degrees celcium
     
 Next page →