hello,
i have a few problems with my XMG P724. in another thread i tried to find help with my dead gtx 980m, but maybe it was in the wrong section.
i tried to fix my gtx 980m as it seems to be a mosfet problem. now the card is working again, but not normal.
first time a day i switch on the computer everythings works, it boots normal, card is recognized.
the card takes everything i throw at it, benchmarks, games, rendering tests, without any problems.
but when i reboot the machine there are 3 different things that can happen:
1. : computer doesn't boot at all - black screen, 22 beeps, then shutdown
2.: computer boots like normal, but card isn't recognized, code 43 in device manager, no external display
3.: computer boots up like normal, but after 20 seconds or so, when os is loaded, 22 beeps and shutdown.
when the computer works normal, sometimes the fans don't spin up and the card overheats, but all readable sensors are there and working, so i assume, there is another non readable sensor only the vbios can read which is malfunctioning.
maybe someone in this forum has the knowledge and experience with this kind of defects and can tell me more about the involved components, especially mosfet power managment in relation to temperature and vrm 3rd phase driver and what the motherboard ec does.
i have read all the technical documents an datasheets but for the clevo side there is not much to find.
thank you
-
Meaker@Sager Company Representative
I went the whole hog and got all 6 places for VRM chips filled but I did not see this kind of behaviour. What chip did you replace it with? Where did you source it?
For reference.Last edited: Feb 22, 2018 -
Hi,
I replaced the Mosfets as shown in your reference picture, i bought them from a seller in hong kong from ebay. but the vrm is on the backside of the card - uP1642...
in the datasheet of this thing is mentioned a temperature reading circuit which controls mosfet power(80% load or high temperature causes throttling - with 3 mosfets - with 6 this doesn't happen as long as you don't raise powerlimit).
http://international.download.nvidia.com/openvreg/openvreg-type2-plus-1-specification.pdf
i think when i reboot the card starts in a protection mode and the card stays on boot voltage and sometimes while booting ec loses signal to vrm(uP1642) temperature sensor.
but i don't understand the communication with the ec and with the power managment chip through video bios. my next step is to replace the uP1642 i bought from ebay GB because i think it could be a false reading which causes the fan problem and the protection mode.
maybe you know something about the ec vbios checks, i couldn't find anything for clevo or something more specific about power up sequence of the computer.
thank you
sorry for bad english, it not my native language. -
Ok, I have to give up on this one, VID trace ripped off the pcb with my 2nd attempt to remove the QFP24 ic. first attempt to solder new chip was good, but i had little too much solder on the ground contact so there was a short and with trying to solve that i had bad luck - no way fixing that.
for anyone who's interested - the problem was cards video bios could not communicate card load(IC INA 3221 - amps over mosfets) and voltage regulator temp( IC uP1642 - monitors itself) to ec, thats why pc sometimes startet but fan didn't spin up under high temps(was reporting wrong conditions to vbios and ec) , and after reboot the signal was completely lost resulting in boot attempt with 22 beeps and then shutdown.
when code 43 in device manager happend the uP1642 was not able to change from boot voltage to any other by driver requested voltage and was running in safe mode.
with everything being ok, my card reached 82°C max under hard conditions, when malfunctioning easy and fast over 90 with strange throttling behavior.
i didn't understand it all, but maybe what i found out is useful for someone with similiar problems.
the cental part of all this is the up1642 and its temperature sensor and my card was not cooled on the backside - clevo decided the backplate for too heavy or something.
that's why i think this thing was messed up a bit and tried to repair.
thank you -
Meaker@Sager Company Representative
I had the chips professionally soldered on, the connections need to be not just making contact but very good to ensure better load balancing and low wastage.
-
yes, soldering the mosfets wasn't that hard, but on my card they weren't dead, as i found out later. the powermanagment on the backside was broken, that's why my card could not stable undervolt, even with an asics score of nearly 78%. only 1 step down was possible under load and even that was not really stable(from 1.025V to 1.012V).
I had problems with this card from day one. in the middle of a game it startet overheating because fan went quiet with 80°C - then going up, and no more power throttling - strange behavior.
this thing caused the problem all the time, but it needed some kind of "reverse engeneering" to understand it.
https://www.upi-semi.com/en-article-upi-362-1472 -
-
yes, i thought the same, but what was triggering the 22 beep alarm must be some kind of other sensor on the card. it's "THALERT"-function triggered by nvidia card.
it's a contact from mxm slot to ec, and notebook behaves the same when the card is not inserted. but it was starting with picture on screen, then after 10 to 20 seconds gave alarm and shut down.
temp sensor from gpu is readable without driver, showed around 40°C before shutdown. from my understanding, gpu sensor will trigger templimit function, but the other one shown in datasheet of uP1642 and nvidia documentation will trigger powerlimit function - my card showed sometimes "no load" limit at 99% load and 85 to 87°C.
i tested my book with an HD7970M - everything worked as it should.
as i said i didn't understand it all, that's why i asked for help in this forum. -
I'm not sure as how the whole card is build in order to tell you how exacly everything fits together. Maybe people like @Prema or @Khenglish can tell you more about it. -
exactly, but it seems not to be the gpu core sensor alone. in the service manuals are different vga temp lines from mxm mentioned.
in case of the AMD RX cards there is no sensor the ec can read - sm bus is involved or maybe they use Ic2 instead. i'm sure this is hardware related as it was with the older HD 7xxx cards in alienware and clevo machines.
anyway, my card is dead now and when i'm in the mood maybe i try to fix the broken trace on the pcb. but thank you for the interest. -
sorry, i meant I²C interface.
-
Meaker@Sager Company Representative
The EC monitors a variety of signals from different components, it's indeed not just a simple single sensor.
-
you measured the graphics card? it is shorted anywhere? I have a 980m shorted on voltage regulator and 1 mosfet burned. I would try to fix it. what did you see between coils and ground?
-
from inductors (Coils) to ground is 0.1 ohm. what is the voltage regulator for you? with my card's behavior i suspected the up1642 ic to be malfunctioning.
take a look at this:Attached Files:
-
-
yes the regulator is the same. when I removed it the shorted circuit go away. I measure and two mosfets csd87350 one of them is shorted. now I be waiting from China for new mosfet and voltage regulator. I measure now coils and ground and it is 0.8 at beeper. your card's voltage line is shorted? it is the first line on the right
-
Meaker@Sager Company Representative
Depending on where you are places like Digikey are great for getting chips.
-
@Kostasgreece
what is your cards behavior? 22 beeps, black screen, sometimes working, code 43 in device manager?
be careful when you measure the 87350 - result is only reliable when they are completely discharged. it's more easy to measure the pci express connector for a short.
the large contact in the right(top view) are 7-20V input rail and should read about 6Kohm - check polarity before. if it's a lot less there could be a short in mosfets. if you have already removed them simply put your finger on the contacts for a second - your bodys resistance will discharge them quickly - then measuring will be more reliable.
@Meaker@Sager
yes, those places are great,but very expensive. i tried to buy these mosfets here in germany, but most shops don't sell them - only mouser but they wanted around 20€ delivery for 5 pieces which is
ridiculous.
the chinese parts are working good, and are cheap with a price of 2$ a piece with no extra delivery cost you can't go wrong if you have the time to wait. -
Meaker@Sager Company Representative
You just have to be careful regarding genuine parts.
-
no my card stop working and for 3 seconds stop in the same picture,after that all the system close and when the card is on the laptop was shorted and do not opening the system. I found the first line from the right shorted. when i desolder the up1642p and the other mosfet is not shorted anymore the line. i hope my card was ok after that.
-
i found the parts from China (aliexp) . in my country Greece is very difficult to found everything like that.
-
maybe tomorrow I will receive the parts. I will try to fix it
-
today I change the mosfet and the up1642p regulator . card is working perfect but it is the 125w and I think I want a new 230w charger because the 180w is not enough.
-
Hi friend, I have the same issue with my 980m, thanks for your tips
-
did you fixed your card?
Clevo P370sm-a fan and graphics card problem
Discussion in 'Sager and Clevo' started by KuroSan, Feb 22, 2018.