The Notebook Review forums were hosted by TechTarget, who shut down them down on January 31, 2022. This static read-only archive was pulled by NBR forum users between January 20 and January 31, 2022, in an effort to make sure that the valuable technical information that had been posted on the forums is preserved. For current discussions, many NBR forum users moved over to NotebookTalk.net after the shutdown.
Problems? See this thread at archive.org.

    Time to eat crow - the real problem with ssd&sandforce

    Discussion in 'Hardware Components and Aftermarket Upgrades' started by nipsen, May 23, 2013.

  1. nipsen

    nipsen Notebook Ditty

    Reputations:
    694
    Messages:
    1,686
    Likes Received:
    131
    Trophy Points:
    81
    So I've been pretty adamant about how a current ssd is effectively more reliable than a mechanical drive, and posted about that on the forum a lot. The truth is that I've seen this demonstrated several times - that a mechanical laptop drive has given out after a couple of years when suddenly the entire disk is full of bad sectors for basically no apparent reason. What happens here is that the reading mechanism simply stops working, and this can happen relatively quickly, even on a fairly expensive disk, and even if you're not terribly careless with your drive.

    The SSDs don't have that problem, having no mechanical parts. And like I said, it's demonstrably true that ssds can take a lot of physical beating.

    It's also true that the advertised problems with the memory giving out after such and such many write and read operations is a non-issue for personal users, and that this has been the case for a very long time. Even the early card slot designs won't simply give out even after many years. And when it happens, it will be in irregular patterns: the controller software simply tags an unworkable piece on the ram-chips as if it was a bad sector on a regular disk, and you can continue using it with the same usually very minor issues.

    No, the real problem with an ssd is something I didn't know existed at all. Apparently this is actually the heart of the problem OCZ had with their drives that were recalled. And it is the same issue Corsair had with the drives they let people RMA before they "broke". It's also what Intel has been known for having issues with.

    What happens is that the Sandforce controllers that serve as the interface between the actual "disk" design and the computer, has several weaknesses. Some of them aren't much of a problem, such as queue depth, curious addressing designs, etc. These can cause blue-screens or protection faults in windows, they can cause errors in extremely rare situations (actually rare - situations that practically has to be programmed specifically to happen), and even then they are recoverable.

    There is one unrecoverable error that can happen with these drives, though. It is actually possible that a sandforce 2xxx series controller can lock in a panic mode that is impossible to go out of. This won't happen under stress-tests or anything having to do with the performance of the ram-areas or the throughput of the drive or the sata-controller. But it can happen if the system doesn't recover cleanly from hibernation. A wrong enable/disable cycle, and the disk essentially disappears from the system, and can not be recovered again.

    So basically what I had happen is that after using a Corsair disk with a SandForce 2 controller for about a year, with no issues, and great performance - it suddenly gave up the ghost after the computer revived from sleep. There's no warning, and no recovery afterwards - the disk is dead, because the controller has entered a panic mode. It cannot be accessed for low-level formatting - it is simply not present, since the controller refuses to "unlock". Poof, all data lost. It's still there - but cannot be accessed.

    In other words, the real problem with ssds has nothing to do with the performance, reliability or longevity of the actual disk -- but the controller software from SandForce. Corsair has admitted that some of their drives had this problem. Intel and OCZ did the same with some of their models. But what I had was a current drive with an updated and allegedly safe firmware.

    Nevertheless, when researching this now, the "controller is locked" issue is a known problem with the entire Sandforce line. It allegedly affects, in "rare occasions", any drive that use it. But loads of expensive drives still currently being sold have this ticking time-bomb issue with hibernation/recovery. And the only reason we don't hear much about it is that the usage patterns of people who run enterprise setups rarely involve using hibernation/suspend at all.

    Private users, on the other hand do use this feature, and end up being met with a company representative that insists this is a "rare issue" that only affects a tiny, tiny little part of their customer mass. Which is statistically true, very likely. But nevertheless, it is an issue that will occur in the right circumstances as certain as int 21h.

    But honestly -- that a problem like this can actually be there on all these drives (even though the "advertised" issues are non-issues), and not actually be consciously taken care of until now? Crazy.

    The next ssd I buy will in other words not have a Sandforce controller.
     
  2. HTWingNut

    HTWingNut Potato

    Reputations:
    21,580
    Messages:
    35,370
    Likes Received:
    9,877
    Trophy Points:
    931
    It's not just Sandforce, but every controller out there, maybe not a sleep/hibernation issue, but the controller just not responding or dying. The controller dying is the main reason for SSD failure and warranty returns. I've had three drives die this way, a Kingston V100 (JMicron), an Intel X25 G2 (Intel's own controller) and an OCZ (Sandforce). All were rapidly replaced under warranty though. But the laptop just wouldn't boot up, didn't recognize the drive, like it didn't even exist. Everyone freaks out about w/e durability but that's a far cry from what the real concern is. If my drive can no longer write but I can read, I'm ok with that. But if the controller dies, there is no way to retrieve data, period, or even boot a machine.

    The three examples I showed were earlier models no doubt, and happened in three different PC's so it's not like it was a single PC issue either. On the other hand, newer SSD's like my Samsung 830, 840, Crucial M4 so far have been running strong and no issues to date. But just another reason why users need to back up their data and their machines, on a regular basis.

    Controller locked I've resolved by performing a hot plug, basically unplug and then plug back in with the machine powered up.

    I'm no fan of Sandforce by any means, but a bricked controller is not unique to Sandforce alone.
     
  3. nipsen

    nipsen Notebook Ditty

    Reputations:
    694
    Messages:
    1,686
    Likes Received:
    131
    Trophy Points:
    81
    ..ouch. Some fix. >_< Thanks. I'll try and see if it works.

    But good point, that the controller locks probably aren't unique to Sandforce2xxx. Even if I haven't seen that hibernate crash elsewhere. It's still a strange way to design it.. Think it could just be that I triggered some "rest/sleep" function that usually triggers right after a power-down? Some driver used an acpi/ahci call of some kind out of order..?
     
  4. tilleroftheearth

    tilleroftheearth Wisdom listens quietly...

    Reputations:
    5,398
    Messages:
    12,692
    Likes Received:
    2,717
    Trophy Points:
    631
    In 1995 when MS started pushing 'plug and pray', sleep, hibernate and all the other 'green' features I was excited for about 10 minutes (until the brand new system I tried it on didn't 'wake up' and had to be hard reset...).


    Sleep, hibernate, and almost all other such features have been immediately turned off after a clean Windows install to avoid such situations where hardware and/or drivers/software is not 100% compatible with the power states that have been the 'standard' for almost 20 years now.

    With SSD's, I immediately found that they continued to be a problem (I thought I could leave the power states at the defaults because I was simply testing the SSD...) and in production use, I made sure to disable them as I always have 'as usual'.


    Interestingly, with the abilty to have 16GB+ RAM on a notebook, the abilty to disable the pagefile has also helped immensely with the stability of the systems I run***.


    What is very clear to me is that SSD's don't introduce new problems (with regards to power related issues) - but rather, they amplify any problem(s) that might exist in any given configuration.


    The best 'tweak' for SSD's is to disable the sleep/hibernate functions immediately - not that they are at fault (or not) on their own - it is the underlying (for decades) base that has never been 'fixed' properly to operate (without question or concern) how it should.


    ***A common cure/way to fix random Windows problems/glitches was to periodically disable the pagefile, reboot a couple of times - making sure that the file was actually deleted... and then re-enabling the pagefile again. This may be needed to be done 3 or more times in a year on some systems... almost as many times as Windows (XP and Vista) needed to be re-installed in the same time frame...



    While I've never believed an SSD is/was more reliable than a HDD... I do know that proper power delivery (and proper startup/shutdown sequences) is important to making them almost as reliable as the HDD's they're replacing.
     
  5. nipsen

    nipsen Notebook Ditty

    Reputations:
    694
    Messages:
    1,686
    Likes Received:
    131
    Trophy Points:
    81
    *nods* Smartest thing I've heard anyone say about ssds in a long while. :)

    Have to say, though.. I've been running this laptop on linux, sleeping and waking, without any problems for almost a year now. See, I use the linux boot while doing any work on it, so I kind of only boot Windows once in a while. And I don't tend to sleep the computer in windows unless I'm on battery (which happens once in a while, or else it's back in linux). So when I think about it, I might actually never have forced it to sleep while on power before.. until the entire thing crashed..

    Just boogles the mind that that actually was possible..

    (I mean, I can see the process when hibernating in linux. It's very simple, and very similar to anything with an hdd. Only difference is that the controller renegotiates speeds programmatically instead of having a reset controller, that then just reports a specific value, etc. Intuitively, you'd think that is actually safer as well, since the controller has to be present as a device before anything in the OS assumes anything is there. So no process or call will ever be sent to cause an interrupt before the device is ready. Meaning that presumably there's no way to get a status wrong, logically, and then get a signal sent or a flag set that is actually not correct..

    But it somehow still managed to get done in Windows. Think it could be Asus' "superhybridengine" scripts that did it? That they're able to run monitor commands on wake-up that might pre-empt normal calls.. and cause the controller to get an uplink signal before the power (for dram - could be using some voltage set script) was ready, that sort of thing.. Maybe SSDs and the dram controller will obey the same "undervoltage" protection during "unsleep", for example..
     
  6. Ajfountains

    Ajfountains Notebook Deity

    Reputations:
    700
    Messages:
    923
    Likes Received:
    139
    Trophy Points:
    56
    Reading this has me a bit worried. I have two systems, one with ssd and one with hdd, and i use the sleep function all the time. I have yet to see any performance issues (5 months now). I do still regularly shut down and reboot about a once a week. Samsung 840 256gb and a hitachi 750gb. Am i just lucky?
     
  7. tilleroftheearth

    tilleroftheearth Wisdom listens quietly...

    Reputations:
    5,398
    Messages:
    12,692
    Likes Received:
    2,717
    Trophy Points:
    631
    Considering that I haven't seen this issue fixed for almost two decades, it doesn't boggle my mind. lol...

    I also wouldn't be so sure of the robustness in Linux/Ubuntu either - as your situation shows?


    The problem is that there are so many 'links' between the O/S and the actual device - not all possibilities can be debugged/accounted for in 100% of the cases - at least not in my lifetime.

    Unless of course you control the hardware like apple does (unfortunately, they also controls their users likewise...)... but even there I have seen 'sleep issues' with apple hardware. And now that I think of it... apple actually needed to make this work because their systems run so hot (when they're on).


    All I know for sure is that a stable platform is a REQUIREMENT for me, not an optional bonus. That is why sleep/hibernate, system restore and lately the pagefile is disabled on all the systems I would like to depend on (whether they're mine or not).
     
  8. Ajfountains

    Ajfountains Notebook Deity

    Reputations:
    700
    Messages:
    923
    Likes Received:
    139
    Trophy Points:
    56
    I regularly back up my data, so should the drives crash I am ok. I always thought hibernate was the issue, not sleep.
     
  9. nipsen

    nipsen Notebook Ditty

    Reputations:
    694
    Messages:
    1,686
    Likes Received:
    131
    Trophy Points:
    81
    ..Mm. No, I can follow the log, see. Even if you run out of power and you need to restart.. or you wake after the power has been completely off, etc., the log essentially was the same. I had a wifi driver I tried to debug a while back, so I read a lot of wake logs, and experimented a lot with it. Never caused the drive to drop or the kernel to panic, because it always initialized the drive the same way, then negotiated speeds, etc. That's why this was so completely out of the blue that it croaked after a windows wake. I just didn't think it was possible to do it...

    You hear about people getting constant bsods because of ssd controller issues as well. And then you expect the drive actually can deal with just losing power unexpectedly, or getting a bunch of junk sent to it, etc. And it clearly can deal with that... I've really punished this drive before too, with reformats, etc. It didn't even croak when I destroyed the file-system and the process stalled/power lost before writing the new table either. The controller was still present, even if the drive was full of junk.

    But the wake/hibernation thing causing a controller "lock" actually was mentioned several places on Corsair's pages as an issue. Some of the forum threads from 2011 when the Force3 drives were recalled also mentioned that as a possible "cause" for the recall. So I don't think it's actually as completely out of the blue as it seemed to me, at least. Presumably there is a way to send sequences of calls, or set flags, in a way that causes the controller to shut down as a "safety" measure, in order to not destroy the data (and I'd suspect that it's a very specific situation as well, since like I said, it's possible to completely wreck the drive's information and partition table, etc., and still have the drive be present in the system).

    So if you're well placed enough and know enough about controller software, I kind of doubt this would be as mysterious and random as it perhaps seems to be.
     
  10. vsg28

    vsg28 Notebook Consultant

    Reputations:
    59
    Messages:
    248
    Likes Received:
    7
    Trophy Points:
    31
    Huh, I was never aware the newer SSDs such as my Samsung 830 were also prone to the sleep/hibernate issue. I put my laptop to sleep all the time and I have never had any issue so far. I guess I should just shut down each time now. Just need to re-stream my videos each time :D
     
  11. Aeny

    Aeny Notebook Consultant

    Reputations:
    110
    Messages:
    169
    Likes Received:
    93
    Trophy Points:
    41
    If I recall correctly that's almost how my Vertex III died (after ~4000hours and ~2TB writes), didn't wake up from sleep mode with the laptop. turned the laptop off -> on and the drive was gone. (Laptop gets put to sleep about 6times/weekday between classes and rebooted roughly once a week)
    However, my Corsair Force III that I recently replaced(after ~6000hours and ~4TB writes and a few days before warranty was over) Had another controller issue. It looked like it failed to read properly from it's memory chips causing the drive to mark random places as faulty and ruining the data on them, picking new random places every reboot. Hopefully my 840 will last longer than both SSDs before it (both died within a year) since I essentially have it without any warranty now.
    I'm not too worried though, backup frequently, learned that the hard way when my Vertex III died. And this thread just reminded me of that so I'll go do that now :thumbsup:

    Oh and if there was somehow a lost sleep command for the SSD that gets executed when the pc wakes up because of some delay, I'd try putting the locked SSD in a PC which is in sleep mode and see if waking that up would wake the drive too. But truth be told I'd have no idea what I'm doing since I got no clue how SSDs(firmware)/hibernate/sleep works internally.

    ~Aeny
     
  12. tilleroftheearth

    tilleroftheearth Wisdom listens quietly...

    Reputations:
    5,398
    Messages:
    12,692
    Likes Received:
    2,717
    Trophy Points:
    631
    nipsen,

    I stand corrected. I thought that this happened to you in Linux/Ubuntu... what version/SP of Windows then?
     
  13. HTWingNut

    HTWingNut Potato

    Reputations:
    21,580
    Messages:
    35,370
    Likes Received:
    9,877
    Trophy Points:
    931
    No, not lucky, just fortunate.

    I would not say SSD is more reliable than HDD, just that it's more durable. You can shock an SSD while it's running and have a 99.99% probability (made that up) of it not failing where a hard drive it is quite likely that it will fail. But as far as reliability, controller failure is as bad as it can get, and it's not fun. That's why all my most important documents are backed up daily.
     
  14. nipsen

    nipsen Notebook Ditty

    Reputations:
    694
    Messages:
    1,686
    Likes Received:
    131
    Trophy Points:
    81
    It's also been unusually resistant to things like powering the device up with an external sata to usb controller, dropping power while read/write operations happen, disconnecting the controller while powered on, etc., etc. :D Like I said, I really thought it wouldn't actually break like this.

    It's almost as if you need to basically have started a low-level interface with the controller software to cause any problems.. That you need to do something clearly "illegal" to trigger a firmware panic..

    Windows 7, sp1, up to date, nothing special or curious running (other than the entire Power4GearAwesome app Asus uses for power profile management..). I've turned off the page file, etc. And this was just a from power->sleep to wake while still on power cycle. It's possible to hibernate, even if I've not set that to happen.

    I kind of wonder if there's something going on with an attempt to go from sleep to hibernate when the computer has been asleep for a period of time, though.. A partial wake-up?
     
  15. tilleroftheearth

    tilleroftheearth Wisdom listens quietly...

    Reputations:
    5,398
    Messages:
    12,692
    Likes Received:
    2,717
    Trophy Points:
    631
    I have used the Asus P4G SE power manager - but ultimately uninstalled it as it gave worse battery life (and system 'responsiveness') with it installed.

    Maybe it is also making some 'illegal' power calls too...
     
  16. OtherSongs

    OtherSongs Notebook Evangelist

    Reputations:
    113
    Messages:
    640
    Likes Received:
    1
    Trophy Points:
    31
    Meaning that you were able to get the laptop functional again?

    Would you kindly provide more specifics on what you actually did, thanks.

    Did this work on all 3 of your laptops with dead SSD?

    When you have a laptop with a problem SSD, how does one figure out if anything can be done by oneself to get the laptop functional again?
     
  17. Karamazovmm

    Karamazovmm Overthinking? Always!

    Reputations:
    2,365
    Messages:
    9,422
    Likes Received:
    200
    Trophy Points:
    231
    i do a safe boot on the mac it tests the hardware, or just use hardware check utility that they have
     
  18. HTWingNut

    HTWingNut Potato

    Reputations:
    21,580
    Messages:
    35,370
    Likes Received:
    9,877
    Trophy Points:
    931
    No, only for "locked" SSD's not dead controllers. All I did was turn on machine, unplug the SSD power for a few seconds, plug it back in, and reboot. I think. It's been a while. I read about it somewhere at the time and used it as a last ditch effort and it worked. I don't recall what brand or what SSD it is. I've used so many now, lol.
     
  19. OtherSongs

    OtherSongs Notebook Evangelist

    Reputations:
    113
    Messages:
    640
    Likes Received:
    1
    Trophy Points:
    31
    You mean you physically pulled a 2.5" SSD out of the laptop drive bay, while the laptop was running? And then to top that off, you shoved the same 2.5" SSD back into the laptop (which was still "on") and then forced a reboot???

    Given that the SSD might already be dead, it might not matter for the SSD. But doesn't doing stuff like that carry some risk for the laptop itself?
     
  20. Fat Dragon

    Fat Dragon Just this guy, you know?

    Reputations:
    1,736
    Messages:
    2,110
    Likes Received:
    305
    Trophy Points:
    101
    I've probably put my Envy 14 to sleep 2,000 times or more in 33 months of ownership (though only a couple dozen hibernates), and I've never had a problem with the SSD. Can problems and failures occur? Yes. That doesn't mean they're all that likely to occur. As with any computing situation, you should always have important data backed up, important installations reclaimable, and a backup plan if you can't afford much or any downtime with your machine. As long as you're prepared for problems like this, which are a reality but hardly an inevitability, there's no real reason to worry about them most of the time.

    For a data point, I've personally owned six computers over the last decade, and I've been a regular user of and often the household "IT guy" for probably 15-20 more, and I've never had to deal with a drive failure on an internal drive. It's actually always been my USB backup drives that have failed or started throwing bad sectors at me, where I've had problems (but never actual failures) with three out of six backup drives I've used.