Sandforce + Sandy Bridge = WTF?

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
I built a Sandy Bridge PC (Gigabyte H67 motherboard) with a Corsair Force (Sandforce SSD). After 3 weeks, it's close to death. Because of the preponderance of newly available components, I'm having trouble isolating the actual cause.

After a lot of experimentation, what I think is happening is that the SSD is disconnecting (or being disconnected) after an unknown amount of access. At this point, Windows does a very good impression of someone being slowly strangled, after which it either crashes to a BSOD or just quietly drops the SSD from the drive list.

On restart (even with the reset button), the motherboard no longer wants to play with the AHCI SATA devices (there's also a Samsung F4 and an Optiarc DVD-RW), or at least, it sits there for about a minute, then gives up trying to find the SSD.

So I tried the SSD in another PC (a 4-year-old AMD box) and it worked fine. I was able to run 'Check' the disk, extract the SMART data and run an 'extended' test (according to HDDScan). I tried it twice and at no stage did it miss a beat. Everything on the drive appeared intact.

"Aha!", I hear you cry. Intel H67 chipsets have a latent SATA bug, so that explains everything. Unfortunately, it was already plugged into one of the two SATA 3.0 ports that Intel assures us are unaffected by said bug. Just in case, I tried it on two of the SATA 2.0 ports as well, with and without the Samsung F4, and also with a different cable.

The SMART data says there's absolutely nothing wrong with the SSD, EXCEPT that it's experienced 30-odd 'Unexpected power loss' events. So I tried it with three different power connectors, and then with a different power supply.

I managed (with some difficulty) to upgrade both the motherboard BIOS (from F3 to F8 ) and the SSD BIOS (to Corsair v2). The Corsair update is supposed to fix problems where their drive causes a BSOD stop 0x4, and initially, I thought the problem was gone.

But then it came back, and now all I have to do is 'Check' the disk and - if it's a secondary drive - after going through the motions for a few seconds, the drive just disappears from the list. If it's the boot drive, you're more likely heading for the BSOD.

On top of that, after such an event, the motherboard (often) chucks its guts and declares a CMOS checksum error, then proceeds to downgrade to the initial F3 BIOS. After this has happened a few times (and you've managed to claw your way through re-upgrading it), you start to curse the names "Gigabyte" and "Corsair" and hope that unpleasant things might happen to them very soon.

The supplier has put me on notice that they have very few Corsair Force returns, and several of those were bounced back by Corsair as "no fault found". So I wonder HTF I'm supposed to prove there's anything wrong with the SSD. That's the rock.

At this stage, they can't even source a B3 fixed version of the Gigabyte board, and existing stock is exhausted, so that would appear to be the 'hard place'.

I'm leaning towards buying another SSD (big choice here ATM: Corsair or OCZ), but really don't want to be stuck with a near-$300 SSD (AU pricing); I build very few PCs these days and that's a premium component. If I had any kind of confidence, I'd lean on the supplier for a credit (wave arms about, shout a bit), but I can't be certain that it's not the stupid motherboard (which works flawlessly with the Samsung F4). Perhaps I'm over-analyzing the situation - what do you think?
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
Already tried that, by changing the settings in the power profile for HDD sleep to "Never".

Can't find any other settings, is there somewhere else I should be looking? It's Windows 7 64-bit.
 

BingBangBop

Storage is cool
Joined
Nov 15, 2009
Messages
667
Not that I know of. That being said, I think I would test to see if not allowing the machine (not just the HD's) to go to sleep at all helped.

I would also see if your CMOS setup has the ability to delay/stagger drive detection (Some SSD's are too fast and some MB's have issues with drive detection because of it). Normally the delay is designed to stagger the power up of drives to limit the startup power spike but it is also useful in slowing down the drive detection process of SSD's allowing them to get detected properly.

My hypothesis is that when the machine awakens from sleeping, it has to re-detect the attached drives and the BIOS is having a hard time of it.
 

Bozo

Storage? I am Storage!
Joined
Feb 12, 2002
Messages
4,396
Location
Twilight Zone
There is a setting to shut the power off to the hard drive.
Right click on the desktop and click on personalize. Then click on screen saver and then click on change power settings. then click on change plan settings and then on change advanced power settings There in the window is hard disk, where you select the power settings for the hard drive.
 

LiamC

Storage Is My Life
Joined
Feb 7, 2002
Messages
2,016
Location
Canberra
Given that the SSD works in other PC's, and the troubleshooting you've done, I'd suspect that Gigabyte motherboard. I'd opt for replacing the motherboard.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
BBB: Sleep is set to 30 minutes - I can reproduce the problem within a couple of minutes of boot up.

I had already tried setting a disk boot delay in the BIOS, although only 1 second; I figured that would be enough to help detection if that was a problem.

Bozo: Thanks, but done that, see previous post.

I have been speculating that it might be something to do with the Hotswap capability of AHCI, i.e. its ability to tolerate disconnection on the fly. Perhaps the SSD disconnects much faster than the mechanical drive, so it's the only one showing the problem? The other PC I tried isn't running with AHCI.

So far, I have one vote from LiamC. Any others? And any ideas how to test this MF?
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
I just let the PC sleep, then restored it. The SSD remained fully accessible, so the problem only manifests when the drive is being used to a reasonable extent - and possibly only in that PC.

Oh, forgot to mention that I tried Safe Mode as well.
 

BingBangBop

Storage is cool
Joined
Nov 15, 2009
Messages
667
If I were to vote on the likelihood of what kind of hardware failure, I too would vote MB over the SSD. I agree, that the fact that it wasn't a problem with another MB is highly suspicious. You can't totally rule out a compatibility issue between Sandy Bridge chipset (Intel) or Gigabytes MB/Bios with the specific Corsair/Sandforce firmware on that drive with the flaw being caused by any of those

There are still diagnostic options that should be checked out. I would want to find out if there is a more recent chipset driver because that is also a possibility; though that is somewhat discounted because the failure transcends a soft-reboot. I would also want to check another SB motherboard (Different brand & model because you are trying to limit the test to the chipset) to make sure it wasn't the SB chipset before condemning the MB or Gigabyte.
 

timwhit

Hairy Aussie
Joined
Jan 23, 2002
Messages
5,278
Location
Chicago, IL
How about booting from a Linux live image and testing the drive that way?
 
Last edited:

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
Good news, of sorts: the drive now misbehaves on the second PC. It seems to be steadily deteriorating and I was unable to get everything off it.

Bad news is that the failure mode really, really sucks, causing fatal instability in Windows and even in Clonezilla, which is Linux based. In fact, even remounted in a USB enclosure, the drive still managed to drag Windows into oblivion every time. The only way I could pull it out of the death spiral was to physically unplug the drive.

Frankly, I can't see how this could possibly be an option in a RAID situation; it might well take down the whole array. And it's worth remembering that the drive self-diagnostics (SMART) completely failed to reveal anything wrong with the drive.

Thanks for the contributions, sorry about the grim result. F**k Anandtech, Sandforce firmware is in no way ready for prime time (Sandforce writes the firmware for all Sandforce drives, regardless of custom differences).
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
So did I, but if you review what I wrote, it was happy as a clam when not asked to do anything. Only when I performed enough reads did it go into its kamikaze routine.

What I didn't say is that as things deteriorated, I began to suspect that it was caused by reading certain blocks on the drive. Multiple locations, so I was unable to work around it like you might on a hard disk drive. This is why I'm so alarmed at the way it (mis)handled the situation. It was hard enough trying to localize the problem with the machine in bits in front of me. How on Earth would you diagnose this if the PC was remote?
 

kenkong

What is this storage?
Joined
Mar 29, 2011
Messages
1
6 months ago I built a gigabyte tower with SSD in raid 1 using the X58 chipset and have not had a problem yet. It screems, 20 second startup and less than 30 second shut down with windows 7.

So I wanted a new laptop using Sandy with 2 SSDs in raid as soon as I saw it available. My vendor says he will no longer build them due to trending SSD failures of all makes at this time. They dont believe it has anything to do with the Intel chipset failure/recall because its been happening with first generation mobos as well. They say its actual SSD failure and nothing else. Suspect, is the controller actually damaging the SSD over time, new technology not yet mature. Vender says mostly Sagar but also a few MSI and Asus. (The only three I know that currently offer this configuration in a laptop) Thought I'd share my experiance.
 
Top