Hard disk failures

i

Wannabe Storage Freak
Joined
Feb 10, 2002
Messages
1,080
Rhetorical question:

Why is it always the day you've decided to do a complete backup that your hard disk fails?

Yesterday, every time I'd power on my computer, after a very short period of time (before even reaching the point it could start booting the OS) the system would lock up solid. After a day of switching out components and stripping the system down, I proved conclusively that the motherboard was dead. I remember thinking at the time, "well, at least the hard disks weren't the problem." I had two disks in the system to act as a crude backup - I manually copied anything remotely important from the first disk over to the second disk. Not a perfect backup method (I know at least one copy of the data should be physically detached from the others, and preferably physically separated by a distance proportional to the importance of the data) but it was better than nothing. The odds of _both_ disks dying simultaneously seemed sufficiently low.

However, having now moved both disks over to another system, I can't get this computer to work with them. I'm beginning to think the power supply in the original computer did something awful to the system - and that it didn't just kill the motherboard. The BIOS on this system is able to see these disks when I go into the Automatic IDE Hard Disk Detection section, but when the system boots up, they aren't seen at all and I get an HDD failure. Likewise, if I configure the BIOS to just auto-detect the drives on startup, they aren't seen at all.

What could I be doing wrong? How often does a drive fail in such a way that the IDE Hard Disk Detection function in the BIOS will see them, but then fail to do so during the boot process?

4 months of work on two hard disks, gone. This really, really sucks.

I wish I could find a job ... then maybe I could afford to get a RAID array and a decent CDRW drive. :(
 

i

Wannabe Storage Freak
Joined
Feb 10, 2002
Messages
1,080
Blah!

I managed to get the disks at least detected at startup in an older system, but no OS can actually access them ... they're BOTH fried.

This has to be the worst stroke of luck I've had in years. I can't believe they're both dead. What the heck are the odds for something like that?
 

Handruin

Administrator
Joined
Jan 13, 2002
Messages
13,737
Location
USA
Did you have some type of virus, or maybe a voltage spike??

When you tried them in another system, did you try different EIDE cables, or where they the same? Maybe a cable is bad?
 

Handruin

Administrator
Joined
Jan 13, 2002
Messages
13,737
Location
USA
To answer one of your questions i, I had my WD 6.4 Caviar drive fail on me a little while back and the BIOS continued to recognize it, but windows would not. It must have had a cluster failure or maybe something else, but it was still detectable.

I find it odd that both of your drives have died at the same time. Be it possible, the chances are very low. Maybe you should play the lottery soon? :)

Jokes aside, I'm hoping in your case that there is something common enough that you are doing which makes it seem as though both drives are bad when in fact they are not.

Can you list all the steps you have tried in more detail? It could be something very small causing the problem, or it may be both drives died due to some condition.

Just trying to help,

Doug
 

i

Wannabe Storage Freak
Joined
Feb 10, 2002
Messages
1,080
Thanks for the replies everyone. Here are the details:

System #1:
MSI 440TX motherboard (now fried)
Pentium 233MMX CPU
2 x 128 MB DIMMs
2 x WD Caviar 136AA HDD (now fried) mfd Feb 27/2000 and Aug 2/2000
"Synergy" power supply (now in the trash)

The system was fine two days ago. I turned it on yesterday morning, and everything appeared normal. It had just made it into Windows and was loading the last few programs in the startup when it froze solid. Unusual for that system, but not for Windows, so I rebooted. Froze again, almost immediately. Powered off. Powered back on again after 15 or 20 seconds, and it froze again during the memory test. So ... I figured something had worked itself lose in the system, and began removing the installed components one at a time. As I do so, I begin to notice that, the longer the system is powered off, the longer I can get the system to run. For example, if I just hard-reset the system, or power it back on after just 5 seconds, the system never comes up at all - just black screen as if the reset button were stuck. If I leave the system powered off for about an hour, I might get 20 seconds of functionality before it locks solid. It becomes apparent that the freeze is completely uptime-dependant: I could even be in the BIOS setup screen, but if the time was up, the system would lock solid. Anyway, I eventually strip the thing down to just an ISA VGA card and floppy disk drive and it still doesn't work. I move the now-barebones setup out of the original case and into a new case (complete with different power supply). Still no luck. I swap out the RAM. No luck. I even swap out the processor with a 133MHz one. No luck. Final diagnosis: motherboard has somehow become fried. I figure that either some weak on-board component has finally given out, or the power supply has done something evil. I wasn't sure which at that point.

As I mentioned earlier, I breathed a sigh of relief thinking that at least the disks were safe.

I tried accessing the disks today on a ASUS P2B-D system. The disks were detected by the BIOS "IDE Auto-Detection" option, but not during the boot sequence (got the old "Primary Master failure" message). I was trying both disks one at a time - but both produced the same error. If I set the BIOS to just auto-detect the devices during the boot phase, the system didn't see the drives at all.

So ... with things not looking so great, I tried both drives (again, one at a time) in another system (an old Intel motherboard, with a P75). This time the BIOS took a looooong time to detect the disks - but it did manage to do that. The system managed to load Windows 98 (again takes a looooong time to get to the point where the system is looking to load an OS) from the already installed and functioning 3.2 Gb HDD. I'd hoped I'd then be able to look at the suspect 13 Gb drive as drive "D". No luck. Once Windows started, there was no trace of the drive in "My Computer". Under "System ... Properties", two drives were listed but with blank descriptions. Neither the computer nor Windows was happy.

I tried booting into a command prompt only, and started FDISK to at least see if the other drive was visible. FDISK immediately posted a flashing warning on its main menu stating that disk 2 was inaccessible. This happened with both drives.

Any suggestions? I have to say this has just not been a good experience so far.
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
21,564
Location
I am omnipresent
Couple of years ago I was on a job moving servers from one location to another. Lots and lots of servers. 70 or 80, I think. We could only move a couple machines at a time, and only on weekends (no interruption in service).

At first we were just taking the machines down, unplugging everything, throwing them in a van, taking them to the new data center and firing them up.

Every weekend, one of the servers wouldn't come back up for whatever reason.

Now, it was springtime. No temperature extremes.
We started padding the carts we were carrying the machines around on, underinflating the tires on the van. avoiding bumps in the road... didn't matter.

In 10 weeks I think we broke 11 servers. Not that me or the other two guys doing the work weren't taking all the precautions we could possibly take. It just happened (the machines, by the way were HP NetServers running Netware5 and NT4).

I guess the moral the story is, computer hardware breaks. It's inescapable. The best you can do is to be as careful as you can be.
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
21,564
Location
I am omnipresent
Time to break out data recovery software, i.

You might also try using a non-retarded fdisk. Disk Manager in 2000 works, so does the fdisk from Linux.

Other than that, the best you can do is try data recovery software.
 

i

Wannabe Storage Freak
Joined
Feb 10, 2002
Messages
1,080
Given the odds of this sort of thing, I guess the other things worth mentioning are that I've been working casually with computer hardware since 1988, and that I'm the second most paranoid person I know with respect to avoiding ESD and physical shocks. The system itself was getting power via a good quality Tripp-Lite isobar surge protector (which is still showing green status lights).
 

i

Wannabe Storage Freak
Joined
Feb 10, 2002
Messages
1,080
Thanks Mercutio. You're right of course ... and I really should have had some type of alternate backup around somewhere. I've got a full backup on CDR from mid-February, but as luck would have it, I did some really critical stuff just a couple of weeks later.

Any suggestions for data recovery software? I think I've got Norton Utilities V8.0 for DOS somewhere, but I'm not sure how the heck I'm going to move 6 or GB off those disks.
 

i

Wannabe Storage Freak
Joined
Feb 10, 2002
Messages
1,080
Thanks for the advice Mercutio, et al. I managed to get the data I needed back!

Once I'd found a motherboard that could detect the drives, I created a basic system using a spare, functional hard disk and installed Red Hat Linux 7.1. Then I moved one of the damaged hard drives back onto the system as the secondary master. I booted up Linux, and voila! During the OS startup, I could see Linux had detected the problematic disk and even its partitions. A simple "mount -t vfat" command worked, and despite the response time being alarmingly slow, I managed to copy the files I needed over to the good drive before anything else went wrong. I just finished writing additional copies onto CDRs. Phew!
 

Buck

Storage? I am Storage!
Joined
Feb 22, 2002
Messages
4,514
Location
Blurry.
Website
www.hlmcompany.com
i said:
Once I'd found a motherboard that could detect the drives, I created a basic system using a spare, functional hard disk and installed Red Hat Linux 7.1. Then I moved one of the damaged hard drives back onto the system as the secondary master. I booted up Linux, and voila! During the OS startup, I could see Linux had detected the problematic disk and even its partitions. A simple "mount -t vfat" command worked, and despite the response time being alarmingly slow, I managed to copy the files I needed over to the good drive before anything else went wrong. I just finished writing additional copies onto CDRs. Phew!

My expectations are that the drives are physically fine. First of all, your P75 system won't properly recognize a 13 GB drive if it actually gets past POST, so that is why it didn't work properly with that system. If the drives are bad, taking them from one system to the next won't make them work.
 

i

Wannabe Storage Freak
Joined
Feb 10, 2002
Messages
1,080
Buck said:
My expectations are that the drives are physically fine. First of all, your P75 system won't properly recognize a 13 GB drive if it actually gets past POST, so that is why it didn't work properly with that system.

But the P75 system was the only system that worked at all with these drives after the original system failed! It's an Intel EV6 series motherboard, with the last BIOS update Intel ever made for it (if that makes a difference). Out of the motherboards I tried (an ASUS P2B-D and another board I haven't identified), that Intel EV6 was the only one that could fully detect the drives (although it took ages to detect the disk on every system startup).

Buck said:
If the drives are bad, taking them from one system to the next won't make them work.

That's a totally reasonable assertion. However, the P2B-D board only detected the drive in the "Auto IDE detection" option in the BIOS setup. However, if I chose the suggestion it made (which was correct), the system would reboot and immediately report a disk failure. I tried this several times: BIOS setup auto-detect works fine, normal system startup subsequently reports a failure. If I selected the option to just auto-detect the drives each time on startup, it didn't see any drives at all. Incidentally, these disks had originally been in that P2B-D system up until a few months ago, and had worked just fine using both configuration methods. None of this made any sense to me, and that's what prompted me to post my original question.

On the other system I tried, the system couldn't detect the drive at all. Not in the BIOS setup or otherwise.

In each case I tried at least two known-good IDE cables.

What can I say? It doesn't make any sense to me. All I know is that the drives are useless now because none of the "decent" motherboards I have can detect them anymore. If that doesn't qualify them as damaged, I don't know what would. Yet I still managed to get the data I needed off them by moving them over to that Intel EV6 board. Coincidence they just started working at that moment (after having been shuffled from system to system and configuration to configuration for a few hours) ? Who knows.

On the upside, this gives me yet another excuse to upgrade my computer to something newer and (hopefully) more reliable. Like an abacus.
 

myself

What is this storage?
Joined
Mar 25, 2002
Messages
29
Hi all :)

i is still a little frustrated about this whole failure thing, and frankly needed a break from the whole affair, so I volunteered to post an update.

Given that the pressure was now off of him (he'd gotten all the data back he needed), he decided to test the drives one more time in the ASUS P2B-D system. The same thing happened as before.

The motherboard detected the drive in the "Auto IDE hard disk detection" portion of the BIOS setup, but when booting, reported:

Primary master hard disk fail

It did this for both drives. This was while testing the drives one at a time, with no other drives present (not even a floppy).

So, he decided to download the latest DLGDIAG program from Western Digital and try and test the drives anyway, one at a time.

With the system stripped down to just a floppy drive and video card, he tested both drives. He broke out a brand new, 80-wire IDE cable that was still sealed in its plastic bag to run these tests...



DRIVE #1
Ignoring the "Primary master hard disk fail" message, the system was booted off a freshly formatted disk into DOS. Then the DLGDIAG software was started up.

The "Quick Test" function reported:

NO ERRORS DETECTED FOR THIS DRIVE

However, the "Extended Test" function reported:

FOR ADDITIONAL INFORMATION, PLEASE CONTACT WD TECH SUPPORT
FINAL CODE FOR THIS DRIVE: 0457


Checking the Western Digital website, it seems code "0457" means:

Code 457 / IDNF 2-9 / Identified Data Not Found. Several instances of information on data positioning and location could not be found. Drive should be replaced.

Nice.

DRIVE #2
i powered off the system, and simply exchanged the disks, keeping the same brand new 80-wire cable. Again, ignoring the "Primary master hard disk fail" message, the system was booted off a freshly formatted disk into DOS. Then the DLGDIAG software was started up.

After selecting the "Quick Test" function, the following message came up:

DLGDIAG detected errors while testing the cable for the currently selected drive.

Interesting, given that the exact same cable was used minutes earlier for the other drive. Unphased, the software is exited, and the system powered off. The IDE cable is switched to an older, known-good 40-wire cable. The HDD (configured as master) is placed on the end connector of the cable The system and test are restarted.

After re-selecting the "Quick Test" function, the message appears again:

DLGDIAG detected errors while testing the cable for the currently selected drive.

Still unphased, the software is exited, and the system powered off. A third known-good 40-wire cable is installed. This one has only one possible connector for the drive (connectors at both ends, none in the middle). The system and test are again restarted.

And again, the following message appears after selecting "Quick Test":

DLGDIAG detected errors while testing the cable for the currently selected drive.

Something's clearly not right here. Three different cables have been tested, including the first one (a brand new 80-wire cable) that worked just fine when the tests ran on DRIVE #1.

The option to ignore the warning is given, so this time that option is selected. The "Quick Test" function subsequently reported:

NO ERRORS DETECTED FOR THIS DRIVE

The "Extended Test" function also posts the warning about errors detected while testing the cable, but given the record so far, the warning is ignored and the test proceeds. The "Extended Test" function finally reports:

NO ERRORS DETECTED FOR THIS DRIVE

We are, of course, skeptical about this. Something is clearly wrong here ... because the drive can't be detected correctly by the motherboard, AND the testing software is reporting communication errors that really can't be explained by a bad cable.



Some sort of vindication about one of the two drives is better than nothing I suppose. I wonder how long the warranty lasts/lasted on that first drive.
 

J-Frog

What is this storage?
Joined
May 30, 2002
Messages
19
Mabye loose pins, bad controller board...

These drives in question are Western Digital 13.6gb 5400rpm Spartan :diablo: drives, are they not?
 

J-Frog

What is this storage?
Joined
May 30, 2002
Messages
19
Mabye loose pins, bad controller board...

These drives in question are Western Digital 13.6gb 5400rpm Spartan :diablo: drives, are they not?
 

i

Wannabe Storage Freak
Joined
Feb 10, 2002
Messages
1,080
J-Frog said:
These drives in question are Western Digital 13.6gb 5400rpm Spartan :diablo: drives, are they not?

They're Caviars. And they're now enjoying their retirement in the bottom of a box in my closet.
 

Buck

Storage? I am Storage!
Joined
Feb 22, 2002
Messages
4,514
Location
Blurry.
Website
www.hlmcompany.com
i said:
J-Frog said:
These drives in question are Western Digital 13.6gb 5400rpm Spartan :diablo: drives, are they not?

They're Caviars. And they're now enjoying their retirement in the bottom of a box in my closet.

Why not RMA them i? You can check their warranty here: http://websupport.wdc.com/websupport/clearexp_scripts/warrantystart.asp

As far as Spartan drives go, they were pretty good, considering they were Caviars with 1 year warranties. Now Protege, that's a different story.
 

Buck

Storage? I am Storage!
Joined
Feb 22, 2002
Messages
4,514
Location
Blurry.
Website
www.hlmcompany.com
i,

The test may return a bad data cable error if there is a communication problem related to the 16-bit bus which does data transfer through pins 4, 6, 8, 10, 12, 14, 16, 18. This can be related either to the cable or the 40-pin interface. Since you’ve tried different cables, you can rule out the first option. This naturally leads to a connectivity issue with the HDD’s circuitry.

Having IDNF errors is bad. The ID is data located within a sector and acts as a beacon for positioning purposes - such as reporting what Cylinder, Head, and Sector the drive is at. An ID can also be used to identify more than one sector. IDNF errors are not relocated.
 

i

Wannabe Storage Freak
Joined
Feb 10, 2002
Messages
1,080
Thanks Buck. :) I had looked into the warranties - both are still covered until at least early next year. I'm not sure I'll bother returning them though for 3 reasons:

1) I'm experiencing a lazy streak. Packaging up a hard disk - evern without the special packaging required - seems like an awful lot of effort right now.

2) It's not clear to me why this failure occurred. Given that the power supply was evidently a cheap one, and that the motherboard died at the same moment as the drives, how can I be sure that the motherboard or the power supply didn't damage these disks? It's hardly Western Digital's fault if that's the case. Of course, given that I don't know how damage can occur, I guess I could just as easily wonder if the hard disks didn't fry the motherboard.

3) If I did send the hard disks in under warranty, I'd just get someone else's defective disks back. Sure, they'd be "refurbished," but how much would I trust them?
 

Buck

Storage? I am Storage!
Joined
Feb 22, 2002
Messages
4,514
Location
Blurry.
Website
www.hlmcompany.com
i said:
Thanks Buck. :) I had looked into the warranties - both are still covered until at least early next year. I'm not sure I'll bother returning them though for 3 reasons:

1) I'm experiencing a lazy streak. Packaging up a hard disk - evern without the special packaging required - seems like an awful lot of effort right now.

2) It's not clear to me why this failure occurred. Given that the power supply was evidently a cheap one, and that the motherboard died at the same moment as the drives, how can I be sure that the motherboard or the power supply didn't damage these disks? It's hardly Western Digital's fault if that's the case. Of course, given that I don't know how damage can occur, I guess I could just as easily wonder if the hard disks didn't fry the motherboard.

3) If I did send the hard disks in under warranty, I'd just get someone else's defective disks back. Sure, they'd be "refurbished," but how much would I trust them?

1) That happens; considering how inexpensive HDDs are, I guess you can let the whole incident slide.

2) Regardless of the cause, the drives have failed and will be replaced by WD.

3) You really don’t get someone else’s defective drive. If the drive is refurbished, it has been opened and rebuilt, plus the latest firmware has been written to it. If the drive arrived into WD defect free, then there was nothing wrong with the drive to begin with. If the drive has been RMAd twice before, it is automatically scrapped.

It’s good being a WD reseller, you get to learn a lot. :D
 

i

Wannabe Storage Freak
Joined
Feb 10, 2002
Messages
1,080
Blarg!

This is not my year for hardware.

I never mentioned here that a few months ago, my Plextor 'Ultraplex Wide' CD-ROM drive lost its tiny mind. It was only about a year and a half old, and rarely used, but over the span of a couple of months, it started doing things that made me suspect trouble. At first, about once every other day I'd hear it suddenly start trying to spin up, as if I'd put a CD into it - except I hadn't touched it. After a few weeks, it would start doing this each day. And then a few times each day.

And then I'd notice it occasionally flashing its LEDs, spinning up, spinning down, flashing its LEDs a few more times, spin up, down, etc, etc, until finally just going completely dead. The power light would be out, and pressing the tray eject button did nothing at all.

And then finally, after a couple of months of this escalation of symptoms, I was sitting at my desk when the Plextor started spinning up. Only it kept spinning up! It started making the worst noise I've ever heard coming from a PC ... it was honestly as loud as a drill press! I expected the drive motor to burn up and see little fragments of metal and plastic explode through the front bezel. I hit the power button.

I'd never really liked the drive with it's 68 pin SCSI connector (only device I had with one of those connectors, so running a dedicated cable to it was a bit of a pain), so I just gave up and pulled it out of the system. I still had a Plextor 'Plexwriter' CDRW drive left anyway. That was good enough for when I needed to read a CD. Months later, the Ultraplex Wide is still under warranty, and it's still sitting in my closet.

But now the Plexwriter is showing problems! It didn't start up normally today - the 'Low Speed Write' LED was blinking in a sequence of five flashes. According to the Plextor site, that means, "Offset adjustment failed: Could not properly set the tracking offset for a CD-R/RW disc." There's no CD in the drive!

I can't say I'm very impressed with Plextor anymore.

I'm sure the next thing that's going to fail is my 19" Viewsonic monitor. It's been doing progressively strange stuff over the span of about a year. Scares the crap out of me too - it makes this crackling noise and the image dims and swells way past the viewable part of the picture tube.
 
Top