SMART failure on a dumb drive

Tannin

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
4,448
Location
Huon Valley, Tasmania
Website
www.redhill.net.au
What do you guys make of this one?

Customer of mine has a more-or-less new system: Gigabyte 7IXE4 main board (AMD 751), Duron 800, nice new case, Gforce II. He kept his old hard drive, a Western Digital 2.5GB, and CD drive - whatever crap Hewlett Packard palmed him off with in the first place.

Just now, it has started giving a SMART error on startup; the message is something like:
Code:
SMART error detected. Backup data and replace drive. Press F1 to continue.
This a BIOS-level error message, no SMART software involved. Press F1 and it boots.

I haven't pulled the drive out yet so I don't have the model number, nor have I tested it in another machine. The mainboard BIOS does not offer a SMART enable/disable option, it's stuck permanently on, I gather.

So far, it sounds like a dying drive. But I have been utterly unable to fault the unit. Sounds normal, performs as expected for a 2.5, no mysterious data errors, no bad sectors, Scandisk is happy, and even the eight-hour Spinrite torture test can't fault it.

Should I:
  • Tell him to buy a replacement drive right away?
  • Tell him not to worry about it, the drive is fine?
  • See what the WD diagnostic software has to say?
  • Tell him that there is no such thing as SMART on a 2.5GB drive out of a Pentium 166?
  • Play games with the transfer mode on the mainboard
  • Replace the hard drive cable?
  • Tell him that I've never seen a SMART error in my whole life before and please can I have a screenshot?
 

P5-133XL

Xmas '97
Joined
Jan 15, 2002
Messages
3,173
Location
Salem, Or
c, followed by downloading a smart reporting utility, then if nothing has been answered do a.

Don't ignore a warning like this. You will look really really really stupid if the drive actually fails and there may be liability involved..
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
22,330
Location
I am omnipresent
I've found that my perfectly good disks function HIGHLY irregularly while IBM's free SMART tool is loaded; this is particularly true with Quantum ASes and I've observed this behavior on more than one machine. I don't know whether to blame SMART or IBM's tool.

Anyway, the obvious, simple thing to do is to run wddiag. It takes just a couple minutes.
 

Will Rickards

Storage Is My Life
Joined
Jan 23, 2002
Messages
2,012
Location
Here
Website
willrickards.net
I've seen errors like that on some support calls I've done.
We just replaced the drive.

I seem to remember some people having the error for a while
but no actual drive failure. All this error means is the drive is outside
of the smart tolerances. While this might indicate soon to be dead drive,
it might also just indicate a drive that has seen better days.
In my opinion I'd suggest replacement.
2.5GB's has to be cramped anyway.
 

Tannin

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
4,448
Location
Huon Valley, Tasmania
Website
www.redhill.net.au
WD Diag said "SMART Error #2, call WD support." Fat lot of help that was!

I talked it over with my customer and, in view of his wife's expressed intention to take to his anatomy with a pair of scissors if she looses her family tree data, he's decided he can spring for the cost of a nice new 20GB drive.

He's thinking about a burner too, though he just paid for the Athlon upgrade so that might have to wait a while.

My favourite Samsung Spinpoints are out of stock, alas, so we've had to make do with a Western Digital Protegee. It's still way faster than the 2.5, and he was only using 1.5GB anyway so 20Gb is heaps for him.

If I find out anything more about the drive (such as if it fails anytime soon) I'll let you know.

Thanks guys.
 

Barry K. Nathan

What is this storage?
Joined
Feb 9, 2002
Messages
42
Location
Irvine, CA
Tannin said:
WD Diag said "SMART Error #2, call WD support." Fat lot of help that was!

I'm almost done putting together a Linux boot floppy which should allow you to get more SMART information from that drive.

I'll post in this thread again when I have it done, with a URL to the boot floppy image and instructions for turning it into a real floppy.
 

Barry K. Nathan

What is this storage?
Joined
Feb 9, 2002
Messages
42
Location
Irvine, CA
Ok, here it is:

http://members.cox.net/barrykn/

sb010.img is a 1440K raw disk image file. sb010.zip is a zipped version of this file -- it's only a 740K or so download. You'll want to use your browser's "save to disk" function, or whatever your browser has, on the .img file if you decide to download that. I have a 300MB/month download limit, so I'd prefer if you download the zip file, but in the worst case I can post these files again on another server.

(BTW, this is "SmartBoot 0.1.0", although that name and version number currently appear nowhere on the disk itself at the moment.)

Once you download the image file, you can put it onto a floppy in several ways. If it's under Linux, you can run a command like this: "dd if=sb010.img bs=8k of=/dev/fd0". Under Windows (esp. NT/2000/XP), you can run something like this program (which I haven't tried, to be honest): http://uranus.it.swin.edu.au/~jn/linux/rawwrite.htm , or (another program I've never tried) http://ntrawrite.sourceforge.net/ . Under DOS or Win9x/ME, you can use RAWRITE, which is available from lots of places. Here's one URL for it: ftp://ftp.uci.edu/mirrors/redhat/linux/7.2/en/os/i386/dosutils/rawrite.exe -- if you have any Linux distribution CDs, RAWRITE is probably somewhere on there too (perhaps in a "dosutils" directory).

If you need me to post the image in some other format instead, feel free to ask.

Estimated system requirements: Pentium, 4MB RAM, and Intel, AMD, VIA, or CMD IDE controller. (If you need this to support Promise, HIghPoint, or whatever, I can recompile with that.)

I'll stop this post here and write more detailed instructions in my next post.
 

Barry K. Nathan

What is this storage?
Joined
Feb 9, 2002
Messages
42
Location
Irvine, CA
How to use SmartBoot:

-Put the floppy disk in the drive and boot the computer

-The computer should display "Loading...." and a bunch of dots, then "Uncompressing Linux" or something like that. If it displays any hexadecimal numbers before it gets to "Uncompressing" then it could be a bad flloppy. If it reports a CRC error, that could also be a bad floppy, although it's more likely to be some kind of Linux bug and if that happens I'll try to fix it for SmartBoot 0.1.1.

-A bunch of messages will fly by, then it will continue loading off the floppy and eventually reach a command prompt.

-If you want to view any of the boot messages that scrolled off the screen, you can use Shift-PageUp and Shift-PageDown to scroll up and down in half-screen increments.

-Before I discuss possible commands/options, I must mention device names. SmartBoot currently supports up to two IDE controllers, each with two channels, per system. Drives are named /dev/hdX, where X is a letter from a to h. a is the 1st IDE channel's master, b is 1st channel's slave, c is 2nd channel's master, d is 2nd channel's slave, and e-h repeat the pattern for channels 3 and 4.

-"smartctl -a /dev/hdX" displays all SMART info for hdX. If it says SMART is disabled, run "smartctl -e /dev/hdX" to enable it. You'll need to use scrolling, as described above, to view all this data.

-"smartctl" with no other options will list all command line options (this is a few screens long and will also require scrolling)

-The only way to save any information at this point is to remove the SmartBoot floppy, put a blank one in, and do "smartctl -a /dev/hdX > /dev/fd0" -- this will overwrite the floppy's contents, starting with the boot sector, and you'll need to use od or Norton Disk Doctor or something to view the raw sectors later. I know this is pretty awful; I'll do something better in the next release.

The interesting information will probably be in the list of attributes/thresholds. See which attributes are below their thresholds (this will probably make more sense once you see and scroll through some smartctl output).

Also, just to make sure I don't get misunderstood: smartctl is part of SmartSuite 2.1, which is a Linux program I didn't make. SmartBoot is (currently) a floppy-based Linux distribution for running smartctl. (It might become more than that in the future.)

I hope this is enough info to get you started.
 

Barry K. Nathan

What is this storage?
Joined
Feb 9, 2002
Messages
42
Location
Irvine, CA
I think I need to clarify my previous post a bit:

Barry K. Nathan said:
-A bunch of messages will fly by, then it will continue loading off the floppy and eventually reach a command prompt.

-If you want to view any of the boot messages that scrolled off the screen, you can use Shift-PageUp and Shift-PageDown to scroll up and down in half-screen increments.

At this point, if you want lots of familiar Unix commands to be accessible, type "aliasall" and press Enter. To see a list of these commands, "alias". The shell also has "help", for what it's worth. None of this is necessary to simply use smartctl though.

The interesting information will probably be in the list of attributes/thresholds. See which attributes are below their thresholds (this will probably make more sense once you see and scroll through some smartctl output).

More detail: Each SMART attribute is rated on a scale of (I think) 253 to 0. Aside from Seagate drive temperatures (where the attribute seems to literally be the temperature in degrees Celsius), higher is better. If any attribute is below (or at...?) its threshold, that's when the drive's going to consider declaring a SMART failure.

Also, when you're done, just turn the machine off or hit control-alt-delete.
 

Vlad The Impaler

Learning Storage Performance
Joined
Jan 27, 2002
Messages
166
Location
UK
Replace the drive.

The customer will not appreciate that you are trying to save them money by keeping a drive that you are not sure about. They sure as hell will be upset if it does fail, though. It is not worth the risk.
 

Tannin

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
4,448
Location
Huon Valley, Tasmania
Website
www.redhill.net.au
A drive under warranty, Vlad, I'd agree without question. "If in doubt, swap it out" is my motto.

But it's his money I'm wanting to spend, and while I have the right to recommend, I can't insist on it. You will know from your own experience just how many stupid things customers want you to do (or not do) to "save money", I guess - and just how often it costs them a hell of a lot more when their rickety, clapped-out system eventually goes down. But in this case, he's done the sensible thing and is going with a brand new drive.
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
22,330
Location
I am omnipresent
The proper enticement would probably be your recommendation along with a "discount" for a trade-in on his current drive. That could take the form of free labor for the ghosting or an offer of the drive at cost, whatever, but nothing shows sincerity at retail like offering a discount. ;)
 

Tannin

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
4,448
Location
Huon Valley, Tasmania
Website
www.redhill.net.au
Quite right, Mercutio.

I gave it to him at just above my cost. Kristi did the data transfer in ~15 minutes and seeing as he just spend a fair bit on his other new bits a week or two ago, I don't need to bleed his wallet too hard. That's his only freebie though! When he comes back for the burner in a month or so, he can pay full price for it.
 

Buck

Storage? I am Storage!
Joined
Feb 22, 2002
Messages
4,514
Location
Blurry.
Website
www.hlmcompany.com
Tannin,

Good of you to help your customer. Personally, I would not have used a Protege drive, as it is my least favorite "new" drive, save the U series. Didn't WD DIAG return a four digit code?

BR
 

Tannin

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
4,448
Location
Huon Valley, Tasmania
Website
www.redhill.net.au
Nope: just "error #2".

We had two Proteges because we ordered five Samsungs, far and away my favourite 5400 RPM drive, but they were out of stock. The WD Caviar 20GB 5400 has finished up now, so the only other thing my supplier had in 20GB was the Protege. The alternatives were Maxtor 20GB (same price as a Samsung 40GB, which we already had in stock but was not needed size-wise, and cost too much for my customer's domestic bliss factor), Seagate U-Series things (no comment) or 40GB 7200s. Not a great drive, the Protege, but still vastly faster than his old 2.5GB WD, so I just got a couple to tide us over.

You know what really irks me though? The bloody Proteges cost A$10 to $15 more than the far superior Samsungs!
 

Buck

Storage? I am Storage!
Joined
Feb 22, 2002
Messages
4,514
Location
Blurry.
Website
www.hlmcompany.com
What version of WDDIAG did you run? What is the Model number on the drive?

BR

PS - Protege drives are at a premium since they are in half of the Microsoft XBOX consoles.
 

Buck

Storage? I am Storage!
Joined
Feb 22, 2002
Messages
4,514
Location
Blurry.
Website
www.hlmcompany.com
Tannin said:
I need to be at the office to answer those questions Buck - and damn it! - it's 10:05AM already and I need to be at the office!

Cry, O weeping Tannin - I just step into the garage!

BR
 

Tannin

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
4,448
Location
Huon Valley, Tasmania
Website
www.redhill.net.au
> smartctl -a /dev/hda
Device: WDC AC22500L Supports ATA Version 3
Drive supports SMART and is enabled
Attribute ID 1 Failed
Please save all data and call drive manufacture immediately

General SMART Values:
Off-line data collection status: (0x00) Offline data collection activity was never started
Total time to complete off-line data collection: (800) Seconds

Offline data collection Capabilities:
(0x03) SMART EXECUTE OFF-LINE IMMEDIATE
Automatic timer ON/OFF support
Suspend offline collection upon new command
NO Offline surface scan supported
NO Self-test supported

Smart capabilities:
(0x0002) does not save SMART data before entering power-saving mode
Supports SMART auto save timer

Error logging capabilitiy:
(0x00) Error logging NOT supported

Vendor Specific SMMART Attributes with Thresholds:
Revision Number: 5


(1) Raw read error rate:
Flag: 0x000b Value: 001 Worst: 001 Threshold: 051 Raw Value: 2624

(4) Start Stop Count
Flag: 0x0012 Value: 098 Worst: 098 Threshold: 040 Raw Value: 2579

(5) Reallocated Sector Ct:
Flag: 0x0012 Value: 195 Worst: 195 Threshold: 000 Raw Value: 9

(10) Spin retry count:
Flag: 0x0013 Value: 100 Worst: 100 Threshold: 051 Raw Value: 0

Callibration retry count:
Flag: 0x0013 Value: 100 Worst: 100 Threshold: 051 Raw Value: 0

(199) UDMA CRC Error Count:
Flag: 0x000a Value: 200 Worst: 200 Threshold: 000 Raw Value: 0

(200) Unknown Attribute:
Flag: 0x0009 Value: 100 Worst: 253 Threshold: 051 Raw Value: 0

Device does not support error logging
Device does not support Self Test logging
>
 

James

Storage is cool
Joined
Jan 24, 2002
Messages
844
Location
Sydney, Australia
... uh, except obviously that the drive is getting higher error rates than it should - high enough that they're above what the SMART frimware on the drive considers the threshold, anyway.

What I meant is, I wonder what that means is going wrong? It certainly is a sick drive.
 

Tannin

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
4,448
Location
Huon Valley, Tasmania
Website
www.redhill.net.au
Well, funny thing is, aside from the SMART error, the drive behaves perfertly. As I said in the first post in this thread:

"But I have been utterly unable to fault the unit. Sounds normal, performs as expected for a 2.5, no mysterious data errors, no bad sectors, Scandisk is happy, and even the eight-hour Spinrite torture test can't fault it."

I'm trying to think of something interesting to do with it.
 

Tea

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
3,749
Location
27a No Fixed Address, Oz.
Website
www.redhill.net.au
I have actually dead dead drives for photos. I might use it in the workshop, load anti-virus software on it and use it to scan customer's drives. If it fails, who cares?

Then the photos. :)
 

Tea

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
3,749
Location
27a No Fixed Address, Oz.
Website
www.redhill.net.au
Barry, your utility is indeed a useful tool. Thankyou!

I used the rawrite utility for Windows 2000 you linked to: it worked no problem and is so easy to use that even Tannin can understand it.

Would you like me to put sb010.zip up on my server? I have no particular bandwidth limit to worry about and it can have a permanent address.

Now all I have to do is bone up on what it's actually telling me. I'll run it on some other drives, healthy and bad, and see what I can discover.

Thankyou again.

Tony
 

Barry K. Nathan

What is this storage?
Joined
Feb 9, 2002
Messages
42
Location
Irvine, CA
Tea said:
Barry, your utility is indeed a useful tool. Thankyou!
Would you like me to put sb010.zip up on my server? I have no particular bandwidth limit to worry about and it can have a permanent address.

Now all I have to do is bone up on what it's actually telling me. I'll run it on some other drives, healthy and bad, and see what I can discover.
From a brief glance, I think I can tell you what it's saying: internally, the drive is failing to read data too often. It's saying that it might go 75GXP on you (i.e., tons of bad sectors and data loss) at any point. At least, that's what the drive firmware thinks. It's already remapped 9 sectors, FWIW. In any case, the problem is absolutely not cabling.

The contents of sb010.zip are all licensed under the GNU General Public License (GPL). It doesn't say that anywhere, nor are there any instructions for obtaining source code. I'll fix those problems with the next version that I post. So, if it disappears off the 'Net or anything I guess you could repost it, but from a legal standpoint it'd be safer to wait until the next version.

I'm not sure when I'll get the next version done; perhaps this week, though. If not this week, then soon. :) (I have some enhancements planned for it...)
Thankyou again.

Tony
You're welcome.
 
Top