Delayed Write Failed

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
I'm getting a delayed write failed on the drive that serves some of my media files. Windows says that the writes are bad, but no file appears to be corrupted.

Now, the weird part is that I turned off write caching in the control panel. So, Windows shouldn't even be caching writes to this drive.

Any ideas?
 

Handruin

Administrator
Joined
Jan 13, 2002
Messages
13,916
Location
USA
In the past I've seen delayed write failures on my SCSI drives under win2k and win xp on occasion. I believe it was the SCSI adapter in my situation. That machine has since died, so I never resolved the root cause. At work, we see delayed write failures sometimes when we lose a connection to a storage array (via fibre channel connection).

Is your data in good health after a reboot of the troubled system? In my situations at work, it took some time before windows would "give in" to the connection loss. That's about the best way I can explain it. I could continue to write files to a non-existent drive for several minutes before the OS threw in the towel. A reboot made the problem more evident.

Have you run a full scan on the drive? Maybe the drive, the cable, or your HBA is going bad. How long have you been seeing these errors and are they in the event viewer, or popup dialogs?
 

CityK

Storage Freak Apprentice
Joined
Sep 2, 2002
Messages
1,719
Windows is corrupted :p

As a complete guess, I'm thinking its a scsi drive your using and its a controller driver related issue. If not, then so much for more fledgling career as a clairvoyant.

Questions:
- what kind of drive?
- what controller?
- what firmware and drivers ?
- anything special about the configuration ?
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
This is a 10.6 with an OEM firmware. It's on an LSI20320-R with three other drives (which do not show this behavior). I'm using the latest driver from LSI.

There are no problems if I use the drive on my Adaptec 39160 controller.
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
I am also getting a lot of event id 51s (error during a paging operation), but only when copying files to the disk.

PerfectDisk complains about several files on the volume that it cannot move. Windows defrag has no apparent problems.
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
This is getting ridiculous. I'm now getting intermittent event ID 51s even when nothing is being written to the disk.
 

LiamC

Storage Is My Life
Joined
Feb 7, 2002
Messages
2,016
Location
Canberra
Is there a warning or information event immediately before ID# 51?

ID# 9 by chance?
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
Nope.

I just get a series of event id 51s and an intermittent delayed write failed when writing to the disk.
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
Ah hah!... Maybe

CIMBrowser has started reporting an intermittent SMART error on this disk. The ASC is 47h and ASCQ is 3h. If I read this correctly (http://www.t10.org/lists/asc-num.htm#ASC_47), this translates to, "INFORMATION UNIT iuCRC ERROR DETECTED."

I have absolutely no idea what this means.
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
Drive passes Seagate's tests with no errors. As I've said previously, I don't get these errors when the drives are on the Adaptec controller.

Maybe the LSI card doesn't like the OEM firmware? Rechecking cabling and connections is on my to-do list for this weekend.
 

P5-133XL

Xmas '97
Joined
Jan 15, 2002
Messages
3,173
Location
Salem, Or
The error code indicates that your drive is receiving uncorrectable CRC errors. That typically means that you have developed some bad blocks on the drive. I agree that it is time to replace the drive, regardless of the status of the Seagate diagnostics, epsecially since you are receiving OS-Level errors in addition to the low-level SMART errors. I seriously doubt that the controller has anything to do with this.

If you really don't want to replace the drive you can try a LLF which may detect the bad blocks and prevent them from being used. However, that is a problamatic solution because it may not successfully detect them and furthermore a drive that starts developing bad blocks trends to continue creating new ones causing more problems in the future.

All-in-all it is time to replace the drive.
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
Ugh. I'm confused, however, as to why the drive does not seem to exhibit this behavior when on the other controller.
 

P5-133XL

Xmas '97
Joined
Jan 15, 2002
Messages
3,173
Location
Salem, Or
Don't know, but my guess is that one controller requests more retries than the other and the retries are successful so it doesn't feel the need to report an error state.
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
Good news: It's not the drive. I swapped the drive to another slot and it works fine.

Bad news: The drive with which I swapped it now exhibits the behaviors.

So, this is likely either a problem with that particular rack or a loose connection on the back. Loose connectors are easy to fix... bad rack will take money.
 

Platform

Learning Storage Performance
Joined
May 10, 2002
Messages
234
Location
Rack 294, Pos. 10
sechs said:
Good news: It's not the drive. I swapped the drive to another slot and it works fine.

Bad news: The drive with which I swapped it now exhibits the behaviors.

So, this is likely either a problem with that particular rack or a loose connection on the back. Loose connectors are easy to fix... bad rack will take money.


Is this a Dell server?

I'm just trying to get a picture of what you have -- mechanically speaking.

 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
I have the drives in individual IcyDocks in an external case.

Since jimmying with the cables and doing some dusting, the errors have completely ceased on the original disk. However, one of the other disks in the case has coughed up a couple of paging errors. It's nothing serious, but I'll have to go back in and check the cable to see if I knocked it loose from that rack.
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
Just so that things get a little wierder...

After jimmying with cables, I've found that I can change which drive has problems by swapping the connectors for the external cable and terminator. The other two drives in this enclosure still show no signs of issues.

I've forced the current-affected drive to sync at 160 rather than 320, and this seems to have arrested the problem. I will need a few days of use to be sure.

Any advice on troubleshooting the source?
 

LiamC

Storage Is My Life
Joined
Feb 7, 2002
Messages
2,016
Location
Canberra
Hmmm.

I am having an intermittent fault with a WD and after opening the case on Tuesday, I noticed that the molex plug that connected the drive had a loose ground (black) pin. Basically if you puhed the connector into the drive, this loose pin would push out so that you could see the tinned connector from the back of the plug. If this connector was making intermittent connection, then the drive would work happily and power down every now and again--which are the symptoms I'm getting. I have since changed the power connector and I am waiting to see if that has made any difference--but I think I'd need a week without error (and preferrably two) to be sure.

Can you use a new cable/connector on your drives, or have you already?
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
Problem (Apparently) Solved

It appears that my issues were caused by insufficient voltage on the 3.3v line. I recently replaced my quiet Tagan PSU with a beefier Silverstone model with 40A on the 3.3v rail, and all problems have ceased.

Time will tell....
 
Top