RAID Failure

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
What is the best course of action when one drive is kicked out and the array is still working? One option is to replace the funky drive and let the RAID5 rebuild itself. Another option is to nuke the array, replace the drive, create a new array and then copy/verify the data to it. A third option would be to copy the data from the degraded array to somewhere else and then rebuild the array with a new drive.
 

Handruin

Administrator
Joined
Jan 13, 2002
Messages
13,920
Location
USA
Just replace the drive and rebuild. That's the whole point of having raid for availability. Is there a reason why you wouldn't do this? This has always been the preferred way that I'm familiar with.
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
I was wondering if the rebuild takes longer. It is taking about 3 days to expand the NAS, but maybe that is a different process.
 

Handruin

Administrator
Joined
Jan 13, 2002
Messages
13,920
Location
USA
Are you asking if a RAID5 rebuild takes longer when there is existing data? I don't know the answer to that as I have no idea what kind of configuration or hardware you're asking about. Even if it does, there would be a point where moving the entire data set to another array and back would take as long or longer for one drive to rebuild.

Are you saying you're also doing a RAID5 expansion while considering a rebuild from a failed drive? I would not attempt that until your volume is healthy and no longer in a degraded state.
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
The NAS is doing a normal RAID 5 expansion, which is very slow (over 50 hours and ~75% so far). I was wondering if a rebuild will take that long. Drives are >98% full.
 

Stereodude

Not really a
Joined
Jan 22, 2002
Messages
10,865
Location
Michigan
The NAS is doing a normal RAID 5 expansion, which is very slow (over 50 hours and ~75% so far). I was wondering if a rebuild will take that long. Drives are >98% full.
That's impossible for us to answer because we don't know anything about the hardware you're using. There's no reason a rebuild would have to take that long, but that doesn't mean it couldn't.
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
60 hours later there are now 5*10TB 5400 RPM drives. How do I calculate rebuild time?
 

Stereodude

Not really a
Joined
Jan 22, 2002
Messages
10,865
Location
Michigan
60 hours later there are now 5*10TB 5400 RPM drives. How do I calculate rebuild time?
Well, it can't be any shorter than the time for the drive to write all 10TB. As far as calculating how long it will take, you need to find out if your NAS is drive limited or processing limited. If drive limited it's however long it take to write 10TB to the drive. If it's processing limited it however long it takes to write 10TB of data at the rate it can process the data and create the missing data from the other 9 drives.
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
Thanks. I suppose the expansion process is just slow and that the rebuild will be decently fast, maybe closer to 24 hours. There are five drives so the four viable ones would be read from if I understand correctly and the Xenon CPU is not a bottleneck for the party. I could have moved to 4x12 TB, but that is less capacity and would still need an expansion later. I think this 5x10TB set will have enough space and fit in the smaller 5-bay NAS units of the future as well.
 

Stereodude

Not really a
Joined
Jan 22, 2002
Messages
10,865
Location
Michigan
OCE is usually quite a bit slower than a rebuild. Personally, I wouldn't even bother with OCE. It's just too slow. I could trash the array, build a new array with the new drive set and restore from a backup in way less time.
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
So now I have another 9.1TB of space. After filling most of it the total capacity is 99.4% used.
For some reason the stupid Synology OS continues to alert that it is running out of space and there is no way to disable it. :(
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
The time it takes to rebuild an array is limited by the size of a disk divided by its sequential write speed. The bigger the disk, the longer it takes. The slower the disk, the longer it takes. And, because reading a disk's worth of data is a lot of work, you're putting stress on the remaining drives for as long as a rebuild is in work.

This is why I not only switched from a single RAIDZ1 (the equivalent of a RAID 5) but to two RAIDZ2 (RAID6) vdevs using disks of half the size. It's less likely to lose data through drive failures and resilvers (rebuilding) take hours instead of days.
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
The data is half archive, half full backup and a quarter incremental backups. 5 drives is the most I could allocate due to practical reasons and 10TB is an economical size. I was hoping to be able to ship the drives without a NAS and have a different Synology in the middle section. I'm not seeing many Synology 5x drives, which is a concern. I expected a DS1519+ at least by now.
 

Handruin

Administrator
Joined
Jan 13, 2002
Messages
13,920
Location
USA
The time it takes to rebuild an array is limited by the size of a disk divided by its sequential write speed. The bigger the disk, the longer it takes. The slower the disk, the longer it takes. And, because reading a disk's worth of data is a lot of work, you're putting stress on the remaining drives for as long as a rebuild is in work.

This is why I not only switched from a single RAIDZ1 (the equivalent of a RAID 5) but to two RAIDZ2 (RAID6) vdevs using disks of half the size. It's less likely to lose data through drive failures and resilvers (rebuilding) take hours instead of days.

I would also add, the more other random IO made on the array will also make the rebuild take longer. The more it is used by clients and devices while it's being repaired is not going to help things.

I thought from reading your concerns in the past that you were not a fan of zfs?
 

Handruin

Administrator
Joined
Jan 13, 2002
Messages
13,920
Location
USA
This is why I not only switched from a single RAIDZ1 (the equivalent of a RAID 5) but to two RAIDZ2 (RAID6) vdevs using disks of half the size. It's less likely to lose data through drive failures and resilvers (rebuilding) take hours instead of days.

This is my config also. I run two 10-drive vdevs both in raid z2. When I replaced my one drive that failed, it took about 9 hours to resilver the new 6TB drive.

Code:
sudo zpool list
NAME         SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
naspool_01   109T  53.1T  55.9T         -    21%    48%  1.00x  ONLINE  -


Code:
sudo zpool status
  pool: naspool_01
 state: ONLINE
  scan: scrub repaired 0 in 14h18m with 0 errors on Sun Feb 10 14:42:44 2019
config:

        NAME                                   STATE     READ WRITE CKSUM
        naspool_01                             ONLINE       0     0     0
          raidz2-0                             ONLINE       0     0     0
            ata-HGST_HDN726060ALE610_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE610_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE610_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
          raidz2-1                             ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE610_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0
            ata-HGST_HDN726060ALE614_XXXXXXXX  ONLINE       0     0     0

errors: No known data errors
 

sechs

Storage? I am Storage!
Joined
Feb 1, 2003
Messages
4,709
Location
Left Coast
I find that FreeBSD, while more difficult to use, is more stable and less hackey.

Merging ZFS development has been long-time coming (the projects always traded code), but it was the meaningful death of Illumos that is real impetus.
 
Top