RAID Superblock disappeared from one of my disks after crash. Fix?

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
My fileserver crashed a couple times lately, and I just noticed that the RAID array didn't come back up the last time. I stored the configuration in /etc/mdadm.conf:
Code:
# cat /etc/mdadm.conf
DEVICE  /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/hda1
ARRAY /dev/md0 level=raid5 num-devices=5 UUID=20694a10:5013084e:6dde8b75:f422ec4e

So I figured I could simply run:
Code:
# mdadm --assemble --scan -v

But only two of the five disks were added to the array. Not enough to start it. So I tried:
Code:
 # mdadm --assemble --scan -fv
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/hda1 is identified as a member of /dev/md0, slot 4.
mdadm: forcing event count in /dev/sda1(0) from 312084 upto 312087
mdadm: forcing event count in /dev/sdb1(1) from 312084 upto 312087
mdadm: RAID superblock has disappeared from /dev/sda1
# cat /proc/mdstat
Personalities : [raid5] [raid4]
unused devices: <none>
# mount /dev/md0 /exports
mount: /dev/md0: can't read superblock
# mdadm --assemble --scan -v
mdadm: looking for devices for /dev/md0
mdadm: no recogniseable superblock on /dev/sda1
mdadm: /dev/sda1 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sdb1
mdadm: /dev/sdb1 has wrong uuid.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/hda1 is identified as a member of /dev/md0, slot 4.
mdadm: no uptodate device for slot 0 of /dev/md0
mdadm: no uptodate device for slot 1 of /dev/md0
mdadm: added /dev/sdd1 to /dev/md0 as 3
mdadm: added /dev/hda1 to /dev/md0 as 4
mdadm: added /dev/sdc1 to /dev/md0 as 2
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.

Now, I suppose I have to restore the superblocks on sda1 & sdb1. Does anyone have any advice? I'm thinking I have to recreate the array identically to restore the superblocks. I'm 90% sure I can do that from memory, but does anyone know if there is any way to confirm the exact configuration of the array just in case my memory is wrong? Since most of the superblocks are present, I think this should be possible.
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
Another question. If I recreate the array should I use the "--assume-clean" option? That's what I'm leaning towards right now.

If I don't will the data be wiped? I'm worried the sync will wipe everything. OTOH, if I do, do I risk corruption because of the crashes? The filesystem was a journalling one (XFS).
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
I found:
Code:
mdadm --misc --examine

I'm going to fiddle with that and see if I can learn anything.
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
Oh, and I think
Code:
mdadm --assemble --update=uuid
Might be handy. It should update the uuid on the superblocks of the RAID disks.

Also:
Code:
mdadm --assemble --update=resync
Is something I'm considering trying after that and before I use the array --if the first command can fix the superblocks.
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
Here's the contents of the superblocks on the good disks:
Code:
# mdadm --misc --examine /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 20694a10:5013084e:6dde8b75:f422ec4e
  Creation Time : Fri Jul  7 14:56:21 2006
     Raid Level : raid5
    Device Size : 245111552 (233.76 GiB 250.99 GB)
     Array Size : 980446208 (935.03 GiB 1003.98 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Wed Sep  6 18:03:51 2006
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 5
  Spare Devices : 0
       Checksum : 13fe93de - correct
         Events : 0.312087

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     2       8       33        2      active sync   /dev/sdc1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       0        0        3      faulty removed
   4     4       3        1        4      active sync   /dev/hda1

It appears that some disks have been noted as removed/faulty because of their bad superblocks when I tried to reassemble the array.
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
Oops. The other two disks:
Code:
# mdadm --misc --examine /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 20694a10:5013084e:6dde8b75:f422ec4e
  Creation Time : Fri Jul  7 14:56:21 2006
     Raid Level : raid5
    Device Size : 245111552 (233.76 GiB 250.99 GB)
     Array Size : 980446208 (935.03 GiB 1003.98 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Wed Sep  6 10:03:55 2006
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 13fe2340 - correct
         Events : 0.312084

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     3       8       49        3      active sync   /dev/sdd1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       3        1        4      active sync   /dev/hda1
# mdadm --misc --examine /dev/hda1
/dev/hda1:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 20694a10:5013084e:6dde8b75:f422ec4e
  Creation Time : Fri Jul  7 14:56:21 2006
     Raid Level : raid5
    Device Size : 245111552 (233.76 GiB 250.99 GB)
     Array Size : 980446208 (935.03 GiB 1003.98 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Wed Sep  6 18:03:51 2006
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 5
  Spare Devices : 0
       Checksum : 13fe93bd - correct
         Events : 0.312087

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     4       3        1        4      active sync   /dev/hda1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       0        0        3      faulty removed
   4     4       3        1        4      active sync   /dev/hda1
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
Attempting to re-add the devices doesn't work:
Code:
# mdadm --manage /dev/md0 --re-add /dev/sda1
mdadm: cannot get array info for /dev/md0
# mdadm --manage /dev/md0 --re-add /dev/sdb1
mdadm: cannot get array info for /dev/md0

Annoying. Sorry for the spam-style thread. It helps me think through this. I suppose it may be useful to someone else with a similar problem one day.
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
One thing I don't understand is why mdadm doesn't just resync the superblocks when I specify "--force" with "--scan". It has the configuration information in mdadm.conf. That's why I put it there so if things went wrong I would be able to restore the array easily.

So much for that :mad:. Bah!
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
1. I examined the good superblocks to make sure I knew how to recreate the array exactly (i.e. chunk size, etc).
2. I zeroed all the superblocks on all the disks.
3. I recreated the array with the "--assume-clean" option.
4. I marked it possibly dirty with:
Code:
mdadm --assemble /dev/md0 --update=resync
5. I let it try to resync. This only took about 30-60 minutes, probably because all the data was good and it was just reading from each disk and not writing.
6. I mounted it & everything so far appears to be good.

Alternately, I think I could have simply recreated it without "--assume-clean" and the initial creation sync wouldn't have destroyed the data, but that wasn't a risk I was willing to take.

Once I got over my fear of accidentally destroying the data, it didn't really take me long.
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
22,275
Location
I am omnipresent
This is very useful information Gilbo. Thank you for sharing it. Messing around with data recovery on a RAID array is pretty much a cold-sweat nightmare for any techie.
 

anupindi007

What is this storage?
Joined
Nov 26, 2010
Messages
2
Hi,
Guys can you help me, my raid failed trying to recover and details as follows:
#mdadm --assemble -v /dev/sd[b-i]1 --run
mdadm: looking for devices for /dev/sdb1
mdadm: /dev/sdc1 is identified as a member of /dev/sdb1, slot 1.
mdadm: /dev/sdd1 is identified as a member of /dev/sdb1, slot 7.
mdadm: /dev/sde1 is identified as a member of /dev/sdb1, slot 6.
mdadm: /dev/sdf1 is identified as a member of /dev/sdb1, slot 2.
mdadm: /dev/sdg1 is identified as a member of /dev/sdb1, slot 3.
mdadm: /dev/sdh1 is identified as a member of /dev/sdb1, slot 4.
mdadm: /dev/sdi1 is identified as a member of /dev/sdb1, slot 5.
mdadm: no uptodate device for slot 0 of /dev/sdb1
mdadm: added /dev/sdf1 to /dev/sdb1 as 2
mdadm: added /dev/sdg1 to /dev/sdb1 as 3
mdadm: added /dev/sdh1 to /dev/sdb1 as 4
mdadm: added /dev/sdi1 to /dev/sdb1 as 5
mdadm: added /dev/sde1 to /dev/sdb1 as 6
mdadm: added /dev/sdd1 to /dev/sdb1 as 7
mdadm: added /dev/sdc1 to /dev/sdb1 as 1
mdadm: failed to RUN_ARRAY /dev/sdb1: Input/output error
mdadm: Not enough devices to start the array.

http://www.storageforum.net/forum/showthread.php?p=139021

# mdadm --misc --examine
mdadm: No devices to examine

# mdadm --assemble --update=uuid /dev/md0
mdadm: failed to add 8:17 to /dev/md0: Device or resource busy
mdadm: /dev/md0 assembled from 0 drives - not enough to start the array.

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md127 : inactive sdc1[1] sdd1[7] sde1[6]
2930279808 blocks

md0 : inactive sdi1[5](S) sdh1[4](S) sdg1[3](S) sdf1[2](S)
3907039744 blocks

unused devices: <none>


------------
dcerouter_1000085:~# cat /etc/mdadm.conf
cat: /etc/mdadm.conf: No such file or directory
dcerouter_1000085:~# mdadm --assemble --scan -v
mdadm: looking for devices for /dev/md0
mdadm: no RAID superblock on /dev/sdi
mdadm: /dev/sdi has wrong uuid.
mdadm: no RAID superblock on /dev/sdh
mdadm: /dev/sdh has wrong uuid.
mdadm: no RAID superblock on /dev/sdg
mdadm: /dev/sdg has wrong uuid.
mdadm: no RAID superblock on /dev/sdf
mdadm: /dev/sdf has wrong uuid.
mdadm: cannot open device /dev/sde1: Device or resource busy
mdadm: /dev/sde1 has wrong uuid.
mdadm: cannot open device /dev/sde: Device or resource busy
mdadm: /dev/sde has wrong uuid.
mdadm: cannot open device /dev/sdd1: Device or resource busy
mdadm: /dev/sdd1 has wrong uuid.
mdadm: cannot open device /dev/sdd: Device or resource busy
mdadm: /dev/sdd has wrong uuid.
mdadm: cannot open device /dev/sdc1: Device or resource busy
mdadm: /dev/sdc1 has wrong uuid.
mdadm: cannot open device /dev/sdc: Device or resource busy
mdadm: /dev/sdc has wrong uuid.
mdadm: no RAID superblock on /dev/sdb
mdadm: /dev/sdb has wrong uuid.
mdadm: cannot open device /dev/sda5: Device or resource busy
mdadm: /dev/sda5 has wrong uuid.
mdadm: no RAID superblock on /dev/sda2
mdadm: /dev/sda2 has wrong uuid.
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has wrong uuid.
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has wrong uuid.
mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 2.
mdadm: 8:17 is identified as a member of /dev/md0, slot 0.
mdadm: no uptodate device for slot 1 of /dev/md0
mdadm: added /dev/sdf1 to /dev/md0 as 2
mdadm: added /dev/sdg1 to /dev/md0 as 3
mdadm: added /dev/sdh1 to /dev/md0 as 4
mdadm: added /dev/sdi1 to /dev/md0 as 5
mdadm: no uptodate device for slot 6 of /dev/md0
mdadm: no uptodate device for slot 7 of /dev/md0
mdadm: failed to add 8:17 to /dev/md0: Device or resource busy
mdadm: /dev/md0 assembled from 0 drives - not enough to start the array.
dcerouter_1000085:~# mdadm --assemble --scan -fv
mdadm: looking for devices for /dev/md0
mdadm: no RAID superblock on /dev/sdi
mdadm: /dev/sdi has wrong uuid.
mdadm: no RAID superblock on /dev/sdh
mdadm: /dev/sdh has wrong uuid.
mdadm: no RAID superblock on /dev/sdg
mdadm: /dev/sdg has wrong uuid.
mdadm: no RAID superblock on /dev/sdf
mdadm: /dev/sdf has wrong uuid.
mdadm: cannot open device /dev/sde1: Device or resource busy
mdadm: /dev/sde1 has wrong uuid.
mdadm: cannot open device /dev/sde: Device or resource busy
mdadm: /dev/sde has wrong uuid.
mdadm: cannot open device /dev/sdd1: Device or resource busy
mdadm: /dev/sdd1 has wrong uuid.
mdadm: cannot open device /dev/sdd: Device or resource busy
mdadm: /dev/sdd has wrong uuid.
mdadm: cannot open device /dev/sdc1: Device or resource busy
mdadm: /dev/sdc1 has wrong uuid.
mdadm: cannot open device /dev/sdc: Device or resource busy
mdadm: /dev/sdc has wrong uuid.
mdadm: no RAID superblock on /dev/sdb
mdadm: /dev/sdb has wrong uuid.
mdadm: cannot open device /dev/sda5: Device or resource busy
mdadm: /dev/sda5 has wrong uuid.
mdadm: no RAID superblock on /dev/sda2
mdadm: /dev/sda2 has wrong uuid.
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has wrong uuid.
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has wrong uuid.
mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 2.
mdadm: 8:17 is identified as a member of /dev/md0, slot 0.
mdadm: forcing event count in /dev/sdf1(2) from 696186 upto 696190
Segmentation fault (core dumped)
dcerouter_1000085:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : inactive sdc1[1] sdd1[7] sde1[6]
2930279808 blocks

unused devices: <none>
dcerouter_1000085:~# history |grep stop
456 mdadm --stop /dev/md0
458 mdadm --stop /dev/md0
466 mdadm --stop /dev/md0
489 mdadm --stop /dev/md0
491 mdadm --stop /dev/md0
495 mdadm --stop /dev/md0
501 mdadm --stop /dev/md0
515 history |grep stop
516 mdadm --stop /dev/md0
518 mdadm --stop /dev/md0
520 mdadm --stop /dev/md0
521 mdadm --stop /dev/md0
522 mdadm --stop /dev/md0
531 mdadm --stop /dev/md0
539 history |grep stop
dcerouter_1000085:~#
dcerouter_1000085:~# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
dcerouter_1000085:~#



#mdadm --misc --examine /dev/md0
mdadm: No md superblock detected on /dev/md0.


Thanks,
Srini
 

sadsfae

What is this storage?
Joined
Mar 13, 2011
Messages
8
Location
Raleigh, NC
Thanks for the helpful post. I believe I am in the same issue with my new XFS mdadm raid6.
The /dev devices have changed in my system and It looks like the superblocks are incorrect on the disks. My hunch is that because I had my old array and new array hooked up in the same box when I built this array, and now the old array is removed this might have caused an issue. It also crashed while resyncing (sata cables got yanked out).

note: I have not let it successfully resync so this may be an issue.

[root@machine ~]# mdadm --assemble --scan -fv
mdadm: looking for devices for /dev/md1
mdadm: /dev/sdf has wrong uuid.
mdadm: /dev/sde has wrong uuid.
mdadm: /dev/sdd has wrong uuid.
mdadm: /dev/sdb has wrong uuid.
mdadm: /dev/sdc has wrong uuid.

I then started the array by using the update=uuid option.

[root@poopsock ~]# mdadm --assemble --update=uuid /dev/md1
mdadm: /dev/md1 has been started with 5 drives.

It is currently rebuilding and estimated to finish in 16hrs.

I had a few questions

1) Should I let resync finish before mounting/using the array? I've been copying data to it but it will randomly fail all the drives after some read/write activity.

2) I read that you used mdadm --zero-superblock $devices - will this destroy data? I see that listed as one of your steps.

I will wait for it to resync before remounting it and see.

Thanks.
 

sadsfae

What is this storage?
Joined
Mar 13, 2011
Messages
8
Location
Raleigh, NC
Ok, as an update, I think I'm having multiple issues.

The resync finished but something else made all the drives fail out of the array. Also, I seem to have a mystery "md127" device appear occasionally and usurp my /dev/sdc.

I also feel like NCQ being enabled might be an issue as I'm using low-power seagate and samsung drives.

Finally, I think it could be possible that the drives aren't responding to default timeout set when under load, explaining the stack traces (dealing with timeouts) and subsequent tossing of drives out of the array.

Here is what I've done so far.

(disable NCQ for each drive)
echo "1" > /sys/block/sdb/device/queue_depth

(increase drive timeout of each device)
echo "60" > /sys/block/sdb/device/timeout

The ghost train /dev/md127 array would sometimes snatch /dev/sdc from me. I also zeroed the superblock of /dev/sdc and re-added it to my array.

Right now it's recovering, I'll post my results.

Personalities : [raid6] [raid5] [raid4]
md1 : active raid6 sdc[5] sdf[0] sdb[4] sdd[2] sde[1]
5860540224 blocks super 1.2 level 6, 64k chunk, algorithm 2 [5/4] [UUU_U]
[>....................] recovery = 1.4% (28604096/1953513408) finish=1442.1min speed=22245K/sec
 

sadsfae

What is this storage?
Joined
Mar 13, 2011
Messages
8
Location
Raleigh, NC
I think I might have fixed this issue by just swapping the cards. I'm currently recovering the raid (purposely failed a drive to fully test the superblock thing from earlier).

This was a Marvel chipset that will randomly offline disks, looking at forum posts and behavior.

avoid marvel 88SE9123 like the plague. I just ended up using the Highpoint rocketraid 622A 2 port SATA card with my rig.

This is exactly my issue:
http://vip.asus.com/forum/view.aspx...1&model=P6X58D+Premium&page=1&SLanguage=en-us

(other articles)
https://koitsu.wordpress.com/2009/07/15/marvells-faulty-88se9123-sata-6g-controller/
http://www.extremetech.com/article2/0,2845,2350340,00.asp
http://opensolaris.org/jive/thread.jspa?messageID=490180

in RHEL6 it requires compilation of a kernel module, I found a good post here:
https://help.ubuntu.com/community/RocketRaid
 

sadsfae

What is this storage?
Joined
Mar 13, 2011
Messages
8
Location
Raleigh, NC
I think the issue I was having is from using a buggy marvel chipset

avoid marvel 88SE9123 if you're doing a 6-8 drive setup using an external SATA bay (sansdigital).

This was exactly my issue:
http://vip.asus.com/forum/view.aspx...1&model=P6X58D+Premium&page=1&SLanguage=en-us

(other articles about the chipset)
https://koitsu.wordpress.com/2009/07/15/marvells-faulty-88se9123-sata-6g-controller/
http://www.extremetech.com/article2/0,2845,2350340,00.asp
http://opensolaris.org/jive/thread.jspa?messageID=490180

Right now I've swapped out the marvel chipset with a rocketraid 622A that came with the SANS digital unit and the array is rebuilding (i manually failed a drive to troubleshoot the superblock thing from earlier).

Crossing my fingers.. but I'm hoping the buggy Marvel chipset was the culprit here.
 

sadsfae

What is this storage?
Joined
Mar 13, 2011
Messages
8
Location
Raleigh, NC
Apologies for duplication of info above.

Unfortunately I spoke too soon, after about 20minutes of running the rr62x module I started seeing hpt resets and FIS calls were unable to go through. The same behaviour occurred - drives were booted out of the controller one at a time and were no longer visible to the system.

I found a post here that summarizes the same issue I had pretty much.
http://ubuntuforums.org/showthread.php?t=1592227

I've since ordered a Sil based chipset SATA controller because other folks have had good experience with it + external SATA bay and I should receive it today,
(ADSA3GPX1-2E Raid 5 2CH Esata II Pcie). It will be slower internally as it's only SATA II/3gpbs however most of my writes are limited to gigabit line speed anyways.

For clarification here is the hardware I had issues with. Everything was done in RHEL6 using software raid (mdadm) and the XFS filesystem.

(storage)
Sans Digital TowerRAID TR8M-B 8 Bay SATA (comes with rocketraid622a below)
4 x 2TB Samsung spinpoint F4 5400rpm low-power SATA
1 x 2TB Seagate 5400 rpm low-power SATA (all drives in 5drive mdadm RAID6)

(SATA controllers)
rocket622a raidcard (rr62x kernel module) - hpt resets, drives dissapear
rocketraid622 controller card (marvel 88SE9123) - drives dissapear.

When the new sil 332 card arrives today I'll give that a shot and post my findings.

Here are the logs below from the last card I tried.
Note: both the Marvell 88SE9123 (622) and rocketrai622a (rr62x) had the same I/O errors below right before the drives were kicked out of the system. Only the rr62x module was verbose enough to log hpt resets and FIS traversal errors.

== rr62x ==
rr62x:hpt_reset(8/0/1)
rr62x:[0 0 ] failed to disable comm status change bits
rr62x:[0 0 ] start port.
rr62x:[0 0 ] start port hard reset (probe 1).
rr62x:[0 1 ] failed to disable comm status change bits
rr62x:[0 1 ] start port.
rr62x:[0 1 ] start port hard reset (probe 1).
rr62x:[0 0 ] start port soft reset (probe 1).
rr62x:[0 0 f] failed to send 1st FIS
rr62x:[0 1 ] start port soft reset (probe 1).
rr62x:[0 1 f] failed to send 1st FIS
rr62x:[0 0 ] start port hard reset (probe 2).
Clocksource tsc unstable (delta = 202255234 ns)
Switching to clocksource jiffies
rr62x:[0 1 ] start port hard reset (probe 2).
rr62x:[0 1 ] start port soft reset (probe 2).
rr62x:[0 1 f] failed to send 1st FIS
rr62x:[0 0 ] start port soft reset (probe 2).
rr62x:[0 0 f] failed to send 1st FIS
rr62x:[0 1 ] start port hard reset (probe 3).
rr62x:[0 0 ] start port hard reset (probe 3).
rr62x:[0 1 ] start port soft reset (probe 3).
rr62x:[0 1 f] failed to send 1st FIS
rr62x:[0 0 ] start port soft reset (probe 3).
rr62x:[0 0 f] failed to send 1st FIS
rr62x:[0 1 ] start port hard reset (probe 4).
rr62x:[0 0 ] start port hard reset (probe 4).
rr62x:[0 1 ] start port soft reset (probe 4).
rr62x:[0 1 f] failed to send 1st FIS
rr62x:[0 0 ] start port soft reset (probe 4).
rr62x:[0 0 f] failed to send 1st FIS
rr62x:[0 1 ] start port hard reset (probe 5).
rr62x:[0 0 ] start port hard reset (probe 5).
rr62x:[0 1 ] start port soft reset (probe 5).
rr62x:[0 1 f] failed to send 1st FIS
rr62x:[0 0 ] start port soft reset (probe 5).
rr62x:[0 0 f] failed to send 1st FIS
rr62x:[0 1 ] start port hard reset (probe 6).
rr62x:[0 0 ] start port hard reset (probe 6).
rr62x:[0 1 ] start port soft reset (probe 6).
rr62x:[0 1 f] failed to send 1st FIS
rr62x:[0 0 ] start port soft reset (probe 6).
rr62x:[0 0 f] failed to send 1st FIS
rr62x:[0 1 ] start port hard reset (probe 7).
rr62x:[0 0 ] start port hard reset (probe 7).
rr62x:[0 1 ] start port soft reset (probe 7).
rr62x:[0 1 f] failed to send 1st FIS
rr62x:[0 0 ] start port soft reset (probe 7).
rr62x:[0 0 f] failed to send 1st FIS
rr62x:[0 1 ] start port hard reset (probe 8).
rr62x:[0 0 ] start port hard reset (probe 8).
rr62x:[0 1 ] start port soft reset (probe 8).
rr62x:[0 1 f] failed to send 1st FIS
rr62x:[0 1 ] start port hard reset (probe 9).
rr62x:[0 0 ] start port soft reset (probe 8).
rr62x:[0 0 f] failed to send 1st FIS
rr62x:[0 0 ] start port hard reset (probe 9).
rr62x:[0 1 ] start port soft reset (probe 9).
rr62x:[0 1 f] failed to send 1st FIS
rr62x:[0 1 ] start port hard reset (probe 10).
rr62x:[0 0 ] start port soft reset (probe 9).
rr62x:[0 0 f] failed to send 1st FIS
rr62x:[0 0 ] start port hard reset (probe 10).
rr62x:[0 1 ] start port soft reset (probe a).
rr62x:[0 1 f] failed to send 1st FIS
rr62x:[0 1 ] start port hard reset (probe 11).
rr62x:[0 0 ] start port soft reset (probe a).
rr62x:[0 0 f] failed to send 1st FIS
rr62x:[0 0 ] start port hard reset (probe 11).
rr62x:[0 1 ] start port soft reset (probe b).
rr62x:[0 1 f] failed to send 1st FIS
rr62x:[0 1 ] start port hard reset (probe 12).
rr62x:[0 1 3] device disconnected on port.
rr62x:[0 0 ] start port soft reset (probe b).
rr62x:[0 0 f] failed to send 1st FIS
rr62x:[0 0 ] start port hard reset (probe 12).
rr62x:[0 0 0] device disconnected on port.
rr62x:[0 0 1] device disconnected on port.
rr62x:[0 0 2] device disconnected on port.
rr62x:[0 0 3] device disconnected on port.
rr62x:hpt_reset(8/0/8)
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 d1 08 00 00 78 00
end_request: I/O error, dev sdc, sector 885117192
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 d8 80 00 00 80 00
end_request: I/O error, dev sdc, sector 885119104
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 d9 00 00 00 50 00
end_request: I/O error, dev sdc, sector 885119232
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 d9 50 00 00 20 00
end_request: I/O error, dev sdc, sector 885119312
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 d9 70 00 00 80 00
end_request: I/O error, dev sdc, sector 885119344
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 d9 f0 00 00 10 00
end_request: I/O error, dev sdc, sector 885119472
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 da 00 00 00 80 00
end_request: I/O error, dev sdc, sector 885119488
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 da 80 00 00 80 00
end_request: I/O error, dev sdc, sector 885119616
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 d3 80 00 00 80 00
end_request: I/O error, dev sdc, sector 885117824
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 db 00 00 00 80 00
end_request: I/O error, dev sdc, sector 885119744
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 db 80 00 00 80 00
end_request: I/O error, dev sdc, sector 885119872
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 dc 00 00 00 80 00
end_request: I/O error, dev sdc, sector 885120000
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 dc 80 00 00 80 00
end_request: I/O error, dev sdc, sector 885120128
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 dd 00 00 00 08 00
end_request: I/O error, dev sdc, sector 885120256
scsi 8:0:1:0: [sdc] Unhandled error code
scsi 8:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
scsi 8:0:1:0: [sdc] CDB: Read(10): 28 00 34 c1 d6 00 00 00 80 00
end_request: I/O error, dev sdc, sector 885118464
scsi 8:0:8:0: rejecting I/O to offline device
scsi 8:0:8:0: [sdf] Unhandled error code
scsi 8:0:8:0: [sdf] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
scsi 8:0:8:0: [sdf] CDB: Write(10): 2a 00 34 c1 d8 00 00 00 80 00
end_request: I/O error, dev sdf, sector 885118976
raid5: Disk failure on sdf, disabling device.
raid5: Operation continuing on 4 devices.
raid5: Disk failure on sdc, disabling device.
raid5: Operation continuing on 3 devices.
end_request: I/O error, dev sde, sector 8
md: super_written gets error=-5, uptodate=0
raid5: Disk failure on sde, disabling device.
raid5: Operation continuing on 2 devices.
end_request: I/O error, dev sdb, sector 8
md: super_written gets error=-5, uptodate=0
raid5: Disk failure on sdb, disabling device.
raid5: Operation continuing on 1 devices.
end_request: I/O error, dev sdd, sector 8
md: super_written gets error=-5, uptodate=0
raid5: Disk failure on sdd, disabling device.
raid5: Operation continuing on 0 devices.
md: md1: recovery done.
RAID5 conf printout:
 

sadsfae

What is this storage?
Joined
Mar 13, 2011
Messages
8
Location
Raleigh, NC
I've got the ADSA3GPX1-2E Raid 5 2CH Esata II Pcie in place.

My drive order changed (sda is no longer my OS drive, it's now sdf)
My raid drives are now sda/sdb/sdc/sdd/sde and I'm recovering the raid.

I tried setting my mobo SATA chipset to AHCI but it would only boot if the external 8-bay was not connected.

I'll post my findings but it will be a bit. I'd like to be optimistic but this is the 3rd SATA controller I've tried, though a different chipset (SiL based vs. Marvell based).

[root@poopsock ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : active raid6 sda[5] sde[0] sdb[4] sdc[2] sdd[1]
5860540224 blocks super 1.2 level 6, 64k chunk, algorithm 2 [5/4] [UUU_U]
[>....................] recovery = 3.2% (64301120/1953513408) finish=1061.4min speed=29663K/sec

(yes, my hostname is poopsock)
http://www.urbandictionary.com/define.php?term=poopsock
 

jobski

What is this storage?
Joined
Jun 8, 2011
Messages
1
I CANNOT THANK YOU ENOUGH!

I have subscribed to this forum just to say THANK YOU after searching google!

I was expanding my raid 6 array when one drive failed during reshaping.

Thought the array was gone for good 4tb worth of precious data (photos videos music ALL PERSONAL).

Tried recreating and all sorts of stuff.

This command saved my life:
mdadm --assemble --scan -fv

THANK YOU SO MUCH FROM AUSTRALIA!
 

ogp

What is this storage?
Joined
Oct 22, 2011
Messages
4
So I've got a 6 disk RAID 5 array and recently lost two disks. I replaced them but can't get the array back online. I believe part of the problem is the order of the disks has changed. What I'm getting from this post is that I can delete the superblock on ALL the disks and then recreate the raid array with the same info (number of disks, chunk size, etc.) without necessarily losing all the data?
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,729
Location
Horsens, Denmark
RAID5 only protects against a single drive failure. If you weren't able to replace that drive before another drive failed, you might be in big trouble.
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
Yeah. It's time for a rebuild and restoration of data from backup.
 

ogp

What is this storage?
Joined
Oct 22, 2011
Messages
4
RAID5 only protects against a single drive failure. If you weren't able to replace that drive before another drive failed, you might be in big trouble.

Yea, I realize that. I was just holding onto some sliver of hope that I might be able to recover some of the data, no matter how small. I guess since I have nothing to lose I'll nuke the superblocks of the existing drives and see what happens.....
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,729
Location
Horsens, Denmark
How did your drives fail? If they can still be detected, you can try some RAID recovery programs and see what it can find.

If the drives are completely inaccessible, time to throw in the towel. As every file is striped across every drive, you have lost some part of any file big enough to fill a stripe.
 

ogp

What is this storage?
Joined
Oct 22, 2011
Messages
4
I'll have to hook them back up and see if I can get them to respond. When I noticed, they were clicking loudly and the machine would essentially lock when detecting them (upon reboot).
Perhaps I'll try freezing them and then doing a dd. I'm not really sure how much data is on each drive...
 

CougTek

Hairy Aussie
Joined
Jan 21, 2002
Messages
8,729
Location
Québec, Québec
When I noticed, they were clicking loudly and the machine would essentially lock when detecting them (upon reboot).
Dead drives exhibiting the click-of-death syndrome are unrecoverable by common means. You'll need to deal with professional recovery services. That means a company that disassembles your hard drive in a clean room. Here, they charges between one and two thousand per drive and the result isn't garanteed. If it's valuable company data, it might be worth it, but if it's just the family photos of your last trip to Disneyland and the ugly face of aunt Betty, then you might want to reconsider.
 

ogp

What is this storage?
Joined
Oct 22, 2011
Messages
4
Yea, nothing important or work related. Just lots of large media files... I'm not holding my breath for any sort of restoration.
 

sadsfae

What is this storage?
Joined
Mar 13, 2011
Messages
8
Location
Raleigh, NC
Hey Folks,

As an update, I had no problems with the Addonics ADSA3GPX1-2E (SiL Based).
It cost me around $30 from Amazon.

http://www.amazon.com/ADSA3GPX1-2E-Raid-2CH-Esata-Pcie/dp/B000UUN70A/

The SIL 3132 chipset based cards are the only ones that worked for me.
My setup is now working 100% with no issues, the SATA I/O disconnect issues are gone.

A friend of mine purchased a similiar tower made by Rosewill which included a SiL 3132 based card and his works as intended, it seems the SANS digital ship with highpoint (Marvell based) cards.

My setup ::

8x 2TB Samsung Spinpoint F4
Linux mdadm RAID6
RHEL6.2
 

mdadm-newbie

What is this storage?
Joined
Feb 9, 2012
Messages
1
1. I examined the good superblocks to make sure I knew how to recreate the array exactly (i.e. chunk size, etc).
2. I zeroed all the superblocks on all the disks.
3. I recreated the array with the "--assume-clean" option.
4. I marked it possibly dirty with:
Code:
mdadm --assemble /dev/md0 --update=resync
5. I let it try to resync. This only took about 30-60 minutes, probably because all the data was good and it was just reading from each disk and not writing.
6. I mounted it & everything so far appears to be good.

Alternately, I think I could have simply recreated it without "--assume-clean" and the initial creation sync wouldn't have destroyed the data, but that wasn't a risk I was willing to take.

Once I got over my fear of accidentally destroying the data, it didn't really take me long.

Still saving lives!!!

Had an old TeraStation Buffalo drive die and using google and this post was able to finally get it up and running!

Thanks again!!!
 

StuartRothrock

What is this storage?
Joined
Feb 26, 2012
Messages
2
I am getting "failed to send 1st FIS" with the Sans Digital TowerRAID TR8M-B 8 Bay SATA and rocketraid 622a and rr62x userland raid module. I am wondering if I replaced the rocketraid with another controller if my errors would go away. Any suggestions are welcome.
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,729
Location
Horsens, Denmark
I have that hardware configuration, but in Win7, and without error. Nice kit, but one of the Linux guys around here will need to help you. Shouldn't be long ;)
 

StuartRothrock

What is this storage?
Joined
Feb 26, 2012
Messages
2
Thanks ddrueding. I have it working in Win7 too. As you say, it works well. My issues are with the linux driver/card, fedora 14 to be more specific. For RAID redundency, it's just not stable enough with my configuration. I'm willing to buy a GOOD card for good stabilty and speed.
 

chandan_raka

What is this storage?
Joined
Jun 14, 2012
Messages
1
I had a failed drive sda and I replaced it and when I wanted to re-add it to md device I encountered following issue.


Before adding the disk I just did fdisk this was the output



root@host ~]# fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 65 522081 fd Linux raid autodetect
/dev/sda2 66 121601 976237920 fd Linux raid autodetect

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 65 522081 fd Linux raid autodetect
/dev/sdb2 66 121601 976237920 fd Linux raid autodetect

Disk /dev/md1: 999.6 GB, 999667531776 bytes
2 heads, 4 sectors/track, 244059456 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md0: 534 MB, 534511616 bytes
2 heads, 4 sectors/track, 130496 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table



==============================

But when I tried to add the sda1 into my md0 raid it went perfect but when I tried to add sda2 into md1 it failed telling that no such device found. And when I did fdisk -l again I saw



[root@host ~]# fdisk -l

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 65 522081 fd Linux raid autodetect
/dev/sdb2 66 121601 976237920 fd Linux raid autodetect

Disk /dev/md1: 999.6 GB, 999667531776 bytes
2 heads, 4 sectors/track, 244059456 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md0: 534 MB, 534511616 bytes
2 heads, 4 sectors/track, 130496 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdc1 * 1 65 522081 fd Linux raid autodetect
/dev/sdc2 66 121601 976237920 fd Linux raid autodetect
You have new mail in /var/spool/mail/root

==============



Suprisingly linux detected the new drive suddenly as sdc1. And now if I want to delete the sda1 from md0 so that I could add sdc1 its not allowing me saying sda1 no such device. Please help...

Dmesh at below pastebin

http://fpaste.org/qwdh/
 

ScottAStanley

What is this storage?
Joined
Aug 25, 2012
Messages
1
I have a raid 5 array that is in a bit of a messed up state. Fully detailed here, http://www.linuxquestions.org/questions/showthread.php?p=4763622#post4763622.

I am trying to follow the procedure Gilbo outlines above to rebuild the array, since the superblock is missing from one of the partitions, and when I attempt to build the array after zeroing out the superblocks, I get this,

[root@moneypit RAID_RECOVERY]# mdadm --build /dev/md5 --chunk=128 --level=raid5 --raid-devices=3 --assume-clean /dev/sdc1 /dev/sdd1 /dev/sdf1
mdadm: Raid level raid5 not permitted with --build.

I am not sure exactly what the command is meant when he says, " I recreated the array with the "--assume-clean" option." I assumed it was the --build option, but clearly it is not.

Anybody have any suggestions?
 
Top