Quantcast
Channel: Question and Answer » software-raid
Viewing all articles
Browse latest Browse all 22

mdadm RAID – Restart during Grow (Shrink) reshape – not working (stuck)

$
0
0

I have a linux software Raid6 array (mdadm). I grew it from 6x4TB disks (16TB usable) to 7x4TB (20TB usable). The reshape went fine, but when I did resize2fs, I got the fairly well known issue of the EXT4 16TB filesystem limit. I checked and the filesystem does NOT have the 64 bit flag. So in an effort to reclaim the extra drive I had just added to the array, I did this:

johnny@debian:~$ sudo resize2fs /dev/md0 16000G
johnny@debian:~$ sudo mdadm --grow /dev/md0 --array-size=16000G
johnny@debian:~$ sudo mdadm --grow /dev/md0 --raid-devices=6 --backup-file=/tmp/backup

Notice the backup-file location. That’s going to be important in a minute, because I’m on Debian.

So things were going fine, slow but working. The progress got to 3.7% and it had slowed to a crawl. I had assumed this was because I was reshaping a few other arrays during this same time. When those other jobs finished and this one didn’t speed up, I got really worried. Since it said it would take years to finish, I decided I should restart and see if it would speed up, so I restarted the system.

This is when bad things start happening…

I’m on Debian, and it is my understanding that the /tmp folder is wiped out when the system comes up, so my backup-file from the reshape was lost. Also, because my /etc/fstab file was trying to mount md0, which wasn’t assembling now, the system failed to come up a few times. I started from a live cd and fixed the fstab file and got the system to come back up.

Once I sorted that out, the system was up, and that was the first time I saw that md0 had not simply assembled itself and continued reshaping. Panic set in…

I don’t have the output of the following commands, but I managed to find the commands I typed in. Brief explanation of what happened to follow…

johnny@debian:~$ sudo mdadm --assemble /dev/md0
johnny@debian:~$ sudo mdadm --assemble --force /dev/md0
johnny@debian:~$ sudo mdadm --assemble --force /dev/md0 --backup-file=/tmp/backup

The first command failed, so I tried the –force option, which also failed, but the error message told me the failure was because it needed the –backup-file option, so I ran the third command. I expected the backup file to still exist, but it didn’t because it was in the /tmp folder and had been deleted. This didn’t seem to cause any problems though, because the array assembled.

Here is what md0 looks like now. Notice the disk marked “removed”. I suspect this is the disc that was being removed, sdj1.

johnny@debian:~$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sat Mar  5 20:45:56 2016
          State : clean, degraded, reshaping 
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

 Reshape Status : 3% complete
  Delta Devices : -1, (7->6)

           Name : BigRaid6
           UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
         Events : 4339739

    Number   Major   Minor   RaidDevice State
      11       8      224        0      active sync   /dev/sdo
       2       0        0        2      removed
       6       8       80        2      active sync   /dev/sdf
       7       8      176        3      active sync   /dev/sdl
      12       8       16        4      active sync   /dev/sdb
       8       8       32        5      active sync   /dev/sdc

       9       8      128        6      active sync   /dev/sdi

And here is the current progress of the reshape. Notice it’s completely stuck at 0K/sec.

johnny@debian:~$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] 
md0 : active raid6 sdo[11] sdi[9] sdc[8] sdb[12] sdl[7] sdf[6]
      15627548672 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [U_UUUU]
      [>....................]  reshape =  3.7% (145572864/3906887168) finish=284022328345.0min speed=0K/sec
      bitmap: 5/30 pages [20KB], 65536KB chunk

unused devices: <none>

Here are the individual discs still in the array.

johnny@debian:~$ sudo mdadm --examine /dev/sd[oflbci]
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : 99b0fbcc:46d619bb:9ae96eaf:840e21a4

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : fca445bd - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : b8d49170:06614f82:ad9a38a4:e9e06da5

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : 5d867810 - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : dd56062c:4b55bf16:6a468024:3ca6bfd0

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : 59045f87 - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdi:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : 92831abe:86de117c:710c368e:8badcef3

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : dd2fe2d1 - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 6
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdl:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : 8404647a:b1922fed:acf71f64:18dfd448

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : 358734b4 - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdo:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : d7e84765:86fb751a:466ab0de:c26afc43

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : c3698023 - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing

Here is /dev/sdj1, which used to be the only member of the array that was not a “whole disk” member. This was the one being removed from the array during the reshape. I suspect it is still needed to finish the reshape although it is not currently a member of the array because it has the data on it from before the reshape.

johnny@debian:~$ sudo mdadm --examine /dev/sdj1
mdadm: No md superblock detected on /dev/sdj1.

So here are my problems…
1. I can’t get the reshape to finish.
2. I can’t mount the array. When I try, I get this.

johnny@debian:~$ sudo mount /dev/md0 /media/BigRaid6
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
johnny@debian:~$ sudo dmesg | tail
[42446.268089] sd 15:0:0:0: [sdk]  
[42446.268091] Add. Sense: Unrecovered read error - auto reallocate failed
[42446.268092] sd 15:0:0:0: [sdk] CDB: 
[42446.268093] Read(10): 28 00 89 10 bb 00 00 04 00 00
[42446.268099] end_request: I/O error, dev sdk, sector 2299575040
[42446.268131] ata16: EH complete
[61123.788170] md: md1: data-check done.
[77423.597923] EXT4-fs (md0): bad geometry: block count 4194304000 exceeds size of device (3906887168 blocks)
[77839.250590] EXT4-fs (md0): bad geometry: block count 4194304000 exceeds size of device (3906887168 blocks)
[78525.085343] EXT4-fs (md0): bad geometry: block count 4194304000 exceeds size of device (3906887168 blocks)

I’m sure mounting would succeed if the reshape was finished, so that’s probably what is most important. Just FYI, the data on this array is too big to be backed up, so if I lose it, the data is gone. Please help!

EDIT 1:

I could shell out $1000 (or more) and get enough disks to copy everything over, but I’d need to be able to mount the array for that to work.

Also, I just noticed that the “bad geometry” error message I get when trying to mount the array has some interested info in it.

[146181.331566] EXT4-fs (md0): bad geometry: block count 4194304000 exceeds size of device (3906887168 blocks)

The size of device, 3906887168, is exactly 1/4 of the array size of md0, 15627548672. From “mdadm –detail /dev/md0″

Array Size : 15627548672 (14903.59 GiB 16002.61 GB)

I don’t know where the 4194304000 number is coming from… but doesn’t that mean the array is the right size to fit on these disks? Or do these sizes not account for the mdadm metadata? Is 4194304000 including the metadata perhaps?

I could swear I tried a few times to get the sizes right before the reshape would even start, so I thought everything was good to go. Maybe I was wrong.


Viewing all articles
Browse latest Browse all 22

Latest Images

Trending Articles





Latest Images