How to replace a brick in GlusterFS

Scenario 1: Your data drive fails in your machine and you really don’t want to go through the process of reconfiguring a new system just to create and join a new brick.

Scenario 2: You proactively replace your hard drive and you keep the OS in tact and just want to use the new drive.

The solution to either scenario is quite simple.

Let’s stop cron just in case we have any watchdog processes running.

service cron stop

First thing we need to do is tell GlusterFS that we want to take the brick offline. We do that with this command.

gluster volume reset-brick your-volume your-hostname-or-ip:/path-to-failed-brick

Then we can unmount the old volume and swap out a new disk. As long as our config remains the same, we will later use the reset-brick command to get all of the data back from the redundant copy.

Now, you need to create a partition and then format the new drive with the following. Let’s assume that your new drive/partition is /dev/sdb1 and that it’s an SSD so that you want to use f2fs.

mkfs.f2fs /dev/sdb1

Now, you need to update /etc/fstab so that it automatically mounts. Let’s say your mount point is /mnt/gluster. You will want to use something like this:

echo "UUID=`blkid | grep /dev/sdb1 | cut -d'\"' -f2` /mnt/gluster        f2fs     defaults,noatime,nofail    0 1" >> /etc/fstab

Note: You should comment out the old entry in your fstab file.

Next, you will want to mount your new partition.

mount -a

Now let’s create the new brick. This must be a different name than the old one!! Assuming that your old brick is /mnt/gluster/brick, we could simply do.

mkdir -p /mnt/gluster/brick

Now let’s use the gluster volume reset-brick command to do the heavy lifting for us.

gluster volume reset-brick your-volume your-hostname-or-ip:/path-to-failed-brick your-hostname-or-ip:/path-to-failed-brick commit force

If you stopped cron, start it back up now.

service cron start

That’s it.

Now you can check on the progress of the re-population of data by using the heal info command.

Hope this helps!