Today is a good day because it is a long time coming. I finally got all of my equipment to setup the Openfiler cluster I have been talking about! I did run into some snags though that I didn't in my testing mainly because of the large logical disks in my production system. Other than that, everything worked great. I will get into some of the stuff I learned during the setup.
To give you a little background first in case you haven't been following along, I decided to create an Openfiler SAN to add to our NetApp FAS2020 because our FAS2020 is starting to get low on space, and we were budgeted to buy more storage this year. The cost for another FAS2020 is roughly $20,000 or so. I thought I could get more bang for my buck if I built my own SAN using SuperMicro servers and Openfiler. I was right! My redundant setup gives us about 20TB of space for about $12,000!
I followed the steps outlined here: (Openfiler 2.99 Cluster with DRBD, Pacemaker and Corosync)
I found though that since my logical partition was bigger than 2TB, I couldn't use cfdisk or fdisk like I did in my testing. No, for disks that big, you have to use Parted. Here are the commands I used to partition my disks:
root@filer01# parted /dev/sdb
(parted) mkpart logical 1MB 512MB
(parted) mkpart logical 513MB 20971520MB
(parted) set 2 lvm on
After that I rebooted, and I was able to follow the instructions. You need to change the above numbers to match your config. If you haven't figured it out 20971520MB = 20TB. You can do your number conversions here: (TB to MB Converter)
Another thing I ran into was on the second page of the documentation. You need to change anything that says nfs-lock to nfslock or you will get errors. I think it was a typo.
Finally, after I set everything up, I immediately wanted to test failover. However, when I pulled the heartbeat cables, failover didn't work. Services wouldn't start on the second node. After a while I figured out it was because DRBD hadn't finished syncing the disks. I found in earlier documentation for configuring a cluster in Openfiler 2.33 that for large disks it can take 24+ hours to sync. Sweet damn! You can however check the status of the disk syncing by running this command:
root@filer01# cat /proc/drbd
When everything is set to UpToDate/UpToDate then you should be able to test failover. You can edit /etc/drbd.d/global_common.conf and add the following line under the syncer section to speed up replication:
Also, you can increase the MTU on your heartbeat NIC's to 9000 to boost the sync speed as well. You can do that by editing the ifcfg-IFACE files located in /etc/sysconfig/network-scripts/. Change the line saying MTU=1500 to MTU=9000 and reboot or restart your networking services. If your heartbeat interface is eth1, edit the ifcfg-eth1 file. If you have bonded NICs like me, it will probably be ifcfg-bond1. You get the point though.
Here is a little video I made of my new setup, check it out!
Have you setup an Openfiler cluster before? What issues did you come accross with the setup? Have any recommendations for my readers? Hit us up in the comments!
Related articles, courtesy of Zemanta:
- FailoverCluster Error 1129 in configuring cluster heartbeat network ?
- Clustering Setup issues
- Linux 3.0 scrubs up Btrfs, gets more Xen