Well today is a sad day in Bauer-Power land. Today I have to tuck tail and say that something that I designed and thought was rock solid is not as solid as I would have liked. I am referring to the Bauer-Power SAN that I originally wrote about back in August of last year. In that article I talk about building an iSCSI SAN using Ubuntu, IET, ZFS, Heartbeat and GlusterFS. It all seemed to be going well until a month ago during a data center move.
In my initial tests, I was able to successfully fail the nodes over without issue. However, I think my test wasn't very accurate because the amount of data that I tested was really small. Now that I have been running this thing for 7 months, and have accumulated almost 7TB of data, I am finding that there are some data integrity issues with GlusterFS. What happened was we powered off both nodes in order to do a fork lift move. When we powered them back on, for some reason the data on the primary node wasn't being served by IET, so we failed over to the secondary node to get things rolling. Everything appeared fine for a few weeks, but when I finally brought the primary back up and the data had been out of sync for a while, the GlusterFS healing process corrupted a few things.
Luckily I didn't keep anything mission critical on the cluster, so my rebuild process isn't that big a deal. The issue also doesn't seem to be with Ubuntu or IET. By itself, that is running rock solid. I just don't think mirroring the nodes is safe.
On a separate note, IET isn't certified with VMWare, although I haven't had any issues with it. It does, however, have a serious issue with Microsoft failover clustering. Microsoft will not use IET disks for clustering. Period. Since this issue came up, I am going to setup each node separately, and will purchase a motherboard and an additional RAID card to have on standby in the event of a hardware failure. The drives, NICs and power supplies are all redundant on their own otherwise.
That being said, I think I will rebuild the primary node with SCST instead of IET because that is VMWare certified and should work with Microsoft failover clustering. If any of you built one of these SAN's, I appologise, but I don't think the clustered setup is ready for prime time. If you haven't done so, I would re-configure the nodes as standalone storage devices, and purchase a backup set of the non-redundant hardware.
- Future Of GlusterFS - From Open Core to Open Source
- Gluster for Geeks: Performance Tuning Tips & Tricks
- How To Configure Hyper-V Failover Cluster Quorum