High Availability Storage

Question

High Availability Storage

I would like to make 2 TB or so available through NFS and CIFS. I am looking for 2 (or more) server solutions for high availability and the ability to load balance across servers, if possible. Any suggestions for clustering or high availability solutions?

This is a business business planning 5-10 TB growth over the next few years. Our facility operates almost 24 hours a day, six days a week. We can have 15-30 minutes of downtime, but we want to minimize data loss. I want to minimize 3 AM calls.

We are currently running a single server with ZFS on Solaris, and we are considering AVS for the HA part, but we had minor problems with Solaris (the CIFS implementation does not work with Vista, etc.) that held us back.

We started to look at

DRDB over GFS (GFS for distributed locking)
Gluster (requires client parts, no native CIFS support?)
Windows DFS (doc-only replicated after file close?)

We are looking for a black box that serves data.

We are currently taking snapshots of data in ZFS and sending a snapshot over the network to a remote data center for an offline backup.

Our initial plan was to have a 2nd car and rsync every 10-15 minutes. The issue with the failure is that current production processes will lose 15 minutes of data and remain “in the middle”. From the very beginning, it would be easier to start than to figure out where to get a pickup in the middle. This is what prompted us to take a look at HA solutions.

+4

storage high-availability

petey Sep 17 '08 at 5:09

source share

10 answers

Tony dodd · Answer 1 · 2008-09-17T17:27:25+0000

I recently used hanfs using DRBD as a backend, in my situation I run active / standby, but I successfully tested it using OCFS2 in main / main mode. There, unfortunately, there is not much documentation on how to best achieve this; most of those that exist are, at best, of little use. If you are on the drbd route, I highly recommend you join the drbd mailing list and read all the documentation. Here's my ha / drbd and script setup I wrote to handle ha failures:

DRBD8 required - this is provided by drbd8-utils and drbd8-source. Once they are installed (I believe they are provided by backports), you can use the helper module to install it - mai drbd8. Either depmod -a, or reboot at this point, if you are depmod -a, you will need modprobe drbd.

You will need a backend section to use drbd, do not create this LVM section, otherwise you will encounter all the problems. Do not put LVM on the drbd device, or you will run into all the problems.

Hanfs1:

/etc/drbd.conf global { usage-count no; } common { protocol C; disk { on-io-error detach; } } resource export { syncer { rate 125M; } on hanfs2 { address 172.20.1.218:7789; device /dev/drbd1; disk /dev/sda3; meta-disk internal; } on hanfs1 { address 172.20.1.219:7789; device /dev/drbd1; disk /dev/sda3; meta-disk internal; } }

Hanfs2 / etc / drbd.conf:

 global { usage-count no; } common { protocol C; disk { on-io-error detach; } } resource export { syncer { rate 125M; } on hanfs2 { address 172.20.1.218:7789; device /dev/drbd1; disk /dev/sda3; meta-disk internal; } on hanfs1 { address 172.20.1.219:7789; device /dev/drbd1; disk /dev/sda3; meta-disk internal; } }

After configuration, we need to call drbd again.

  drbdadm create-md export
 drbdadm attach export
 drbdadm connect export

Now we have to perform the initial data synchronization - obviously, if this is a new drbd cluster, it does not matter which one to choose node.

Once you're done, you'll need the mkfs.yourchoiceoffilesystem on your drbd device — the device in our configuration file above / dev / drbd 1. http://www.drbd.org/users-guide/p-work.html — a useful document for reading while working with drbd.

heartbeats

Install heartbeat2. (Pretty simple, apt-get install heartbeat2).

/etc/ha.d/ha.cf on each machine should consist of:

hanfs1:

 logfacility local0 keepalive 2 warntime 10 deadtime 30 initdead 120

ucast eth1 172.20.1.218

auto_failback no

node hanfs1 node hanfs2

hanfs2:

 logfacility local0 keepalive 2 warntime 10 deadtime 30 initdead 120

ucast eth1 172.20.1.219

auto_failback no

node hanfs1 node hanfs2

/etc/ha.d/haresources should be the same for both mailboxes:

  hanfs1 ipaddr :: 172.20.1.230/24/eth1
 hanfs1 HeartBeatWrapper

I wrote a wrapper script to deal with idiosyncrasies caused by nfs and drbd in a script to switch to another resource. This script must exist in / etc / ha.d / resources.d / on each machine.

`!/bin/bash`

`heartbeat fails hard.`

`so this is a wrapper`

`to get around that stupidity`

`I'm just wrapping the heartbeat scripts, except for in the case of umount`

`as they work, mostly`

if [[ -e /tmp/heartbeatwrapper ]]; then runningpid=$(cat /tmp/heartbeatwrapper) if [[ -z $(ps --no-heading -p $runningpid) ]]; then echo "PID found, but process seems dead. Continuing." else echo "PID found, process is alive, exiting." exit 7 fi fi

echo $$ > /tmp/heartbeatwrapper

if [[ x$1 == "xstop" ]]; then

/etc/init.d/nfs-kernel-server stop #>/dev/null 2>&1

`NFS init script isn't LSB compatible, exit codes are 0 no matter what happens.`

`Thanks guys, you really make my day with this bullshit.`

`Because of the above, we just have to hope that nfs actually catches the signal`

`to exit, and manages to shut down its connections.`

`If it doesn't, we'll kill it later, then term any other nfs stuff afterwards.`

`I found this to be an interesting insight into just how badly NFS is written.`

sleep 1

 #we don't want to shutdown nfs first! #The lock files might go away, which would be bad. #The above seems to not matter much, the only thing I've determined #is that if you have anything mounted synchronously, it going to break #no matter what I do. Basically, sync == screwed; in NFSv3 terms. #End result of failing over while a client that synchronous is that #the client hangs waiting for its nfs server to come back - thing doesn't #even bother to time out, or attempt a reconnect. #async works as expected - it insta-reconnects as soon as a connection seems #to be unstable, and continues to write data. In all tests, md5sums have #remained the same with/without failover during transfer. #So, we first unmount /export - this prevents drbd from having a shit-fit #when we attempt to turn this node secondary. #That a lie too, to some degree. LVM is entirely to blame for why DRBD #was refusing to unmount. Don't get me wrong, having /export mounted doesn't #help either, but still. #fix a usecase where one or other are unmounted already, which causes us to terminate early. if [[ "$(grep -o /varlibnfs/rpc_pipefs /etc/mtab)" ]]; then for ((test=1; test <= 10; test++)); do umount /export/varlibnfs/rpc_pipefs >/dev/null 2>&1 if [[ -z $(grep -o /varlibnfs/rpc_pipefs /etc/mtab) ]]; then break fi if [[ $? -ne 0 ]]; then #try again, harder this time umount -l /var/lib/nfs/rpc_pipefs >/dev/null 2>&1 if [[ -z $(grep -o /varlibnfs/rpc_pipefs /etc/mtab) ]]; then break fi fi done if [[ $test -eq 10 ]]; then rm -f /tmp/heartbeatwrapper echo "Problem unmounting rpc_pipefs" exit 1 fi fi if [[ "$(grep -o /dev/drbd1 /etc/mtab)" ]]; then for ((test=1; test <= 10; test++)); do umount /export >/dev/null 2>&1 if [[ -z $(grep -o /dev/drbd1 /etc/mtab) ]]; then break fi if [[ $? -ne 0 ]]; then #try again, harder this time umount -l /export >/dev/null 2>&1 if [[ -z $(grep -o /dev/drbd1 /etc/mtab) ]]; then break fi fi done if [[ $test -eq 10 ]]; then rm -f /tmp/heartbeatwrapper echo "Problem unmount /export" exit 1 fi fi #now, it important that we shut down nfs. it can't write to /export anymore, so that fine. #if we leave it running at this point, then drbd will screwup when trying to go to secondary. #See contradictory comment above for why this doesn't matter anymore. These comments are left in #entirely to remind me of the pain this caused me to resolve. A bit like why churches have Jesus #nailed onto a cross instead of chilling in a hammock. pidof nfsd | xargs kill -9 >/dev/null 2>&1 sleep 1 if [[ -n $(ps aux | grep nfs | grep -v grep) ]]; then echo "nfs still running, trying to kill again" pidof nfsd | xargs kill -9 >/dev/null 2>&1 fi sleep 1 /etc/init.d/nfs-kernel-server stop #>/dev/null 2>&1 sleep 1 #next we need to tear down drbd - easy with the heartbeat scripts #it takes input as resourcename start|stop|status #First, we'll check to see if it stopped /etc/ha.d/resource.d/drbddisk export status >/dev/null 2>&1 if [[ $? -eq 2 ]]; then echo "resource is already stopped for some reason..." else for ((i=1; i <= 10; i++)); do /etc/ha.d/resource.d/drbddisk export stop >/dev/null 2>&1 if [[ $(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2) == "Secondary/Secondary" ]] || [[ $(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2) == "Secondary/Unknown" ]]; then echo "Successfully stopped DRBD" break else echo "Failed to stop drbd for some reason" cat /proc/drbd if [[ $i -eq 10 ]]; then exit 50 fi fi done fi rm -f /tmp/heartbeatwrapper exit 0

elif [[x $1 == "xstart" ]];

 #start up drbd first /etc/ha.d/resource.d/drbddisk export start >/dev/null 2>&1 if [[ $? -ne 0 ]]; then echo "Something seems to have broken. Let check possibilities..." testvar=$(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2) if [[ $testvar == "Primary/Unknown" ]] || [[ $testvar == "Primary/Secondary" ]] then echo "All is fine, we are already the Primary for some reason" elif [[ $testvar == "Secondary/Unknown" ]] || [[ $testvar == "Secondary/Secondary" ]] then echo "Trying to assume Primary again" /etc/ha.d/resource.d/drbddisk export start >/dev/null 2>&1 if [[ $? -ne 0 ]]; then echo "I give up, something seriously broken here, and I can't help you to fix it." rm -f /tmp/heartbeatwrapper exit 127 fi fi fi sleep 1 #now we remount our partitions for ((test=1; test <= 10; test++)); do mount /dev/drbd1 /export >/tmp/mountoutput if [[ -n $(grep -o export /etc/mtab) ]]; then break fi done if [[ $test -eq 10 ]]; then rm -f /tmp/heartbeatwrapper exit 125 fi #I'm really unsure at this point of the side-effects of not having rpc_pipefs mounted. #The issue here, is that it cannot be mounted without nfs running, and we don't really want to start #nfs up at this point, lest it ruin everything. #For now, I'm leaving mine unmounted, it doesn't seem to cause any problems. #Now we start up nfs. /etc/init.d/nfs-kernel-server start >/dev/null 2>&1 if [[ $? -ne 0 ]]; then echo "There not really that much that I can do to debug nfs issues." echo "probably your configuration is broken. I'm terminating here." rm -f /tmp/heartbeatwrapper exit 129 fi #And that it, done. rm -f /tmp/heartbeatwrapper exit 0

elif [[ "x $1" == "xstatus" ]];

 #Lets check to make sure nothing is broken. #DRBD first /etc/ha.d/resource.d/drbddisk export status >/dev/null 2>&1 if [[ $? -ne 0 ]]; then echo "stopped" rm -f /tmp/heartbeatwrapper exit 3 fi #mounted? grep -q drbd /etc/mtab >/dev/null 2>&1 if [[ $? -ne 0 ]]; then echo "stopped" rm -f /tmp/heartbeatwrapper exit 3 fi #nfs running? /etc/init.d/nfs-kernel-server status >/dev/null 2>&1 if [[ $? -ne 0 ]]; then echo "stopped" rm -f /tmp/heartbeatwrapper exit 3 fi echo "running" rm -f /tmp/heartbeatwrapper exit 0

code>

When doing all of the above, you just want to configure / etc / exports

  / export 172.20.1.0/255.255.255.0(rw,sync,fsid=1,no_root_squash)

Then this is just a case of running a beat on both machines and issuing hb_takeover on one of them. You can verify that it works by making sure that the one you released is primary - check / proc / drbd, that the device is installed correctly and that you can access nfs.

-

Good luck. It was a very painful experience for me.

pjz · Answer 2 · 2008-09-17T05:15:32+0000

These days, 2TB is suitable for a single machine, so you have options from simple to complex. All of them assume linux servers:

You can get the HA of a bad person by setting up two machines and doing periodic rsync from primary to backup.
You can use DRBD to mirror one at a different block level. This has the disadvantage that it will be difficult to expand in the future.
You can use OCFS2 for disk clustering, and for future extensibility.

There are also many commercial solutions, but currently for most of them 2 TB for most of them is not enough.

You have not yet specified your application, but if you do not need a hot switch to another resource, and all you really need is what it will cost to lose a disk or two, find a NAS that supports RAID-5 with at least 4 disks and hotswap, and you should be good to go.

Sev · Answer 3 · 2008-09-17T05:13:53+0000

I would recommend NAS Storage. (Network storage).

HP has some good ones you can choose from.

http://h18006.www1.hp.com/storage/aiostorage.html

as well as cluster versions:

http://h18006.www1.hp.com/storage/software/clusteredfs/index.html?jumpid=reg_R1002_USEN

David ackerman · Answer 4 · 2008-09-17T05:13:50+0000

Are you looking for a "corporate" solution or a "home" solution? It is difficult to say on your question, because 2TB is very small for the enterprise and a bit at a high level for the home user (especially two servers). Could you clarify the need for us to discuss compromises?

bmdhacks · Answer 5 · 2008-09-17T05:23:53+0000

There are two ways. First, just buy a SAN or NAS from Dell or HP and throw money at the problem. Modern storage equipment simply makes it all easy to use, saving your knowledge for more serious problems.

If you want to minimize yourself, look at using Linux with DRBD.

http://www.drbd.org/

DRBD allows you to create network block devices. Think of RAID 1 on two servers instead of two drives. DRBD deployment is typically performed using Heartbeat to fail on failure if one system dies.

I'm not sure about load balancing, but you can explore and see if LVS can be used to load balance between DRBD hosts:

http://www.linuxvirtualserver.org/

In conclusion, let me just reiterate that you are probably going to save a lot of time in the long run by simply playing money for the NAS.

user15094 · Answer 6 · 2008-09-17T06:13:33+0000

I assume that from your question you are a business user? I bought 6TB RAID 5 from Silicon Mechanics and connected it to the NAS, and my engineer installed NFS on our servers. Backups performed through rsync on another high-capacity NAS.

pro · Answer 7 · 2008-09-17T06:23:57+0000

Take a look at Amazon Simple Storage Service (Amazon S3)

http://www.amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2?ie=UTF8&node=16427261&no=3435361&me=A36L942TSJ2AJA

“That might be of interest.” High availability

Dear AWS Client:

Many of you have asked us to inform you in advance about the features and services that are currently being developed so that you can better plan how this functionality can integrate with your applications. To this end, we are pleased to share with you some early details about the new offer we are developing here at AWS, the content delivery service.

This new service will provide you with a high-performance method for distributing content to end users, providing your customers with low latency and high data transfer speeds when accessing your facilities. The initial release will help developers and enterprises who need to deliver popular, publicly accessible content over HTTP connections. Our goal is to create a content delivery service that:

Allows developers and enterprises to easily get started - there are no minimum fees and obligations. You will only pay for what you actually use. Simple and easy to use - the only, simple API call is all that is needed to begin the delivery of your content. Works with Amazon S3 - this gives you reliable storage for the original, final versions of your files, while simplifying the use of the content delivery service. Has a global presence - we use a global network of border locations on three continents to deliver your content from the most suitable place.

You'll start by storing the original version of your objects in Amazon S3, making sure they are publicly available. You will then make a simple API call to register your bucket with the new content delivery service. This API call returns a new domain name for inclusion in your web pages or application. When customers request an object using this domain name, they will be automatically redirected to the nearest location of the region for high-performance delivery of your content. It's simple.

We are currently working with a small group of private beta customers and expect this service to be widely available by the end of the year. If you want to be notified at startup, let us know by clicking here.

Yours faithfully,

Amazon Web Services Team

ben · Answer 8 · 2008-09-17T17:16:25+0000

It is best, perhaps, to work with experts who do such things for a living. These guys are actually in our office complex ... I had the opportunity to work with them on a similar project in which I was.

http://www.deltasquare.com/About

McGovernTheory · Answer 9 · 2009-04-18T21:40:09+0000

May I suggest you visit the F5 website and check out http://www.f5.com/solutions/virtualization/file/

fish.ada94 · Answer 10 · 2010-01-30T08:01:19+0000

You can look in the mirror file system. It replicates files at the file system level. The same file on the primary and backup systems is a live file.

http://www.linux-ha.org/RelatedTechnologies/Filesystems

High Availability Storage

!/bin/bash

heartbeat fails hard.

so this is a wrapper

to get around that stupidity

I'm just wrapping the heartbeat scripts, except for in the case of umount

as they work, mostly

NFS init script isn't LSB compatible, exit codes are 0 no matter what happens.

Thanks guys, you really make my day with this bullshit.

Because of the above, we just have to hope that nfs actually catches the signal

to exit, and manages to shut down its connections.

If it doesn't, we'll kill it later, then term any other nfs stuff afterwards.

I found this to be an interesting insight into just how badly NFS is written.

More articles:

`!/bin/bash`

`heartbeat fails hard.`

`so this is a wrapper`

`to get around that stupidity`

`I'm just wrapping the heartbeat scripts, except for in the case of umount`

`as they work, mostly`

`NFS init script isn't LSB compatible, exit codes are 0 no matter what happens.`

`Thanks guys, you really make my day with this bullshit.`

`Because of the above, we just have to hope that nfs actually catches the signal`

`to exit, and manages to shut down its connections.`

`If it doesn't, we'll kill it later, then term any other nfs stuff afterwards.`

`I found this to be an interesting insight into just how badly NFS is written.`