How to replace a brick in GlusterFS

Scenario 1: Your data drive fails in your machine and you really don’t want to go through the process of reconfiguring a new system just to create and join a new brick.

Scenario 2: You proactively replace your hard drive and you keep the OS in tact and just want to use the new drive.

The solution to either scenario is quite simple.

Let’s stop cron just in case we have any watchdog processes running.

service cron stop

First thing we need to do is tell GlusterFS that we want to take the brick offline. We do that with this command.

gluster volume reset-brick your-volume your-hostname-or-ip:/path-to-failed-brick

Then we can unmount the old volume and swap out a new disk. As long as our config remains the same, we will later use the reset-brick command to get all of the data back from the redundant copy.

Now, you need to create a partition and then format the new drive with the following. Let’s assume that your new drive/partition is /dev/sdb1 and that it’s an SSD so that you want to use f2fs.

mkfs.f2fs /dev/sdb1

Now, you need to update /etc/fstab so that it automatically mounts. Let’s say your mount point is /mnt/gluster. You will want to use something like this:

echo "UUID=`blkid | grep /dev/sdb1 | cut -d'\"' -f2` /mnt/gluster        f2fs     defaults,noatime,nofail    0 1" >> /etc/fstab

Note: You should comment out the old entry in your fstab file.

Next, you will want to mount your new partition.

mount -a

Now let’s create the new brick. This must be a different name than the old one!! Assuming that your old brick is /mnt/gluster/brick, we could simply do.

mkdir -p /mnt/gluster/brick

Now let’s use the gluster volume reset-brick command to do the heavy lifting for us.

gluster volume reset-brick your-volume your-hostname-or-ip:/path-to-failed-brick your-hostname-or-ip:/path-to-failed-brick commit force

If you stopped cron, start it back up now.

service cron start

That’s it.

Now you can check on the progress of the re-population of data by using the heal info command.

Hope this helps!

Simple Fail2Ban Alternative for FreePBX + SSH

I created a script to help secure Asterisk servers: fail-to-ban. It runs once per minute and will block an IP that tries, unsuccessfully, to authenticate more than a certain number of tries within a certain interval. It keeps track of each block and will automatically unblock an IP after a given amount of time elapses. All of these variables are easily set/tuned at the beginning of the script. It’s basic, easy to install, and effective. The installation is just two extremely easy steps.

  • Create the file /scripts/fail-to-ban.
  • Add a cron job entry to run it.
  • /scripts/fail-to-ban

    #!/bin/bash
    
    PATH=$PATH:/usr/bin:/bin:/sbin:/usr/sbin
    
    ATTEMPTS=10;    # NUMBER OF ATTEMPTS IN A GIVEN INTERVAL
    INTERVAL=300;   # INTERVAL (IN SECONDS) TO WATCH FOR FAILED ATTEMPTS - HISTORICALLY FROM CURRENT TIME
    PERMBAN=100;    # AFTER THIS NUM OF FAILED ATTEMPTS, BAN UNTIL LOG ROTATES
    BLOCKSECS=3600;   # AFTER THIS TIME (IN SECONDS), UNBLOCK A BLOCKED IP
    BLOCKED_ALREADY=""
    BLOCKED_NOW=""
    SKIPPED=""
    EXPIRED_BLOCK=""
    NOW=`date '+%s'`
    
    isip() {
            ISIP=$1
            if [ $(echo $IP | sed 's/[^.]//g' | awk '{print length; }' 2> /dev/null) -eq 3 ]; then
                    ISIP=1
            fi
    }
    
    fail2ban() {
            # echo failing $IP with count $COUNT and lastcount $LASTCOUNT
            IP=$IP
            EXISTS=`iptables -n -L | grep $IP | wc -l`
            IS_LOCAL=`echo $IP | grep -E '^10\.|192\.168|127\.' | wc -l`
            if [ $EXISTS -gt 0 ]; then
                    BLOCKED_ALREADY+=",$IP:$COUNT"
                    # echo "IP $IP is already blocked"
            elif [ $IS_LOCAL -eq 1 ]; then
                    SKIPPED+=",$IP:$COUNT"
                    # echo "IP is local IP.  Not blocking"
            else
                    if [ ! "$IP" == "" ]; then
                            # echo "Blocking IP $IP after $COUNT abuses."
                            BLOCKED_NOW+=",$IP:$COUNT"
                            iptables -I INPUT 1 -j DROP -s $IP
                            echo "`date`:$IP:$NEWCOUNT:$COUNT" >> /tmp/banned.log
                    fi
            fi
    }
    
    updateList() {
            NOW=`date '+%s'`
            sed -i /tmp/ip-list.log -e "s/"$IP":"$LASTCOUNT".*$/"$IP":"$COUNT":"$NOW"/"
    }
    
    
    showList() {
            LIST="$2"
            DESCRIPTION="$1"
            if [ ! "$LIST" == "" ]; then  
                    echo "$DESCRIPTION"
                    for i in `echo "$LIST"`                                                                       
                    do                                                                                                   
                            BIP=$(echo $i | sed -e 's/:.*$//')                                                           
                            BCOUNT=$(echo $i | sed -e 's/^.*://')                                                        
                            if [ ! "$BIP" == "" ]; then
                                    echo $BIP $BCOUNT                                                                            
                            fi
                    done
            fi
    }
    
    checkExpired() {
            BLOCKED=$(iptables -L INPUT -n | grep "^DROP" | sed -e 's/^.*--  //' | sed -e 's/ .*$//')
            for i in `grep -e "$BLOCKED" /tmp/ip-list.log`                                                                                                                                              
            do                                                                                                                                                                           
                    IP=`echo $i | cut -d':' -f1`                                                                                                                                         
                    isip $IP                                                                                                                                                             
                    COUNT=`echo $i | cut -d':' -f2`                                                                                                                                      
                    LASTACTION=`echo $i | cut -d':' -f3`                                                                                                                                 
                                                                                                                                                                                         
                    if [ $((NOW-LASTACTION)) -gt $BLOCKSECS ] && [ ! "$IP" == "" ] && [ $ISIP -eq 1 ] && [ $COUNT -lt $PERMBAN ]; then                                                   
                            LINE=`iptables -L -n --line-numbers | grep "$IP" | cut -d' ' -f1`                                                                                            
                            if [ ! "$LINE" == "" ]; then                                                                                                                                 
                                    echo "Removing block on $IP"                                                                                                                         
                                    # EXPIRED_BLOCK+=",$IP"                                                                                                                              
                                    echo iptables -D INPUT $LINE                                                                                                                         
                                    iptables -D INPUT $LINE                                                                                                                              
                            fi                                                                                                                                                           
                    fi                                                                                                                                                                   
            done                                   
    }
    
    
    if [ ! -f /tmp/ip-list.log ]; then
            touch /tmp/ip-list.log
    fi
    
    # Do some checking to see if the logs actually changed
    if [ -f /tmp/this-run ]; then
            mv /tmp/this-run /tmp/last-run
    else
            touch /tmp/last-run
    fi
    ls -al /var/log/asterisk/security > /tmp/this-run
    CHANGE=$(diff /tmp/last-run /tmp/this-run | wc -l)
    if [ $CHANGE -eq 0 ]; then
            echo "No change since last run"
            checkExpired
            exit
    fi
    
    IPLIST=$(/bin/grep -E "InvalidAccount|ChallengeResponseFailed" /var/log/asterisk/security | sed -e 's/^.*RemoteAddress="IPV4\/UDP\/\([0-9.]*\)\/.*$/\1/' | sort | uniq -c | sed -e 's/^ *//' | sed -e 's/ /:/')
    
    for i in `echo "$IPLIST"`
    do
            #echo $i
            COUNT=`echo $i | cut -d':' -f1`
            IP=`echo $i | cut -d':' -f2`
            isip $IP
            LASTCOUNT=`cat /tmp/ip-list.log | grep "^$IP:" | cut -d':' -f2`
            ELAPSED=`cat /tmp/ip-list.log | grep "^$IP:" | cut -d':' -f3 | sed -e 's/\n//g'`
            ELAPSED=$((NOW-ELAPSED))
            if [ "$COUNT" == "" ]; then
                    COUNT=0
            fi
            if [ "$LASTCOUNT" == "" ]; then
                    LASTCOUNT=0
            fi
            NEWCOUNT=$((COUNT-LASTCOUNT))
            if [ ! "$LASTCOUNT" == "" ] && [ $LASTCOUNT -eq 0 ] && [ $ISIP -eq 1 ]; then
                    echo "$IP:$COUNT:$NOW" >> /tmp/ip-list.log
                   # echo "Adding $IP to the IP tracking log with count $COUNT"
            fi
            if [ $NEWCOUNT -ge $ATTEMPTS ] && [ $ISIP -eq 1 ] && ( [ $ELAPSED -le $INTERVAL ]  ||  [ $COUNT -gt $PERMBAN ] ); then
                    if [ $LASTCOUNT -ne 0 ]; then
                            # echo "Updating IP:$IP with NEWCOUNT:$NEWCOUNT ATTEMPTS:$ATTEMPTS ELAPSED:$ELAPSED INTERVAL:$INTERVAL ISIP:$ISIP"
                            updateList
    
                    fi
                    fail2ban
            fi
    done
    
    checkExpired
    
    IFS=","
    
    showList "Blocked | Attempts" "$BLOCKED_ALREADY"
    showList "Newly Blocked | Attempts" "$BLOCKED_NOW"
    showList "Skipped | Attempts" "$SKIPPED"
    showList "Expired" "$EXPIRED_BLOCK"
    

    /scripts/crontab

    * * * * * /scripts/fail-to-ban > /dev/null 2>&1 &
    

    After creating these files, simply run the following commands:

    chmod a+x /scripts/fail-to-ban
    crontab /scripts/crontab
    

    That’s really all you need to do!

    Prepping VyOS + OpenVPN for use with a Chromebook

    First, you need to create the certificates. Use EasyRSA for this. Follow the instructions at the OpenVPN site for that.
    https://openvpn.net/index.php/open-source/documentation/miscellaneous/77-rsa-key-management.html

    To build the CA certificate, use the command:
    ./easyrsa build-ca

    To build the server certificate, use the command:
    ./easyrsa build-server-full server

    To build a client, use the command:
    ./easyrsa build-client-full client

    You may want to remove the password for the private keys for the server and client certificates. To do that, use these commands.

    cd private
    openssl rsa -in server.key -out server-nopass.key
    openssl rsa -in client.key -out client-nopass.key
    cd ..

    Now copy the ./issued/server.crt, the ./ca.crt, and the ./private/client-nopass.key to the VyOS server. Create a new folder called /config/auth/openvpn and store them there.

    The OpenVPN connection for VyOS should look like this:

     openvpn vtun0 {
         local-port 1194
         mode server
         openvpn-option --persist-tun
         protocol udp
         replace-default-route {
             local
         }
         server {
             domain-name mydomain.com
             max-connections 10
             name-server 10.0.0.1
             push-route 10.0.0.0/23
             subnet 10.0.1.0/24
             topology subnet
         }
         tls {
             ca-cert-file /config/auth/openvpn/ca.crt
             cert-file /config/auth/openvpn/server.crt
             dh-file /config/auth/openvpn/dh.pem
             key-file /config/auth/openvpn/server-nopass.key
         }
     }
    

    Now we have to prep the Chromebook certificate.

    openssl pkcs12 -export -out client.pfx -inkey private/client-nopass.key -in issued/client.crt -certfile ca.crt

    Now upload the ca.crt and the client.pfx files to the Chromebook (You can use the SCP addon for the file manager to transfer them there.)

    Navigate to chrome://certificate-manager.
    Click on Authorities and then click Import.
    Navigate to the ca.crt and import it.
    Now click back to “Your certificates” and click “Import and Bind to Device”.
    Navigate to the client.pfx and import it. When it asks for a password, hit enter without one if you did not set one.
    Now navigate to chrome://settings and click “Add Connection”. Choose OpenVPN.
    Set the “Provider Type” to be OpenVPN.
    Set the CA certificate to be the one that we uploaded.
    Set the client certificate to be the client.pfx. Note: It will show up with the common name of the certificate, not the file name.
    Now type in any username and password. It doesn’t matter since we are using certificate based authentication. Save the user/pass so that you do not have to type it every time. Again, it does not matter what you type here.

    Now you should be able to connect!

    XenCenter Storage Repository Rescan Issue – Solved

    When clicking rescan on a storage respository in XenCenter, you may encounter an error like the one below.

    rescan-failure

    The problem is usually due to a foreign file in the storage repository. To find out what the offending file is, turn to the power of the command-line. Login to the master server and run the following command.
    tail -n 100 /var/log/SMlog

    Review the log and find the offending entry. It might look something like this.
    ***** VHD scan error: vhd=/var/run/sr-mount/3219ad4f-242e-1744-6d38-78e17efdce69/9411b73c-7a3a-4812-b64b-edfe8f04ee70.vhd scan-error=-32 error-message='opening file'

    In this case, the file /var/run/sr-mount/3219ad4f-242e-1744-6d38-78e17efdce69/9411b73c-7a3a-4812-b64b-edfe8f04ee70.vhd could not be opened. After taking a quick look at that file, I quickly recognized that it was a 0 byte file. It did not create properly. I moved it out to the /tmp folder and tried to rescan. This resolved my issue.

    VPN Passthrough for DD-WRT

    For a long time, I had enabled VPN passthrough with DD-WRT from an IPSEC endpoint running inside the network with the SPI firewall disabled. The VPN was mostly reliable, but every once in a while I would need to reset the tunnel on both sides. I did this until I discovered the magic of enabling the SPI firewall. Just as with a Cisco router, the DD-WRT router needs to inspect the packets to know what to do with IPSEC. So, if you want IPSEC passthrough support, you really should enable both the option in step 1 AND the option in step 2 (below).

    1.
    vpn-passthrough

    2.
    spi-firewall

    Tuning Ext4 for MythTV

    When using an ext4 filesystem for MythTV, here are a few tricks to help improve speed.

    First of all, always create a separate partition for your recordings! Here’s how to ensure that it is optimal.

    Let’s say that you create a partition on device /dev/sdb.

      mkfs.ext4 -T largefile /dev/sdb1
      tune2fs /dev/sdb1 -o journal_data_writeback
      echo “/sbin/blockdev –setra 4096 /dev/sdb” >> /etc/rc.local

    In the first step, we tell ext4 to format tuned to use large files. This reduces the number of inodes, maximizing the available space and (unconfirmed, but hypothesized) minimizing the journal size.
    In the second step, we enable writeback mode for the filesystem as a default mount option. This is normally not a safe step, but we are only protecting TV content that is already very robust at working in the face of errors. Setting this option allows the OS to write to the disks without 100% lock-step synchronicity.

    Now add these options to /etc/fstab.

      nodiratime,noatime

    This option in the fstab tells the OS not to change the last accessed times on the files and folders every time a file is accessed. This reduces unnecessary writes, which improves overall performance.

    Your fstab might look something like this:

    UUID=2a59a579-9779-455c-bfed-d7e69c5ec533 /shared/recordings ext4 defaults,nodiratime,noatime 0 0

    That’s it! Happy watching.

    Xenserver Heterogeneous CPUs in Pool

    I have been working to get some Xenservers setup in the same pool that have different CPU versions. With different features, I get the error message that the CPUs are not homogeneous. To help rectify this, I created the following script and then followed the instructions below.

    /root/get-cpu-features

    #!/usr/bin/perl
    $FEATURES2=$ARGV[0];
    if (exists $ARGV[1]) {
            $FEATURES1=$ARGV[1];
    } else {
            chomp($FEATURES1=`xe host-cpu-info | grep " features:" | cut -d':' -f2 | cut -d' ' -f2`);
    }
    
    if (!exists $ARGV[0]) {
            print "\nUsage: ./get-cpu-features [features-key1] [features-key2]\n\n";
            print "Local CPU Features: $FEATURES1\n";
            exit;
    }
    
    sub dec2bin {
        my $str = unpack("B32", pack("N", shift));
        $str =~ s/^0+(?=\d)//;   # otherwise you'll get leading zeros
        return $str;
    }
    sub bin2dec {
        return unpack("N", pack("B32", substr("0" x 32 . shift, -32)));
    }
    
    print "Local : ".$FEATURES1."\n";
    print "Remote: ".$FEATURES2."\n";
    $count=1;
    @GROUP1ARRAY=();
    @GROUP2ARRAY=();
    
    foreach $GROUP1 (split(/\-/,$FEATURES1)) {
            my $GROUP1BIN=dec2bin(hex($GROUP1))."\n";
            my $GROUP1BINPADDED = sprintf("%033s", $GROUP1BIN);
            $GROUP1ARRAY[$count]=$GROUP1BINPADDED;
            $count++;
    }
    
    $count=1;
    
    foreach $GROUP2 (split(/\-/,$FEATURES2)) {
            my $GROUP2BIN=dec2bin(hex($GROUP2))."\n";
            my $GROUP2BINPADDED = sprintf("%033s", $GROUP2BIN);
            $GROUP2ARRAY[$count]=$GROUP2BINPADDED;
            $count++;
    }
    
    @output=();
    
    print "\nMerged: ";
    for ($i=1; $i<=4; $i++) {
            $newgroup="";
            my @bin1=split(//,$GROUP1ARRAY[$i],33);
            my @bin2=split(//,$GROUP2ARRAY[$i],33);
            for ($j=0; $j<32; $j++) {
                    $result=$bin1[$j]*$bin2[$j];
                    $newgroup=$newgroup.$result;
            }
            $result=printf('%08x',bin2dec($newgroup));
            if ($i < 4) {
                    print "-";
            }
    }
    
    print "\n";
    
    

    On server 1

    /root/get-cpu-features

    The output will look something like this:

    Usage: ./get-cpu-features [features-key1] [features-key2]

    Local CPU Features: 0000e3bd-bfebfbff-00000001-20100800

    On server 2, run the script again, but this time specify the key from server 1 as an argument to the script.

    /root/get-cpu-features 0000e3bd-bfebfbff-00000001-20100800

    Now the output will look like this:

    Local : 0098e3fd-bfebfbff-00000001-28100800
    Remote: 0000e3bd-bfebfbff-00000001-20100800

    Merged: 0000e3bd-bfebfbff-00000001-20100800

    The compatible feature key for all nodes is the one labeled merged, in this case, 0000e3bd-bfebfbff-00000001-20100800.

    On each of the two nodes, we now need to set the feature key. We can do this by running this command on each node:

    xe host-set-cpu-features features=0000e3bd-bfebfbff-00000001-20100800

    Now reboot the nodes and then join them to the same pool.

    Note: If you have more than two nodes, then you will need to add a second argument when running the script on nodes 3+. That second argument should always be the "Merged" key from each previous server that the script was run against. This will ensure that all nodes are accounted for in the final merged key. On the last node, the merged key that is produced will be the one that you need to use for every node.

    A Poor Man’s Solution for Automatic ISP Failover

    This file should be installed on your router. In my case, I was testing with a VyOS router so I was able to easily extend it with this script. Just paste this into a new file and chmod +x to make it executable. Update the IP information below – most importantly, your LAN information. Then add it to a cron job and have it run once per minute. That’s it!

    
    #!/bin/bash
    PATH=$PATH:/bin:/usr/bin:/sbin:/usr/sbin
    
    # LAN IPs of ISP Routers to Use as the Default Gateway
    PREFERRED="10.1.10.1"
    ALTERNATE="10.1.10.3"
    
    # Public Internet Hosts to Ping
    PUBLICHOST1="8.8.8.8"
    PUBLICHOST2="8.8.4.4"
    
    
    # Ping the first public Internet device first.  See if it fails.
    RETURNED=`/bin/ping -c 5 $PUBLICHOST1 | grep 'transmitted' | cut -d',' -f2 | sed -e 's/^ *//' | cut -d' ' -f1`
    
    # If the first pings fail, check the secondary Internet device.
    if [ $RETURNED -eq 0 ]; then # Do a second check just to be sure, and to a different IP.  Less pings, since we are already pretty sure of an issue
            echo "Failed ping test to $PUBLICHOST1.   Trying against $PUBLICHOST2."
            RETURNED=`/bin/ping -c 2 $PUBLICHOST2 | grep 'transmitted' | cut -d',' -f2 | sed -e 's/^ *//' | cut -d' ' -f1`
    fi
    
    # If it still fails, assume that the Internet is down.
    if [ $RETURNED -eq 0 ]; then
            echo "Everything is not ok.  Looks like the Internet connection is down.  Better switch ISPs."
            date +%s > /tmp/cutover-start.log
    else
            echo "Everything is ok.  Checking if we are using the preferred connection."
            CURRENT=`route -n | grep "^0.0.0.0" | cut -d' ' -f2- | sed -e 's/^ *//' | cut -d' ' -f1`
            if [ -f /tmp/cutover-start.log ]; then
                    LASTSWITCHTIME=`cat /tmp/cutover-start.log | sed -e 's/\n//'`
            fi
            NOW=`date +%s`
            TIMEDIFF=$((NOW - LASTSWITCHTIME))
            if [ "$CURRENT" == "$ALTERNATE" ] && [ $TIMEDIFF -gt 300 ]; then           
                    date +%s > /tmp/cutover-start.log
                    echo "Testing if the primary gateway is back online."
                    route add -host $PUBLICHOST2 gw $PREFERRED
                    RETURNED=`/bin/ping -c 5 $PUBLICHOST1 | grep 'transmitted' | cut -d',' -f2 | sed -e 's/^ *//' | cut -d' ' -f1`
                    if [ $RETURNED -eq 5 ]; then
                            echo "Switching back now that primary ISP is back online."
                            route del -net 0.0.0.0/0
                            route add default gw $PREFERRED
                    else
                            echo "The primary host is not back online yet."
                    fi
                    route del -host $PUBLICHOST2
            fi
            exit
    fi
    
    # If the script does not exit before it reaches this point, then assume the worst.  Time to cutover to the other ISP
    
    CURRENT=`route -n | grep "^0.0.0.0" | cut -d' ' -f2- | sed -e 's/^ *//' | cut -d' ' -f1`
    
    # Delete the existing default gateway
    route del -net 0.0.0.0/0
    
    if [ "$CURRENT" == "$PREFERRED" ]; then
            echo "Switcing to the alternate ISP $ALTERNATE"
            route add default gw $ALTERNATE
    else
            echo "Switching to the default ISP $PREFERRED"
            route add default gw $PREFERRED
    fi
    
    

    Why the public cloud is the fast food of the IT industry.

    The challenge for any IT manager is how to meet business requirements with technology.  More often than not, those requirements dictate rapid deployment.  The cloud excels at this, since the infrastructure is already available and waiting to be used.  Rapid deployment, just as fast-food, does not guarantee quality, however.  In fact, the performance and reliability of public cloud services is currently variable and largely unpredictable.  Despite the high availability of fast food restaurants, people do not eat it for breakfast lunch and dinner.  They know that the quality of the food is sub-par, and it is intuitive to recognize that its usefulness runs out if speed and short-term cost are not the two foremost drivers.  The same holds true with the public cloud services.  They excel at speed-to-deploy and cost-of-entry for a solution.   And – just as with fast food, public cloud services are almost irresistibly appealing on the surface.  But after considering quality and long-term costs, it is clear that the best solutions remain in private clouds.

    Cloudsourcing – What is this “IT revolution” all about?

    First there were servers.  Each server had a single purpose.  In some cases, 90% of the time, they ran idle — a big waste of resources (processor, disk, memory, etc.)

    Then there were server farms, which were groups of servers distributing loads for single applications.   Distributing loads to multiple servers allowed for higher loads to be serviced, which allowed for tremendous scalability.

    Then there was SAN storage, the aggregation of disks to provide performance, redundancy, and large storage volumes.  A company could invest in a single large-capacity, highly-available, high-performance storage device that multiple servers could connect to and leverage.   No more wasted disk space.

    Then there was virtualization, the concept of running more than one server on the same piece of hardware.   No more wasted memory.  No more wasted processors.  No more wasted disk space.  But the management of isolated virtual servers became difficult.

    Then there was infrastructure management.  The idea that we could manage all of our servers from a single interface.  No more wasted time connecting to each server to centrally manage them.   But there were still inefficiencies when managing the applications and configuration of the virtual servers.

    Then there was devops, the concept of having software scripts manage the configuration and deployment processes for virtual or physical servers.

    …  With all of these inefficiencies addressed, you would think that there is no more room for improvement.  But we do have a “gotta-have-it-now” society, and in this age of fast-food and mass produced goods, we had to see this coming.  The “Cloud”.   The “cloud” is the fast-food of servers.  It is the idea that someone else can leverage all of the aforementioned concepts to build and manage an infrastructure much larger than yours, and cheaper and faster (per unit) than you can.  It is the idea that economies of scale drive costs so low for the cloud provider that those savings translate into big savings for the companies or people that leverage them.  We all use “the cloud” in some way.   Microsoft has OneDrive/SkyDrive, Google has Google Drive, and then there’s Dropbox.   All of these services represent “the cloud” for individual people.  But there are clouds for companies, too, like Amazon’s AWS, Google’s Cloud Platform, Microsoft’s Azure, etc.

    So how do we know that the cloud is good for a company?   Well, it’s really difficult to tell.  For one, the traditional model for companies was one where they owned their infrastructure.  All expenses were considered capital expenses – that is, there were not recurring expenses for their infrastructure equipment (except perhaps for software components).   But when it comes to cloud infrastructure, the business model is, of course, the one that profits the cloud owner the most — the holy grail of business models — the recurring revenue stream known in IT speak as Infrastructure As A Service.   Cloud businesses are booming right now!  The big question is – is the “cloud” as good for the customer as it is for the provider?

    In my experience, owning hardware has distinct advantages.  The most recognizable difference is that a company can purchase infrastructure while sales are up – and they can coast along continuing to leverage their infrastructure equity when sales dip down.   In my experience, once hardware is paid for, the recurring cost to run and maintain it is much less than the cost of any cloud solutions.  After all, the cloud providers need to pay for their infrastructure, too.  They pass that cost on to you after padding it with their soon-to-be profits, so your recurring costs with them will potentially always be higher than yours will be if you own your infrastructure.  If you were to transform your initial investment into an operational expense and distribute it out over the lifespan of the equipment (let’s say, 5 years), then add in the cost of ownership of that equipment, then the numbers become close enough that I am comfortable enough to declare that at least the first year of ownership will be a wash.  In other words, if you intend to own hardware for a year or less, then the cloud is really where you should be.   The longer that you intend to continue to extract use out of your hardware, though, the more appealing it is to own your own rather than use the cloud.

    To put this into perspective…  it is common for townships to employ a mechanic to maintain their vehicles.   If townships were to, instead, lease all of their vehicles, the mechanic would not be necessary and costs could initially be reduced.  But, over time, the value that the mechanic introduces would more than pay for his salary.   Company’s have mechanics, too, except that they are called systems administrators.  They keep systems running and squeeze every last drop of efficiency out of a piece of hardware.   If you are a company without “mechanics”, the cloud is for you.   But if you do have systems administrators within your employ, then yet another consideration needs to take place.  What is the reason for leveraging the cloud?   If your answer is, “Because it’s the latest trend.”, then you need to take a step back and reconsider taking another sip of the kool-aid that you have been drinking.   If your answer is, “We have been growing rapidly and our infrastructure cannot keep up.”, then perhaps the cloud is the right spot for you.  If your answer is, “Our business is seasonal and sometimes we have a lot of wasted infrastructure that is online, but unused.”, then perhaps the cloud is for you.

    There are other cases, and the math certainly needs to be done.   Economies of scale make this topic interesting because it is certainly plausible that one day cloud services will be cheaper than owning your own infrastructure.   There will always be the differences in CapEx vs OpEx, and that requires an assessment of your sales patterns before taking any plunge into the cloud.   One thing is certain, though.  The cloud is not for everyone.  Assess your business needs, assess the competence of your IT group, assess your revenue streams, and then make a careful and calculated decision.   But whatever you do, don’t do it just to do it.   Because chances are that the results will not meet your business needs.