Can you imagine losing $66,240 per minute? That’s what Amazon.com lost when its site was down for 30 minutes in September 2013, according to an estimate by Forbes. Outages can cripple or ruin organizations, and while some outages are unpreventable, contingency plans can be put into place to minimize downtime.
Companies often strive to architect for success when, in fact, more companies need to architect for failure. Being the IT pessimist is much more relevant than being the IT optimist. If something in IT could fail, companies should operate with the mindset that it most likely will.
Build with Failure in Mind
When building website architecture and topology, you need to design for every possible failure. Any line that you draw within the network diagram should have a dotted line signifying a backup path to accompany it. Any box representing a server or a piece of hardware must have a corresponding box as a hot-swappable, redundant failover.
If you are the individual tasked with being the IT pessimist — the one who is the fall guy if the infrastructure fails — you have probably developed a nervous twitch that makes itself known whenever talk of modifying your infrastructure comes up.
Any misstep that renders your website inaccessible to your customers is the equivalent of putting a “closed” sign on your brick-and-mortar store. Actually, it’s worse. It’s like having a wrecking ball crash through that storefront while your customers are lining up to buy things. It ruins their customer experience and potentially drives them away for good.
Are you scared yet? You should be.
Global Failover to the Rescue
Fortunately, some service providers, like Lunarpages, offer global redundancy and failover, which is specifically designed to mitigate and thwart outages. With this solution, companies set up mirrored environments within unique data centers. So, if disaster strikes one data center, the other is available to accept rerouted traffic.
If you set up a global load-balancing and failover solution for your business, you are actually helping your customers in two ways:
1) If there is an outage at one data center, the other one will provide business/website continuity.
2) Customers coming from diverse geographic locations will notice speed and performance improvements when there isn’t an outage and all data centers are up and running.
Depending on how the business has set up its routing rules, web users access different data centers based on proximity to that data center. That means East Coast users will typically hit an East Coast data center, and West Coast users will access a West Coast data center. This translates to reduced latency and fewer hops by the user to access the infrastructure, which translates to faster response times.
Be sure to look carefully at these different global load-balancing solutions, because they’re not always the same. Typically, you need to have the following:
- Health monitors: Be sure that you can configure more than one monitor, because you need to ensure that there are checks and double-checks prior to a global load balancer marking an entire data center out-of-service because it detected a single anomaly.
- Groups: Ensure that you can group logical or similar health monitors as well as infrastructure components into groups. If a database server goes down in one data center, the global load balancer should know not to send traffic to the associated web servers that depend on the offline database server.
- Isolation of routing: Be sure that your global failover solution is not dependent on hardware/software located in or tied to a particular data center within that solution. By using global DNS, for example, routing will still occur even if an entire data center is offline.
While global failover is not always an inexpensive solution, when you weigh it against the revenue lost per minute, you quickly realize that your customers probably will not forgive you very easily if you don’t have a recovery solution in place.
[image credit: alphaspirit/iStock/ThinkStockPhotos]