Replication is the obvious next step to ensuring higher service levels after a solid backup plan has been deployed. But if backup is the building block of a disaster recovery strategy and replication is the next step with real-time data protection, where does system availability fit in?
Gunstein Løken, Operations and Development Manager, Orkla Media Service Senter IT
Backup and replication technologies both deal with minimising data loss, but neither technology can help keep systems alive - even during disasters. For maximum system availability we need to look at other alternatives. Traditionally, if backup is the only technology used to protect systems then the only way to get back to business is to do a restore from disk or tape, which can take anywhere from a couple of hours to days or even weeks. An example of how crucial it is to keep systems alive comes from a few years back when a large online brokerage company suffered 4 system outages within 2 months, resulting in a 22% dip in the stock price as customers lost faith in the company.

To minimise downtime as well as data loss, a combination of clustering and replication must be used. Clustering is simply the process of moving a failed application on a system that is experiencing a disaster to a working system, whether in the same data centre or in another location. This process can take anywhere from seconds to minutes.
What is a Cluster?
Before clustering emerged as a viable technology for keeping systems alive, users simply connected to systems and if the systems went down the users would be paralysed from doing anything until that system was fixed.
This also meant that if you were the administrator for an IT environment you would be held responsible for getting the system back up and running as quickly as possible and everyone would be hounding you until the system was fixed.

Figure 2. In the above environment, if the server went down the entire IT environment would be crippled and unavailable.
Though the concept of clustering had been available for many years on mainframes, it wasn't until the 90's that it became widespread on open systems, such as Windows, Unix and Linux. With clustering IT administrators could now secure access to systems and minimise downtime by having extra systems available in case of failure.

Figure 3. By having another system available to take over in case of system outages, downtime can be minimised. If one system fails, another takes over.
How does clustering work?
Clustering is by no means magic, it simply automates the process of rebuilding a system and starting an application on a standby system. Without clustering rebuilding a system can take a very long time, because first the operating system has to be installed, then applications, then patches downloaded and applied, then the system has to be configured, etc.

Symantec, Middle East



