The A to Z of Windows disaster recovery

Windows based computing has become the hub of many businesses, particularly smaller and medium sized organisations. Typically, as companies expand, so too does the extent of their dependency on the computer systems.

  • Sunday, November 21 - 2004 at 17:42


sponsored link
related stories
With that dependency comes risk. The greater the dependency the greater the potential threat of disaster striking. And in computing terms a disaster doesn't always have to be an entire site burning down or a flooded data centre. A simple application failure can constitute a disaster if based on the impact in terms of data loss and downtime.

When looking at disaster recovery the first thing for an organisation to consider are the different threats to their IT environment. These are normally categorised as either physical, logical or site outages. A physical outage would be some kind of hardware or component failure, a logical threat something such as an accidental deletion or corruption and a site outage is what is more traditionally associated with disasters, such as floods or power failures.

The planning stage should look at how to avoid the impact of each one of these threats coming to fruition. Having done this the next factor to consider is what would happen if applications running a business being unavailable and losing data.

Every application has a different level of importance in the context of a business and the effects of downtime across applications will differ. For example the consequences of a bank's trading systems being offline will be hugely more costly than if a file and print server were to be unavailable.

It is important for an organisation to understand the downtime costs for their particular industry across the different applications. Once this is done it is possible to understand the risk and therefore what level of protection is needed to mitigate that risk.

The illustration below came from a Meta Group study into the costs for downtime across different industry sectors that looked at the financial effect of downtime across organisations and their applications. By direct costs it means lost revenues as a result of the data loss and downtime. Obviously every business is different and so an organisation should understand its own circumstances.

Meta Group study

Recovery Point Object and Recovery Time Objective (RPO/RTO)
The next part of the DR planning process should be to look at what sort of level of recovery is needed for the various applications. This will differ from application to application. To do this we use the principals of RPO and RTO.

RPO is the point in time to which an organisation needs to recover its data for a particular application. Put in other terms, how much data loss can be tolerated. For example, if there is a Service Level Agreement (SLA) in place between IT and the business saying that the recoverable data for an application should be no more than four hours hold, yet the most recent copy of the data is available is a backup that was done twelve hours ago then they are not going to meet the RPO using that particular technology.

RTO is the amount of time that elapses between the failure occurring and the application and data being recovered and brought back online i.e. the acceptable amount of downtime.



A business should look at each of its applications in this way and decide what an acceptable RPO and RTO should be across each of the individual applications. This profiling of applications then provides the basis for making decisions around the technology.

Once there is an understanding of the recovery requirements for the various applications then the process of mapping the most appropriate technology to achieve the required recovery profile can begin.

Different technologies will achieve different recovery profiles in terms of RPO and RTO. Typically tape based backup and recovery tends to deliver longer RPOs and RTOs and then as technologies such as disk based backup, remote mirroring, replication and local and global clustering are introduced the RPO and RTO will reduce accordingly.

The diagram below highlights how different technology will deliver different RPO's and RTO's on an application by application basis.



Technologies for Windows based DR
The starting point for any DR strategy should be an effective backup and recovery solution. This is a prerequisite and should underpin any other technology being used. Backing up your data to tape represents the last valid copy of your data.

This is important for a number of reasons. Firstly tapes can be easily transported offsite to give protection against a site wide disaster. Also, even when technology such as mirroring is being used, if there is a corruption to the data, it is highly likely that you will mirror the corruption. Backup represents the data safety net - the last valid copy of your data when all else fails.

VERITAS Backup Exec for Windows can deliver this starting point. It has a number of application specific agents so that applications such as Exchange can be backed up without incurring downtime and can be backed up in a fast and efficient manner.

From a recover point of view using a combination of backup to tape and to disk can allow faster and more frequent backups, as well as faster recovery.

Recovery of a server or servers that have had a complete failure can be notoriously difficult, time consuming and cumbersome. This process can be automated to dramatically improve the recovery process and the time taken by using the Intelligent Disaster Recovery option.

Further enhancements can be made to the recovery process by eliminating old, duplicate, non business and stale data from the server environment. This can be done in an automated manner through the use of VERITAS StorageCentral.

In a typical scenario, if a company were to free up 30% of its storage capacity using StorageCentral there will be an equally dramatic improvement in recovery time.

Downtime Avoidance and Fast Recovery
It is better to eliminate downtime in the first place. Using VERITAS Storage Foundation for Windows gives a stable platform to protect your storage physical storage environment. It provides a central point of management of all your storage on all your Windows servers regardless of which vendor's storage it is.

Through allowing you to connect different physical storage devices from different vendors to the server directly or via a Storage Area Network (SAN) and presenting it as logical volumes it eliminates a single point of hardware failure. Not only that, but more storage can be added and volumes grown without have to take the application offline. This approach allows many of the day-to-day management tasks to be carried out without incurring downtime.

One of the other capabilities of Storage Foundation for Windows is the ability to mirror data at a software level, meaning mirrors can be created that span physical storage devices. From a disaster recovery perspective if a company has two sites connected with fibre based SAN connectivity a mirrored copy of the data can now be made available at the secondary site. There is now a secondary offsite copy of the data meaning faster recovery from more up to date data.

For critical applications there is also a Flashsnap option which allows an additional mirror of the data to be created and 'snapped' off. This then represents a point in time, on disk backup image of the data. This image can be used for much faster recovery of critical applications such as Exchange. Instead of having to recover data from a previous tape based backup the data can now be recovered massively faster from an on-disk backup image with much less data loss.



High Availability
Storage Foundation is also available in a High Availability version, known as Storage Foundation for Windows HA. This combines the technology already outlined, with VERITAS' high availability clustering and comes with some additional options.

We have previously discussed the use of the mirroring capabilities in SFW to do remote mirroring over fibre channel SANs. Where there isn't a SAN connecting sites then another way has to be found to mirror data to a remote location. This is where the Volume Replicator option comes in. This is able to replicate all data between sites continuously over a standard IP network without the need for any specialist hardware. What's more, it's non proprietary and can replicate from any vendors disk to any vendors disk.

This gives the ability to continuously copy data to another location anywhere in the world to protect against an entire site outage. But this is only part of the picture.

When this is combined with clustering technology it is possible to build a state of the art disaster recovery solution to virtually eliminate downtime from both local and site wide disasters.

VERITAS' clustering technology protects against both server and application failure. It allows up to 32 servers to be clustered together so that whether and application or an entire server fails another server can take over the work, with little or no impact to the users. What's more it is far more cost efficient than traditional approaches to clustering in that there is no need to have a standby machine for every production server. With VERITAS clustering you can have one standby machine shared amongst several active machines or you can have every server in the cluster active.

On a local level this provides local protection and when combined with the Global Cluster Option and the Volume Replication option it is possible to centrally manage multiple clusters in multiple locations and in the event of a site outage, automate the migration of an entire site to a DR site. Again, this can be done with minimal impact to users with automated updating of all the DNS traffic so the user never has to know. For critical systems such as Exchange and SQL this is enormously beneficial.



In conclusion, whether it is simple tape based backup or a global disaster recovery solution then VERITAS technology can help.

Start off by understanding the threats that you need to protect against. Look at the impact of downtime in your organisation both in terms of direct costs and the indirect knock on costs. This should give you and idea of the criticality of different applications.

Profile your applications in terms of RPO and RTO in order to define a suitable recovery profile. Use the appropriate technology to deliver the required level of protection.

Finally, always ensure that a plan is thoroughly tested. The time to find out that there are problems is not after a disaster has taken place. And revisit the plan regularly to ensure that changes in the IT environment don't impact the ability to recover.




Symantec Symantec, Middle East
Sunday, November 21 - 2004 at 17:42 UAE local time (GMT+4)

Replication or redistribution in whole or in part is expressly prohibited without the prior written consent of AME Info FZ LLC / Emap Limited.


Disclaimer:
Articles in this section are primarily provided directly by the companies appearing or PR agencies which are solely responsible for the content. The companies concerned may use the above content on their respective web sites provided they link back to http://www.ameinfo.com

Any opinions, advice, statements, offers or other information expressed in this section of the AME Info Web site are those of the authors and do not necessarily reflect the views of AME Info FZ LLC / Emap Limited. AME Info FZ LLC / Emap Limited is not responsible or liable for the content, accuracy or reliability of any material, advice, opinion or statement in this section of the AME Info Web site.

For details about submitting your stories, please read the guide - all content published is subject to our terms and conditions

Sponsored Links

Email newsletters

Business Directory »

The news you choose

News and Articles »

Current Events »

Advertisement »