Traditionally, large data warehouses were deployed on high-end symmetric multiprocessing computers that didn't necessarily need to meet the same availability requirements as transaction processing systems.
Today, as companies extend their data warehouse assets to more users—both inside and outside the organization—there is a critical need to control IT costs and deliver higher service levels. That's why many companies are choosing to deploy their data warehouses on clustered low-cost commodity servers running Linux.
Enter Linux Power
Is Linux ready for these high-end data warehouses and business intelligence systems?
IT pros at Vanderbilt University think so. Until recently, this Nashville, Tennessee-based educational institution struggled with the cost of managing its growing information systems.
Vanderbilt discovered that running Oracle Database with Oracle Real Application Clusters on Linux would allow it to use low-cost Intel-based hardware. The economy of the solution was compelling: A two-node RISC-based server configuration would have cost $100,000, whereas a two-node Intel-based solution cost only $30,000.
'Our tests showed that we would get three times the server power and performance for the dollar, plus greater availability, if we switched from UNIX to Linux,' reports Tim Getsay, assistant vice chancellor of Vanderbilt's management information systems.
Today, Vanderbilt uses Oracle Database 10g with Oracle Real Application Clusters configured on 16 HP ProLiant DL580 servers running Red Hat Enterprise Linux.
Oracle Real Application Clusters makes it easy to scale the data warehouse, because low-cost servers can incrementally be added to the cluster. According to Getsay, Vanderbilt expects to add 20 processors per year as it scales its data warehouse to several terabytes.
'It's not just the operating system but also the overall cost of commodity components that is driving down costs in these data warehouse installations,' says Lou Agosta, an analyst at Cambridge, Massachusetts-based Forrester Research.
'The operating system is only about five percent of the price of the overall configuration and often less than one percent. It is not Linux, per se, that is driving down costs for Oracle data warehouse customers, but all of the things Oracle is doing with Linux in grid computing environments to create low-cost systems.'
The Linux cluster convergence
Apart from costs, Agosta believes that the adoption of open source operating systems such as Linux is being driven by a variety of factors.
'For one thing, large vendors such as IBM, HP, Oracle, and Dell are getting behind Linux,' he says. 'Second, many people want a low-cost, nonproprietary alternative to Windows. And, finally, because Linux runs on commodity components, it allows you to avoid technology lock-in.'
Oracle's commitment to Linux is part of the reason for its widespread adoption in the enterprise today. Oracle continues to work closely with Red Hat, Novell, and the Linux community to ensure that Oracle products and the Linux kernel are optimally configured and tuned to the underlying hardware, and Oracle provides seamless and integrated 24/7 customer support for Linux.
These business dynamics motivated Vanderbilt to deploy an enterprise grid that can manage all of the university's core databases on a centralized infrastructure, consolidating the data warehouse environments for both Vanderbilt University and the Vanderbilt Medical Center.
Vanderbilt's Linux-based information systems keep data secure yet highly available to all authorized personnel. If any server in a cluster fails, the remaining servers continue to operate seamlessly, ensuring 24/7 availability.
According to Mainstay Partners, an IT consulting firm based in Redwood City, California, this infrastructure affords Vanderbilt University the benefit of US$6.2 million in cost avoidance for hardware investments and hardware maintenance as the university continues to scale out its grid with standardized, commodity-priced servers and storage devices.
While better reliability at a lower cost is a benefit any IT manager can appreciate, there is another reason companies are turning to the performance and scalability of Linux clusters for data warehousing.
The growing popularity of end-user reporting signals a change in how data warehouses are being used. Traditionally, these systems were utilized by a small group of information analysts, and possibly some senior executives and line-of-business managers.
Today's business intelligence solutions typically involve extending business information to many different types of employees throughout the enterprise—as well as to external partners and customers. These operationally focused business intelligence systems influence a lot of the core information systems of the company.
They are used for both strategic and tactical decisions—not only by professional analysts and power users but also by rank-and-file employees throughout the organization.
'Our users are becoming more independent,' confirms Ron Reinsma, manager of the applications group at MLT Vacations, one of the largest providers of vacation packages in the United States.
'They want to dig into the data and get their own answers. It's a challenge for us to keep up with that demand, because they keep asking more-complex questions. We're seeing 10 percent annual growth just from the new report requests and data structures we're building.'
Oracle Real Application Clusters enables companies such as MLT Vacations to scale their information systems to support changing business demands and to create an infrastructure with built-in high availability and business continuity.
According to Chris Corona, manager of system services at MLT Vacations, these clustered environments gracefully handle unscheduled outages by automatically recovering a failed server and continuing to provide database services by using surviving servers.
Data is always accessible—as long as there is at least one server running in the cluster. This resilient configuration enables MLT Vacations' Web site, reservation systems, and data warehouse applications to remain online during routine maintenance or when there is a problem with a server.
'This stability is becoming increasingly critical, not just for our transaction processing systems but also for our data warehouse,' says Corona. 'Many users depend on the warehouse to analyze revenue, inventory, and pricing data as well as to track customer issues, credit vouchers, and profitability.'
System essentials
One of the strategies behind grid computing is to maximize the use of processors and storage capacity. When purchasing computing power, companies typically overestimate the amount they will need—and then pay support and maintenance on that additional capacity.
Even if they ultimately end up using all available capacity, it's an expensive way to do business. Oracle Real Application Clusters changes this scenario, allowing companies to scale their information systems incrementally, minimizing capital expenditures by adding server capacity only when needed.
'Oracle data warehouses built with Oracle Real Application Clusters technology have great flexibility, because of their shared-everything architecture,' says William Hardie, senior director of Database Product Marketing at Oracle.
If a server malfunctions in an Oracle clustered database environment, processing continues on the remaining servers. This ensures that data remains accessible and applications function without interruption. 'Plus it's easy to scale database clusters on demand,' Hardie adds, 'because Oracle Real Application Clusters automatically harnesses the processing power of additional servers as they are brought into the cluster.'
Others have seen the wisdom of this way of thinking. 'We wanted to be able to easily scale our information systems as our business grows,' admits Reinsma. 'We were reaching the boundaries of scalability with our SMP server, which meant we would soon need to buy another big box. With Oracle Real Application Clusters, we can add more small servers as we need them.'
By replacing its SMP servers with clustered Intel-based servers running Oracle Database and Oracle Real Application Clusters on Linux, MLT Vacations was able to improve system performance while decreasing technology costs.
The travel company expects to save approximately US$1 million in software, hardware, training, and maintenance costs over the next five years as a result of its IT investments.
'Moving to Oracle on Linux has exceeded our expectations in terms of performance and cost efficiencies,' says Michael Kress, director of enterprise technology services at MLT Vacations. 'With our SMP server, even though there were multiple processors, they were all tied together, so we couldn't take one down without taking down the others.'
Browse
related articles
Linux, sponsored by IBM, Oracle and Sun Middle East
