press release

           
Links Left
  Home
  Consulting
  Education
  Contact Us
  Clients
  Employment
  Site Map
 

 

 

Destiny Corporation Analysis on Netezza Data Warehouse Appliance for Large Data

Netezza Data Warehouse Appliance

The Business Problem

Many organizations have stated various challenges when accessing data:

  • Access to Data is slow
  • Long running queries
  • The need to reduce costs

The Result on the Business

  • Data analysis is constrained by the time it takes to load and retrieve the data.
  • Organizations adapt and only analyze the data they can get access to within a reasonable amount of time and effort; they forego critical analysis because it is too time consuming or not possible to achieve given their current infrastructure.

Innovative Solution

Times have changed. Now an environment that offers access and analysis of data that is 25 – 500 times faster than traditional methods exists. This allows Business Intelligence and Analytics software the ability to get access to the data and analyze it as quickly as the business can use it. Imagine the ability to summarize and get statistics from the organization’s data in a few minutes or seconds instead of many hours. Imagine the ability to set up this environment in a day, not require a team of Database Administrators to run it and allow the business community to concentrate on the analysis instead of the hassles of getting access to and organizing the data. Imagine the ability to have real-time profile analysis of the customer against the warehouse of data while the customer is still on the phone.

The excuse ‘we cannot get access to the data fast enough to support the business’ is now history.

Destiny Perspective

As a Business and Information Technology Consulting firm, we constantly assess the vendors in the industry and market niche they support. We work for our customers to help them answer their business questions and support their business needs. We have found a proven, simple, cost effective way to solve the problem of not being able to get fast access to data to make decisions – the Netezza Data Warehouse Appliance. 

The Netezza Data Warehouse Appliance looks like a refrigerator that rolls into the Data Center. It holds 12.5 Terabytes of storage. For more storage, just chain a bunch of these ‘refrigerator looking’ cabinets together.

How can it load data at 500 Gigabytes an hour and retrieve data in seconds? It is due to its design.

Design

Each Data Warehouse Appliance (refrigerator) contains (108) computers called Snippet Processing Units (SPU). Each SPU is an integrated circuit board with a CPU, 400-Gigabyte hard disk, memory and 1 Gigabyte Network Interface Card. Each Appliance (cabinet) contains 108 of these SPUs. This is gives you parallel processing across 108 computers inside of each cabinet.

As the data is loaded into the Appliance, it intelligently separates each table across the 108 SPUs. Typically, the hard disk is the slowest part of a computer. Imagine 108 of these spinning up at once, loading a small piece of the table. This is how Netezza achieves a 500 Gigabyte an hour load time.

After a piece of the table is loaded and stored on each SPU (computer on an integrated circuit card), each column is analyzed to gain descriptive statistics such as minimum and maximum values. These values are stored on each of the 108 SPUs, instead of indexes, which take time to create, updated and take up unnecessary space. Imagine your environment without the need to create indexes.

When it is time to query the data, a master computer inside of the Appliance queries the SPUs to see which ones contain the data required. Only the SPUs that contain appropriate data return information, therefore less movement of information across the network to the Business Intelligence/Analytics Server.

For joining data, it gets even better. The Appliance spreads data in multiple tables across multiple SPUs by a key. Each SPU contains partial data for multiple tables. It joins parts of each table locally on each SPU returning only the local result. All of the ‘local results’ are assembled the internal hardware backplane of the cabinet and then returned to the Business Intelligence/Analytics Server. This methodology also contributes to the speed story.

The key to all of this is ‘less movement of data across the network’. The Appliance only data returns required data back to the Business Intelligence/Analytics server across the organization’s 1000/100 MB network. This is very different from traditional processing where the Business Intelligence/Analytics software typically extracts most of the data from the database to do its processing on its own server. The database does the work to determine the data needed, returning a smaller subset result to the Business Intelligence/Analytics server.

Backup and Redundancy
               
To understand how the data and system are set up for almost 100% uptime, it is important to understand the internal design. It uses the outer, fastest, one-third part of each 400-Gigabyte disk for data storage and retrieval. One-third of the disk stores descriptive statistics and the other third stores hot data back up of other SPUs. Each Appliance cabinet also contains 4 additional SPUs for automatic fail over of any of the 108 SPUs.

New Trends

Several Business Intelligence and Analytics vendors are now exploring the ability to push most of their processing to the Netezza database, further reducing the movement of data. One of the initial offerings is in the area of scoring in the database. This is typically a very simple process where an algorithm calculates across a set of columns in the database. The problem is that it typically does this across an entire table that has first been relocated to the Business Intelligence/Analytics server, hence movement of the table across the network. Quite often, after the scoring process completes, the table is loaded back into the database requiring a second move of the data across the network.

Destiny Corporation, a member of the Netezza Developer Network, is developing one of the first scoring processes that automatically takes a Business Intelligence/Analytics vendor’s algorithms and transforms them to a compiled process that runs directly inside the Netezza database.

Case Study

Recently, the New York Stock Exchange purchased a 100 Terabyte Netezza Data Warehouse Appliance. One of their challenges is not only the fact that their data needs are growing exponentially, but they need to be able to analyze different types of data daily. Every day the Securities Exchange Commission goes onsite to the NYSE to audit trade data on different publicly traded firms across several years. The NYSE must prepare the data for the SEC Auditors to analyze. Using Netezza’s ability to load 500 Gigabytes an hour allows the NYSE to load different, ad hoc data upon request, making it available to the SEC Auditors. The NYSE offloads the data once the SEC has completed their work and new data is loaded for the next business day’s audit. The NYSE has stated that no other warehouse infrastructure was able to support this type of business need.

With Netezza, the NYSE Euronext has simplified its data load process while providing more flexible data modeling to support increased data feeds and volumes. With its four NPS 10400s, the NYSE can simultaneously load multiple servers, enabling business continuity. Its transformation process can support both batch and trickle batch feeds with Netezza. Data is available by 4:00 AM each day, as opposed to 12:00 noon in the previous environment. SLAs can finally be met, and the NPS system provides high performance on detailed queries without requiring aggregations or indexing.

Netezza has also simplified The NYSE’s data warehouse environment. Its existing hardware and software were effectively repurposed, and the Netezza system was easily integrated within the NYSE’s current On Demand Database (ODD) and Flat File Farm (FFF). In fact, the NYSE successfully deployed its first application on Netezza only two months after the NPS system was installed. The ODD can now provide users with a high performance facility to load archived data from commoditized file servers to Netezza in a highly transparent way.  The FFF can also be quickly and easily reloaded in the new environment, driven by the ODD application. Additionally, Netezza has minimized DBA and SAN administration resources.

NYSE analysts can now ensure that the trading systems are enabled to handle much greater volume spikes and its surveillance organization can identify and investigate many more trading anomalies using Netezza, thus ensuring a fair market.  Overall, Netezza is providing significant benefits to the entire NYSE organization. 

Competitive Alternatives

In our studies of other vendor solutions, we have seen a trend. After Netezza coined the phrase ‘Data Warehouse Appliance’ at the turn of the century, other vendors are trying to follow. The offerings are software based, hardware based or a combination of the two, leading to complicated solutions. Software companies can devise the logic, but need an array of hardware devices chained together in parallel from hardware vendors. The large hardware vendors have billions of dollars already invested in their legacy storage technology. This is not easy to transform overnight. We have seen several vendors offering their version of the ‘Data Warehouse Appliance’ to the marketplace, but to date, the solutions are very complex, require several vendors, a combination of skill sets not readily available in the industry and several database and architect resources. Some are simply repositioning legacy hardware designs.

Test Drive

Netezza is a very aggressive organization and constantly puts its reputation on the line, standing behind its promises. Netezza will offer a proof of concept to any qualified organization with a large data business need. The way it works is simple. We define the business benefit and ROI for fast data loading and retrieval, the customer defines the data they desire to test on the appliance. We load the data on a Netezza box at the customer site or at Netezza headquarters in Framingham, Massachusetts. The customer runs their normal Business Intelligence/Analytics processes. We document the benchmarks. Customers always see incredible results.

Summary

In conclusion, Destiny Corporation has tracked trends in Business and Information Technology since 1987. We have heard our customer’s issues from both a Business and an IT perspective. Living and working with large data has been a growing challenge for our customers as data acquisition continues to grow yielding masses of stored data available to the organization for analysis. The storage and analysis mechanisms must adapt. Our analysis has shown that Netezza is the leader as it elegantly answers today’s large data business needs.

About Destiny Corporation

Destiny Corporation is a Business and Technology Consulting firm incorporated in 1987. Destiny helps organizations understand and solve their business problems using appropriate information technology.

For more information about Destiny Corporation, please contact us on:

Phone: (860) 721-1684 or US toll free (800) 721-1684
Fax: (860) 721-9784
Email: info@destinycorp.com
Website: www.destinycorp.com