In this tutorial we are going to discuss deep detail into data warehousing concepts, its architecture and components of a data warehouse.
Data warehouses are multidimensional databases which generalize and consolidate data. It is a data repository maintained at a different place from other operational databases. It provides an integrated platform for collection of data from variety of applications. Data warehouse is a platform for information processing and analysis of accumulated historical data.
Data warehouses contains historical data unlike transactional databases which contains current information.
According to William H. Inmon, a leading architect in the construction of data warehouse systems, “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision making process”.
What is Data warehousing?
Data warehousing is a technique for businesses to use the data for decision making process. Data warehousing provides necessary tools and architecture for business executives to systematically understand their data and use it for customer insights and improve their businesses.
So we can say, data warehousing is construction and using of data warehouses.
Data warehousing is a powerful concept emerging in today fast evolving world. Many organizations are focusing of building large data warehouses and using it for making strategic decision and improving its customer base.
Features of a Data warehouse
The properties of data warehouse which distinguish it from other databases such as relational database, transactional databases, and file system are:
- Subject –Oriented: A data warehouse focuses on analysis and modelling of collected historical data for decision making process, unlike relation databases which mainly focus on day to day information processing. Data warehouses give a clear, simple and concise view of a particular area such as sales, customer, supplier etc by excluding the unnecessary data not useful for decision making process.
- Integrated: Data in data warehouse comes from multiple heterogeneous data sources such as online transactions, flat files and relation databases. Since different data sources have different naming conventions, encoding formats therefore data warehouse use data cleaning and data integration techniques to maintain consistency. Data cleaning is applied to remove noisy data.
- Time-Variant: Data warehouse store historic information of an organization ranging from 5- 10 years data. Therefore the time element is present in the data warehouse.
- Nonvolatile: Data warehouse is different from operational database as it does not require the mechanisms such as concurrency control. Recovery and transactional processing. Only “data loading” and “data access” operations are done on a data warehouse. This is because it is stored separate from the operational environment.
How is information from Data warehouse used by organizations?
Businesses use data warehouse to support many decision making processes such as:
- Customer centric decisions: Companies focus on customer buying trends, spending trends, buying time, total time in store, liking and disliking of customer etc.
- Product analysis: Companies use Data warehouses to decide on how the products should be placed on shelves, the sale of product in year, quarter and month, sale of product in particular geographical area. Based on the data it makes strategic decisions.
- Increasing Profits: The companies analyses the transactions to increase their profits.
- Reducing Costs: Companies use data warehousing to improve their business models, strategies, correct environmental conditions and manage customer relationships to reduce unnecessary costs.
- Integration from multiple resources: Data warehouse is useful as the companies are able to collect data from multiple heterogeneous, autonomous data sources and integrate it on a single solid platform. Access to data is very efficient and it helps in important decision making process.
Query Driven Approach vs Update Driven Approach in Data warehousing
In traditional databases the integration from multiple heterogeneous data sources is very complex, expensive and inefficient. It requires filtering and complex integration processes. It requires building of “wrappers” and “integrators” over the databases. When the client queries the database, the metadata is used to translate the query from local to format understood by all the heterogeneous sites. These queries are then interpreted and sent to local query processor. The results which come from local query processors are collected and integrated into a global result. This approach is called query driven approach.
Data warehousing uses update driven approach. In this approach the information and data from multiple heterogeneous data sources is integrated in advance and stored. It can be used for direct analysis. This approach gives high performance as the data is copied, pre-processed, integrated and restructured and summarized into one data space. Also, the day to day query processing is not affected by data warehousing as it is maintained separately.
Operational database vs Data warehouse
Data warehouse is kept separate from transactional and relational databases so as to maintain high performance in both databases.
- The operational databases are simple queries maintained for tasks like searching records, indexing and optimizing queries while data warehouses are complex queries, handling large amount of data and multidimensional views. If such big tasks are done using operational databases it would degrade its performance.
- Operational database allows multiple transactions to run at a single instance. Mechanisms such as concurrency control, recovery, locking protocols are used to maintain consistency of database. The data warehousing analytical processing has read-only access of data. It summarizes and aggregates the data. So these mechanisms of cannot be applied to data warehouse.
- The operational databases does not store historic data while DW stores only historic data. The DW has uses in decision making process as it contains huge data records which are cleaning, organized and integrated. Operational databases contain raw data which needs to pre-process before analysis.
Thus looking at the functionality, structure and kind of data present in operational database and data warehouse it is important to maintain separate database and data warehouse. Now, many RDBMS vendors are making their systems optimizable for analytical processing. In near future, this gap between RDB and DW is likely to reduce.
Conclusion
A data warehouse is a data store for implementation of decision making activities. It stores historical data which is used by enterprises for making strategies in their business processes.
DW is constructed from multiple heterogeneous data sources. The data from these sources is collected, cleaned, pre-processed, integrated and summarized. DW supports multidimensional views which can be accessed by complex queries. Data warehouse management and utilization is sometimes referred to warehouse DBMS.
⇓ Subscribe Us ⇓
If you are not regular reader of this website then highly recommends you to Sign up for our free email newsletter!! Sign up just providing your email address below:
Happy Testing!!!