Lesson 1, Topic 1
In Progress

3.2 Data Warehousing Process


Warning: Attempt to read property "post_author" on null in /home/palanfou/evarsity.my/wp-content/themes/buddyboss-theme/learndash/ld30/topic.php on line 196

Organizations, private and public, continuously collect data, information, and knowledge at an increasingly accelerated rate and store them in computerized systems. Maintaining and using these data and information becomes extremely complex, especially as scalability issues arise. In addition, the number of users needing to access the information continues to increase as a result of improved reliability and availability of network access, especially the Internet. Working with multiple databases, either integrated in a data warehouse or not, has become an extremely difficult task requiring considerable expertise, but it can provide immense benefits far exceeding its cost. As an illustrative example, Figure 3.3 shows business benefits of the EDW built by Teradata for a major automobile manufacturer.

Many organizations need to create data warehouses—massive data stores of time series data for decision support. Data are imported from various external and internal resources and are cleansed and organized in a manner consistent with the organization’s needs. After the data are populated in the data warehouse, DMs can be loaded for a specific area or department. Alternatively, DMs can be created first, as needed, and then integrated into an EDW. Often, though, DMs are not developed, but data are simply loaded onto PCs or left in their original state for direct manipulation using BI tools. In Figure 3.4, we show the data warehouse concept. The following are the major components of the data warehousing process:

FIGURE 3.3 Data-Driven Decision Making—Business Benefits of the Data Warehouse.Source: Teradata Corp.
FIGURE 3.4 A Data Warehouse Framework and View

Data sources. Data are sourced from multiple independent operational “legacy” systems and possibly from external data providers (such as the U.S. Census). Data may also come from an OLTP or enterprise resource planning (ERP) system. Web data in the form of Web logs may also feed to a data warehouse.

Data extraction and transformation. Data are extracted and properly transformed using custom-written or commercial software called ETL.

Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse and/or DMs.

Comprehensive database. Essentially, this is the EDW to support all decision anal-ysis by providing relevant summarized and detailed information originating from many different sources.

Metadata are maintained so that they can be assessed by IT personnel and users. Metadata include software programs about data and rules for organizing data summaries that are easy to index and search, especially with Web tools.

Middleware tools enable access to the data warehouse. Power users such as analysts may write their own SQL queries. Others may employ a man-aged query environment, such as Business Objects, to access data. There are many front-end applications that business users can use to interact with data stored in the data repositories, including data mining, OLAP, reporting tools, and data visualization tools.