Data Warehouse Architecture
Data warehouse architecture refers to the way a data warehouse is structured and organized. There are several common architectural approaches, each with its own strengths and use cases. Here are the main types of data warehouse architectures:
Single-Tier Architecture:
All data is stored in a single, centralized repository
Rarely used in practice due to performance and scalability limitations
Two-Tier Architecture:
Separates the data sources from the data warehouse
Consists of a database server and clients accessing it
Three-Tier Architecture: This is the most common and widely used architecture, consisting of:
a) Bottom Tier: Database server, usually a relational database system b) Middle Tier: OLAP (Online Analytical Processing) server c) Top Tier: Client front-end tools for querying and reporting
Bus Architecture:
Utilizes shared dimensions (conformed dimensions) across different data marts
Allows for incremental development of the data warehouse
Hub-and-Spoke Architecture:
Central data warehouse (hub) feeds departmental data marts (spokes)
Ensures consistency across the organization while allowing for customization
Federated Architecture:
Integrates multiple autonomous data sources without centralizing them
Useful when full integration is impractical or undesirable
Data Vault Architecture:
Focuses on long-term resilience to change and auditability
Separates business keys, relationships, and descriptive attributes
Kimball's Dimensional Model:
Uses fact tables and dimension tables
Optimized for query performance and ease of use
Inmon's Corporate Information Factory (CIF):
Advocates for a centralized, normalized data warehouse
Data marts are derived from this central repository
Lambda Architecture:
Combines batch processing and real-time processing
Consists of batch layer, speed layer, and serving layer
Data Lakehouse:
Combines elements of data warehouses and data lakes
Aims to provide structure and performance on top of low-cost storage
Each of these architectures has its own advantages and is suited to different organizational needs, data volumes, and analytical requirements. The choice of architecture depends on factors such as the organization's size, data complexity, reporting needs, and existing infrastructure.
Last updated