CS代考 FIT3003 – Business Intelligence and Data Warehousing – cscodehelp代写
INFORMATION TECHNOLOGY
FIT3003 – Business Intelligence and Data Warehousing
Week 3a – Bridge Tables Semester 2, 2021
Developed by:
1. Bridge Tables
1. Product Sales Case Study 2. TruckDeliveryCaseStudy
Using FLUX
1. Visit http://flux.qa/ on your internet enabled device
2. Log in using your Monash account (not required if
you are already logged in to Monash)
3. Click on the “+” to join audience
4. Enter the Audience Code:
• Clayton – Z9J7MT
• Malaysia – PMYBD6
5. Select FIT3003 in the Active Presentation menu
Bridge Tables
Bridge Tables
▪ A bridge table is a table that links between two dimensions; and only one of these two dimensions are linked to the fact.
➢ As a result, the star schema becomes a snowflake schema.
Bridge Tables
▪ Two reasons on why a dimension cannot be connected directly to the Fact: ▪
a) The Fact table has a fact measure, and the dimension has a key identity. In order to connect a dimension to the Fact, the dimension’s key identity must contribute directly to the calculation of the fact measure. Unfortunately, this cannot happen if the operational database does not have this data.
b) The operational database does not have this data if the relationship between two entities in the operational database that hold the information about dimension’s key identity and the intended fact measure is a many-many relationship.
Bridge Tables Case Study #1
Case Study #1 – A Product Sales Case Study
▪ A company management team would like to analyze the statistics of its product sales history. The analysis is needed to identify popular products, suppliers supplying those products, the best time to purchase more stock, etc.
▪ A small data warehouse is to be built to keep track of the statistics.
▪ The management is particularly interested in analyzing the total sales (quantity * price) by product, customer suburbs, sales time periods (month and year), and supplier.
Case Study #1 – A Product Sales Case Study
▪ The management is particularly interested in analyzing the total sales (quantity * price) by product, customer suburbs, sales time periods (month and year), and supplier.
Case Study #1 – A Product Sales Case Study
▪ The management is particularly interested in analyzing the total sales (quantity * price) by product, customer suburbs, sales time periods (month and year), and supplier.
▪ Sales Star Schema ➢ Fact:
• Total Sales
➢ Dimensions: • Product
• Customer locations/suburbs
• Time period
• Supplier
Case Study #1 – A Product Sales Case Study
▪ Possible Two-Column Methodology Tables:
TotalSales
TotalSales
$2,500,000
TotalSales
(a) Product point of view
(b) Time point of view (c) Suburb point of view
Case Study #1 – A Product Sales Case Study
▪ Sales Star Schema ➢ Fact:
• Total Sales
➢ Dimensions:
• Customer locations/suburbs
• Time period
• Supplier
Case Study #1 – A Product Sales Case Study
Case Study #1 – A Product Sales Case Study
SupplierID
TotalSales
Supplier point of view
Case Study #1 – A Product Sales Case Study
SupplierID
TotalSales
Supplier point of view
Case Study #1 – A Product Sales Case Study
Case Study #1 – A Product Sales Case Study
Case Study #1 – A Product Sales Case Study
SupplierID
TotalSales
Case Study #1 – A Product Sales Case Study
Case Study #1 – A Product Sales Case Study
Bridge Table
Case Study #1 – A Product Sales Case Study
▪ To create Time Dimension:
– create table TimeDim as
distinct to_char(SalesDate, ’YYYYMM’) as TimeID, to_char(SalesDate, ’YYYY’) as Year, to_char(SalesDate, ’MM’) as Month
from Sales;
▪ To create Customer Location Dimension: – create table CustLocDim as
select distinct Suburb, Postcode from Customer;
Case Study #1 – A Product Sales Case Study
▪ To create Product Dimension:
– create table ProductDim as
select distinct ProductNo, ProductName from Product;
▪ To create Bridge Table:
– create table ProductSupplierBridge as
from StockSupplier;
▪ To create Supplier Dimension:
– create table SupplierDim as
select SupplierID, Name as SupplierName
from Supplier;
Case Study #1 – A Product Sales Case Study
▪ To create Fact Table:
– create table ProductSalesFact as
to_char(S.SalesDate, ’YYYYMM’) as TimeID, P.ProductNo,
sum(SI.QtySold*P.Price) as TotalSales
from Sales S, Product P, Customer C, SalesItem SI
where S.SalesNo = SI.SalesNo
and SI.ProductNo= P.ProductNo
and C.CustomerID = S.CustomerID
to_char(S.SalesDate, ’YYYYMM’), P.ProductNo, C.Suburb;
Bridge Tables Case Study #2
Case Study #2 – A Truck Delivery Case Study
▪ A trucking company is responsible for picking up goods from warehouses of a retail chain company, and delivering the goods to individual retail stores.
▪ A truck carry goods during a single trip, which is identified by TripID, and delivers these goods to multiple stores. Trucks have different capacities for both the volumes they can hold and the weights they can carry.
▪ At the moment, a truck makes several trips each week. An operational database is being used to keep track the deliveries, including the scheduling of trucks, which provide timely deliveries to stores.
Case Study #2 – A Truck Delivery Case Study
▪A trip may pick up goods from many warehouses
o i.e. a many-many relationship between Warehouse and Trip
▪ A trip uses one truck only, and a truck may have many trips in the history
o i.e. a many-1 relationship between Trip and Truck
▪ A trip delivers goods (e.g. TVs, fridges, etc) potentially to several stores
o a many-many relationship between Trip and Store, which is represented by the Destination table
Case Study #2 – A Truck Delivery Case Study
▪ Sample data in the operational database:
(a) Warehouse Table
(b) Trip Table
(c) TripFrom Table
Date TotalKm TruckID
14-Apr-2018 370 Truck1
14-Apr-2018
14-Apr-2018
15-Jul-2018
WarehouseID
WarehouseID Location
W1 Warehouse1
W2 Warehouse1
W3 Warehouse1
(d) Truck Table
(e) Store Table
(f) Destination Table
TruckID VolCapacity
Truck1 250
WeightCategory CostPerKm
Medium $1.20
StoreName Address
MyStore City Melbourne
MyStore Chaddy
MyStore HiPoint
High Point
MyStore Donc
MyStore North
MyStore South
MyStore East
MyStore Knox
Case Study #2 – A Truck Delivery Case Study
▪ The management of this trucking company would like to analyze the deliver cost, based on trucks, time period, and store.
Case Study #2 – A Truck Delivery Case Study
▪ Sales Star Schema
• Total Delivery Cost
(distance * cost per kilometre)
➢ Dimensions: • Truck
• Time period • Store
Case Study #2 – A Truck Delivery Case Study
▪ From the Truck point of view, Truck1 has two trips (e.g. Trip1 and Trip4), with the total kilometres of 820km (370km + 450km). The cost for Truck1 is $1.20. Hence, calculating the cost for Truck1 is straightforward. Other trucks can be calculated this way.
▪ From the Period point of view, 14-Apr-2018 has three trips (e.g. Trip1,Trip2, and Trip3). Trip1 (370km) is delivered by Truck1 which costs $1.20/km. Trip2 and Trip 3, on the same day, can be calculated the same way. Hence, on 14-Apr-2018, the total cost can be calculated.
▪ From the Store point of view; The cost is calculated based on Trip, but a trip delivers goods to many stores. Therefore, the delivery cost for each store cannot be calculated. The delivery cost is for the trip – not for the store.
Case Study #2 – A Truck Delivery Case Study
Solution Model 1 – Using a Bridge Table
Case Study #2 – A Truck Delivery Case Study
Solution Model 2 – add a Weight Factor attribute
A weight factor is only needed if we want to estimate the contribution that a dimension made to the fact
Case Study #2 – A Truck Delivery Case Study
(a) Trip Dimension Table
(b) Bridge Table (c) Store Table
TripID StoreID
StoreID StoreName Address
M1 MyStore City Melbourne
MyStore Chaddy
MyStore HiPoint
MyStore Donc
MyStore North
MyStore South
MyStore East
MyStore Knox
High Point
Case Study #2 – A Truck Delivery Case Study
▪ To create Trip Dimension:
– create table TripDim2 as
select T.TripID, T.TripDate, T.TotalKm, 1.0/count(*) as WeightFactor
from Trip T, Destination D
where T.TripID = D.TripID
group by T.tripid, T.tripdate, T.totalkm;
Case Study #2 – A Truck Delivery Case Study
Solution Model 3 – a List Aggregate version
Case Study #2 – A Truck Delivery Case Study
(a) Trip Dimension Table
(b) Bridge Table
TripID StoreID
listagg (Attr1, ‘_’) within group
(order by Attr1) as ColumnName
Case Study #2 – A Truck Delivery Case Study
▪ To create Trip Dimension:
– create table TripDim3 as
select T.TripID, T.TripDate, T.TotalKm, 1.0/count(D.StoreID) as WeightFactor,
(D. , ’_’) by D. ) as
from Trip T, Destination D
where T.TripID = D.TripID
group by T.TripID, T.TripDate, T.TotalKm;
StoreGroupList
Case Study #2 – A Truck Delivery Case Study
▪ Joining based on the StoreGroupList attribute in the Trip dimension table and the StoreID in the Store dimension table:
– select *
from TripDim3 T, StoreDim3 S
where T.StoreGroupList like ’%’||S.StoreID||’%’;
▪ Without the StoreGroupList attribute in the Trip dimension, we need to join three tables:
– select *
from TripDim3 T, BridgeTable3 B, StoreDim3 S where T.TripID = B.TripID
and B.StoreID = S.StoreID;
Bridge Tables Summary
Bridge Tables – Summary
▪ In principal, a Bridge Table is used: ▪
a) When it is impossible to have a dimension connected directed to the Fact table, because simply there is no relationship between this dimension and the Fact table (e.g. in the Product Sales case study, it is impossible to have a direct link from SupplierDim to ProductSalesFact)
b) When an entity (which will become a dimension) has a many-many relationship with another entity (dimension) in the E/R schema of the operational database (e.g. Supplier and Stock has a many-many relationship).
c) When temporality aspect (data history) is maintained in the operational database and the bridge table can be used to accommodate the dimension that has temporal attributes (e.g. product supply history is maintained in the second snowflake schema example).
Bridge Tables – Summary
▪ When a Bridge Table is used in the schema, there are two additional options: ▪
a) A Weight Factor is used to estimate the contribution of a dimension in the calculation of the fact measure. Because this is only an estimate, a weight factor is option.
b) Every snowflake schema (whether it has Weight Factor or not) can be implemented in two ways: a List Aggregate version, and a non-List Aggregate version.