How to create Create Data Ware house from scratch

Sharifdeen Ashshak
8 min readJul 1, 2021

here I have created data warehouse using ETL tools.without wasting too much time on Introduction let’s move into the procces to get the hands on dirty.

Inorder do that in your local Machine you should have to Install the following requirements in your machine

The dataset here selected is as the IDA statement transactional data. This dataset contains the column of project Borrower ,his credidt card number,country ,region also it includes the project details which was bought by the Borrower. Also it includes some 3 rdparty details . further more it has first Repayment Date, last Repayment Date, AgreementDate, BoardApprovalDate etc.

This csv types of dataset contains multiple sectors of column. I have attached the csv files in this document.

That what we call data warehouse. In order to create a data warehouse we have following Architecture.

Here I have used my data scource as my csv file to load into the database

A stagingarea, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The datastagingarea sits between the data source(s) and the data target(s), which are often datawarehouses, data marts, or other data repositories. The ETL tool means Extract Transfer Load. This will Extract the data from data scources and transfer and load it into the Data warehouse.

Now we take look at how to load the data from the data source to data warehouse

Data warehouse design and development

Here I have used csv file as data scource to load into the data warehouse.

Star schema

The data could be stored in a single table where the attributes are repeated on each row (like the source for an Excel pivot table). However, that would take up much more space and makes managing the dimensions more difficult.so star schema maps the table very well structure and reduce the data redundancy.

Here is my star schema for my dataset . It contains fact tables and multiple dimensional tables. The fact table contains foreign keys of other dimensional tables. Also it includes depentat colums values that interects with other dimensional tables. The dimensional tables contain descriptive attributes.

Open your Report server configuration manager and and click start it. This tool helps us to customize SQL Server Reporting Service parameters and configurations

Then go to the database option then click onto the change database option.

Then click onto create a new report server database option.after that click onto next option.

Then let the options as defult and click onto the next option.

Again let the options as defult and click onto the next option.

Then click onto Apply option

Then open your Microsoft SQL server management studio to create a database. Then click on connect it.

To create a new project go to file system click on new file and click to open a new project.

Then select the directory and save your project using your wishfull name.

Click on the Database option after that create a database

Give a name for your database and click ok

Then expand the table to create new table.

Then create other column for my previously arranged dimensional tables

Here I have set my Region_IDS as my region dimensional table’s primary key.

Following images are that created for other dimensional tables.

Then create a Fact table by placing the dimensional tables primary key into the fact table as foreign key

Then take look how we will be going to make relation those primary key with the fact table.

Right click onto the value that you want to make foreign key and click on Relationship

Click on add

Then click on the symbol shows “…”

Here you have to set the value primary key of dimensional table and another side you have to define which value that you will be going to make sure in the foreign key table.

As well n as you have to do for other values too

create ETL tools to using visual studio

First you have to install Sql server intergration service using manage extension

Then now create new project by using Intergration service project

Then click on control flow layer and select the data flow task from the side bar and drag and drop it on the work place. If you wish rename it. And again double click on it.It will allow to extract the data file from scource to the destination.

Then in side bar select flat file and drag and drop it on the work flow. And rename . then double click to open it.

Then click on the new

Then select the csv file that you want to upload

From the side bar select the Data conversation and drag and drop it on the workflow. Then rename it and double click to open it

Here you can select the data column and can change the data type as well

Now here I have used Loook up tools to compare scource and destination data. It filters out the matched and unmatched data in the specific destinations. In my Warehouse table I have 5 dimensional table and one fact table. To make sure to initialize and insert the values into the fact table and dimensional table. I have used Look up tool to insert the value from scource file to dimensional table. The dimensional tables primary key has been assigned into the fact table foreign.

key. In order to refer the Dimension tables primary key into the fact table foreign key, I have to map the scource column which are corresponded to the dimensional table. So the steps are follows.

Here I have used flate file manager to transfer the data scource column type to that corresponds with destination column data type.Here In this picture I have done for one column like wise I have to do for other remaing columns.

Now double click on the Look up here. And then select yout server,dimensional table, then click on the column for mapping. Map the corresponded column from scource to destination column. Then check the primary key which wants to refer in the fact table. Like wise you have to do for other dimensional tables.

Use connections Lookup for match output

Then select the OLE DB destination fromside bar and drag and drop it on work flow.then change its name. then double click to open it.

Then click on new

Then select your server name and the database name .

Then click on Mappings and connect the corresponded columns from scources to destination column.

Populate the data into Dimensional Table

Then again create a new file . Then click on control flow layer and select the data flow task from the side bar and drag and drop it on the work place. then double click on it .place the flat file on the data flow layer. Then use Multicast tool from the SSIS tools. It can populate the data to one or more destination. Here I have used five de

Then map the each and every column from the source to the dimensions tables as follows

Then double click on the data flow layer and click the execute button

Now I have execute the SQL command to get the updated data in region dimension table

Then execute the previous flat file flow

Then I have used SQL command to get the populated flat file data

Now you can see in the above picture the dataset from the csv file has been successfully uploaded into the flat file and other dimensional tables.

Hope you have get the idea of how to load the data from OLTP database structure to OLAP database structure.

Originally published at https://www.blackkeyhole.com.

--

--