Foundational Data Warehouse – Analysis Phase

Previously, I blogged on the Foundational Data Warehouse Checklist, which should be used as a high-level reference when building a data warehouse. In the blogs to come I will be taking a deeper dive into each of the phases that help build the foundation of a data warehouse. Hopefully, by the end of the blogs you will come out with a good understanding of why each phase is important and possible deliverables you can produce per phase. Regardless of which type of project lifecycle your company uses, they all have the same basic phases: Analysis, Design, Testing, Deployment. This blog will be focused on the Analysis phase.

Generally, the project will begin with the Analysis phase. The focus of this phase is to clearly define the business requirements and to understand what is required to complete the project. Often it is referred to as the WHAT? (What does my customer want, what will make them happy, what should the end result look like). More often than not people make the mistake of jumping into the HOW; save that for the Design phase. You really want to focus on what the customer wants.

I often use the analogy that building a data warehouse is similar to building a house. It takes many meetings with the customers to understand what they are looking for so you can meet their needs, it goes through multiple phases to get to the end result, you need to be flexible with changes, and it takes time to get to the perfect product. Throughout my blogs I will be references the following analogy, Let’s pretend we have a young couple that have lived in their start-up home for five years, they know what they like about it and what they dislike about it. They’ve been going to all the parade-of-home showings and have their dream home all planned out and are ready to build a new home. This is often how the business will come to you, they have their current reports, they know what they like about them and dislike about them. They often have an idea of what they want for a new reports and the sky is the limit. The reality is there is always a budget and there will have to be cut-backs or items moved out of scope, which is mainly done through the Analysis phase. Similar to building a house, your initial meetings with the customer is to gain a clear understanding of their basic needs. This is where you start to define the scope of the customer’s request and resources required to complete the project.

During the Analysis phase you should not only be meeting with the business to understand their requirements, but you should be understanding the source systems as well. You need to make sure you can extract all data points defined in the requirements. You need to understand the Operational tables, the joins between them, how the data flows throughout the system, the valid values allowed to be inputted within the system, data types, etc.

Most phases within a project require some sort of deliverable (output). When you come out of source system analysis you should have a good understanding of the Operational systems, the business flow, and the scope of the project. Below are some examples documents that may come out of the Analysis phase:

Requirements Documentation: The Business Analyst generally creates this document which contains the requirements, business flow, any items out of scope, assumptions, etc. This document should be used as a mechanism to communicate the requirements and deliverables between the business and IT.

Business Process Flow: The Business Analyst generally creates this document as part of the requirement documentation. The document is basically a diagram that explains the businesses work flow.

Source System Analysis Document: This document is usually created by the technical lead or some other lead that plans on moving the data. The document should include basically everything about your sources. I usually create this document using Excel so I can add tabs to meet my specific needs. Example Tabs: questions, responses, watchouts, glossary-of-terms, assumptions, risk, source table analysis (retention period, deletes, join criteria, etc.), field level analysis (valid values, cardnaility, nullability, data types, etc.), test cases (as you think of them during analysis), etc.

ER Diagram: This document is generally created by the technical lead or someone who has a good understanding of a star schema and has a clear vision of the company’s end-state. The ER diagram provides a high-level visual of the requirements. Usually, when you are sitting through the requirement gathering sessions you can start to hear the group by’s, which many times becomes the dimensions. In this phase you should be able to start to visualize your facts and dimensional tables.

ETL Process Flow: This document is usually created by the technical lead or some other lead that plans on moving the data. This person should have an overall vision of the data warehouse and understand the strategic direction. The document is basically a high-level diagram that explains the data movement strategy.

Prototyping: This document is generally created by the BI representative and is used to help ensure the requirements are defined and well understood between the business and IT.

In conclusion, the Analysis phase is a critical part of the project lifecycle. This is where you get to meet with your customers to gain a clear understanding of the deliverables, this is where you can ask questions about the systems and the data between them. This is where your inputs and outputs to the data warehouse are starting to be defined. Most importantly, this is where you shake hands with the business partners solidifying the requirements and agreed upon outcome.

Now that you have clearly defined requirements you are ready to begin the Design phase. In my next blog we will be discussing the importance of the Design phase and possible deliverables.