They are different but interlinked together
In this post, I would like to talk briefly about data analysis and data modelling because of two reasons. Firstly, I have written several posts concerning data analysis in a very random way, so it is necessary to talk about the main processes of the data analysis in systematic way. Secondly, I am planning to writing about some topics about modelling sooner.
1. Data Analysis and Data Modelling
Data analysis and data modelling are two concepts, but usually interlinked together. In some sense, they mean more or less the similar thing.
(1) What is data analysis
In general, data analysis is:
- a process of collecting, transforming, cleaning, and modelling data
- to recover the useful information (Data mining)
(2) What is data modelling
In terms of modelling, it refers to
- the process of analyzing and organizing the data elements
- to find how the data elements relate to one another
2. Why Data Analysis and Data Modelling
The main reason includes, but not limits to, the following points:
- Explore the useful patterns and information in the data
- Understand developing direction and objectives
- Discover the causes of certain events based on data findings
- Understand problems facing
- Present technical insights
- Drive effective decision-making
- Increase productivity
3. Examples of Data Analysis and Data Modelling
There are the 4 types of data analysis:
- Descriptive Analysis: investigates the past data and finds what happened in the past.
- Diagnostic Analysis: determines the reasons why that happened.
- Predictive Analysis: predicts what would be likely to happen in the future (i.e. under past and current development trend).
- Prescriptive Analysis: uses the information discovered from the previous 3 types of data analysis and provides a series of actions or strategies for different situations and scenarios (i.e. what-if question).
All these types of analysis typically rely on developing one or more models based on the data.
4. Process of Data Analysis and Modelling
The main processes include more or less the following aspects, of which many topics have been touched in the previous posts. You can reach them through the links in the text below.
- Determine Objective: goals, reasons
- Data Preparation: any processing of collecting, gathering, combining, structuring and organizing data for further analysis
- Data Exploration: discover the useful information or/and problems through different operation, like missing values and outliers detection, data slicing, sorting, filtering, grouping, data visualization, etc.
- Data preprocessing: clean data (such as missing values imputation, outliers treatment), integrate data, reduce data, transform data, encode dataset, find relations between data variables, normalizing data, split the data, etc.
- Data Modelling: estimate or train model, validate it and test it, evaluate it, visualize the results, etc.
- Model Deployment: apply it for prediction, strategy making under different scenarios, etc.
- Result Communication: Report the results, publish the results, etc.
5. Online Course
If you are interested in learning essentials of Python data analysis and modelling in details, you are welcome to enroll one of my courses:
Master Python Data Analysis and Modelling Essentials