7 Steps To Quick & Effective Data Preprocessing In Machine Learning
Streamlining the Path from Raw Data to Machine Learning Models
6 min readMay 29, 2023
Have you ever tried to build a house of cards? It takes patience, a delicate touch, and, above all, a level and stable foundation. Without that firm base, your painstakingly constructed edifice is doomed to collapse.
Much like that house of cards, a Machine Learning (ML) model is only as good as the data it’s built upon. In the vast, chaotic universe of raw data, it is the art and science of data preprocessing that ensures we’re not just building castles on sand.
We’ll explore:
- Data Collection: Where does our data come from, and how do we gather it?
- Data Cleaning: How do we tidy up our data, dealing with missing values, duplicates, and outliers?
- Data Transformation: How do we mold our data into a form suitable for Machine Learning algorithms?
- Data Integration: How do we bring together diverse data sources into a coherent whole?
- Data Reduction: How do we distill our data, keeping what’s important and discarding the rest?
- Data Discretization: How do we convert continuous data into discrete buckets when necessary?
- Data Splitting: How do we divide our data into training and testing sets?