Skip to main content

Command Palette

Search for a command to run...

Splitting the dataset in Machine Learning

Updated
1 min read
Splitting the dataset in Machine Learning

Do you know how to split the #datasets in Machine Learning? 🤔

If you want to become a data scientist 👨‍💻, then you must be good at working with datasets. Being familiar with the dataset is the first step towards building ML models.

Before working with the data from the dataset it’s important to split the dataset into three parts.

  • Training set: Used for training the model.

  • Validation set: Used to pick the best model for prediction. Once you have trained and tested various models you have to decide which model makes accurate predictions based on the validation set.

  • Testing set: Once you have finalized the model you can then test the model with a testing set.

The ratio in which you divide the data depends on the size of the dataset and there is no specified number on how to split the data. However, you can generally split the dataset into 60% training set, 20% validation set and 20% testing set.

That’s all about splitting the dataset in Machine Learning. In the next post, we will see how to explore the dataset with Pandas.