# Splitting the dataset in Machine Learning

Do you know how to split the [#datasets](https://www.linkedin.com/feed/hashtag/?keywords=datasets&highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7050464383780405248) [in Machi](https://www.linkedin.com/feed/hashtag/?keywords=datasets&highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7050464383780405248)ne Learning? 🤔

If you want to become a data scientist [👨‍💻, t](https://www.linkedin.com/feed/hashtag/?keywords=datasets&highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7050464383780405248)hen you must be good at working with datasets. Being familiar with the dataset is the first step towards building ML models.

Before working with the data from the [dataset i](https://www.linkedin.com/feed/hashtag/?keywords=datasets&highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7050464383780405248)t’s important to split the dataset into three parts.

* Training set: Used for training the mo[del.](https://www.linkedin.com/feed/hashtag/?keywords=datasets&highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7050464383780405248)
    
* [V](https://www.linkedin.com/feed/hashtag/?keywords=datasets&highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7050464383780405248)alidation set: Used to pick the best [model for](https://www.linkedin.com/feed/hashtag/?keywords=datasets&highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7050464383780405248) prediction. Once you have trained and tested various models you have to decide which model makes accurate predictions based on the validation set.
    
* Testing set: Once you have finalized t[he model](https://www.linkedin.com/feed/hashtag/?keywords=datasets&highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7050464383780405248) you can then test the model with a testing set.
    

The ratio in which you divide the data [depends](https://www.linkedin.com/feed/hashtag/?keywords=datasets&highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7050464383780405248) on the size of the dataset and there is no specified number on how to split the data. However, you can generally split the dataset into 60% training set, 20% validation set and 20% testing set.

That’s all about splitting the dataset [in Machi](https://www.linkedin.com/feed/hashtag/?keywords=datasets&highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7050464383780405248)ne Learning. In the next post, we will see how to explore the dataset with Pandas.
