In the previous blog, I mentioned that I tried k-fold cross-validation with 10 folds.
Here a question arises why 10 folds and not 4, 3, or 5 folds and does that even matter how many folds you take?
The simple answer is no. It should not matter how many folds you want because it simply doesn’t matter.
It is not a rule or norm that you necessarily have to take 5 folds or 10 folds (like in our case). You can take as many folds as you require.
K-fold is just a way to solve a problem where the training data is limited. It just depends upon what we require for the project at that time.
Now we will understand training error and test error.
-
Training error
In simple words when we apply the model to the data, we trained the model on then it means that we are simply calculating the training error.
-
Testing error
In simple words if we apply the model to the data which was unknown when we were training the model (which is the test data, then we are simply calculating the test error.
One more important aspect the professor told us in class today that when we train the model with training data and then test the model on the testing data then at the time of testing if there is more than one similar value is present there then it will not consider them separate and this will affect the accuracy of the model.
In this case, the accuracy that we were getting previously was around 33 % which is questionable.
So, to overcome this what we can do is we can label every data to give a unique identification.
For example, we have data.
[x1, y1.z1] and [x2, y2, z2]
Then what we can do is we can add numbers to this data to give them a unique identification
[1, x1, y1.z1] and [2, x2, y2,z2]
For today I must look at the data and label the data in the hope of getting better accuracy for the model