# Problems occurring during validation

## Validation stage

Causes of different scores and optimal parameters

1. Too little data
2. Too diverse and inconsistent data

We should do extensive validation

1. Average scores from different KFold splits
2. Tune model on one split, evaluate score on the other

## Submission stage

We can observe that:

- LB score is consistently higher/lower that validation score
- LB score is not correlate with validation score at all


## Expect LB shuffle because of

- Randomness
- Little amount of data
- Different public/private distributions