# Hyperparameter tuning

## Plan for the lecture

  - Hyperparameter tuning in general
    - General pipeline
    - Manual and automatic tuning
    - What should we understand about hyperparameters?
  - Models, libraries and hyperparameter optimization
    - Tree-based models
    - Neural Networks
    - Linear models

### Plan for the lecture : models

  - [[Tree-based models]]
    - [[GBDT]]: [[XGBoost]], [[LightGBM]], [[CatBoost]]
    - [[RandomForest]] / [[ExtraTrees]]
  - [[Neural nets]]
    - [[Pytorch]], [[Tensorflow]], [[Keras]]...
  - [[Linear models]]
    - [[SVM]], logistic regression
    - Vowpal Wabbit, [[FTRL]]
  - Factorization Machines (out of scope)
    - [[libFM]], [[libFFM]]


#### What framework to use?

  - [[Keras]], [[Lasagne]]
  - [[TensorFlow]]
  - [[MxNet]]
  - [[PyTorch]]
  - sklearn's [[MLP]]
  - ...

They implement the same functionality! (except [[sklearn]])

  - I recommend:
    - [[PyTorch]]
    - [[Keras]]

### Tips

Don't spend too much time tuning hyperparameters

  - Only if you don't have any more ideas or you have spare computational resources

Be patient

  - It can take thousands of rounds for [[GBDT]] or [[nueral nets]] to fit

Average everything

  - Over random seed
  - Or over small deviations from optimal parameters
    - e.g. average _max\_depth=4,5,6_ for an optimal 5


## How do we tune hyperparameters

1. Select the most influential parameters
  - There are tons of parameters and we can't tune all of them

2. Understand, how exactly they influence the training

3. Tune them!
  - Manually (change the examine)
  - Automatically ([[hyperopt]], etc)

## Hyperparameter optimization software

A lot of libraries to try:

  - [[Hyperopt]]
  - [[Scikit-optimize]]
  - [[Spearmint]]
  - [[GPyOpt]]
  - [[RoBO]]
  - [[SMAC3]]

```
def xgb_score(param):
  # run XGBoost with arameters 'param'
  
def xgb_hyperopt():
  space = {
    'eta': 0.01,
    'max_depth': hp.quniform('max_depth', 10, 30, 1),
    'min_child_weight': hp.quniform('min_child_weight', 0, 100, 1),
    'subsample': hp.quniform('subsample', 0.1, 1.0, 0.1),
    'gamma': hp.quniform('gamma', 0.0, 30, 0.5),
    'colsample_bytree': hp.quniform('colsample_bytree', 0.1, 1.0, 0.1),
    'objective': 'reg:linear',
    'nthread': 28,
    'silent': 1,
    'num_round': 2500,
    'seed': 2441,
    'early_stopping_rounds': 100
  }
  
  best = fmin(xgb_score, space, algo=tpe.suggest, max_evals=1000)
```

## Color-coding legend

1. Underfitting (bad) (red)
2. Good fit and generalization (`good`)
3. Overfitting (bad) (green)

A parameter in red
  - Increasing it impedes fitting
  - Increase it to reduce overfitting
  - Decrease to allow model fit easier

A parameter in green
  - Increasing it leads to a better fit (overfit) on train set
  - Increase it, if model underfits
  - Decrease if overfits

{{https://i.imgur.com/Wl60v5Z.jpg}}

## Conclusion

Hyperparameter tuning in general

  - General pipeline
  - Manual and automatic tuning
  - What should we understand about hyperparameters?

Models, libraries and hyperparameter optimization

  - Tree-based models
  - Neural networks
  - Linear models


## Ref

- https://www.coursera.org/learn/competitive-data-science/lecture/Hg3xw/hyperparameter-tuning-iii