程序代写代做代考 flex When Models Meet Data 2 – cscodehelp代写

When Models Meet Data 2
Australian National University

Overfitting
• The aim of a machine learning predictor is to perform well on
unseen data.
• We simulate the unseen data by holding out a proportion of the
whole dataset.
• This hold out set is called test set.
• In practice, we split data into a training set and a test set.
• Training set: fit the model
• Test set: not seen during training, used to evaluate generalization performance
• It is important for the user to not cycle back to a new round of training after having observed the test set.

• Empirical risk minimization can lead to overfitting.
• the predictor fits too closely to the training data and does not
generalize well to new data
𝑦
𝑦 𝑥𝑥
This simple model fits the training data less well.
A larger empirical risk.
A good machine learning model.
This complex model fits the training data very well.
A very small empirical risk.
A poor machine learning model due to overfitting.
regression

• Empirical risk minimization can lead to overfitting.
• the predictor fits too closely to the training data and does not
generalize well to new data A good model
A poor model classification

8.2.3 Regularization to Reduce Overfitting • When overfitting happens, we have
• very small average loss on the training set but large average loss on the test set
• Given a predictor 𝑓, overfitting occurs when
• the risk estimate from the training data 𝐑%&’ 𝑓, 𝑿*+,-., 𝒚*+,-.
underestimates the expected risk 𝐑*+0% 𝑓 . In other words,
• 𝐑%&’ 𝑓, 𝑿*+,-., 𝒚*+,-. is much smaller than 𝐑*+0% 𝑓 which is estimated
using𝐑%&’ 𝑓,𝑿*%1*,𝒚*%1*
• Overfitting occurs usually when
• we have little data and a complex hypothesis class

• How to prevent overfitting?
• We can bias the search for the minimizer of empirical risk by introducing a penalty term
• The penalty term makes it harder for the optimizer to return an overly flexible predictor
• The penalty term is called regularization.
• Regularization is an approach that discourages complex or extreme solutions to an optimization problem.

• Example
• Least-squares problem
min98 𝒚−𝑿𝜽 𝜽
• To regularize this formulation, we add a penalty term
min98 𝒚−𝑿𝜽3+𝜆𝜽3 𝜽
• The addition term 𝜽 3 is called the regularizer or penalty term, and the parameter regularizer 𝜆 is the regularization parameter.
• 𝜆 enables a trade-off between minimizing the loss on the training set and the amplitude of the parameters 𝜽
• It often happens that the amplitude of the parameters in 𝜽 becomes relatively large if we run into overfitting
• 𝜆 is a hyperparameter
3

8.2.4 Cross-Validation to Assess the Generalization Performance
• We mentioned that we split a dataset into a training set and a test set
• we measure generalization error by applying the predictor on test data.
• This data is also sometimes referred to as the validation set.
• Validation set is from the entire data, and has no overlap with
the training data.
• We want the training set to be large
• That leaves the validation set small
• A small validation set makes
the result less stable (large variances)

• Basically, we want the training set to be large • We want the validation to be large, too
• How to solve these contradictory objectives? • Cross-validation: 𝐾-fold cross-validation
Example: 𝐾 = 5

Cross-validation
• 𝐾-fold cross-validation partitions the data into 𝐾 chunks
• 𝐾 − 1 trunks form the training set R
• The last trunk is the validation set 𝒱
• This procedure is repeated for all 𝐾 choices for the validation set, and the performance of the model from the 𝐾 runs is averaged
Example: 𝐾 = 5

Cross-validation
• Formally, we partition our training set into two sets 𝒟 = R ∪ 𝒱,
such that they do not overlap, i.e., R ∩ 𝒱 = 𝜙
• We train on our model on R (training set)
• We evaluate our model on 𝒱 (validation set)
• We have 𝐾 partitions. In each partition 𝑘:
• Training set R G produces a predictor 𝑓 G
• 𝑓 G is applied to validation set 𝒱 G to compute the empirical risk R𝑓G,𝒱G
• All the empirical risks are averaged to approximate the expected generalization error O
𝔼J𝑅𝑓,𝒱 ≈𝐾1MR𝑓G,𝒱G GN8

Cross-validation – some understandings
• The training set is limited — not producing the best 𝑓 G
• The testing set is limited – producing an inaccurate estimation of
R𝑓G,𝒱G
• After averaging, the results are stable and indicative
• An extreme: leave-one-out cross-validation, where the validation set only contains one example.
• A potential drawback – computation cost
• The training can be time-consuming
• If the model has several parameters to tune, it is hard to evaluate those hyperparameters.
• This problem can be solved by parallel computing, given enough computational resources

Check your understanding
• When your model works poorly on the training set, your model will also work poorly on the test set.
• When your model works poorly on the training set, your model may also have overfitting.
• Overfitting happens when your model is too complex given your training data.
• Regularization alleviates overfitting by improving the complexity of your training data.
• In 𝐾-fold cross-validation, we will get more stable test accuracy if 𝐾 increases.
• In 2-fold cross-validation, you can obtain 2 results from the 2 test sets, and they may differ a lot with each other.

Leave a Reply

Your email address will not be published. Required fields are marked *