计算机代考 Machine Learning and Data Mining in Business – cscodehelp代写
Machine Learning and Data Mining in Business
Week 6 Tutorial
When studying these exercises, please keep in mind that they are about problem-solving techniques for machine learning. In general, they’re not about particular distributions or learning algorithms.
Question 1
Copyright By cscodehelp代写 加微信 cscodehelp
Let Y1, Y2, . . . , Yn mass function
∼ Poisson(λ).
Recall that the Poisson distribution has probability
e−λ λy p(y;λ)= y! .
(a) Write down the likelihood for a sample y1, . . . , yn.
(b) Derive a simple expression for the log-likelihood.
(c) Let the objective function for optimisation be the negative log-likelihood. Find the critical point of the cost function.
(d) Show that the critical point is the MLE.
(e) You can create as many additional exercises of this type as you like by picking any simple statistical distribution and answering the same questions.
Question 2
In addition to being good practice, this exercise derives results that will be very useful later.
Consider the model Y1, Y2, . . . , Yn ∼ Bernoulli σ(β) , where β ∈ R is a parameter and σ is the sigmoid function
σ(β) = 1 . 1 + exp(−β)
You can think of this model as a logistic regression that only has the intercept.
Following the lecture, the optimisation problem for estimating this model is
n minimise −yilog σ(β) −(1−yi)log 1−σ(β) ,
(a) Differentiate σ(β).
(b) Show that σ′(β) = σ(β)(1 − σ(β)).
(c) Find the derivative of J(β) using the chain rule and the previous result.
(d) Find the critical point of J(β).
(e) What is the second derivative of the cost function? Show that the objective function is convex.
Question 3
Suppport vector machines (SVMs) were a major development in machine learning in the mid-1990s due to their state-of-art performance and novelty at the time. Since then, researchers have discovered that support vector machines can be reformulated as regularised estimation, establishing a deep connection to classical methods such as logistic regression.
In suppport vector classification (SVC), we consider a binary classification problem and encode the response as y ∈ {−1, 1}. The method is based on the linear decision function
and classification rule
f(x)=β0 +β1×1 +…+βpxp
y=sign f(x) ,
which means that y = 1 if f(x) > 0 and y = −1 if f(x) < 0.
The set {x : f (x) = 0} is the decision boundary. Thus, we can view |f (x)| as a measure of the learning algorithm’s confidence that the observation is correctly classified.
The support vector classifier learns the coefficients β0, β1, . . . , βp by regularised empirical risk minimisation based on the hinge loss
L y,f(x) =max 0,1−yf(x) . Page 2
This figure from the ISL textbook plots the hinge loss and the cross-entropy loss (neg- ative log-likelihood loss) for y = 1. The figure calls the latter the logistic regression loss because in this formulation, the prediction f(x) in the loss function L(y,f(x)) is a prediction for the logit of the probability.
Logistic Regression Loss
−6 −4 −2 0 2
yi(β0 +β1xi1 +...+βpxip)
(a) Write down the learning rule for a support vector classifier based on l2 regularisa-
(b) Consider the term yf(x) from the hinge loss. What is the classification when
yif(xi) > 0 compared to yif(xi) < 0?
(c) Intepret the hinge loss function by considering the following cases:
1. yf(x)>1
2. 0