代写代考 Nonlinear Econometrics for finance HOMEWORK 1: solutions – cscodehelp代写
Nonlinear Econometrics for finance HOMEWORK 1: solutions
(Review of linear econometrics and conditional expectations)
Problem 1. Real estate is a key asset. Investing in real estate represents the biggest investment decision for most households over their lifetimes. A real estate company in Baltimore wants to estimate a model to relate the house prices to several characteristics of the house. The data come from Zillow and consists of a sample of houses in the Baltimore area for the year 2014. The data are contained in the file housing data.xslx and provide the following information:
1. Zillow id of the house (id) 2. price in dollars (price)
Copyright By cscodehelp代写 加微信 cscodehelp
3. street address (street)
4. postal code (zip)
5. year the house was built (yearBuilt)
6. size of the house measured in square feet (sqft) 7. number of bathrooms (bathrooms)
8. number of bedrooms (bedrooms).
Given this information, you need to analyze house prices using Matlab. 1
1. Generate a histogram of the house prices and compute descriptive statistics (mean, median, variance, standard deviation, minimum, maximum). What do you notice?
Answer. The histogram of the prices is shown in Figure 1 (top panel). The descriptive statistics are shown in the table below:
Variance 22002505131
From the figure, we observe that there are a few outliers (in the right tail) and the distribution of the prices is not symmetric.
2. Now, take a log transformation of the house prices. Plot the histogram of the log-prices. What do you notice?
Answer. The distribution of the log-prices is considerably more bell-shaped. In the regression below, we will therefore assume that the errors are normal. One thing that we should take into account when moving from prices to log-prices is that the regression coefficients should be interpreted differently (see the response to Question 4).
3. Run a regression of the log-prices on the explanatory variables:
log(pricei) = β0 + β1agei + β2sizei + β3bathroomsi + β4bedroomsi + εi
where εi is an error term.
Answer. The following table reports the estimated coefficients, the standard errors, the t statistics (for testing H0 : βj = 0 ∀j) and the p-values (again, for testing H0 : βj = 0 ∀j):
Figure 1: Histograms of prices and log-prices
βˆ se(βˆ) 11.224 0.088575
bathrooms 0.35332 0.027232 bedrooms -0.26537 0.023925
t stats p-values 126.71 0 2.3946 0.016906 11.534 0 12.974 0
0.0013385 0.00055896 0.00050645 0.0000439
4. Interpret the coefficients of the regression above. What does the model say about the house prices?
Answer. The dependent variable is in logs. This means that each coefficient should be interpreted as the percentage change in price if we change the corresponding regressor by one unit, keeping everything else fixed. Let us focus on age, for example, but the same logic applies to all regressors. Because
log(pricei) = β0 + β1agei + β2sizei + β3bathroomsi + β4bedroomsi + εi, we have that
∂ log(price ) ∂ log(pricei) ∂pricei i =β1 ⇒ ∂pricei
think of it as the return on the house if we keep it for another year and sell it next year, without doing any renovation or changes (everything else being the same).
As most realtors would suggest, there is value in adding one bathroom to your house. Indeed, our estimates suggest that if we add one bathroom, the price will go up by an average of 35%. And, of course, bigger houses are more expensive.
5. Why do you think the number of bedrooms has a negative effect on the log-price?
Answer. A possible explanation is that everything else equal (so, given the overall size of the house), an additional bedroom implies that each bedroom is smaller in size.
=β1 ⇒ pricei =β1,
∂log(pricei) = 1/pricei). Thus, β1 represents the percentage price change when we
∂agei ∂agei
since the derivative of the log-price with respect to price is one over the price (i.e.,
increase the age of the house by one year, keeping everything else the same. One can
The living areas could be smaller too, which may also negatively affect the overall value of the house.
6. We want to test whether the effect of age on log-prices is statistically significant. What test would you use? Compute the test statistic and interpret the result.
Answer. The hypothesis test has the null H0 : β1 = 0 and the alternative Ha : β1 ̸= 0. The test statistic for this hypothesis is
βˆ1−0→d tn−k, (1) se(βˆ1 )
where tn−k is the t distribution with n − k degrees of freedom, n is the number of observations and k is the number of regressors. In our model, k = 5 (I am including the column of ones in the regressors because it is associated with the intercept). At the 5% level, we reject if the t statistic is larger than (about) 2 in absolute value. Similarly, we reject if the p-value of the test is smaller than 5%. The t statistic is 2.3946. It is larger than 2, so we reject the null hypothesis. The p-value of the test is 0.0169 which, once more, leads to rejection. Thus, βˆ1 is “statistically significant” (which means that it is “statistically different from zero”). Said differently, age seems to matter as a predictor.
7. Test whether age and size are jointly different than zero.
Answer. This is a test of the following joint hypothesis: H0 :β1 =0andβ2 =0
against the alternative hypothesis
Ha :eitherβ1 ̸=0orβ2 ̸=0.
We cannot use a t statistic for this test. The reason is that we are testing 2 linear restrictions, not just one. We can, however, use an F test. We set up the null hypothesis in the following way. Let R be the matrix
0 1 0 0 0 00100
and let γ = [0 0]⊤. Thus, we can write the null hypothesis as β0
The test statistics for this test is
0 1 0 0 0β1 0 β2 = .
0 0 1 0 0 β3 0
σ−2(Rβ − γ)⊤ R(X⊤X)−1R⊤ ⊤
(Rβ − γ)/q (Rβ−γ)/q→Fq,n−k.
= σ (Rβ−γ) R(X X) R
−2 ⊤ ⊤ −1 ⊤−1
(See lecture notes on linear econometrics, Chapter 1, in OneDrive.) The critical value (for a 5% test) is computed according to an Fq,n−k distribution. Here, q is the number of linear restrictions, in our case q = 2, and k is the number of regressors, in our case k = 5. The critical value is, therefore, 3.009.
The F test is one-sided. We obtain a value for the test statistic of 68.1901 > 3.009 and a p-value very close to zero. Thus, we reject the null hypothesis at any meaningful significance level (1% or 5%) since p-value< 1%.
8. Test whether β1 = 2β2.
Answer. This is a single linear restriction on the coefficients. We can indeed write the null hypothesis as H0 : β1 −2β2 = 0 and the alternative as Ha : β1 −2β2 ̸= 0. In matrix notation, we define the vector c = [0 1 −2 0 0]⊤ and the scalar γ = 0, and write the null hypothesis as
c⊤β=γ⇔0 1 −2 0 0β2=0. (2)
β 3 β4
Because there is only one restriction, we could test the null hypothesis by using a classical t statistic (see lecture notes on linear econometrics, Chapter 1, in OneDrive):
⊤ ⊤ −1 → tn−k.
σ c ( X X ) c
We obtain a t value of 0.58 and a p-value 0.562. Thus, we fail to reject the null hypothesis at the 5% significance level (because 0.58 < 2 and, similarly, because p- value = 0.562 > 0.05). The restriction is supported by the data.
Alternatively, we could use a one-sided F test:
σ−2(c⊤β − γ)⊤ c⊤(X⊤X)−1c−1 (c⊤β − γ)/1 ⊤
= σ (c β−γ) c (X X) c (c β−γ)/1→F1,n−k.
ε ε σ2(n−k)
−2 ⊤ ⊤⊤ ⊤ −1−1 ⊤ d
In this case, we obtain a value of 0.336 (with a critical value of 3.855) and a p-value 0.56. We note that the F statistic is the same as the t statistic raised to the power 2. In fact, 0.582 = 0.3364. Again, we fail to reject the null hypothesis at the 5% significance level.
9. Using your model, predict the price of a house with 3 bedrooms, 3 bathrooms, size of 3500 square feet and built in 1985. Explain how you compute your prediction.
Answer. To predict the log price we just plug the values of the regressors in the estimated model and obtain
log(pricei) = β0 + β1Agei + β2sizei + β3bathroomsi + β4bedroomsi (3)
where the βˆ for k = 0, 1, 2, 3, 4 are estimated by least squares. One may be tempted k
to predict the prices in levels (pricei) as
price = elog(pricei) = e[β0+β1Agei+β2sizei+β3bathroomsi+β4bedroomsi] (4)
but this is not completely exact (even though it will not be penalized). If we are interested in predicting price levels, we need to apply a simple correction based on the
following result.
Aside. If X is a normal random variable, X →d N(μ,σ2), and we define another random variable Y = eX, then Y is a log-normal random variable. Specifically,
Xd μ+σ2 σ2 2μ+σ2 Y =e →logN(m,v)withmeanm=e 2 andvariancev=[e −1]e .
You will see the same result in derivatives in the context of the Black and Scholes op- tion pricing model.
Since our error terms look normal (after the log transformation), then log(pricei) →d N(xiβ,σ2). So, we can assume that the prices are log-normal, that is
pricei = elog(pricei) →d log N(mi, vi),
where, as before in the aside, the letters m and v stand for the mean and variance of
the log-normal distribution. In our case, the mean mi is mi = xiβ + σ2 . Thus, our 2
prediction for the price of the house is
ˆˆˆˆ ˆ 1ε⊤ε price = eβ0+β1Agei+β2sizei+β3bathroomsi+β4bedroomsi+ ,
Using the numbers in Table 1, we obtain a predicted price of 679113 (the prediction for log-prices is 13.2989).
Problem 2. Suppose you have two fair coins, so that for each coin there is a 50% chance of heads or tails. Consider tossing each coin and let X1 denote the random variable that describes the toss of the first coin. X2 is the random variable which describes the second coin toss. If we code heads as 1 and tails as 0, the random variables X1 and X2 are Bernoullis with probability of success 0.5.
P(X1 =1)=0.5; P(X1 =0)=0.5; P(X2 =1)=0.5; P(X2 =0)=0.5.
We are interested in the random variable that describes the number of heads when we toss both coins. We will call it Y = X1 + X2.
where we have substituted the value σ2 in the formula for the mean of the log-normal
2 ε⊤ε randomvariablewithitsestimatorσ =n−k.
1. Compute E(Y ). Show your math.
Answer. The random variables X1 and X2 can only take on values 0 or 1 and, by the very nature of this experiment, they are independent. Their sum Y can only take on the values 0, 1, 2. The value of 0 occurs when both X1 and X2 are 0, which happens with probability 0.5 × 0.5 = 0.25. Analogously, the value 2 occurs when both X1 and X2 are equal to 1, which has a probability equal to 0.5 × 0.5 = 0.25 of occurring. Therefore, the value of 1 realizes when either X1 = 0 and X2 = 1 or when X1 = 1 and X2 = 0. This will occur with probability 1 − 0.25 − 0.25 = 0.5. Thus, the random variable Y has the following probability distribution (outcomes and probabilities):
y P(y) 0 0.25 1 0.50 2 0.25
The expected value of Y is easily computed as
E(Y) = p(0)·0+p(1)·1+p(2)·2
= 0.25·0+0.5·1+0.25·2 = 1.
2. Compute E(Y |X1 = 1)
Answer. If we already know that the first coin toss was 1, then it is easier to predict
Y . Let us write the expected value first:
E(Y|X1 =1) = P(Y =0|X1 =1)·0 + P(Y =1|X1 =1)·1 + P(Y =2|X1 =1)·2
Clearly, the probability P(Y = 0|X1 = 1) = 0. We cannot get 0 from the sum if the first toss is a one. What about the other probabilities?
P(Y =1|X1 =1) = P(X1 +X2 =1|X1 =1) = P(X2=0)
P(Y =2|X1 =1) = P(X1 +X2 =2|X1 =1) = P(X2=1).
Thus, we end up with a conditional expected value of
E(Y|X1 =1) = 0.5·1+0.5·2=1.5
3. Verify the Law of Iterated Expectations (LIE). In particular, show that for this example
it is true that E(Y ) = E[E(Y |X1)].
Answer. We need to first compute E(Y |X1 = 0). Following the same steps as in the
previous question, we get
E(Y|X1 =0) = p(Y =0|X1 =0)·0 + p(Y =1|X1 =0)·1 + p(Y =2|X1 =0)·2
= 0.5·0+0.5·1+0·2=0.5 We can now compute the expected value E[E(Y |X1)] as
E[E(Y|X1)] = P(X1 = 0)E(Y|X1 = 0)+P(X1 = 1)E(Y|X1 = 1) = 0.5·0.5+0.5·1.5=1.
Thus, we have proven – numerically – that E(Y ) = E[E(Y |X1)] = 1.
Problem 3. A financial analyst wants to predict the return on a portfolio. The portfolio gives a return of either 1 or 3 percent in each period. She knows that the joint probability of returns at time t and t+1 is
Given this data, please address the following questions:
1. Compute the unconditional expected value E(rt). Show your math and derivations.
Answer. From the table of joint probabilities (below), we can easily derive all the marginal probabilities. For example, we know that:
P(rt =1) = P(rt+1 =1,rt =1)+P(rt+1 =3,rt =1)=0.2+0.4=0.6 (6) P(rt =3) = P(rt+1 =1,rt =3)+P(rt+1 =3,rt =3)=0.3+0.1=0.4. (7)
We can do the same for rt+1 and obtain the following table with joint and marginal probabilities (in red):
1 0.2 0.3 0.5 rt+1
3 0.4 0.1 0.5 0.6 0.4
In order to calculate the expected value, we just compute
E(rt)=P(rt =1)·1+P(rt =3)·3=0.6·1+0.4·3=1.8. (9)
2. Compute the unconditional expected value E(rt+1). Show your math and derivations. Answer. Using the table above, similarly to rt, write
E(rt+1) = 0.5 · 1 + 0.5 · 3 = 2.
3. Compute the conditional expected value Et(rt+1) = E(rt+1|rt). Show your math and
derivations.
Answer. We use the joint and marginal distributions to compute the conditional probablities.
E(rt+1|rt =1) = 1·P(rt+1 =1|rt =1)+3·P(rt+1 =3|rt =1) = 1·P(rt+1 =1,rt =1)+3·P(rt+1 =3,rt =1)
P(rt = 1) P(rt = 1)
= 1· 0.2 +3· 0.4
0.6 = 2.33333.
1 2 3 4 5 6 7 8 9
10 11 12 13 14
4. Verify the law of iterated expectation for this example. In particular, show that E(rt+1) = E[E(rt+1|rt)]. Show your math and derivations.
Answer. Using the previous derivations, we can show that
E[E(rt+1|rt)] = P(rt = 1) · E(rt+1|rt = 1) + P(rt = 3) · E(rt+1|rt = 3)
= 0.6·2.33333+0.4·1.5=2. Thus, again, we have shown numerically that
E(rt+1) = E[E(rt+1|rt)] = 2. Matlab Code for Problem 1
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Solutions : First question of the first assignment %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
E(rt+1|rt =3) = =
1·P(rt+1 =1|rt =3)+3·P(rt+1 =3|rt =3) 1·P(rt+1 =1,rt =3)+3·P(rt+1 =3,rt =3)
P(rt = 3) P(rt = 3) 1· 0.3 +3· 0.1
0.4 0.4 1.5.
%Let ’ s clean the environment
close all;
clear variables ; clc;
%Let us load the data
data =xlsread(’housing data.xlsx’); %%%%%%%%%%%%%%%%%%%
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
% QUESTION 1 and 2 %%%%%%%%%%%%%%%%%%%
%The price is the second column of our data
y = data(:,2);
%Computing descriptive statistics
fprintf(’ −−−−−− Some descriptive statistics for the prices −−−−
’) fprintf(’The mean is $%4.2f
’, mean(y))
fprintf(’The median is $%4.2f
’, median(y))
fprintf(’The standard deviation is $%4.2f
’, std(y))
fprintf(’The variance is %4.2f
’, var(y)) fprintf(’The minimum is $%4.2f
’, min(y)) fprintf(’The maximum is $%4.2f
’, max(y))
%Here is another way to plot the results
f p r i n t f ( ’−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
’ ) desc stats = table(mean(y) ,median(y) ,std(y) , var(y) , min(y) , max(y) , …
’VariableNames’,{’mean’ ’median’ ’std’ ’var’ ’minimum’ ’ maximum ’ } ) ;
disp(desc stats);
f p r i n t f ( ’−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
’ )
subplot (2 ,1 ,1) ; histogram (y)
title ( ’Histogram of the
subplot (2 ,1 ,2) ;
histogram ( log (y) )
title ( ’Histogram of the log
%%%%%%%%%%%%%%%%%%% % QUESTION 3 %%%%%%%%%%%%%%%%%%%
%Compute age
age = 2014 − data(: ,5) ;
%The other regressors
size = data(:,6);
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
88 89 90 91 92 93 94 95 96
bathrooms = data (: ,7) ; bedrooms = data (: ,8) ;
%The sample size
n = length(age); %The regressand
%Let us create a matrix X of regressors , including a constant
X = cat(2, ones(n,1) , age, size , bathrooms, bedrooms); %The OLS estimate
beta hat = inv(X’∗X)∗(X’∗Y);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Inference in the regression model (explicit step−by−step code) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Let us compute predicted values Y hat
Y hat = X∗beta hat;
%Let us compute residuals
eps hat = Y − Y hat;
%The estimated variance of the residuals (sigma2 hat)
sigma2 hat = (eps hat ’∗eps hat)/(n − 5); %The variance−covariance matrix of beta hat
varcov beta = sigma2 hat∗inv(X’∗X);
%The variances of the slope estimates are on the diagonal of the variance/ covariance matrix
var beta = diag(varcov beta);
%The standard errors are the square roots of estimated variances
stderror beta = sqrt(var beta);
%The t−statistics (for testing H0: beta j = 0)
t stats = beta hat./stderror beta;
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− 14
100 101 102 103 104 105 106
111 112 113 114 115 116 117
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
% Testing the null H0: beta j = 0 for j=0,1,…,4
% It is sufficient for us to look at the t statistics .
% If |t statistics|>2, reject the null
%We could also do it with p−values.
% If p−value < 5%, reject the null
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
p value0 = 2∗(1 − tcdf(abs(t stats),n−5)); %I am selecting the area in both tails given where the t stats falls
%The classical regression output (estimates , standard errors , t stats (for H0: beta j = 0) and p values (for H0: beta j = 0)
reg output = table(beta hat,stderror beta,t stats, p value0, ...
’VariableNames ’ ,{ ’estimates ’ ’standard errors ’ ’t statistics
’ ’p values’});
disp(reg output);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Inference in the regression model (using a regression function in Matlab) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
X1 = cat(2,age,size ,bathrooms,bedrooms); % Notice that I am excluding the intercept because it is added by the function fitlm below
reg = fitlm (X1,Y) ; % You can run fitlm to check if your manual regression (above) was well−designed.
disp(reg);
%%%%%%%%%%%%%%
% QUESTION 4
%%%%%%%%%%%%%%%
% see comments in the solutions
%%%%%%%%%%%%%%%
% QUESTION 5
%%%%%%%%%%%%%%%
% see comments in the solutions
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165
169 170 171 172
%%%%%%%%%%%%%%%% % QUESTION 6 %%%%%%%%%%%%%%%%
% The test statistic is a t test :
t stat age = beta hat(2)/stderror beta(2);
%Thep−valueis
p value0 age = 2∗(1 − cdf(’T’,t stat age ,n−5));
fprintf(’ −−− Testing if age is significant −−−−−−−−−−
’) fprintf(’The t−statistic is %4.3f
’, t stat age) fprintf(’The p−value is %4.3f
’, p value0 age)
f p r i n t f ( ’−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
’ )
%%%%%%%%%%%%%%% % QUESTION 7 %%%%%%%%%%%%%%%
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− % Testing the multiple linear restriction : H0: beta 1 = 0 and beta 2=0 % −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
% We write a matrix R containing q = 2 rows (# of restrictions )
% and k = 5 columns (# of parameters); We also write a vector gamma, % with 2 rows equal to zero (the values of the linear restrictions).
R=[01000;00100]; gamma=[00]’;
% The test statistic is an F test (remember to divide by q = 2):
F test = sigma2 hatˆ(−1)∗(R∗beta hat−gamma) ’∗inv(R∗(inv(X’∗X)∗R’) )∗(R∗beta hat −gamma)/2;
c value = icdf ( ’F’ ,0.95 ,2 ,n−5); value for a 5% level test
%Thep−valueis
p value0 F = 1 − cdf(’F’,F test,2,n−5);
value is one−sided
% This is calculating the critical
% Notice that the p−
fprintf(’ −−− Testing if age and size are jointly significant −−
’) fprintf(’The F−statistic is %4.3f
’, F test)
fprintf(’The critical value is %4.3f
’, c value)
173 fprintf(’The p−value is %4.3f
’, p value0 F)
174 fprintf(’−−−−−−−−−
程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com