# Linear Regression Models Study Guide

LINEAR REGRESSION MODELS W4315 HOMEWORK 2 ANSWERS February 15, 2010 Instructor: Frank Wood 1. (20 points) In the ? le ”problem1. txt”(accessible on professor’s website), there are 500 pairs of data, where the ? rst column is X and the second column is Y. The regression model is Y = ? 0 + ? 1 X + a. Draw 20 pairs of data randomly from this population of size 500. Use MATLAB to run a regression model speci? ed as above and keep record of the estimations of both ? 0 and ? 1 . Do this 200 times. Thus you will have 200 estimates of ? 0 and ? 1 . For each parameter, plot a histogram of the estimations. . The above 500 data are actually generated by the model Y = 3 + 1. 5X + , where ? N (0, 22 ). What is the exact distribution of the estimates of ? 0 and ? 1 ? c. Superimpose the curve of the estimates’ density functions from part b. onto the two histograms respectively. Is the histogram a close approximation of the curve? Answer: First, read the data into Matlab. pr1=textread(’problem1. txt’); V1=pr1(1:250,1); V2=pr1(1:250,2); T1=pr1(251:500,1); T2=pr1(251:500,2); X=[V1;V2]; Y=[T1;T2]; Randomly draw 20 pairs of (X,Y) from the original data set, calculate the coe? ients b0 and b1 and repeat the process for 200 times b0=zeros(200,1); b1=zeros(200,1); i=0 for i=1:200 indx=randsample(500,20); x=X(indx); 1 y=Y(indx); avg x = mean(x); avg y = mean(y); sxx = sum((x ? avg x). 2 ); sxy = sum((x ? avg x). ? (y ? avg y)); b1(i) = sxy/sxx; b0(i) = avg y ? b1(i) ? avg x; end; Draw histograms of the coe? cients b0 and b1 hist(b0) hist(b1) Figure 1: Histogram of b0 Figure 2: Histogram of b1 2 i b. As we have known, b1 = i i(Xi ? X)2 = i (Xii ? X)2i = i Ki Yi whereKi = Xi ? X? 2 ? ? i i i (Xi ? X) So, b1 is a linear combination of Yi .

Since Yi has a normal distribution, b1 also follows a normal distribution. E(b1 ) = i Ki E(Yi ) = i Ki (? 0 + ? 1 Xi ) = i Ki ? 0 + ( i Ki Xi )? 1 ? i (Xi ? X) =0 ? i Ki = (Xi ? X)2 i i i i i i i =1 ? 2 = ? 2 i Ki X i = i (Xi ? X) i (Xi ? X) E(b1 ) = 0 + 1 ? ?1 = ? 1 ? 2 V ar(b1 ) = (Xi ? X)2 (see the proof in homework 1 solution) ? i ? ? (X ? X)(Y ? Y ) ? (X ? X)Y ? ? (X ? X)X ? ? (X ? X)(X ? X) ? Therefore, b1 ? N (? 1 , (Xi ? X)2 ) ? i ? ? b1 X ? b0 = Y E(b0 ) = ? 0 2 1 V ar(b0 ) = ( n + (XXi X)2 )? 2 ? i? i 2 1 Therefore, b0 ? N (? 0 , ( n + (XXi X)2 )? 2 ) ? i? Since the data are generated by the model Y = 3 + 1. 5X + , where ? N (0, 22 ). ?0 = 3; ? 1 = 1. 5 and ? 2 = 4. The mean and variance of b0 and b1 can thus be determined. Calculate the variance of b0 and b1 in Matlab avg X = mean(X); avg Y = mean(Y); SXX = sum((X ? avg X). 2 ); SXY = sum((X ? avg X). ? (Y ? avg Y )); B1 = SXY /SXX; B0 = avg Y – b1*avg X; var B1=4/SXX var B0 = 4 ? (1/500 + ((avg X). 2 )/SXX) sd B0 = sqrt(var B0) sd B1 = sqrt(var B1) The results showed that V ar(b0 ) = 0. 0334; V ar(b1 ) = 9. 457E ? 004 The exact distribution of the estimates of ? 0 and ? is b0 ? N (3, 0. 0334); b1 ? N (1. 5, 9. 457E? 004) 2 c. We have obtained the estimates’ exact distribution in part(b), we can now plot the curve of their pdf function and compare them with the histograms. a = 0 : 0. 1 : 6; mu = 3; sigma = sd B0; pdfNormal = normpdf(a, mu, sigma); 3 [n, xout] = hist(b0); n = 6 ? n/200; bar(xout,n) hold on; plot(a, pdfNormal) hold o? xlabel(’b0’) ylabel(’6*Frequency’) Figure 3: Histogram and the pdf curve of b0 on the same plot b = 1 : 0. 1 : 2 mu = 1. 5; sigma = sd B1; pdfNormal = normpdf(b, mu, sigma); [n, xout] = hist(b1); n = 40 ? /200; bar(xout,n) hold on; plot(b, pdfNormal) hold o? xlabel(’b1’) ylabel(’40*Frequency’) 4 Figure 4: Histogram and pdf curve ofb1 on the same plot As we can see from Figure 3 and Figure 4, the shape of the histogram of the coe? cients obtained from the 200 times simulations is similar to that of the curve of the estimated distubtioin of the coe? cients. 2. (20 points) Use the same data set in the last problem, we will estimate ? 0 and ? 1 using Newton-Raphson method. a. Draw a 3d plot using MATLAB(check ”surf” command for example) to illustrate how the SSE varies according to di? rent combinations of estimates of ? 0 and ? 1 . So to speak, draw a 3d plot where x and y axes represent di? erent values of slope and intercept of the regression line respectively, while z axis is the SSE. b. Use Newton-Raphson method to minimize the SSE and give estimates of the parameters(slope and intercept) of the regression line. Give a geometrical interpretation of the method and explain how it works. Answer: a. Use the” surf” command in Matlab to draw the 3D plot z = zeros(61, 61); x = [0 : 0. 1 : 6]; y = [? 1. 5 : 0. 1 : 4. 5]; i = 0; j = 0; for i=1:61 for j=1:61 5 (i, j) = sum((Y ? x(j) ? y(i) ? X). 2 ); end end meshgrid(x,y,z) surf(x,y,z) Figure 5: 3D plot of SSE versus the slope and the intercept of the regression line b. Use Newton-Raphson method to minimize the function F (? ) = SSE and get the estimates of the parameters. Here we use the iterations HF (? n )(? n+1 ? ?n ) = F (? n ) where HF (? n ) is the Hessian matrix(second-order partial derivatives of the function SSE) and F (? n ) is gradient. ?2 i (Yi ? ?0 ? ?1 Xi ) 2n 2 i Xi ? 0 F = HF = ? = 2 ? 2 i ((Yi ? ?0 ? ?1 Xi )Xi ) 2 i Xi 2 i (Xi ) ? The Matlab code is: function[beta,SSE]= NR linear(data,beta start) x = data(:, 1); y = data(:, 2); n = length(x); di? =1;beta=beta start; while di? > 0. 0001 beta old=beta; J=[-2*sum(y-beta(1)-beta(2)*x);-2*sum((y-beta(1)-beta(2)*x). *x)] 6 H = [2 ? n, 2 ? sum(x); 2 ? sum(x), 2 ? sum(x. 2 )] H 1=inv(H); SSE = sum((y ? beta(1) ? beta(2) ? x. 2 ) beta=beta old – H 1*J di? =sum(abs(beta – beta old)); end hw1=[X,Y] beta0=[0;0] [betaml, sse] = N R linear(hw1, beta0) Using Newton Ralphson method, we got the same result with the least square method, b0 = 2. 725 b1 = 1. 5297. The geometric interpretation of Newton’s method is that at each iteration one approximates by a quadratic function around F(x), and then takes a step towards the maximum/minimum of that quadratic function. 3. (10 points) a. In simple linear regression setting y = ? 0 + ? 1 x + , write out the explicit form the error function. b. Prove this function is convex with respect to its variables(? 0 and ? 1 ). Answer: The error functionE = n (Yi ? ?0 ? ?1 ? Xi )2 i=1 To prove the error function is convex with respect to ? and ? 1 , we need to show that the Hessian matrix of the error function is postive- semide? nite. z1 Suppose we have a non zero vector Z = z2 2n 2 i Xi z1 z1 = 2nz1 + 2z2 i Xi 2z1 i Xi + 2z2 i Xi2 2 i Xi 2 i (Xi2 ) z2 z2 2 2 2 = 2nz1 + 4z1 z2 i Xi + 2z2 i Xi 2 2 = 2[nz1 + 2z1 z2 i Xi + z2 i Xi2 ] = 2[(z1 + Xi z2 )2 ] ? 0 for any non zero vector Z ? Rn The Hessian Matrix of the error function with respect to ? 0 and ? 1 is postive-semide? nite and therefore the error function is a convex function. Z T HZ = z1 z2 7