Problem #4 (i) Drawbacks of Linear Regression 1. Assumption of Linear relationship between the independent and dependent variable. 2. To test the goodness of the regression, the data has to be Normally distributed. If not data has to be transformed to Normality, which are most often difficulty to achieve in real data sets. 3. Local nonlinearities cannot be modeled. 4. Few outliers can greatly influence the fit. -- (ii) PRESS is predictive error sum of squares. Regression model (or any model) are fit to data to be able to predict skilfully future observations. Thus, it is intuitive to fit the model parameters to achieve this. PRESS is a score function that provides a measure of the model in a predictive mode without actually having to fit the model in a predictive mode. Thus it is attractive relative to other measures such as R2, MSE etc. -- (iii) Steps for Bootstrap confidence interval. 1. Generate a bootstrap sample of the same size as the original data (when a data is selected at random both x (the independent) and y(the dependent) variables are selected together) 2. Linear regression is fit to the bootstrap sample 3. The value of the dependent variable is estimated using this regression at the desired x point(s). 4. Repeat steps 1-3 a number of times, say 500. Thus, obtaining 500 estimates of the dependent variable for each desired x point(s). 5. Select the 2.5th and 97.5th percentile from these 500 estimates, forming the 95% CI. The advantage of this is that Normality of errors need to be assumed and asymetric confidence intervals can be possible. ------------ (iv) 1. Generate bootstrap sample from X: x1, x2, x3, .., xN, of size N 2. Compute the mean 3. Repeat 1-2 a number of times, say 500. Thus, obtaining 500 estimates of mean. 4. Select the 2.5th and 97.5th percentile from these 500 estimates, forming the 95% CI of the true mean of X. 5. Repeat steps 1-4 for the sample Y: y1, y2, .., yN 6. Compare the CI of X and Y. If they overlap then they are significantly different at 95% confidence. If they don't then they are significantly different. ------- (v)Strength of Evidence. It is the measure of "beleivability" of the Null hypothesis from the data. IT is also the smallest level of significance that wouuld lead to rejection of the Null hypothesis with the given data.