辅导Acsc/Stat 300、讲解data、辅导R编程语言、讲解R

2020.04.06 - 首页 >> Algorithm 算法

Lab 8 Assignment
The assignments must be in the correct format.
• All code you write in R must be compiled using R-Markdown with the output and answers
into MS-Word or PDF formats.
– The assignment must be done using R-Markdown. The output of this document will
include all the R code, R output and answers to the questions. Answers must be
written in full sentences.
Questions
1. This question is based on Question 6, Chapter 7, Page 299. Please modify as below.
(a) This question will perform polynomial regression to predict wage using age. Use
cross-validation to select the optimal degree d for the polynomial. What degree is
chosen and what is the test error obtained from cross validation?
Fit the model of degree d to the whole data and make a plot of the resulting polynomial
fit to the data.
(b) This question will look at a step function to predict wage using age. Use cross
validation to choose the optimal number of cuts. What is the number of cuts chosen
and what is the test error obtained from cross validation. For this assignment set the
maximum number of cuts to be 8.
Fit the model with the selected number of cuts to the whole data and plot the fit
obtained.
2. You are looking at two models in Question 1. Choose a preferred model from these two
and briefly discuss why it was chosen.
3. This question is based on Question 9, Chapter 7, Page 299.
(a) This question will look using a natural spline ns() to predict nox using dis. Use
cross-validation or another approach to select the best degrees of freedom for the
spline. What degrees of freedom were selected and what was the test error?
Fit a natural spline model with the select degrees of freedom on the whole data and
plot the resulting fit to the data.
(b) Fit a smoothing spline to the data using cross validation to select the degrees of
freedom. What degrees of freedom were selected? Plot the resulting fit on the data.
(c) Fit a local regression using the loess() function. Use the span of 0.375 to define
the neighbourhood (37.5%). Plot the resulting function on the data.
This just leaves generalized additive models. You can try this on your own time.