代写Advanced Statistical Learning Assignment 1帮做Python语言
- 首页 >> Java编程Advanced Statistical Learning
Assignment 1 (Due Date is 16:00 8th August 2025)
The data comes from a study of prostate cancer. The data information for the file "prostate.xls" is listed as follows.
• Predictors (columns 2—9)
– lcavol: log(cancer volume)
– lweight: log(prostate weight)
– age
– lbph: the logarithm of the amount of benign prostatic hyperplasia
– svi: seminal vesicle invasion
– lcp: log(capsular penetration)
– gleason: gleason score
– pgg45: percentage Gleason score 4 or 5
• Response variable (column 10)
– lpsa: the logarithm of prostate-specific antigen
• train/test indicator (column 11)
– This last column indicates which 67 observations were used as the "training data set" and which 30 observations as the "test data set". "T" indicates training data while "F" means test data.
Question 1 [20 marks]: Conduct principal component analysis (PCA) for the 8 pre- dictors with all the training data. Choose the number of principal components that attains the proportion of explained variance no less than 85%. Please write down the principal component scores, principal component loadings and the number of principal components. Provide the R codes as well.
Question 2 [20 marks] Consider a linear regression model which regresses the response variable "lpsa" on the principal component scores obtained in Question 1. Estimate the linear coefficients with the least squares method on the training data, and then predict the response variable on the test data. Please calcuate the MSE for predicted response variable. Provide the R codes.
Question 3 [20 marks]: Consider the linear regression model that regresses the re- sponse variable "lpsa" on the 8 predictors. With the training data, estimate the linear coefficients with ridge estimation and the lasso estimation, respectively. For these two methods, please calculate the MSEs for the predicted response variable on the test data. Provide the R codes.
Question 4 [20 marks]: Compare the three test MSEs derived in Question 2 and Question 3. Which method has the smallest test MSE? Please explain why this method produces a smaller MSE than the other two methods.
Question 5 [20 marks]: Propose a method to estimate the variance of the ridge esti- mation for the linear coefficients in Question 3. Please write the details of the proposed method and provide the R codes.