代做Dataset: Boston Housing代做Java语言
- 首页 >> Python编程Dataset: Boston Housing
Dataset comprises town-level socio-economic data on the housing in 506 towns comprising Greater Boston including data on pollution levels.
Variable definitions are included in the Excel workbook containing the dataset.
Objective
To understand the determinants of the median value of housing in 506 towns comprising Greater Boston.
Tasks
(All tasks must be undertaken using Excel. You will then use the results generated by your data analysis to complete the Answer Sheet included in this Assessment Brief. Only the completed Answer Sheet including the required screenshots is to be submitted.)
1. Calculate descriptive statistics for CRIME, ROOMS, AGE, TAX, PTRATIO and VALUE.
2. Generate histograms for CRIME, ROOMS, AGE, TAX, PTRATIO and VALUE.
3. Undertake skewness/outlier analysis and normality tests for CRIME, ROOMS, AGE, TAX, PTRATIO and VALUE.
4. Generate the correlation matrix for CRIME, ROOMS, AGE, TAX, PTRATIO and VALUE.
5. Create a scatterplot for VALUE vs ROOMS; include a linear trendline with the equation of the line and R2.
6. Estimate a general regression model of VALUE using all the variables in the dataset.
7. Develop a specific regression model of VALUE that eliminates irrelevant variables and maximises R2(adj).
8. Undertake residual analysis for the estimated specific regression including residual plots and auxiliary regression analysis.
9. Complete the answer sheet and submit on Minerva (via TurnitIn).
Assignments should be a maximum of 2000 words in length.
All coursework assignments that contribute to the assessment of a module are subject to a word limit, as specified in the assessment brief. The word limit is an extremely important aspect of good academic practice, and must be adhered to. Unless stated otherwise in the relevant module handbook (if one has been provided), the word count includes EVERYTHING (i.e. all text in the main body of the assignment including summaries, subtitles, contents pages, tables, supportive material whether in footnotes or in-text references) except the main title, reference list and/or bibliography and any appendices. It is not acceptable to present matters of substance, which should be included in the main body of the text, in the appendices (“appendix abuse”). It is not acceptable to attempt to hide words in graphs and diagrams; only text which is strictly necessary should be included in graphs and diagrams.
You are required to adhere to the word limit specified and state an accurate word count on the cover page of your assignment brief. Your declared word count must be accurate, and should not mislead. Making a fraudulent statement concerning the work submitted for assessment could be considered academic malpractice and investigated as such. If the amount of work submitted is higher than that specified by the word limit or that declared on your word count, this may be reflected in the mark awarded and noted through individual feedback given to you.
The deadline date for this assignment is 12:00:00 noon on Wednesday 14th May 2025.
Semester 2, 2024/25
Assessed Coursework: Answer Sheet
1. Descriptive Statistics (Worth 12%)
(i) Complete the following table:
|
CRIME |
ROOMS |
AGE |
TAX |
PTRATIO |
VALUE |
Mean |
|
|
|
|
|
|
Median |
|
|
|
|
|
|
Minimum |
|
|
|
|
|
|
Maximum |
|
|
|
|
|
|
1st Quartile |
|
|
|
|
|
|
3rd Quartile |
|
|
|
|
|
|
St Dev |
|
|
|
|
|
|
CoV |
|
|
|
|
|
|
Note: CoV = coefficient of variation
(ii) Which of the variables (CRIME, ROOMS, AGE, TAX, PTRATIO and VALUE) are most/least dispersed? Explain your answer.
2. Histograms (Worth 12%)
Insert histograms:
(i) CRIME
(ii) ROOMS
(iii) AGE
(iv) TAX
(v) PTRATIO
(vi) VALUE
3. Distributional Properties (Worth 10%)
(i) Is the distribution skewed for any of the variables (CRIME, ROOMS, AGE, TAX, PTRATIO and VALUE)? Explain your answer.
(ii) Is there evidence of any outliers in any of the variables (CRIME, ROOMS, AGE, TAX, PTRATIO and VALUE)? Explain your answer.
4. Correlation Analysis (Worth 10%)
(i) Correlation matrix
Complete the correlation matrix
|
CRIME |
ROOMS |
AGE |
TAX |
PTRATIO |
VALUE |
CRIME |
|
|
|
|
|
|
ROOMS |
|
|
|
|
|
|
AGE |
|
|
|
|
|
|
TAX |
|
|
|
|
|
|
PTRATIO |
|
|
|
|
|
|
VALUE |
|
|
|
|
|
|
(ii) Comment on the key points of the correlation analysis.
5. Scatterplot (Worth 6%)
(i) Insert the scatterplot for VALUE vs ROOMS including a linear trendline with the equation of the line and R2.
(ii) Comment on the scatterplot.
6. General Regression Model (Worth 18%)
Complete the following table for the general regression model of VALUE
Outcome: VALUE |
||||
|
Coefficient |
Standard Error |
T Stat |
P-Value |
Intercept |
|
|
|
|
CRIME |
|
|
|
|
ZONE |
|
|
|
|
INDUSTRY |
|
|
|
|
RIVER |
|
|
|
|
NOX |
|
|
|
|
ROOMS |
|
|
|
|
AGE |
|
|
|
|
DISTANCE |
|
|
|
|
HIGHWAY |
|
|
|
|
TAX |
|
|
|
|
PTRATIO |
|
|
|
|
DIVERSITY |
|
|
|
|
LOWSTAT |
|
|
|
|
Goodness of Fit |
||||
R2 |
|
|||
R2(adj) |
|
|||
F statistic (P-value) |
|
|||
Comments on Key Points
|