# 代写Data Analysis作业、代做R编程语言作业、R课程设计作业代写、代做system留学生作业 代做留学生Prolog|帮做Haskell程序

- 首页 >> Web作业 Categorical Data Analysis

Take-home Assignment №2

Deadline: March 12, 2020, at 23:59 MT

Please submit your assignment via LINK. Make sure that you clearly name your assignment

files. You are supposed to submit two files – a PDF/Word doc and an R code. Instead of

submitting an assignment as a Word or PDF doc with the file name like “Assignment 2” try

naming the file something like this: “Your surname-A2-CDA-2020”.

Course Policy Reminder

∙ Working Together. Unless instructed otherwise (e.g. for the Replication

Project), students may work together on assignments. However, students have

to write up their own solutions in their own words. If a student turns in material

that is in the same words as a fellow student, the work will be considered to be

plagiarized. Plagiarism will be dealt with according to the policies of the HSE.

∙ Late Assignments. Unless otherwise instructed, assignments have to be

submitted before the beginning of the corresponding lecture. Late assignment

has a half-life of 24 hours; that is a student gets 50% credit if it is handed in late,

but within 24 hours of the due time; a student gets 25% credit for the next 24

hours; etc.

∙ Academic Fraud. Plagiarism and any other activities when students present

work that is not their own are academic fraud. Academic fraud is a serious matter

and is reported to the Academic Supervisor and the Manager. All the cases of

academic fraud will be individually discussed and resolved with according to the

policies of the HSE.

Part 1: Prove Yourself !

1. Calculate 𝑃(𝑦𝑖 = 0|𝑥𝑖), 𝑃(𝑦𝑖 = 1|𝑥𝑖), 𝑃(𝑦𝑖 = 2|𝑥𝑖) and 𝑃(𝑦𝑖 = 3|𝑥𝑖) for 𝑦𝑖 which is

measured on an ordered 4-point scale. Consider that observed 𝑦𝑖 = 0 if latent 𝑦𝑖* ≤ 𝜏1;

𝑦𝑖 = 1 if 𝜏1 < 𝑦𝑖* ≤ 𝜏2; 𝑦𝑖 = 2 if 𝜏2 < 𝑦𝑖* ≤ 𝜏3; and 𝑦𝑖 = 3 if 𝑦𝑖* > 𝜏3.

2. Let’s assume that we have an ordered probit model. A dependent variable 𝑦𝑖

is

measured on an ordered 4-point scale. An independent variable 𝑥𝑖

is continuous and

1

Categorical Data Analysis: Take-home Assignment №2

distributed as (−∞; +∞). If 𝛽ˆ

0 = −.50, 𝛽ˆ

1 = .052, 𝜏ˆ1 = .75, 𝜏ˆ2 = 3.5 and 𝜏ˆ3 = 5.0,

then calculate predicted probabilities for the following cases:

x = 15 x = 40 x = 80

P(y = 1 | x) ??? ??? ???

P(y = 2 | x) ??? ??? ???

P(y = 3 | x) ??? ??? ???

P(y = 4 | x) ??? ??? ???

Part 2: Ordered Logit and Gologit

To complete these tasks use The European Quality of Government Index (EQI) data (France,

2017). I’ve already prepared a dataset with all the variables recoded. See the description of

variables and their scales in the table below. Feel free to practice your data management

skills and use files from official website here, or use this file otherwise.

Code Question Wording Scale

Q10 All citizens are treated equally in the public education system in my area

Ordered:

4-Agree

3-Rather agree,

2-Rather disagree

1-Disagree

Q1 Have you or any of your immediate family been enrolled or employed in

the public school system in your area in the past 12 months? Binary: Yes/No

Q4 How would you rate the quality of public education in your area?

Continuous:

1- Very poor

10 - Excellent quality

Q13 Corruption is prevalent in my area’s local public school system

Continuous:

1 - Strongly disagree

10 - Strongly agree

Q17_1 In the past 12 months have you or anyone living in your household

paid a bribe in any form to Education services? Binary: Yes/No

D1 Gender of respondent Binary: M/F

D3 Age of respondent Continuous: 18-99

D2 Education level of respondent

Factor:

1-Elementary school or less,

2-High school (but did not graduated),

3-Graduation from high school,

4-Graduation from college, university,

5-Post-graduate degree (Masters, PHD)

RECODED4 Please tell me your average total household net income per month

Factor:

1-low,

2-medium

3-high

1. Show descriptive statistics for all the variables in the dataset. Use tables, simple tests

(correlations, T-tests, Chi-square, etc.) and visualization to describe your data. In case

of missing values, analyze whether they are completely random, random or not at

random. Compare descriptive statistics before and after you omit missing values.

2. Rearrange levels of Q10 in the ’Agree → Rather Agree → Rather Disagree → Disagree’

order. Build an ordered logistic regression model.

2 CDA 2020

Categorical Data Analysis: Take-home Assignment №2

3. Use Q10 as the dependent variable and the rest of variables in your dataset as independent

ones. Build an ordered probit and a linear regression model. Create a table with three

columns then (you might use stargazer package to create a fancy table). Compare

the results.

4. Test two hypotheses about linear restrictions using the Wald test. Use any variables

for which you find it relevant. Interpret the results, i.e. include in your answer 𝐻0 and

𝐻1, statistics, p-value and substantial interpretation.

5. Get rid of variables whose coefficients are not statistically significant. Compare this

model with the full one using Likelihood Ratio Test. Interpret the results, i.e. include

in your answer 𝐻0 and 𝐻1, statistics, p-value and substantial interpretation.

6. Calculate odds ratios for one continuous and two binary variables and interpret them.

Use both statistical and substantive interpretation of your results.

7. Create one graph with predicted probabilities and one graph with cumulative probability

of different categories depending on a continuous variable. Interpret the resulted graphs.

Use both statistical and substantive interpretation of your results.

8. Test your model for the parallel regression assumption. nterpret the results, i.e. include

in your answer 𝐻0 and 𝐻1, statistics, p-value and substantial interpretation.

9. Build (if necessary) least constrained model (Gologit), Partial Proportional Odds

Model and Ordered logit. Compare these models using Likelihood Ratio Test. Show

estimates of the best model in a table. Interpret the results.

Part 3: Multinomial Logit

To complete these tasks use the data from European Election Study on elections in Netherlands.

I’ve already prepared a dataset with all the variables recoded. See the description of variables

and their scales in the table below. Use this file.

1. Use party as the dependent variable and income, age, educ, union as independent

ones. Let’s assume that the reference party is PvdA - Social Democrats (0). Build a

multinomial logistic regression.

2. Build another model with the same set of predictors, but add relig as well. Test the

significance of this coefficient using Wald test. Compare these models using Likelihood

Ratio Test. Interpret the results, i.e. include in your answer 𝐻0 and 𝐻1, statistics,

p-value and substantial interpretation.

3. Tests for combining dependent categories. Interpret the results, i.e. include in your

answer 𝐻0 and 𝐻1, statistics, p-value and substantial interpretation. Combine categories

if it’s necessary.

4. Calculate odds ratios for one continuous and one binary variables. Interpret them. Use

both statistical and substantive interpretation of your results.

3 CDA 2020

Categorical Data Analysis: Take-home Assignment №2

Code Question Wording Scale

party Which party did you vote for in the last elections?

Factor:

0 - PvdA (Social Democrats - Left)

1 - CDA (Christian Democrats - Right, Religious)

2 - VVD (Liberals - Right Secular)

3 - D66 (Democrats 66 - Social-liberals, Democrats)

income Income of respondent’s household

Continuous:

1 - less than 21,000

2 - 21,000 - 23,999

3 - 24,000 - 29,999

4 - 30,000 - 35,999

5 - 36,000 - 43,999

6 - 44,000 - 54,000

7 - more than 54,000

age Age of respondent

Continuous:

1 - 17-20 years

2 - 21-25 years

3 - 26-30 years

4 - 31-35 years

5 - 36-40 years

6 - 41-45 years

7 - 46-50 years

8 - 51-55 years

9 - 56-60 years

10 - 61-65 years

11 - 66-70 years

12 - 71-75 years

13 - 76 years and older

educ Education of respondent

Continuous:

1 - low

5 - high

relig Is respondent religious?

Binary:

1 - yes

2 - no

union Is respondent a member of any labor union?

Binary:

1 - yes

2 - no

5. Interpret the predicted probabilities for the parties depending on the values of one

binary and one continuous variables using graphs.

6. Describe individuals who are most and least likely to support Social Democrats.

7. Test the IIA assumption and interpret the results.

4 CDA 2020

Take-home Assignment №2

Deadline: March 12, 2020, at 23:59 MT

Please submit your assignment via LINK. Make sure that you clearly name your assignment

files. You are supposed to submit two files – a PDF/Word doc and an R code. Instead of

submitting an assignment as a Word or PDF doc with the file name like “Assignment 2” try

naming the file something like this: “Your surname-A2-CDA-2020”.

Course Policy Reminder

∙ Working Together. Unless instructed otherwise (e.g. for the Replication

Project), students may work together on assignments. However, students have

to write up their own solutions in their own words. If a student turns in material

that is in the same words as a fellow student, the work will be considered to be

plagiarized. Plagiarism will be dealt with according to the policies of the HSE.

∙ Late Assignments. Unless otherwise instructed, assignments have to be

submitted before the beginning of the corresponding lecture. Late assignment

has a half-life of 24 hours; that is a student gets 50% credit if it is handed in late,

but within 24 hours of the due time; a student gets 25% credit for the next 24

hours; etc.

∙ Academic Fraud. Plagiarism and any other activities when students present

work that is not their own are academic fraud. Academic fraud is a serious matter

and is reported to the Academic Supervisor and the Manager. All the cases of

academic fraud will be individually discussed and resolved with according to the

policies of the HSE.

Part 1: Prove Yourself !

1. Calculate 𝑃(𝑦𝑖 = 0|𝑥𝑖), 𝑃(𝑦𝑖 = 1|𝑥𝑖), 𝑃(𝑦𝑖 = 2|𝑥𝑖) and 𝑃(𝑦𝑖 = 3|𝑥𝑖) for 𝑦𝑖 which is

measured on an ordered 4-point scale. Consider that observed 𝑦𝑖 = 0 if latent 𝑦𝑖* ≤ 𝜏1;

𝑦𝑖 = 1 if 𝜏1 < 𝑦𝑖* ≤ 𝜏2; 𝑦𝑖 = 2 if 𝜏2 < 𝑦𝑖* ≤ 𝜏3; and 𝑦𝑖 = 3 if 𝑦𝑖* > 𝜏3.

2. Let’s assume that we have an ordered probit model. A dependent variable 𝑦𝑖

is

measured on an ordered 4-point scale. An independent variable 𝑥𝑖

is continuous and

1

Categorical Data Analysis: Take-home Assignment №2

distributed as (−∞; +∞). If 𝛽ˆ

0 = −.50, 𝛽ˆ

1 = .052, 𝜏ˆ1 = .75, 𝜏ˆ2 = 3.5 and 𝜏ˆ3 = 5.0,

then calculate predicted probabilities for the following cases:

x = 15 x = 40 x = 80

P(y = 1 | x) ??? ??? ???

P(y = 2 | x) ??? ??? ???

P(y = 3 | x) ??? ??? ???

P(y = 4 | x) ??? ??? ???

Part 2: Ordered Logit and Gologit

To complete these tasks use The European Quality of Government Index (EQI) data (France,

2017). I’ve already prepared a dataset with all the variables recoded. See the description of

variables and their scales in the table below. Feel free to practice your data management

skills and use files from official website here, or use this file otherwise.

Code Question Wording Scale

Q10 All citizens are treated equally in the public education system in my area

Ordered:

4-Agree

3-Rather agree,

2-Rather disagree

1-Disagree

Q1 Have you or any of your immediate family been enrolled or employed in

the public school system in your area in the past 12 months? Binary: Yes/No

Q4 How would you rate the quality of public education in your area?

Continuous:

1- Very poor

10 - Excellent quality

Q13 Corruption is prevalent in my area’s local public school system

Continuous:

1 - Strongly disagree

10 - Strongly agree

Q17_1 In the past 12 months have you or anyone living in your household

paid a bribe in any form to Education services? Binary: Yes/No

D1 Gender of respondent Binary: M/F

D3 Age of respondent Continuous: 18-99

D2 Education level of respondent

Factor:

1-Elementary school or less,

2-High school (but did not graduated),

3-Graduation from high school,

4-Graduation from college, university,

5-Post-graduate degree (Masters, PHD)

RECODED4 Please tell me your average total household net income per month

Factor:

1-low,

2-medium

3-high

1. Show descriptive statistics for all the variables in the dataset. Use tables, simple tests

(correlations, T-tests, Chi-square, etc.) and visualization to describe your data. In case

of missing values, analyze whether they are completely random, random or not at

random. Compare descriptive statistics before and after you omit missing values.

2. Rearrange levels of Q10 in the ’Agree → Rather Agree → Rather Disagree → Disagree’

order. Build an ordered logistic regression model.

2 CDA 2020

Categorical Data Analysis: Take-home Assignment №2

3. Use Q10 as the dependent variable and the rest of variables in your dataset as independent

ones. Build an ordered probit and a linear regression model. Create a table with three

columns then (you might use stargazer package to create a fancy table). Compare

the results.

4. Test two hypotheses about linear restrictions using the Wald test. Use any variables

for which you find it relevant. Interpret the results, i.e. include in your answer 𝐻0 and

𝐻1, statistics, p-value and substantial interpretation.

5. Get rid of variables whose coefficients are not statistically significant. Compare this

model with the full one using Likelihood Ratio Test. Interpret the results, i.e. include

in your answer 𝐻0 and 𝐻1, statistics, p-value and substantial interpretation.

6. Calculate odds ratios for one continuous and two binary variables and interpret them.

Use both statistical and substantive interpretation of your results.

7. Create one graph with predicted probabilities and one graph with cumulative probability

of different categories depending on a continuous variable. Interpret the resulted graphs.

Use both statistical and substantive interpretation of your results.

8. Test your model for the parallel regression assumption. nterpret the results, i.e. include

in your answer 𝐻0 and 𝐻1, statistics, p-value and substantial interpretation.

9. Build (if necessary) least constrained model (Gologit), Partial Proportional Odds

Model and Ordered logit. Compare these models using Likelihood Ratio Test. Show

estimates of the best model in a table. Interpret the results.

Part 3: Multinomial Logit

To complete these tasks use the data from European Election Study on elections in Netherlands.

I’ve already prepared a dataset with all the variables recoded. See the description of variables

and their scales in the table below. Use this file.

1. Use party as the dependent variable and income, age, educ, union as independent

ones. Let’s assume that the reference party is PvdA - Social Democrats (0). Build a

multinomial logistic regression.

2. Build another model with the same set of predictors, but add relig as well. Test the

significance of this coefficient using Wald test. Compare these models using Likelihood

Ratio Test. Interpret the results, i.e. include in your answer 𝐻0 and 𝐻1, statistics,

p-value and substantial interpretation.

3. Tests for combining dependent categories. Interpret the results, i.e. include in your

answer 𝐻0 and 𝐻1, statistics, p-value and substantial interpretation. Combine categories

if it’s necessary.

4. Calculate odds ratios for one continuous and one binary variables. Interpret them. Use

both statistical and substantive interpretation of your results.

3 CDA 2020

Categorical Data Analysis: Take-home Assignment №2

Code Question Wording Scale

party Which party did you vote for in the last elections?

Factor:

0 - PvdA (Social Democrats - Left)

1 - CDA (Christian Democrats - Right, Religious)

2 - VVD (Liberals - Right Secular)

3 - D66 (Democrats 66 - Social-liberals, Democrats)

income Income of respondent’s household

Continuous:

1 - less than 21,000

2 - 21,000 - 23,999

3 - 24,000 - 29,999

4 - 30,000 - 35,999

5 - 36,000 - 43,999

6 - 44,000 - 54,000

7 - more than 54,000

age Age of respondent

Continuous:

1 - 17-20 years

2 - 21-25 years

3 - 26-30 years

4 - 31-35 years

5 - 36-40 years

6 - 41-45 years

7 - 46-50 years

8 - 51-55 years

9 - 56-60 years

10 - 61-65 years

11 - 66-70 years

12 - 71-75 years

13 - 76 years and older

educ Education of respondent

Continuous:

1 - low

5 - high

relig Is respondent religious?

Binary:

1 - yes

2 - no

union Is respondent a member of any labor union?

Binary:

1 - yes

2 - no

5. Interpret the predicted probabilities for the parties depending on the values of one

binary and one continuous variables using graphs.

6. Describe individuals who are most and least likely to support Social Democrats.

7. Test the IIA assumption and interpret the results.

4 CDA 2020