代做AEM 4110/5111 – Introduction to Econometrics Problem Set 5代做留学生SQL语言

2025.11.22 - 首页 >> Python编程

Problem Set 5

AEM 4110/5111 – Introduction to Econometrics

Instructions

• This problem set is due by 11/20 at 11:59pm.

• Submit your answers via Canvas in the assignments section of the course.

• Submit a zipped folder with the following documents. The zipped folder should be named according to “PS5” “lastname”

1. A write-up in PDF format with your answers to the questions below and the full names of all your group members.

2. A do-file with the Stata code you use for your answers. In the do-file, comment your script specifying which sections correspond to each answer in your write-up.

3. For the questions that require filling a table, you can create one in Excel or using LaTeX.

4. Important! Please write each answer on a separate page and clearly label it with the corresponding question number (for example, Question I.1.a, Question I.1.b, etc.).

Question I: RDD

Goal: In this problem, you will analyze the causal effect of harsher DUI (driving under influence) punishments on recidivism (repeat offenses) using a Regression Discontinuity Design.

Set up

In many U.S. states, drivers arrested for driving under the influence (DUI) face different penalties depending on their blood alcohol content (BAC) at the time of arrest. Specifically:

• Drivers with BAC ≥ 0.08 face stricter punishments: higher fines, longer license suspen- sions, mandatory jail time, and a permanent criminal record.

• Drivers with BAC < 0.08 face lighter punishments: smaller fines, shorter suspensions, and often no jail time.

This sharp cutoff at BAC = 0.08 creates a natural experiment. Drivers just above and just below the threshold are likely very similar in terms of their drinking behavior, demographic characteris- tics, and driving patterns. The only difference is that those just above 0.08 receive much harsher punishment.

You will use this discontinuity to estimate whether harsher DUI penalties reduce the likelihood that offenders commit another DUI offense within the next 4 years (recidivism).

The Data

The dataset hansen dwi .dta contains information on DUI offenders in Washington State from 1999-2007. The key variables are:

• bac1: Blood alcohol content (BAC) at the time of arrest (the running variable)

• recidivism: Indicator for whether the individual was arrested for DUI again within 4 years (the outcome variable)

• male: Indicator for male

• white: Indicator for white

• aged: Age at the time of arrest

• acc: Indicator for whether the arrest involved an accident

1 Let’s start by thinking about why it’s hard to estimate the causal effect of the DUI penalty.

(a) Why would it be problematic to simply compare recidivism rates between all offenders with BAC ≥ 0.08 and all offenders with BAC < 0.08 (so, even individuals far away from the cutoff)? Explain in 2-3 sentences, being specific about the source(s) of bias and using the data to support your statement.

(b) Explain the key assumption that allows RDD to identify the causal effect of harsher punish- ment in this context. What must be true about offenders just above vs. just below the BAC cutoff of 0.08? Answer in 2-3 sentences.

(c) Is this a Sharp RDD or Fuzzy RDD design? Explain your reasoning based on the treatment assignment rule. Answer in 2-3 sentences.

2 Let’s now start with our regression analysis. The simplest RDD specification estimates the following regression:

recidivismi = β0 + β1duii + ui (1)

where duii = 1 if BAC ≥ 0.08, and 0 otherwise.

(a) Load the dataset in Stata. Keep only observations where BAC is within 0.05 of the cutoff (i.e., BAC between 0.03 and 0.13). This is your analysis sample. How many observations remain in your sample?

Hint: Use keep if bac1 >= 0 .03 & bac1 <= 0 .13

(b) Generate the dui dummy variable.

(c) Run regression 1 in Stata.

Report the coefficient on dui and its standard error.

Coefficient: Standard Error:

(d) Is this coefficient statistically significant at the 5% level? State your null hypothesis, alterna- tive hypothesis, and decision rule clearly.

H0:

H1:

3 Now estimate a more flexible RDD specification that controls for the running variable (BAC).

recidivismi = β0 + β1duii + β2bac centeredi + ui (2)

where bac centeredi = bac1i - 0.08 (BAC centered at the threshold).

(a) First, create the centered BAC variable in Stata. What is the mean of bac centered?

(b) Run the regression. Report the coefficient on dui and its standard error:

Coefficient: Standard Error:

(c) Has your estimate changed compared to Question 1? If so, why? Explain in 2-3 sentences what role the bac centered variable plays.

4 [Optional question, not graded] Now estimate an RDD specification that allows for different slopes on either side of the cutoff:

recidivismi = β0 + β1duii + β2bac centeredi+ β3(duiXbac centeredi) + ui (3)

where duiXbac centered is the interaction between dui and the centered BAC variable

(a) Generate the interaction term and run this regression. Report the coefficient on dui and its standard error:

Coefficient: Standard Error:

(b) Provide an interpretation for the coefficient β3 . Can you reject the null hypothesis that the coefficient is 0 at the 5% level? What can you conclude? Use 2-3 sentences.

Hint: Review Lecture 12 (dummy X continuous variable interaction).

5 Now we are going to test whether the RDD assumptions are satisfied. A key assumption in RDD is that individuals just above and below the cutoff should be similar in terms of observable characteristics (other than the treatment). In other words, we want balance on covariates.

(a) For this exercise, we want to test whether there is a discontinuity in male (gender) at the threshold by running:

malei = β0 + β1duii + β2bac centeredi + ui (4)

Report the coefficient on dui and its p-value:

Coefficient: p-value:

(b) Is there evidence of a discontinuity in gender at the threshold? If this result holds for other covariates, what does it suggest about the validity of the RDD design? Explain in 2-3 sentences.

6 Another key concern in RDD is whether individuals can manipulate the running variable to get on one side of the threshold or the other.

(a) In this context, why might we be worried about manipulation of BAC levels? Give one specific example of how manipulation could occur.

(b) Create a histogram of the BAC variable (bac1) using narrow bins (e.g., 0.002 width). You can use the following Stata command:

histogram bac1, width(0.002) xline(0.08)

Based on the histogram, do you see any evidence of unusual bunching or gaps around the 0.08 threshold that would suggest manipulation? Explain what you observe in 2-3 sentences.

Optional Question (Not Graded) In a more advanced RDD analysis, researchers often test the validity of their design by examining whether there are discontinuities at placebo cutoffs (fake thresholds where there should be no effect).

Choose a placebo cutoff at BAC = 0.10 (above the real threshold) and re-run your RDD regression from question 3, but this time:

• Define a new treatment indicator: placebo dui = 1 if BAC ≥ 0.10, 0 otherwise

• Use a new centered variable: bac placebo centered = bac1 - 0 .10

• Restrict your sample to BAC between 0.07 and 0.13

Report the coefficient on placebo dui. Is it statistically significant? What does this placebo test tell you about the credibility of your main RDD results?

Question II: Diff-in-Diffs

The Set-up For this problem, we will work on the same problem as Problem Set 4: estimating the treatment effect of a mentoring program in high school. Now we continue to work under the assumption that there is self-selection into the mentoring program, that is, students voluntarily choose whether to enroll or not.

In Problem Set 4 you already verified that students who selected into the program have, on average, a lower GPA than those who don’t. You also saw that this generates a bias when we try to esti- mate the treatment effect of the program by comparing students who enrolled and those who don’t.

Now you are going to use a Diff-in-Diffs estimator to estimate the treatment effect of the program, where you exploit the panel structure of your data.

The Data The dataset mentoring data panel .dta contains the following variables:

• student id: Student identifier

• time: time variable = 0 if before treatment, = 1 if after treatment

• income: parental income

• parent educ: parental education

• treat: dummy if the student self-selected in the mentoring program

• gpa: student’s GPA

1 We saw that we can compute the Diff-in-Diff estimator by running the following regression

gpait = β0 + β1postt+ β2treati + β3treatXpostit + uit (5)

where post = 1 if time = 1 and treatXpost = 1 if post = 1 AND time = 1 and 0 otherwise.

(a) Using 1-2 sentences each, please provide an interpretation for each coefficient.

(b) Run regression 5 and report the coefficients and p-values.

Note: You need to generate the variables post and treatXpost.

β1: p-value:

β2: p-value:

β3: p-value:

(c) [For AEM 5111 only] Based on the estimates you got in point (b), do you think that there’s a time trend in GPA? Explain in 2-3 sentences.

2 Let’s now think about the assumption we need to make to claim causality.

(a) Which assumption do you need to make in order to interpret the Diff-in-diff estimator as the causal effect of the program on GPA? Using 2-3 sentences, discuss the assumption and describe which data you would need to test it.

(b) Can you think of a case when the assumption in (a) may be violated? Discuss using 2-3 sentences.

(c) Optional (Not Graded) You find that the grades of the students in the treated group are

trending downward, whereas the GPA of the other students is approximately stable.

• Why would this be a problem for your estimation of β3?

• Under this scenario, how would your estimate of β3 compare to the one you estimated in the previous part?

Hint: You can use a figure to help your reasoning.