代做Probability Component代做Statistics统计

2025.08.08 - 首页 >> Java编程

Probability Component

1 Discrete Probability (Counting)

Consider a group of n people. We want to understand the probabilities of members of the group sharing a birthday. For simplicity, you can assume that there are 365 days in a year (i.e. you can ignore leap days), that birthdays are uniformly distributed over the year (i.e. there is a 1/365 probability that a person’s birthday falls on a specific day), and that birthdays are statistically independent across members of the group.

• Question 0: (not graded; just for fun/to get you thinking) Intuitively, what would you guess is the probability that, among a group of 30 people (n = 30), at least 2 people in the group share the same birthday? How big do you think n needs to be for there to be at least a 50% chance that 2 people in the group share the same birthday? What about 99%?

• Question 1: Write the formula for the probability that, out of the group of n members, at least 2 of them share the same birthday. Plug in n = 10, 20, . . . , 60 to this formula and report the probabilities of at least 2 people sharing a birthday for each n. How do these probabilities compare to your guesses in Question 0?

Hint: First write out the probability that all n people have unique birthdays, then take the probability complement.

• Question 2: Write the formula for the number of unique pairs of people can you construct from a group of size n (i.e. how many unique subsets of size two are there). Plug in n = 10, 20, . . . , 60 to your formula. Can this help explain the intuition behind your answer to Question 1?

And a question not about birthdays:

• Question 3: Suppose your class has 28 people, and the instructor splits you at random into groups of 4 for a final project. You have 3 good friends in the class. What is the probability that all 4 of you get placed together? What about the probability that you get placed in a group with at least 1 of your friends?

Hint: For the purposes of this question, all that matters about the groupings is which people are in your group vs. not in your group. So, you can hold the group you’re in fixed and just think about the problem as counting all the ways you could pick 3 people out of the other 27 people to be in your group, and how many of those possible combinations contain 1, 2, or 3 of your friends.

2 Joint and Conditional Distributions

• Question 4: In the public policy response to COVID-19, a big concern with virus testing is the possibility for false negatives, i.e. the possibility that someone who has the virus has their test result incorrectly come up negative. Some studies suggest that the false negative rate can be as big as 20%, even once a patient has started exhibiting symptoms.

Suppose that we test everyone in a population who is exhibiting symptoms, and that 80% of the people exhibiting symptoms actually have contracted COVID-19. Suppose the false negative rate for the test is 20% (for simplicity we assume the false positive rate is 0%).

If a person tests negative, what is the conditional probability that they actually have COVID-19 despite testing negative?

Suppose we test each person twice, and diagnose them as positive if either test comes up positive (and negative only if both tests come up negative). Assuming the two tests are independent given a person’s true COVID-19 status (i.e. the probability of a false negative is independently 20% for both tests), now what is the conditional probability that a person actually has COVID-19 despite being diagnosed negative?

Now consider two random variables X and Y , both of which have support on the [0, 1] interval. Their joint probability density function is given by:

fX,Y (x, y) = 2 (1 − x − y + 2xy)

• Question 5: Verify that X (marginally) follows a uniform. distribution by solving for the marginal density of X, fX(x) = fX,Y (x, y)dy. Note that since the density is symmetric (i.e. fX,Y (x, y) = fX,Y (y, x)), it follows that Y is also marginally uniform.

• Question 6: What is the conditional expectation function of Y given X = x, that is, E [Y |X = x]?

Hint: recall that by Bayes’ theorem, fY |X(y|x) = fX,Y (y, x)/fX(x). Once you have the conditional density of Y |X, you can solve the expectation based on that density.

• Question 7: What is the correlation between X and Y ?

Hint: recall that Cor(X, Y) = Cov(X, Y)/(SD(X)SD(Y)), and Cov(X, Y ) = E [XY] − E [X] E [Y]. You can take as given that uniform. distributions have mean 1/2 and variance 1/12, so you only need to solve for E [XY ].

FYI: this kind of joint distribution, where each dimension is marginally uniform, is called a copula. It is a very useful idea in modeling dependencies between multiple variables, because it turns out that the CDF of any multivariate distribution can be written as a combination of the marginal CDFs of each dimension and the CDF of a copula (this result is known as Sklar’s theorem). So, if we want to model the joint distribution of several variables, instead of proposing the joint distribution explicitly, we can specify a separate marginal (univariate) model for each variable, then specify a copula to model how all the variables are related.

3 Means, Variances, and Normal Approximations

Suppose you are an actuary for a car insurance company. Your job is to calculated the expected profit and associated risk for a specific type of insurance policy.

Denote the number of insurance claims that a random customer files in a given year as X, denote the dollar amount paid out for the customer’s i-th insurance claim by Y(i), and denote the total amount paid out to the customer as For instance, if a customer files 3 claims in a year, and the insurance company has to pay out $150, $250, and $200 for each claim respectively, then X = 3, Y(1) = 150, Y(2) = 250, Y(3) = 200, and Y = 600.

Based on historic data, you estimate that the number of claims in a year for a given customer has mean E[X] = 0.3 and standard deviation SD(X) = 2, while the dollar amount paid out per claim has mean E[Y(i)] = 175 and standard deviation SD (Y(i)) = 350.

• Question 8: Assume that all Y(i) in a given year for a given customer are independent and identically distributed (i.i.d.), and are all independent of X. Calculate the mean and standard deviation of Y .

Hint: Use the law of total expectation E[Y] = EX[EY|X [Y|X]] and the law of total variance Var [Y] = EX [VarY|X [Y|X]] + VarX [EY|X [Y|X]].

• Question 9: Suppose the company sells 1,000 such policies in a given year, and charges an annual premium of $70 to each customer. Assuming the customers’ claims are i.i.d., what is the expected total margin (i.e. total premiums minus total payouts, summed across the 1,000 policies), and what is the standard deviation in total margin? You can assume that the payouts are uncorrelated across policies.

• Question 10: Assuming that the central limit theorem holds (i.e. 1,000 policies is large enough that the total margin is approximately normally distributed), what is the probability that the company makes a positive margin on the 1,000 policies?

• Question 11: Does the assumption that Y(i) are all mutually i.i.d. for a given customer in a given year seem reasonable? What about the assumption that X is independent of the Y(i) s? How might these assumptions be violated?