讲解data留学生、Python编程设计调试、program辅导、讲解Python语言

2020.07.27 - 首页 >> Database作业

Problem 1
1) If a ray of light passes from air into glass, the angles of incidence i and refraction r are defined as follows:

They are related by Snell’s law, Unknown character(i)=n*sin(r), where n is the refractive index of glass. Thus, if you measure the angels i and r, you can calculate the refractive index n as n=

1.What is the uncertainty in the answer, δn, given uncertainties in i (δi) and r (δr)? Provide an analytical solution.
2.Suppose we measure the angle r for a couple of values of i and get the results shown in the first two columns in the table below. Calculate the relative error in sin(i), sin(r) and n.
i (deg) r (deg)
10 ± 1 7 ± 1
20 ± 1 13 ± 1
30 ± 1 20 ± 1
50 ± 1 29 ± 1
70 ± 1 38 ± 1
1.Comment on the size of error given differences in incident angles. Is there an optimal angle to perform this measurement which will minimize errors in the measurement?
2.The refractive index, n, is dependent on the wavelength of light, yet not on the angle of incidence. Calculate the value of n for each individual measurement (different incident angle) with the associated absolute error for both measurements.
3.What single value of n would you report to a customer of the material for that wavelength if you were the vendor? What error is appropriate?
4.What would you do to improve on (i.e. reduce) the error in determined value of n?

Problem 2
2) A dataset is provided in “MAT395-495-HW3-Distributions1.xlsx”. The data is a result of an X-ray 2θ scan from which the peak value can be used to determine the lattice constant of the material. Assume μ=28.5 for the parent population (i.e. you somehow already know what the lattice constant (and hence the corresponding 2θ) is because someone else already did the measurement with a high degree of accuracy). Import this dataset into a Jupyter Notebook. Make sure to use native python code to directly import the code into your script. Do not copy-paste the numbers into your code.
1.
Plot the data as a scatter plot. Which distribution does this data set visually seem to follow most closely?

2.
Determine the mean, median, root-mean-square, average deviation, variance and standard deviation. Perform calculation using equations provided in class (perform the summation using Python)

3.
Use native Python function calls to do the previous calculations for you automatically. You need to determine which package you want to use to perform the statistical function calculations for you. Justify why you chose the package/functions you did and why you think they are reliable.
4.
In your scatter plot, find a way to appropriately and effectively visually present the information you calculated: mean, median, root-mean-square, average deviation, variance and standard deviation. Make sure you properly label the axes and make an appropriate legend.

Problem 3
3) A dataset is provided in “MAT395-495-HW3-Distributions2.xlsx”. This dataset is a result of you measuring the outcome of an experiment 400 times with each run being independent of each other. You also know, based on the experimental setup, that the outcomes must follow the binomial distribution n = 100, p = 0.3.
1.
Use python to calculate the histogram for this data. Use whole integers as the bins. Plot the resulting data as a probability function (y-axis is probability of obtaining the corresponding value indicated on the x-axis).
2.
Calculate the parent binomial probability distribution and overlay it in your previous plot.
3.
Calculate the mean and standard deviation of the data set using python functions on the sample distribution and on the parent distribution.
4.
Calculate the error in your experimentally determined mean and standard deviation from the real values (parent distribution).

Problem 4

4) Let’s empirically analyze the impact of number of data points on the determination of the peak and standard deviation value for a Gaussian peak. To that end, we will use dataset “MAT395-495-HW3-Distributions3.xlsx”.
1.
Plot the data as a scatter plot. You can think of the data as an intensity vs. something plot (e.g. angle for X-ray, energy for spectroscopy measurements, etc.).
2.

Using the complete data set determine the mean, standard deviation, and full width at half max. Plot the Gaussian curve using those parameters as a continuous function on top of the discrete data set plot of part a.
3.

Remove every other data point from the previous data set (do not do this manually, yet rather find a way to do this efficiently using a line or two of python code or better yet using a function or a loop) and determine the mean, standard deviation, and full width at half max. Make a plot for this data set and corresponding Gaussian curve you determined.
4.
Repeat part c until you do not have any data points left (the dataset size is 212 = 4096 so the total number of times you can divide your set by 2 is 12 before you have one data point). What you are doing by deleting values is are you are effect simulating a run using the same characterization tool but using a larger step size during your measurement, i.e. instead of taking a measurement every 0.001 degrees, you are taking them every 0.002 degrees.
5.
Make three plots: plot your determined values for the mean, standard deviation, and full width at half max as a function of number of data points in the set, respectively. What do you observe? Is there a benefit in having a very nice-looking data set with very small step sizes in-between measurements of you know the peak follows a Gaussian, Lorentzian or Voigt distribution?