Probability and Distribution Theory (PDT)

1 Probability and Distribution Theory (PDT) Semester 1 2021

Assignment 1 Submission due date: by midnight on Tuesday 4 May 2021 Note that the due date has been extended by 2 days from that stated in the PDT Study Guide. Marks This assignment contains 4 questions and you should attempt all parts of all questions. There are 35 marks in total for the assignment and it counts for 35% of your overall grade for PDT. Use of Wolfram Alpha If you use Wolfram Alpha for any integration or differentiation you still need to show all the steps in the derivation of the result. The exception is where questions indicate that you should use that software. Submitting your assignment on Canvas All assignments must be submitted through the submission link on Canvas as a single pdf file (a Word file may not be dealt with as well by the Canvas system). To complete a submission you will acknowledge compliance with Academic Honesty policy and procedures as explained in Canvas. Submission of your work is through the “Turnitin” software which is plagiarism detection software and is mandated for all BCA units. Conventions Probability density functions, pdf, are represented by f(.) and cumulative distribution functions, cdf, are represented by F(.). 2 Question 1 (8 marks) Please answer the following three parts: (a) 3 marks If X has an exponential distribution and if P(X ≤ 1) = P(X > 1), what is the value of Var(X)? (b) 1 mark What name do we give to the value X = 1 in this case? (c) 4 marks Suppose X follows a Normal distribution with mean µ (with µ > 0) and a variance σ 2 that is some function of the mean, i.e. σ 2 = h(µ), for some function h(). Find h(µ) such that P(X ≤ 0) does not depend on µ. [Hint: Transform X to the standard normal distribution.] Question 2 (9 marks) The following table lists the prevalence of 7 diagnoses in patients suspected of having congenital heart disease. The table uses the following abbreviations: ventricular septal defect (VSD), atrial septal defect (ASD) pulmonary stenosis (PS), pulmonary hypertension (PH). The probability of having chest pain as a symptom given each diagnosis is presented in the third column. So for example, the last line in the table indicates that the prevalence of VSD with PH is 0.126, and P(Chest pain | VSD with PH) = 0.10. Diagnosis Prevalence Pr(Chest pain given diagnosis) Normal 0.155 0.05 ASD without PS or PH 0.126 0.02 VSD with valvular PS 0.084 0.05 Isolated PH 0.020 0.10 Transposed great vessels 0.098 0.01 VSD without PH 0.391 0.01 VSD with PH 0.126 0.10 (a) 2 marks Calculate the probability of a “VSD without PH” diagnosis for a patient who has chest pain. (b) 7 marks (total) A new diagnostic test has been proposed for “Isolated PH” that has sensitivity 75%, specificity 90%, and costs \$20. A cost-effectiveness calculation for this test needs to be conducted because its use has been proposed for screening of an entire adult population in which 1% of people would be suspected 3 of having congenital heart disease. (Assume that all cases of “Isolated PH” are in people who would be suspected of having congenital heart disease.) In addition to the cost of the test there is: 1) a \$100 cost for further tests in any person whose screening test is positive but who does not have “Isolated PH” [Such a person testing positive causes them to present for further tests which rule out Isolated PH.] 2) an average \$20,000 cost of further tests and treatment in any person whose screening test is positive and who is confirmed to have “Isolated PH”. [Such a person testing positive causes them to present for further tests which confirm Isolated PH and then the person is treated early for Isolated PH.] 3) an average \$150,000 cost of treatment in any person whose screening test is negative but who has “Isolated PH” that is later diagnosed. [Such a person testing negative on the screening test means that they are not diagnosed with Isolated PH until late in the disease course and requires intensive, expensive treatment for an extended duration.] (i) 1 mark Calculate the prevalence of “Isolated PH” in this adult population. (ii) 4 marks Calculate the expected cost per person if this new screening test was used in the population. [Hint: Define a random variable Y to be cost for an individual and list the values Y can take with their corresponding probabilities.] (iii) 2 marks Calculate the standard deviation of cost per person if this new screening test was used in the population. In your calculations, include costs of screening and subsequent tests and treatment, and assume that the prevalence of various diagnoses given in the table above apply here. Question 3 (8 marks) Consider arrivals at a particular emergency department. It is thought that the time between the arrivals of patients at a particular emergency department, as measured in minutes, can be modelled using an exponential distribution with parameter β = 5 minutes. (a) 1 mark Let S be the time in hours between arrivals. Derive the probability density function of S. (b) 1 mark What is the probability that the time between two arrivals in the department is greater than half an hour? 4 It can be shown that if the number of minutes between arrivals does have an exponential distribution with parameter β = 5 minutes, then the number of arrivals at an emergency department has a Poisson distribution with an expected value of 12 arrivals per hour. (c) 1 mark Simulate 100 hours’ worth of arrivals for this emergency department (Hint: to simulate the number of arrivals in 1 hour for this emergency department, in Stata use rpoisson(12), or in R use rpois(1,12) ). What is the mean number of arrivals across those 100 one-hour periods? What is the variance? (Note: be sure to set a random seed so that you can replicate your simulation. In Stata, use set seed XXXX; in R use set.seed(XXXX), where XXXX is replaced by a number of your choice.) (d) 3 marks Recall that for a random variable X with a Poisson distribution, E[X] = Var(X), so that E[X]/Var(X) = 1. Repeat the simulation of the previous part of the question 2000 times (i.e. generating 2000 samples that each consist of 100 hours’ worth of data). For each of these simulations, calculate the mean number of arrivals per hour and the variance, and include a histogram of the mean divided by the variance. What do you notice? (e) 2 marks For this emergency department, department records for 100 onehour periods of time were investigated, and it was found that on average, 12 patients arrived per hour, with a variance of 16. Using the simulation from the previous part, do you think it is likely that the number of arrivals per hour in this emergency department has a Poisson distribution? Why or why not? HINT: The following Stata or R code may be useful: Stata: set seed XXXXX [where XXXX is any number – the starting seed] set obs 2000 gen sampnumber = _n expand 100 gen poissamp = rpoisson(12) collapse (mean) my2000means=poissamp (sd) my2000sds=poissamp, by(sampnumber) gen my2000vars = my2000sds^2 gen my2000ratios = my2000means/my2000vars R: set.seed(XXXXX) [where XXXX is any number – the starting seed] mypoissons <- matrix(data=rpois(2000*100, 12), nrow=2000, ncol=100) my2000means <- apply(mypoissons, 1, mean) 5 my2000vars <- apply(mypoissons, 1, var) my2000ratios <- my2000means/my2000vars Question 4 (10 marks) Suppose X follows the distribution fX(x) = 2xe−x 2 with x > 0. (a) 2 marks Find the pdf, fY (y) of the random variable Y where Y = X2 . What distribution is this? (b) 3 marks Find the pdf, fU (u), of the random variable U where U = Y 1+Y , and Y is the random variables in part (a). (c) 2 marks Confirm that fU (u) from part (b) has the properties of a probability density function. (d) 3 marks Draw a random sample of size 10,000 from fU (u) (from part (b) ) and construct a histogram of the values in your random sample. How does your histogram compare to the pdf you derived in part(b) above? NOTE: In parts (c) and (d) you may use Wolfram Alpha to solve integrals. Here is some example WolframAlpha code: Suppose f(u) = 3u 2 , 0 < u < 1. To obtain R 1 u=0 f(u)du, in WolframAlpha type integrate 3*u^2 from u=0 to 1 [answer is 1 ] To obtain R u t=0 f(t)dt, in WolframAlpha type integrate 3*t^2 from t=0 to u [answer is u^3 ]