Stat 427/527: Homework 9, due Thursday December 1, 2005 Problem 1: The following data were collected from 48 women who were at least 40 years old when they gave birth to their first child. The data concern the gestation period of that pregnancy, and related variables on the child and mother. The columns are, from left to right: 1) ID 2) The child's gestation period, in weeks 3) Sex of the child (0=Male, 1=Female) 4) Birth Weight of child, in grams 5) Number of cigarettes smoked per day (on average) by the mother 6) Height of mother in cm 7) Weight of mother in kilograms at first prenatal visit 8) Weight of mother in kilograms at final prenatal visit 1 36 0 3300 0 160.0 67.3 82.7 2 38 0 3300 60 167.6 52.7 76.0 3 38 0 4100 20 167.6 64.2 79.6 4 38 1 2900 10 163.9 72.7 95.8 5 39 0 2820 0 161.3 50.0 63.3 6 39 0 3040 0 158.8 49.1 61.5 7 39 0 4120 0 160.0 57.7 73.5 8 39 0 4200 0 174.0 68.0 86.8 9 39 1 3100 0 171.5 67.3 85.6 10 39 1 3330 0 160.0 74.0 90.5 11 39 1 3410 0 165.1 55.9 70.7 12 39 1 3420 0 162.6 52.3 66.0 13 40 0 2450 20 167.6 61.4 72.5 14 40 0 2885 0 167.7 60.0 78.6 15 40 0 3235 0 170.2 50.0 65.5 16 40 0 3320 0 165.1 63.6 80.2 17 40 0 3600 0 165.1 53.2 68.7 18 40 0 3720 0 165.0 57.7 74.4 19 40 0 3720 0 172.7 61.4 80.0 20 40 0 3820 0 175.3 60.8 78.1 21 40 0 3840 0 167.0 60.5 83.9 22 40 0 3880 0 156.2 57.3 73.7 23 40 0 3960 0 157.5 52.7 68.2 24 40 0 4465 0 157.5 51.4 66.4 25 40 1 2980 0 160.0 47.7 55.2 26 40 1 3040 0 162.0 49.0 60.3 27 40 1 3060 20 157.5 61.0 75.0 28 40 1 3100 0 170.2 55.5 64.6 29 40 1 3120 0 160.3 56.8 75.4 30 40 1 3205 0 172.7 58.2 75.5 31 40 1 3220 0 170.0 64.6 86.0 32 40 1 4100 40 167.0 67.0 85.0 33 41 0 3100 0 168.9 61.4 69.2 34 41 0 3720 0 170.2 57.7 67.7 35 41 0 3720 20 170.2 57.7 80.5 36 41 0 3900 0 167.0 68.0 85.4 37 41 0 3990 0 165.1 52.3 71.2 38 41 0 4050 0 167.6 61.0 78.5 39 41 0 4080 0 162.6 59.1 83.1 40 41 0 4100 0 165.1 60.5 86.5 41 41 0 4460 20 165.1 56.8 88.0 42 41 0 5220 0 157.5 56.8 68.2 43 41 1 3300 40 162.6 74.1 89.7 44 41 1 3400 0 172.7 71.4 87.8 45 41 1 4000 0 165.1 90.0 100.8 46 41 1 4030 0 166.0 63.0 95.3 47 43 1 3220 0 166.4 60.9 72.0 48 43 1 4270 0 162.6 54.5 70.3 1) Plot the birth weight (BW) against the length of gestation (GEST). Describe the relationship. Looking at the plot, should the sample correlation between BW and GEST be positive, negative, or nearly zero? 2) Compute the Pearson and Spearman correlations between BW and GEST. Comment. Test the hypothesis that the population correlation between BW and GEST is zero. Comment on the tests. 3) Provide an equation for the least squares line for predicting BW from GEST. Test the hypothesis that the slope of the population regression line is zero. We can think of this as a test that GEST is important for explaining the observed variation in BW. Superimpose the LS line on the data plot and comment on whether the simple linear regression model appears to adequately summarize the relationship between BW and GEST. 4) What percentage (or proportion) of the variability in BW is explained by the linear relationship between BW and GEST? 5) Compute the Cook's distance and the studentized residuals for each case. Make appropriate residual plots and an index plot of Cook's D to check for inadequacies with the model, and for potentially influential cases. Comment on what you find. 6) Calculate a 95% CI for the mean of all birth weights for a gestation period of 38 weeks. Also calculate a 95% prediction interval for a birth weight at 38 weeks. Which is wider, and why? 6) Carry out any further analyses that you feel are needed. 7) Provide a short summary of your analysis. -------------------------------------------------------------------------- Problem 2 A study by Lea in 1965 investigated the relationship between mean annual tempreature (deg F) in regions of Briain, Norway, and Sweden, and the rate of mortality from a type of breast cancer in women. The data appear below. temp mortality 51.3 102.5 49.9 104.5 50.0 100.4 49.2 95.9 48.5 87.0 47.8 95.0 47.3 88.6 45.1 89.2 46.3 78.9 42.1 84.6 44.2 81.7 43.5 72.2 42.3 65.1 40.2 68.1 31.8 67.3 34.0 52.5 1) Plot the data. Does it appear that the relationship can be adequately modeled by a linear function? 2) Estimate the regression line and add this to your plot. 3) Calculate and interpret R-squared. 4) Interpret the estimated slope coefficient in terms of mortality and temperature. 5) Find a 95% CI for the population slope coefficient. 6) Find a 95% prediction interval for a region having mean annual temperature 45. 7) Do a thorough residual analysis using the diagnostics we have studied. Are any conclusions above affected by unusual observations? 8) Summarize what you see as the relationship between mortality and temperature based upon these data.