FINAL EXAM Stat 427/527 Due Thursday Dec. 15 at 4 PM, hand delivered or mailed to my campus address. Please do not email. Campus address is Department of Mathematics and Statistics, MSC03 2150, 1 University of New Mexico Albuquerque, New Mexico 87131-0001. My office is HUM 435. Directions: The exam has two problems of equal point value. You must do both questions. The questions were deliberately constructed to be somewhat open-ended. It is your responsibility to decide what analysis is appropriate, and to critically assess each analysis you perform. Keep the solutions for each problem separate. Each solution should consist of no more than 4 word- processed pages of discussion followed by all relevant computer output. The output must be clearly labeled and referred to in the text. Alternatively, you can intersperse the output with discussion. Try to be thorough yet succinct. Unnecessary or inappropriate analyses included for the sake of completeness will be frowned upon - you should not tell me all the things you did that did not work out. However, all output relevant to the discourse should be included. (1) The data below are the average wine consumption rates (liters per person) and the number of ischemic heart disease deaths (per 1,000 men aged 55 to 64 years old) for 18 industrialized countries. Country Wine Consumption Heart Disease Mortality ---------------------------------------------------------- Norway 2.8 6.2 Scotland 3.2 9.0 England 3.2 7.1 Ireland 3.4 6.8 Finland 4.3 10.2 Canada 4.9 7.8 United States 5.1 * Netherlands 5.2 5.9 New Zealand 5.9 8.9 Denmark 5.9 5.5 Sweden 6.6 7.1 Australia 8.3 9.1 Belgium 12.6 5.1 Germany 15.1 4.7 Austria 25.1 4.7 Switzerland 33.1 3.1 Italy 75.9 3.2 France 75.9 2.1 The questions of interest here are whether the data suggest that the heart disease death rate is associated with the average wine consumption, and if so, then how can that relationship be described? Do a thorough analysis, carefully checking all assumptions. Use whatever model you create to predict heart disease mortality in the United States using an appropriate confidence interval. Give me a complete summary of your analysis and conclusions in no more than 4 pages. (2) Mark Twain is often credited with being the author of ten letters published in the New Orleans Daily Crescent under the name "Quintius Curtis Snodgrass" In 1963, C. Brinegar used statistical methods to compare the Snodgrass letters to works known to be written by Twain in an attempt to decide whether Twain did write the Snodgrass letters. Brinegar used a very simple test of authorship based on the distributions of word lengths of the two authors. If the distributions of word lengths are very different, we have some evidence that the authors are probably different people. Other tests of authorship have been subsequently developed. Brinegar counted the number of two-letter words, the number of three-letter words, and so on, for the ten Snodgrass letters. A similar summarization was done for a collection of works known to be written by Twain, including seven letters written to friends between 1858 and 1867, and samples of approximately 2500 words each from his 1872 work "Roughing It" and his 1897 work "Following the Equator." One-letter words, such as "I" and "a" were omitted because they tend to characterize content more so than style. Brinegar then compared the proportions of words of a given length across authors, that is, he tested homogeneity of proportions. Initial analyses of Twain's and Snodgrass's works showed no significant differences in the word length proportions across the works of either author. All of the works for Twain and Snodgrass were combined for the final analysis. In the table below, the first column gives the word lengths, with words of 13 or more letters combined together. The second and third columns give the word length counts for Snodgrass's and Twain's works, respectively. The goal for this problem is to devise an appropriate statistical analysis to assess the agreement between the word length distributions for Twain and Snodgrass. Word Length Snodgrass Twain ----------------------------- 2 2685 2989 3 2752 3917 4 2302 3224 5 1431 1954 6 992 1283 7 896 1026 8 638 693 9 465 495 10 276 287 11 152 134 12 101 62 13 61 47 a. Use the MINITAB calculator to compute, for each author, the proportion of words of lengths 2 through 13+. For example, if Twain's word counts are in column C3, then in the dialog box for the CALCULATOR type C3/sum(C3) in the EXPRESSION box, and store the results in an appropriate column. Then, plot the proportions of words of a given length as a function of length, including the summaries for both Snodgrass and Twain on the same plot. (Note of 12/13 - the proportions are the summaries). Describe what you see, keeping in mind that the goal is to compare the word length distributions for works by the two authors. b. Carry out a formal test that the word length distributions for Twain and Snodgrass are identical. c. Carry out any additional analyses that you might deem appropriate, and summarize your conclusions. Use no more than 4 pages for this.