problem #3 test=matrix(scan("prob3-spr07.txt"), ncol=3, byrow=T) x=test[,2] #contamination for site 1 for the 20-day period y=test[,3] #contamination for site 2 for the 20-day period # Check Normality.. #histogram hist(x) #QQ plot qqnorm(x) qqline(x) > ks.test(x, "pnorm", mean=mean(x), sd=sd(x)) One-sample Kolmogorov-Smirnov test data: x D = 0.0924, p-value = 0.9956 alternative hypothesis: two.sided #histogram hist(y) #QQ plot qqnorm(y) qqline(y) ks.test(y, "pnorm", mean=mean(y), sd=sd(y), exact=TRUE) One-sample Kolmogorov-Smirnov test data: y D = 0.1215, p-value = 0.9294 alternative hypothesis: two.sided ------- While the p-value from the K-S test is high indicating that the data is Normally distributed, the qqplot and the histogram indicate there might be slight non-Normality. --------------- (i) Parametric test two sample t-test Equation 10-15 with unequal variance and Paired t-test Equation 10-22 two sample t-test ------------------- t.test(x,y, var.equal=FALSE, paired=TRUE) Paired t-test data: x and y t = 3.5992, df = 19, p-value = 0.001912 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 1.255446 4.744554 sample estimates: mean of the differences 3 > t.test(x,y, var.equal=FALSE) Welch Two Sample t-test data: x and y t = 2.8498, df = 37.59, p-value = 0.00706 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.8681746 5.1318254 sample estimates: mean of x mean of y 17.85 14.85 ----------- Both the tests indicate that the mean contamination between the two sites significantly (at 95% confidence) different. -------- Nonparametric test ------------------- Wilcox Rank Sum test or Signed Rank test are appropriate as it is for data assuming continuous distributions differing in their mean. We have to assume the same variance. Wilcoxon Rank Sum Test ----------------------- wilcox.test(x,y, exact=FALSE) Wilcoxon rank sum test with continuity correction data: x and y W = 294, p-value = 0.01114 alternative hypothesis: true mu is not equal to 0 This test suggests that the mean contamination is different between the sites just about at 90% confidence. ---- Wilcoxon Signed Rank Test (assuming a paired situation) -------------------------- wilcox.test(x,y, paired=TRUE, exact = FALSE) Wilcoxon signed rank test with continuity correction data: x and y V = 180, p-value = 0.005294 alternative hypothesis: true mu is not equal to 0 This test (from the p-value) also suggests that the mean contamination is different between the sites at greater than 95% confidence. ---------- A really general test with absolutely no assumption on the distribution of the data is the Sign Test sign.test - available in the library BSDA in R. Download the library BSDA and the function 'sign.test' will be available. sign.test(x,y) $rval Dependent-samples Sign-Test data: test[, 2] and test[, 3] S = 16, p-value = 0.01182 alternative hypothesis: true median difference is not equal to 0 95 percent confidence interval: 1.116471 6.000000 sample estimates: median of x-y 3.5 Clearly, the null hypothesis is rejected at just about 90% confidence (from the p-value). ---------------------------------------------------------------------------- (ii) While the p-value from the K-S test is high indicating that the data is Normally distributed, the qqplot and the histogram indicate there might be slight non-Normality. Hence, nonparametric tests are to be preferred, as they don't make assumption of the distribution of the data. Wilcox Signed Rank or the Sign test are preferred. -- (iii) delta = 0.42 N = 10 d = delta/sd(x) = 0.12 From Chart VII (g) for N = 10 and d=0.12 Beta = 0.15 Probability of rejecting Ho Mu = 2.03 is 1-0.15 = 0.85 = Power of Test Clearly, the sample size is barely adequate to achieve a power of 0.8. More samples will increase the power of the test. ----------------