For this exercise produce a script file with code for reading the data into the data frame and all the analyses that you do. Annotate the
script with comments (precede each with #) to indicate what each command does.Put the script at the bottom of your answer document.
1. Given below are the data for breast cancer incidence per 100,000 (age-standardized rate*) for Saudi women from the years 2001 to 2010.
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
12 13.9 14.6 16.1 18.4 18.1 20.5 20.3 23.4 24.9
Enter the data into R and produce an appropriate scatterplot with a linear regression line. Include the plot in your answer.
2. Perform a regression on the breast cancer data (be sure that you do the regression the right way round – this means correctly choosing
which variable is the independent/predictor variable).
a. write down the equation for the relationship between the variables
b. give a summary of the analysis as you would do for a research report
3. From the regression analysis:
a. predict the incidence for the year 2011
b. predict the incidence for the year 2040
c. predict the incidence for the year 1990
d. what is wrong with the prediction made for the year 1990?
4. The effectiveness of a new therapy was tested on a randomly selected sample of 10 patients. Each patient’s health was assessed before and
after a week’s course of therapy. The scores are given below (using a 0 – 20 scale, low scores indicate better health):
Before 15 12 19 14 10 16 15 8 14 17
After 14 11 15 10 11 8 12 6 11 12
a. Use R to determine whether there was any significant change in health after the therapy. Summarize fully the results of your analysis
as you would do for a research report.
b. If the change in health was not due to the therapy, what else could have changed it?
*age-standardised rate means the number of cases per 100,000 women adjusted according to the age profile within the Saudi population
This exercise will count for 15% of the total mark

+1 862 207 3288 