BIOSTATISTICS
Please note: this assessment task must be all your own work. Please do not discuss questions and answers in detail with your fellow students.
Assignments must be submitted on-line via the CloudDeakin dropbox facility before 5 pm on 13 June 2014. Assignments must be submitted in a Microsoft Word document or a pdf.
Some of the questions require calculations using Stata. Where you have used Stata for calculations, you should copy the Stata commands and output from the Stata results screen and paste them into your assignment so that the assessor can see how you have derived your answer. Note: this Stata output is required in addition to your answer to the question. Simply pasting in the Stata output will not be considered an adequate answer on its own.
The assignment has five main analysis tasks, numbered 1 to 5, and a concluding paragraph. You should use the five task headings in your answers to clearly identify the answer for each task.
This assignment is worth 40% of the final mark for HSH746 and the marks allocated for each question are shown.
Students should ensure that they keep a spare copy of their work.
Questions
The relationship between lung cancer and cigarette smoking is well known and has been extensively studied. However, cigarette smoking is also believed to be a risk factor for a number of other cancers. The aim of this assignment is to examine the relationship between cigarette smoking and kidney cancer and leukaemia. We wish to identify whether either of these conditions is associated with cigarette smoking.
There are two parts to this assignment. The first is a study of the relationship between cancer mortality rates for selected states in the USA and the average number of cigarettes sold per person in each state. (This is known as an ecological study). The second is a case-control study examining the relationship between smoking and kidney cancer/leukaemia incidence. You are to complete both parts of the assignment.
Part 1: The relationship between cancer mortality and the average number of cigarettes sold per person in selected states in the USA.
Read the data description in the document assignment 3 ecological data description and perform and report on the analysis described below.
The data for this part of the assignment comprise mortality data for kidney cancer and leukaemia (as numbers per 100,000 population) and the average number of cigarettes sold per person for 47 states in the USA. There is also a variable called hi_smoke, which takes the value 1 if the average number of cigarettes smoked in that state is above the national median number of cigarettes sold and 0 if it is below the median. A complete description of the data is in the document assignment 3 ecological data description. You should read this document before starting your data analysis.
The data are in the comma-separated file ecological.csv.
Your task is to decide whether these data show a relationship between cigarette smoking and either of kidney cancer and leukaemia.
Note that all tables and graphs in this assignment should be presented with appropriate headings and footnotes. You may refer to the source of these data as ‘assignment 3: ecological study’.
Analysis tasks
1. Data cleaning
Find any incorrect or inconsistent data in the input data set and set any incorrect or inconsistent data to missing. Report on your findings – how many incorrect data points did you find and what were they? (3 marks)
2. Exploratory data analysis
Are the mortality data approximately normally distributed? You should check this and report it separately for high and low smoking states (ie where hi_smoke = 1 and
hi_smoke = 0). (3 marks)
3. The relationship between cancer rates and high/low numbers of cigarettes sold.
We will carry out a test of whether or not the mean of the mortality rates for states with high numbers of cigarettes sold differs from the mean of the mortality rates with low numbers of cigarettes sold.
3.1 The difference in the mean of the mortality rates between states with high numbers of cigarettes sold and states with low numbers of cigarettes sold.
What probability distribution will you use to calculate the 95% confidence interval for the difference between the mean of the mortality rates between states with high numbers of cigarettes
(hi_state = 1) and states with low numbers of cigarettes sold (hi_state = 0)? Why? (2 marks)
For each of kidney cancer and leukaemia:
Calculate the difference between the mean of the mortality rates between states with high numbers of cigarettes sold (hi_state = 1) and states with low numbers of cigarettes sold (hi_state = 0) and its associated 95% confidence interval. State this to 2 decimal places. (2 marks)
Interpret the 95% confidence interval in words. (1 mark)
What does this confidence interval tell you about the difference between the mean of the mortality rates between states with high numbers of cigarettes sold and states with low numbers of cigarettes sold for this condition (kidney cancer or leukaemia)? (1 mark)
3.2 Is the mean of the mortality rates for states with high numbers of cigarettes sold different to the mean of the mortality rates for states with low numbers of cigarettes sold?
For each of kidney cancer and leukaemia, test the hypothesis that that the mean of the mortality rates between states with high numbers of cigarettes (hi_state = 1) is different to the mean of the mortality rates between states with low numbers of cigarettes sold (hi_state = 0). Report the results, stating the test statistic to 2 degrees of freedom. (10 marks)
4. The relationship between cancer rates and the average numbers of cigarettes sold per person.
Do a scatterplot for, and calculate a correlation between:
a) kidney cancer mortality and the average number of cigarettes sold per person; and
b) leukaemia mortality and the average number of cigarettes sold per person. (2 marks)
Cite correlation to 2 decimal places.
What do these suggest about the relationship between each of kidney cancer mortality and leukaemia and average number of cigarettes sold per person? (2 marks)
Fit a regression model for whichever of leukaemia mortality and kidney cancer mortality appear to be linearly related to the average number of cigarettes sold per person and report the equation with coefficients to 2 decimal places. (1 mark)
Note: you should only fit one regression model. Only one of leukaemia mortality and kidney cancer mortality is linearly related to the average number of cigarettes sold per person.
Is the independent variable a statistically significant predictor of mortality – quote the test statistic and p value in your answer – citing the test statistic to 2 decimal places? (1 mark)
Part 2: Case control study of the relationship between cigarette smoking and kidney cancer or leukaemia.
Ecological studies are subject to potential confounding, so they cannot be regarded as strong evidence for a relationship between two variables. In order to study more closely the potential relationship between leukaemia, kidney cancer and cigarette smoking, two case-control studies were carried out. In the first study 40 people with kidney cancer were identified and 80 similar people without kidney cancer and their smoking status was recorded (as either current smoker or not a smoker). In the second study 40 people with leukaemia were identified and 80 similar people without leukaemia and their smoking status was recorded (as either current smoker or not a smoker).
The data for the first study are in the data set kidney.csv. This data set contains two variables. The first is called cigarette and takes the value 1 if the person is a current smoker and 0 if they are not a smoker. The second is called kidney and takes the value 1 if the person has kidney cancer and 0 if they do not have kidney cancer.
The data for the second study are in the data set leukaemia.csv. This data set also contains two variables. The first is called cigarette and takes the value 1 if the person is a current smoker and 0 if they are not a smoker. The second is called leukaemia and takes the value 1 if the person has leukaemia and 0 if they do not have leukaemia.
Analysis tasks
5. Analysis of the results of the case-study
In question 4, you fitted a regression equation to one of kidney cancer or leukaemia. For that cancer or leukaemia only do the following calculations for the results of its case study.
5.1 Calculate the odds ratio and its associated 95% confidence interval
Calculate the odds ratio and the associated 95% confidence interval for the relationship between smoking and the condition (kidney cancer or leukaemia) to 2 decimal places. (1 mark)
5.2 Test the relationship between the condition and cigarette smoking
Use the odds ratio to test the hypothesis that the proportion of people with the condition who smoke is the same as the proportion of people without the condition who smoke. Report the test statistic to 2 decimal places. (5 marks)
Part 3: Conclusions
Write a brief paragraph (around 250 words) summarising the results of your analyses. This should include:
• Which of kidney cancer or leukaemia is associated with cigarette smoking
• A statement of each point of evidence from your analysis supporting your conclusion.
PLACE THIS ORDER OR A SIMILAR ORDER WITH US TODAY AND GET AN AMAZING DISCOUNT 🙂

+1 862 207 3288 