Statistics Case Study

This guide will assist you in completing the case studies for this semester. Please refer to it often to ensure that you are
completing the case studies appropriately. Keep in mind that it is only a guide. In particular, the rubric specified in this
guide is only being provided to give student an idea of how we have graded these assignments in the past. Students
should NOT expect future rubrics to following these grading guidelines.
This guide is separated into four sections: instructions, example case study assignment, example case study
solutions/rubric, and examples of actual completed case studies and their scores.
Instructions:
It is imperative that students follow the instructions for the case studies exactly as they are given below. These
instructions have been created to ensure the student’s files are read correctly by Carmen, that TA’s can read and grade
the files quickly and efficiently, and to ensure academic integrity. You may receive points for following these directions.
For instance, if you answer a question correctly but do not do so in full sentences, you may receive partial or no credit.
 Case study submissions MUST be submitted in PDF format. DO NOT upload your Excel file. IT WILL NOT BE
GRADED!!! This is for your benefit. Carmen does not always read Word files correctly.
 Under no circumstances should you include the questions in your file with the answers. Just include your
answers, given according to these instructions, and numbered accordingly.
 You do not have to type your case study, although doing so will create a more professional final product.
 Files should be uploaded to the Dropbox on Carmen by the due date. You may not email the Case Study to your
TA – you must upload the file to the Dropbox in Carmen. (You can find the Dropbox under the ACTIVITIES section
on Carmen, right next to the CONTENT button.)
 Please give your Case Study the following name: LastName_FirstName_CaseStudy#. (Example: Jennifer Mann’s
first case study would be labeled Mann_Jennifer_CaseStudy1).
 Make sure to include your name in the Case Study itself.
 Case studies are due BY 5:00 PM. This means they must be submitted BEFORE 5 p.m. Case studies submitted
up to 1 hour late will receive an automatic 25% penalty. Case studies cannot be submitted more than 1 hour
late. THERE WILL BE NO EXCEPTIONS TO THIS POLICY!!!
 You will receive an email informing you that you have successfully submitted a file to Carmen. Please keep this
email in case there are any technical issues with your submission.
 You may resubmit your assignment as many times as you’d like without penalty. Only the last file that you
submit will be kept in the Dropbox.
 It is the student’s responsibility to ensure that the file uploaded in the Dropbox is correct, complete, and
accurate. Please make sure to check that the file you submit is complete and accurate before the due date has
passed. You can access the file once it is submitted by going back to the Dropbox. TA’s will only grade the file
that appears in the Dropbox. Resubmissions after the deadline will not be allowed for any reason.
 Answers should be given in complete sentences, use appropriate units, and check all conditions, when
necessary. Remember, treat these case studies as if they are reports you are providing in a professional
environment.
 When performing hypothesis tests, be sure to specify ALL parts of a hypothesis test, including both hypotheses,
the test statistics, the p-value, the decision to reject or fail to reject the null hypothesis, and your final
conclusion.
 When reporting confidence intervals, be sure to give your answer as an interval: (lower value, higher value).
 Remember that all confidence levels are 95% and all significance levels are 0.05 unless otherwise noted.
 You should work on this case study on your own with no help from others, including tutors in the tutoring lab,
your TA, or your fellow classmates. You may ask for clarification about a question, but you may not ask TA’s if
your answers are correct or complete. Students who do not follow this policy will be reported to the
Committee on Academic Misconduct.
 Remember, all of the Case Study grades will count toward your final grade in the course. No Case Study scores
will be dropped, so plan accordingly.
Case Study Assignment Example
STAT 1430 – Case Study #1 – This is the actual assignment from Autumn 2015
The goal of this case study is to take what you’ve learned and apply it to a real world data set.
On Carmen, under the “Case Studies” section, you will find an Excel data set called “CEO_Sal”. Use this data set to
address the following questions.
1. We don’t know how this data was collected. Explain briefly (2-3 sentences) why this is a problem when it comes
to using this data set for inference.
2. Assume moving forward that we know that the data was produced using a valid simple random sampling
method. Further assume that the data were collected on 60 CEO’s of publicly traded companies here in
Columbus. Find the following pieces of information:
a. What is the mean age of the CEO’s? The standard deviation of ages? The median age? The IQR of the
ages?
b. What is the mean salary of the CEO’s? The standard deviation of salary? The median salary? The IQR of
the salary?
c. Find the statistics needed to make a boxplot for ages. (Do not make the boxplot.)
d. Create a histogram of CEO ages, selecting the number of bins that you feel are appropriate for the data.
Briefly describe the histogram. Based on the graph, which do you feel is the more appropriate measure
of central tendency – the mean or the median? Include the histogram in your report.
e. Create a histogram of the CEO salaries, selecting the number of bins that you feel are appropriate for
the data. Based on the graph, which do you feel is the more appropriate measure of central tendency –
the mean or the median? Include the histogram in your report.
3. Briefly discuss any interesting results (2-3 sentences). These can be any details about the statistics you created
above which stand out to you. Use the statistical knowledge you have gained in class to pick out specific details.
Case Study Solutions/Rubric Example
STAT 1430 – Case Study #1 – SOLUTIONS/RUBRIC
1. Not knowing how the data was collected means we have no way of knowing what kinds of bias may or may
not be present. This means that any information we get from the data should not be considered conclusive.
(In statistical terms, we can’t rely on the data for valid inference.)
(WORTH 3 POINTS)
 Just one sentence, +1 point
 2-3 sentences, + 1 point
 Mention bias or randomness in any way, + 1 point
2. See answer for each part below:
a. The mean age of CEO’s in Columbus, Ohio for this data set was about 51.47 years. The CEO ages had a
standard deviation of about 8.92 years. The median age of CEO’s in Columbus, Ohio for this data set
was 50 years. The interquartile range of ages for CEO’s was 11.25 years.
(WORTH 2 POINTS)
 All four numbers (mean, SD, median, and IQR), +1
o Missing any of the numbers, no point
 Included units for all the numbers, or indicated in some way that the units for each number is
the same, +1
o If they only list units once or twice, no point
b. The mean salary of CEO’s in Columbus, Ohio for this data set was about $404,169. The CEO salaries
had a standard deviation of about $220,534. The median salary of CEO’s in Columbus, Ohio for this
data set was $350,000. The interquartile range of salaries for CEO’s was $289,500.
Not graded
c. The youngest CEO (the minimum age in the data set) was 32 years old. 25% of the CEO’s (the first
quartile) were 45.75 years of age or younger. 50% of the CEO’s (the median) were 50 years of age or
younger. 75% of the CEO’s (the third quartile) were 57 years old or younger. The oldest CEO (the
maximum age in the data set) was 74 years old.
(WORTH 2 POINTS)
 All 5 numbers (min, Q1, med, Q2, max), +1
o If they are missing any of the numbers, no point
 Included units for all the numbers, or indicated in some way that the units for each number is
the same, +1
o If they only list units once or twice, no point
d.
5 0
10
15
20
35 40 45 50 55 60 65 70 75
Number of CEO’s
Ages, in years
Ages of CEO’s in Columbus, n=60
This histogram is approximately symmetric. It has a relatively small spread, since the data values are
all clusters fairly close to the mean and median ages. Since the data are approximately symmetric,
either the mean or the median can be used as a measure of central tendency. Most people, however,
will use the mean in this case since most individuals can readily understand what a mean is.
** Note to students: If you used less bins, it is likely that you got a histogram that appeared skewed,
like the histogram below. However, the descriptive statistics you found in part (a) above should have
told you that the histogram should look symmetric. Furthermore, your histograms should have had
appropriate titles which included the sample size. The axes should have been clearly labeled. Your
histogram should not have had gaps between the bars. Finally, there should have been no “More” bin
or empty bins in your histogram.
Histogram with too few bins that looks skewed:
Histogram that Excel creates, including bins that Excel chose, without any modifications:
(WORTH 10 points)
The write up is worth 2 points:
– Mentioned shape in any way, + 1 point
5 0
10
15
20
25
30
40 50 60 70 80
Number of CEO’s
Ages, in years
Ages of CEO’s in Columbus, n=60
5 0
10
15
20
32 38 44 50 56 62 68 More
Frequency
Bin
Histogram
Frequency
– Mentioned center or spread in any way, +1 point
Having a histogram is worth 8 points:
– No gaps, +1
– Appropriate title, + 1 point (Histogram alone is NOT appropriate)
– Appropriate y-axis name, + 1 point (Frequency alone is NOT appropriate)
– Appropriate x-axis name, + 1 point (Bins is NOT appropriate)
– Sample size, + 1 point (can be in the write up, too)
– Symmetric, +3 points
– OR
– Skewed, +2 points
– OR
– Default Excel histogram, without modifying the graph in any way, + 1 point
e. Create a histogram of the CEO salaries, selecting the number of bins that you feel are appropriate for
the data. Based on the graph, which do you feel is the more appropriate measure of central tendency –
the mean or the median? Include the histogram in your report.
Based on the histograms, the median salary would be the more appropriate value to report, since the
data is skewed.
There are several options of histograms below. The first has bin widths of $50,000; the second has bin
widths of $100,000; the third has bin widths of $200,000; the last is the default histogram created in Excel
when the user does not specify the bins. ** It is important to note that the data set was missing data for
one of the CEO’s, and therefore the sample size for this histograms is smaller than the sample sizes of
ages. ** Each of the histograms looks right skewed, because there are a few higher salaries in the data set.
The histograms also appear to be quite spread out. For the measures of center, see the part (b) above.
Notice that in the third histogram (the one with the fewest bins), while the distribution looks skewed, it
does not show the large gap (i.e., the part of the histogram where there are not columns for salaries
ranging from $900,000 – $1,100,000 in the first graph and for salaries ranging from $1,000,000 – $1,100,000
in the second). This is an important omission, and for that reason a histogram with so few columns is not
the best to use to report on the distribution of salaries.
The first graph also shows that a fairly high number of CEO’s receive salaries between $500,000 and
$550,000, as well as between $700,000 and $750,000, at least when you look at the surrounding columns.
This is not as apparent in the second histogram.
8 6 4 2 0
10
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
Number of CEO’s
Annual Salary, in Thousands of Dollars
Salaries of CEO’s in Columbus, n=59
Here the problem with the default options becomes apparent, as the decimal places that Excel chooses make
the histogram difficult to understand.
Not graded
5 0
10
15
20
100 200 300 400 500 600 700 800 900 1000 1100 1200
Number of CEO’s
Annual Salaries, in Thousands of Dollars
Salaries of CEO’s in Columbus, n=59
5 0
10
15
20
25
30
35
200 400 600 800 1000 1200
Number of CEO’s
Annual Salary, in Thousands of Dollars
Salaries of CEO’s in Columbus, n=59
5 0
10
15
20
25
Frequency
Bin
Histogram
Frequency
3. Briefly discuss any interesting results (2-3 sentences). These can be any details about the statistics you created
above which stand out to you. Use the statistical knowledge you have gained in class to pick out specific details.
Answers here may vary.
One interesting element of the data is that the ages were fairly symmetric, which might be surprising since
people typically tend to think that CEO’s are older individuals.
The minimum salary for the salary data set was much lower than some would expect, at $21,000/year. The
maximum salary was also very large, at $1,103,000/year. These outliers certainly affected the look of the
histograms.
Another interesting facet of the data is that the standard deviation and interquartile range for the salaries
were large, both of which indicate the large variation among the different salaries.
The missing data point is also of interest. It is reasonable to question how the results might change if the data
for this 47 year old CEO could have been included.
(WORTH 3 PTS)
 Less than 2 sentences, +1.5 point
 At least 2-3 sentences, full credit
Case Study Student Examples
Below are examples of actual student responses submitted last semester, along with their corresponding grade.
Student #1 Received 18/20
STAT 1430
Case Study #1
1. It is a problem that we don’t know how this data was collected because we don’t know for sure whether
or not this was a random sample. We do not know if this sample is of men, women, or both and we also
don’t know where, when, and why this data was collected. This is an obvious problem because without
that information we have no reason for the data; it essentially tells us nothing but numbers listed as
ages and numbers listed as salaries.
2.
a) The mean age of the CEO’s is 51.47
The SD of the ages is 8.92
The median age is 50
The IQR of the ages is 11.25
b) The mean salary is 404.17
The SD of the salaries is 220.53
The median salary is 350
The IQR of the salaries is 289.5
c) The minimum for ages is 32
The 1st quartile is 45.75
The 2nd quartile is 50
The 3rd quartile is 57
The maximum for ages is 74
d)
The histogram shows the ages of 60 CEO’s in the Columbus area. The ages range from 32 to
74 and are divided into 7 groups in increments of 7.
The most appropriate measure of central tendency for this graph would be the mean. The data
is not affected by outliers and has a fairly low variability, both good when using the mean to
interpret data. The mode would not be appropriate, even though it provides a fairly similar result,
because it does not take in to consideration the ages of younger or older CEO’s.
e)
The histogram shows the salaries of 60 CEO’s in the Columbus area. Salaries range from $21,000 to
$1,221,000 and the data is divided into 9 groups with increments of 150.
5 0
10
15
20
25
32 39 46 53 60 67 74
#of CEO’s
Ages
60 CEO Ages
5 0
10
15
20
25
21 171 321 471 621 771 921 1071 1221
# of CEO’s
Salary (thousands of dollars)
Salaries of 60 CEO’s
The mode would be a better measurement of central tendency for this graph because the mean
becomes affected by the outliers. The mode will give us the most common salary of CEO’s and will give
us a general idea of what the average would be without outliers.
3. I found it interesting that the ages of the CEO’s were fairly symmetrical and were also not skewed.
While the salaries also had some symmetry, they were affected by outliers, skewing the data to the
right. It would be interesting to see a scatterplot of this data to see if there is any correlation between
the age of the CEO’s and their salaries. Taking a quick glance at the data would give you the thought
that there is not much correlation considering the maximum salary of $1,221,000 belongs to a 57 year
old CEO, which is closer to the mean of the ages.
Student #2 – Received 14/20
Statistics 1430
Case Study 1
September 14, 2015
1. The problem that arises when we do not know how the data was collected is it allows for a lot of
variables that could change the data. This becomes a problem when interpreting the data because it
could very well extremely skewed by the data collecting process.
2. A. The mean is 51.5, the standard deviation is 8.9, the median is 50, and the IQR is 11.3.
B. The mean is 404.2 thousands of dollars, the standard deviation is 220.53 thousands of dollars, the
median is 350 thousands of dollars, and the IQR is 289.5 thousands of dollars.
C.
D.
This histogram is a fairly good bell curve representation with a slight skew right. The best item to use
would be the mean.
8 6 4 2 0
10
12
32 35 38 41 44 47 50 53 56 59 62 65 68 71 74
Frequency
Ages
CEO Ages
Frequency
E.
This histogram shows the CEO salaries and is an example of a skewed right histogram. The best
describer is the median.
3. I saw that the average age was a lower than I would have expected. As well as the difference between
the highest and lowest salaries was substantially more than I thought.
5 0
10
15
21
121
221
321
421
521
621
721
821
921
1021
1121
Frequency
Salary
CEO Salaries
Frequency

TAKE ADVANTAGE OF OUR PROMOTIONAL DISCOUNT DISPLAYED ON THE WEBSITE AND GET A DISCOUNT FOR YOUR PAPER NOW!

© 2020 customphdthesis.com. All Rights Reserved. | Disclaimer: for assistance purposes only. These custom papers should be used with proper reference.