Introduction to MATLABTM, Probability and Statistics

Aim

The aim of this session is to continue your introduction to MATLABTM, specifically considering aspects

of probability and statistics. Refer to the handout on this topic.

Introduction

As per the session on Matrices and Complex Numbers, you must record everything you do in this session,

noting down the results and discussing them as you go along. This should be done by pasting Matlab

code, figures and results into a Word document called ‘Statistics WorkReturn Template’ you can download

from Blackboard.

Exercise 1: Generate random data in Matlab (30%)

You can use MatLab to generate random data that are normally distributed (randn) or uniformly

distributed (rand). To generate a random variable x which is normally distributed with length L, with

mean xm and standard deviation xd, the Matlab command is
>> x = xm + xd*randn(L,1);
Now, generate a random variable x with 1000 data points, with x being normally distributed with a

theoretical mean of 10 and a standard deviation of 0.5. Plot x. Calculate the mean and the standard

deviation of x in MatLab. Are they the same as the theoretical mean and standard deviation? Explain

why.

Next, plot a histogram of x in MatLab using the MatLab command hist(x). The data should be

approximately normally distributed.

Use the same theoretical normal distribution as above, but reduce the sample size from 1000 to (a) 20,

and (b) 100. For each sample size, generate 500 data sets by writing a loop in MatLab. In each case,

calculate and collect the mean of the sample. Use the subplot command in MatLab to plot the histogram

of the sample mean for each sample size so that the two histograms are in the same MatLab figure. To

make visual comparison easier, make sure both histograms have the same range on the horizontal axis and

that on the vertical axis respectively. Use the Central Limit Theorem to explain your findings.

Exercise 2: Effect of Variance (25%)

This exercise aims to demonstrate the effect of variance in the data on our ability to detect

significance.

Generate a normally distributed random variable of size 50 with a theoretical mean of 1.1 and a

standard deviation of 0.8. Plot the data. Use the MatLab command ttest to test the hypothesis that the

data comes from a normal distribution with a mean of 1, even though you know the theoretical mean is

1.1. Record your findings. Repeat the exercise but replace the standard deviation by 0.1. Compare the

two cases and comment on the effect of the variance in the data on hypothesis testing.

Exercise 3: Paired t-test vs two- t-test (25%)

This exercise aims to illustrate the importance of selecting the correct t-test when comparing two sets

of data.

Generate a random variable x of size 20, with a theoretical mean of 100, and a standard deviation of 4.

Then generate a variable y=2+x+e, where e represents noise on the data which is normally distributed

with a mean of zero and a standard deviation of 1. You have just generated 20 pairs of data, with each

element in y linked to the corresponding element in x by adding a number 2 with some uncertainty due to

noise. Plot x and y superimposed with each other in MatLab.
Use the MatLab command ttest to test if x and y are significantly different at 5% level of

significance. Now use the MatLab command ttest2 for the same hypothesis test. Discuss your results.

Exercise 4: Correlation (20%)

Within your class, collect the body height (in cm), shoe size (in UK size) and heart rate per minute of

everyone. Present the data as a Table in your report. Plot height vs shoe size and height vs heart rate

as two figures in MatLab. Use MatLab to find the correlation coefficient between height and shoe size,

and between height and heart rate. In each case, test if the correlation is significant at 5% level of

significance.

© 2020 customphdthesis.com. All Rights Reserved. | Disclaimer: for assistance purposes only. These custom papers should be used with proper reference.