LINEAR REGRESSION
1. Introduction
o very often when 2 (or more) variables are observed, relationship between them can be visualized
o predictions are always required in economics or physical science from existing and historical data
o regression analysis is used to help formulate these predictions and relationships
o linear regression is a special kind of regression analysis in which 2 variables are studied and a straight-line relationship is assumed
o linear regression is important because
1. there exist many relationships that are of this form
2. it provides close approximations to complicated relationships which would otherwise be difficult to describe
o the 2 variables are divided into (i) independent variable and (ii) dependent variable
o Dependent Variable is the variable that we want to forecast
o Independent Variable is the variable that we use to make the forecast
o e.g. Time vs. GNP (time is independent, GNP is dependent)
o scatter diagrams are used to graphically presenting the relationship between the 2 variables
o usually the independent variable is drawn on the horizontal axis (X) and the dependent variable on vertical axis (Y)
o the regression line is also called the regression line of Y on X
2. Assumptions
o there is a linear relationship as determined (observed) from the scatter diagram
o the dependent values (Y) are independent of each other, i.e. if we obtain a large value of Y on the first observation, the result of the second and subsequent observations will not necessarily provide a large value. In simple term, there should not be auto-correlation
o for each value of X the corresponding Y values are normally distributed
o the standard deviations of the Y values for each value of X are the same, i.e. homoscedasticity
3. Process
o observe and note what is happening in a systematic way
o form some kind of theory about the observed facts
o draw a scatter diagram to visualize relationship
o generate the relationship by mathematical formula
o make use of the mathematical formula to predict
4. Method of Least Squares
o from a scatter diagram, there is virtually no limit as to the number of lines that can be drawn to make a linear relationship between the 2 variables
o the objective is to create a BEST FIT line to the data concerned
o the criterion is the called the method of least squares
o i.e. the sum of squares of the vertical deviations from the points to the line be a minimum (based on the fact that the dependent variable is drawn on the vertical axis)
o the linear relationship between the dependent variable (Y) and the independent variable can be written as Y = a + bX , where a and b are parameters describing the vertical intercept and the slope of the regression line respectively
5. Calculating a and b
o
o where X and Y are the raw values of the 2 variables
o and are means of the 2 variables
6. Correlation
o when the value of one variable is related to the value of another, they are said to be correlated
o there are 3 types of correlation: (i) perfectly correlated; (ii) partially correlated; (iii) uncorrelated
o Coefficient of Correlation (r) measures such a relationship
o the value of r ranges from -1 (perfectly correlated in the negative direction) to +1 (perfectly correlated in the positive direction)
o when r = 0, the 2 variables are not correlated
7. Coefficient of Determination
o this calculates the proportion of the variation in the actual values which can be predicted by changes in the values of the independent variable
o denoted by , the square of the coefficient of correlation
o ranges from 0 to 1 (r ranges from -1 to +1)
o expressed as a percentage, it represents the proportion that can be predicted by the regression line
o the value 1 – is therefore the proportion contributed by other factors
8. Standard Error of Estimate (SEE)
o a measure of the variability of the regression line, i.e. the dispersion around the regression line
o it tells how much variation there is in the dependent variable between the raw value and the expected value in the regression
o this SEE allows us to generate the confidence interval on the regression line as we did in the estimation of means
9. Confidence interval for the regression line (estimating the expected value)
o estimating the mean value of for a given value of X is a very important practical problem
o e.g. if a corporation’s profit Y is linearly related to its advertising expenditures X, the corporation may want to estimate the mean profit for a given expenditure X
o this is given by the formula
o at n-2 degrees of freedom for the t-distribution
10. Confidence interval for individual prediction
o for technical reason, the above formula must be amended and is given by
An Example
Accounting X
Statistics Y
X2
Y2
XY
1
74.00
81.00
5476.00
6561.00
5994.00
2
93.00
86.00
8649.00
7396.00
7998.00
3
55.00
67.00
3025.00
4489.00
3685.00
4
41.00
35.00
1681.00
1225.00
1435.00
5
23.00
30.00
529.00
900.00
690.00
6
92.00
100.00
8464.00
10000.00
9200.00
7
64.00
55.00
4096.00
3025.00
3520.00
8
40.00
52.00
1600.00
2704.00
2080.00
9
71.00
76.00
5041.00
5776.00
5396.00
10
33.00
24.00
1089.00
576.00
792.00
11
30.00
48.00
900.00
2304.00
1440.00
12
71.00
87.00
5041.00
7569.00
6177.00
Sum
687.00
741.00
45591.00
52525.00
48407.00
Mean
57.25
61.75
3799.25
4377.08
4033.92
Figure 1: Scatter Diagram of Raw Data
Figure 2: Scatter Diagram and Regression Line
Interpretation/Conclusion
There is a linear relation between the results of Accounting and Statistics as shown from the scatter diagram in Figure 1. A linear regression analysis was done using the least-square method. The resultant regression line is represented by in which X represents the results of Accounting and Y that of Statistics. Figure 2 shows the regression line. In this
example, the choice of dependent and independent variables is arbitrary. It can be said that the results of Statistics are correlated to that of Accounting or vice versa.
The Coefficient of Determination is 0.8453. This shows that the two variables are correlated. Nearly 85% of the variation in Y is explained by the regression line.
The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two variables are positively correlated (Y increases as X increases).
PLACE THIS ORDER OR A SIMILAR ORDER WITH US TODAY AND GET AN AMAZING DISCOUNT 🙂

+1 862 207 3288 