# Correlation

A statistical technique for determining the extent to which variations in the values of one variable are associated with variations in values of another. For example, if we found that relatively high values of one variable tended to be associated with relatively high values of another, and also that relatively low values tended to occur together, we would say that the variables were closely correlated or associated. Statisticians have made this notion precise and have devised methods of measuring the degree of association, the most frequently used of which is the correlation coefficient (or, strictly, the productmoment correlation coefficient). This coefficient measures the degree of association on a scale which varies between – 1 and + 1 inclusive. If the sign of the coefficient is negative, this tells us that relatively high values of one variable tend to be associated with relatively low values of the other and vice versa: i.e. there is an inverse association. If the sign of the coefficient is positive, this tells us that relatively high values of both variables tend to occur together, as do relatively low values (throughout this explanation we are using ‘relatively high’ and ‘relatively low’ in the sense of ‘above average’ and ‘below average’ respectively). The actual value of the number tells us how strong the association is. A value close to ± 1 tells us that the variables are closely associated. On the othen hand a value close to 0, whether positive or negative, indicates that relatively high values of one variable are as often associated with relatively high values as relatively low values of the other. Stronger and stronger degrees of association are indicated as the coefficient varies from O to ± 1. The usefulness of correlation analysis lies "’in testing hypotheses about the relationships between variables. Thus we could assert the following hypotheses: (a) the higher the household income, the higher will be household expenditure; (b) the liigher the rate of interest, the lower will be the leve! of business investment; (c) the greater the rate of cigarette smoking, the greater will be the incidence of !ung cancer; and (d) the !arger the size of the farnily, the shorter will be the duration of each child’s full-time education. These hypotheses could be tested by measuring values of the variables and then calculating the correlation coefficients. These would show us how closely the variables were associated in practice. Statisticians stress several limitations of correlation a:nalysis in terms of the correlation coefficient here described, the most important of which is that the correlation coefficient does not itself prove anything about causation; it is possible for values of variables to be associated without there being a causal connection flowing from one variable to another. One reason for this may be that both variables are in fact determined by some third variable; changes in values of the latter cause changes in those of the former to be associated, without there being any causal relationship between them. An important special case of this is where time is the third variable: two variables may have strong timetrends which lead to their being highly correlated without there necessarily being a causal relation. Alternatively, a high correlation may arise by pure chance; as, for example, the well-known high correlations between the num ber of storks nesting in Scandinavia and the birth rate in London. Thus correlation does not prove causation, and we are invariably thrown back on theoretical arguments for interpretation of the ‘facts’.