The Variables: Year and Percent of Observations
Each chart depicts the relation between two variables: The year of observation, and the percentage of all observations for that Class. This percentage is computed this way:
- For each year, count how many observations were reported for that particular species.
- For each year, count how many observations were reported for the Class that plant or animal belongs to. "Class" refers to the taxonomic class, such as "Aves" or "Mammalia".
- Divide #1 by #2, and multiply the result by 100 (this is a "percent", which mean per 100.)
We are interested in percentage here because it helps control for the possibility that a change in observation rate for some species is simply the result of a change in observation rate for all similar species, perhaps because there are more observers. We show how this control can be useful with the example of the Java Tree Sparrow.
The Correlation Coefficient
Correlation means co-relation: a change in one thing is related to a change in another. Such a correlation can be positive (more of one is accompanied by more of the other) or negative. When two things are not related at all, their correlation is zero. If the numbers of a species are increasing each year, we would expect that observations of that species also increase each year: a positive correlation between observations and year of observation.
A "Correlation Coefficient" is a number that is calculated from two variables (sets of numbers) using a statistical formula. Correlation coefficients can vary numerically between 0.0 and 1.0. The closer the correlation is to 1.0, the stronger the relationship between the two variables. A correlation of 0.0 indicates the absence of a linear relationship.
A positive correlation coefficient means that as variable A increases, variable B increases, and conversely, as variable A decreases, variable B decreases. In other words, the variables move in the same direction when there is a positive correlation. A negative correlation means that as variable A increases, variable B decreases and vice versa. In other words, the variables move in opposite directions when there is a negative correlation. If we find a negative correlation between Percent of Observations and Time, it suggests that the population may be declining; if we find a positive correlaton, it suggests the population may be increasing.
Correlation coefficients are only concerned with linear relationships. If a curvilinear relation existed (eg., an animal population first increased, and then decreased), the coefficient might prove to be close to 0, for no straight line would say much about such a curve.
Significance
"Statistical significance" is a measure of the likelihood that a result might be due to chance, or whether it reflects some underlying truth. A result is significant if it is unlikely to have occurred by chance.
The correlation between the height and age of children is likely to be genuine: children grow taller as they age. P-values (probability values) are measures of the likelihood that an observed correlation is just chance. If in fact there is no relation between two variables, you might still find one in a little study, due to random variation. We say that a result is statistically significant if we could obtain such a result 95% of the time that we did such measurement.
The expression "P<.05" means that less than 5% of the time would you find a correlation like you did if there was no relation at all.
Statistical significance does not guarantee practical significance. When we find that observations are declining with time, our hope is merely that such a finding will lead to closer examination of this species.
Statistical significance can be determined by calculating the F-ratio, which is the ratio of explained variance to unexplained variance. The larger the F ratio, the more a straight line through the scatterplot accounts for the points plotted. When the F ratio produced from a certain number of measurement pairs exceeds some particular value, the result is statistically significant. For example, if there are 30 pairs of measurements, then an F ratio of 4.17 is significant at the .05 level. In other words, if we obtain an F ratio of more than 4.17 with 30 years of observations, then we believe that less than 5% of the time would we find a correlation like we did if there was no relation at all.
Last Revised: April 17, 2007 |