geek talk
The semivariance is based on the
idea that two points close together are likely to be more similar to
each other than two points farther apart. The degree of similarity
caused by spatial proximity is known as autocorrelation. The formula
for calculating the semivariance is:
Semivariance (distance h) = 0.5 * average[(value at location i - value
at location j)^2]
So in other words the semivariance for all points that are separated by
a given distance, you are calculating half the average variance between
each pair of points (hence 'semi' variance). Doing this for all
distances means that the variance will be calculated between all
possible combinations of points. Then you can plot the semivariance on
the Y axis and the distance (h) on the X axis. The resulting plot is
called a semivariogram. There are some examples of semivariograms on
page 15 of this document:
http://www.esri.com/software/arcgis/arcgisxtensions/geostatistical/pdf/a
irqualityjgra.pdf
If there is autocorrelation in the data (indicating some spatial
structure), then the points towards the left of the X axis will be
closer together because these represent the variances of the points that
are closest together. The points on the right end of the X axis will be
scattered farther apart because points that are farther apart from each
other are more variable. If you fit a line to the scatter of points in
the semivariogram, it typically crosses the Y intercept somewhere above
zero, then rises for some distance until leveling out. You can get some
good information about the spatial structure of the data from the fitted
line.
Where the line crosses the Y axis is called the 'nugget'. In theory two
points separated by zero distance should be identical and have a
variance of zero. However, sampling error and variance that occurs
below the sample interval or resolution imparts variance to points
infinitesimally close together and the nugget gives you an estimate of
that variance. The distance (h) where the fitted line flattens out is
called the 'range'. Points beyond the range are basically not
autocorrelated which is why the line flattens out and the points become
a random scatter. The range tells you at what distance points are no
longer similar based on proximity - in other words, it is an estimate of
the patch size of the phenomena being measured. The value of the
semivariance at the range is called the 'sill' and the sill minus the
nugget is the 'partial sill'. I'm not really sure what those tell you
other than the maximum variance of the autocorrelated values but
somebody probably has figured out some handy use for them. I guess I
should also mention that you don't actually plot all point pairs.
Instead you group them into 'lag bins' (e.g. all points between 20 and
30 cm apart). By playing with the lag distance and seeing how it
changes the semivariogram, you can get an idea of the spatial distances
where the phenomena is most responsive.
Semivariograms are often used to fit a model to the data that allows you
to interpolate the data taking into account the spatial structure
('kriging' is a common technique for this). This lets you plot a
continuous surface of a phenomena that hopefully represents the real
world. Those fancy weather maps that show temperature or barometric
data as a smooth continuous surface are most likely kriged data from
point data sources.
The upshot is that if you took a whole bunch of points of something like
light in vivaria of different sizes, you might be able to use
semivariograms to glean information about how the size of the vivarium
(or any manipulation really) changes the spatial structure of the
measured phenomena. If you haven't figured it out by now, by 'spatial
structure' I mean how the measured phenomena is affected by location.
Okay, now I'm REALLY sorry if anyone read through all of that hoping to
find anything that is useful to frogery. Personally, I now feel like I
have a brain bot.
Brent
_______from the notes and contributions of Frognet Patrons_______