Text 86, 207 rader
Skriven 2004-09-17 16:37:00 av Michael Ragland (1:278/230)
Ärende: Re: Dawkins gives incorre
=================================
Guy Hoelzer <hoelzer@unr.edu> wrote or quoted:
in article ci7mqk$24qd$1@darwin.ediacara.org, Tim Tyler at
tim@tt1lock.org
Guy Hoelzer <hoelzer@unr.edu> wrote or quoted:
in article chvng2$2hqs$1@darwin.ediacara.org, Tim Tyler at
tim@tt1lock.org:
Guy Hoelzer <hoelzer@unr.edu> wrote or quoted:
in article chsg65$1hqg$1@darwin.ediacara.org, Tim Tyler at
tim@tt1lock.org:
Guy Hoelzer <hoelzer@unr.edu> wrote or quoted:
GH:
Are you arguing that treating p_i as frequency is almost never done, or
that this practice has not increased in frequency? Or are you just
arguing that you don't think it has become sufficiently common to call
it a transition?
p_i is /always/ the probability of the i'th symbol arising.
TT:
Sometimes the probabilities are determined completly by symbol
frequencies - but the p_i's are never frequencies.
If they are "determined completely by by symbol frequencies" then they
are frequencies.
A frequency is normally a measurement of the number of times that a
repeated event occurs per unit time.
GH:
I am aware of that definition, but I am using a different conventional
meaning. This distinction might be a source of some of our differences.
The definition I am using is the one I believe to be most commonly used
in the biological sciences, and it well represented by the one expressed
by "A Dictionary of Ecology, Evolution, and Systematics." It reads:
"The number of items belonging to a category or class; the number of
occasions that a given species occurs in a series of examples."
This dictionary does not list any other definitions for "frequency."
TT:
I note that that still doesn't result in a series of numbers that add up
to 1.0.
GH:
How do you explain the information theoretical methods of analysis, such
as the Akaike Information Content measure, that have been growing fast
in application. It is fundamental to these methods that they yield
precisely the same result in the hands of every scientist, so that they
are repeatable and verifiable. The role of perceiver, which was
Shannon's initial concern, has been dropped from information theory by
many.
TT:
I'm not sure about the Akaike Information Criterion, but - as far as I
can tell - is escapes observer-dependence by completely specifying a
particular hypothetical observer (its model) and then asking how
effective that observer is at predicting the data.
In other words, the term "information" in its title appears to refer not
to the information gained by someone measuring its value - but to the
information that can be expected to be gained by a completely- specified
hypothetical observer witnessing the data stream.
GH:
A good resource for learning about AIC and its application (IMHO) is the
book:
Burnham, K. P., and D. R. Anderson. 1998. Model selection and inference:
a practical information-theoretic approach. Springer-Verlag, New York,
New York, USA. 353 pp.
The authors explain why Kullback-Leibler information is more fundamental
than Shannon information and show that it is more general (it includes
Shannon information). It is Kullback-Leibler that is assumed under the
AIC paradigm, which does not posit an hypothetical observer, according
to the authors. Instead, they argue, the set of AIC values (or adjusted
analogues, such as AICc) that you get out of a comparative analysis
express the relative distance of competing models from objective Truth.
That claim took me by surprise when I first ran across it, but you
really have to examine the theory closely to make an informed judgment
about it.
TT:
I had never heard of Kullback-Leibler information.
MR:
Here's a brief source on it: "Date: 13. - 16. September 04
Location: Institute of Environmental Sciences, UniZH
STATISTICAL MODEL SELECTION AND INFERENCE: A PRACTICAL COURSE
Prof. David Anderson, author of the book Â"Model Selection and
Multi-Model InferenceÂ"
Model selection using information criteria is an alternative to
traditional null hypothesis testing that connects information theory and
likelihood theory. Traditional statistical models like regression and
ANOVA use null hypothesis tests and an arbitrary probability value P of
0.05 to decide whether a factor has an effect or not. These models are
sensitive to Type I and Type II errors. Model selection uses
Akaike¹s information criterion (AIC) to choose the best model from
a set of candidate models, and AIC is used to decide whether a factor
should be included in a model that describes the structure of the
data. AIC-based model selection can be applied to experimental and
observational data. The course is a practical course. The aim is to
learn about model selection and how to use it The course focuses on
application, not theory. David Anderson, the teacher of the course, is
the leading expert in the field of model selection. Because he worked in
Fish and Wildlife Departments he knows the needs of biologists. During
the course participants will have the possibility to analyse their
data.
In particular, the course will cover the following topics:
- Some philosophy about science and data analysis issues
- Kullback-Leibler information and its centrality in the sciences
- Estimators of K-L information (AIC, AICc, QAICc, and TIC)
- Model selection, the principle of parsimony, bias/variance trade-offs
- Strength of evidence for models in the candidate set,
- Scaling models (delta values)
- Akaike weigths (the likelihood of model i, given the data)
- Incorportating model selection uncertainty into estimates of precision
- Multi-model Inference (MMI) -- making formal inference from several
models with special sessions on
- Likelihood theory, maximum likelihood estimates, etc
- Model building
- Null hypothesis testing, problems, limitations
TT:
I visited http://googleduel.com/ with the terms
"shannon information" and "Kullback-Leibler information"
Shannon information won by more than 100 to 1.
MR:
I tried a variant and the results were "Google Duel
(I left out Kulback Leibler "Information".
And the Winner Is...
Kullback Leibler (7,430)
Shannon Information (3,510)
TT:
Maybe an option for you would be to use one of the terms referring to
this quantity - if it is what you are talking about.
The terms "relative entropy", "divergence", "directed divergence", and
"cross entropy" all appear to refer to this metric.
TT:
The metric represents a measure of distance between two probability
distributions. If the distributions are given, then metric does not
depend on who measures it.
However Shannon information does not normally consider the probabilities
it is considering to be given and agreed-upon in advance - instead it
allows the possibility that different observers may have different
information about the events and may make different estimates of their
probabilities. In the terminology of relative entropy, they would be
said to be considering different models.
If you caluclate the /relative entropy/ between the predictions of
different models and some fixed set of observations then you would
indeed arrive at different values.
GH:
They always add up to 1.0 - like probabilities do.
TT:
Like frequencies always do.
GH:
Frequencies are usually measured in Hertz - and never add up to a
dimensionless quantity such as 1.0.
TT:
Indeed, adding the values of frequencies together is usually a bad move:
since 1hz+2hz != 3hz.
GH:
Under the definition provided above frequencies must always add to one
if you have included all possible types in your data. For example, if
you consider the frequency of each allele present in a data set, those
frequencies must add to one.
TT:
How could they possibly - if the frequency is defined to be a count of
the number of occurrences of an item in a set?
Frequencies have no upper bound. They can become as large as you like.
You appear to be talking about a proportion of some sort - not a
frequency.
Your unorthodox definition of frequency appears to matches your unusual
definition of information. This sort of thing seems bound to cause
communication problems :-|
GH:
It doesn't appear to be what you are talking about - but it shares the
element of observer-independence (though it tends to become
language-dependent in the process).
You are correct that this is not exactly what I am talking about, but I
do not see how it is observer-dependent. [...]
TT:
I said it had "observer-*in*dependence" not "observer-dependence".
--
__________
|im |yler http://timtyler.org/ tim@tt1lock.org Remove lock to
reply.
"It's uncertain whether intelligence has any long term survival value.
Bacteria do quite well without it."
Stephen Hawking
---
þ RIMEGate(tm)/RGXPost V1.14 at BBSWORLD * Info@bbsworld.com
---
* RIMEGate(tm)V10.2áÿ* RelayNet(tm) NNTP Gateway * MoonDog BBS
* RgateImp.MoonDog.BBS at 9/17/04 4:37:38 PM
* Origin: MoonDog BBS, Brooklyn,NY, 718 692-2498, 1:278/230 (1:278/230)
|