|
Markov Chains on general state spaces |
January 3
The following material was made available
December 13
We started out by proving that if P is irreducible, aperiodic, Harris
recurrent and positive with an atom &alpha of positive
irreducibilty measure, and if X0, X1,
X2,... and Y0, Y1,
Y2,... are two Markov chains with one-step
transitionprobability P, then there is probability 1 that sooner
or later the two chains will be in &alpha simultanously. The
proof was based on renewal theory. What we did, is essentially
equivalent to lemma 2.1 and corollary 2.3 in Niels Richard Hansens
manuscript below.
Next we proved the socalled aperiodic ergodic theorem for
Markov chains (Meyn and Tweedies Theorem 13.3.3). There were
three steps in the proof: First we demonstrated the theorem in the
case with an accessible atom. Next in the strongly aperiodic
case. And finally in the general aperiodic case.
Almost all the work lies in the atomic case. We gave a pretty
detailed exposition of the socalled coupling argument.
Perhaps the closest analogoue in the litterature is the exposition in
Olle Häggströms book
Finite Markov Chains and Algorithmic Applications.
This is by the way an extremely nice book. It
deals with Markov chains on finite spaces, so the concept of
an atom is somewhat degenerate - any one-point set will will do.
But the flow of the argument is very similar to what we did. He
spends some time proving that 'the two chains will meet' with
probability one - he actually proves that they will meet in a
specific point with probability one - and we allready have this
part covered by the above renewal theory.
The aperiodic ergodic theorem in the strongly aperiodic case follows
from the atomic case by a standard application of Athreya-Neys
splitting construction. And the general aperiodic case, say with
a k-small set of positive irreducibility measure, follows by
considerations of the k-skeletons.
Finally we proved a strong law of large numbers for Markov chains that
are irreducible,
aperiodic, Harris recurrent and positive.
We proved that if X0, X1,
X2,... is such a chain with invariant initial
distribution &pi, then averages of the form
      \frac 1n \sum_{i=1}^n f(X_i)
(sorry for the typography) will converge almost surely to \int f d&pi
whenever f is integrable with respect to &pi. For instance this
will be true for all bounded f's. The important fact is that the
initial distribution of the chains does not matter - both the
limit and the fact that we have almost sure convergence is
determined entirely by P.
We sketched a proof based on the ergodic theorem for stationary
processes. The point is that the aperiodic ergodic theorem
guarantees that the X-proces is strongly mixing, if the initial
distribution is &pi, and hence ergodic.
Perhaps the easiest way to extend this result to the non-stationary
case is to consider the three-step argument from before:
accessible atoms, strongly aperiodic, general aperiodic. The
atomic case can be swept away by the coupling argument: A chain
with a general initial distribution will at some time coallesque
with a stationary version of the chain. And as SLLN holds for the
stationary version, it will also hold for the general version.
The extensions to the strongly aperiod and the general aperiodic cases
will again follow standard patterns.
I actually had an entirely different proof of SLLN in mind - a proof
based on regeneration. In the atomic case, regeneration happens
whenever the proces visits the atom. And pieces of the proces
between visits to the atom are in fact iid. Hence SLLN for
Markov chains can be
established using only SLLN for iid. variables. The main
problem with this approach is actually to identify the limit...
The reason why I would have preferred the regeneration proof, is
because it gives a natural framework for a similar proof of a
central limit theorem, and it gives reasonable hints as to what
extra conditions are necessary for a CLT to hold.
But we did not get this far before the term ended...
A scan of my
preparational notes for the lecture as a pdf-file. The notes
may be quite unreadable, and they may well be almost unrelated to what
actually happened at the lecture.
December 6
We started out by proving Dynkins formula, which is a general fact for
many classes of stochastic processes - there is nothing particular
markovian about it. Essentially we proved it in the form it has
in Meyn and Tweedies Proposition 11.3.2, though we did sum all the
way up to tau (Meyn and Tweedies sums stop at tau-1), and though
we for the proof did not care how the Markov chain was started.
We remarked that the result is a rather trivial consequence of the
theorem on optional sampling.
Next we discussed what Meyn and Tweedie calls the criterion for
strict drift towards C (p. 270), and we used in in combination with
Dynkins lemma to prove lemma 11.3.9. We did not end up with the
exact formula (11.26), due to the fact that our version of
Dynkins lemma has a term more in the sums that Meyn and Tweedies,
but the difference is inconsequential.
Finally we applied the result from lemma 11.3.9 to prove Theorem
11.3.4, to show that if P is irreducible and aperiodic and
satisifies the criterion for
strict drift towards a small set C, then P is Harris recurrent and
has an invariant probability measure.
This is an extremely strong result, making it possible to establish
the existence of an invariant probability measure in a vast array
of Markov models, hardly without any effort.
It does not say much about the nature of this
invariant probability measure - but it guarantees that it exists.
This is very relevant - and non-trivial - knowledge in many models.
We indicated the ease with which this drift criterion for positivity
is applied, by studying AR(1)-processes in one and higher
dimensions (Proposition 11.4.2 is relevant reading here) and
the simplest version of an ARCH-proces.
A scan of my
preparational notes for the lecture as a pdf-file. The notes
may be quite unreadable, and they may well be almost unrelated to what
actually happened at the lecture.
Plan:
In principle, the plan for the lecture December 13
is to cover Meyn and Tweedies chapter 13, dealing with socalled
ergodic properties of Markov chains. We shall put
emphasis on Theorem
13.3.3. But the exposition will be entirely different from Meyn
and Tweedies... I
doubt that it is worthwhile to study chapter 13 in advance.
The ergodic theorem in the atomic case will be established by the
socalled coupling argument. This is the main part of the
work, the extension to strongly
aperiodic and general aperiodic chains is quite trivial in
comparison.
We will also use the coupling argument to give a rigorous proof of a
strong law of large numbers for Markov chains, as the
highlight of the course.
Finally, we will - to the extend that time permits - discuss the major
results in the remaining parts of Meyn and Tweedies book.
The main theme here are strong ergodic properties, giving
rates of convergence in the ergodic theorem. The two most
useful concepts go under the name of geometric
ergodicity and uniform ergodicity. They can be
checked by relevant drift criteria, and they yield very
strong results, for instance central limit theorems.
November 29
It is quite hard to describe what we did in the lecture, in terms of
material in Meyn and Tweedie. Basically, we covered section 10.4,
but we hardly proved a single theorem in the form it has in that
section....
We proved that any irreducible, aperiodic kernel has subinvariant measures - we
used the existence of an atomic lift to reduce the general
situation to a kernel with an atom, and hence the conclusion
followed from theorem 10.2.1. I would expect the theorem to be
true in periodic case as well, but we did not consider that.
Next we proved that for any irreducibel, aperiodic kernel, there can
at most exist one invariant probability measure - again
utilizing the Athreya-Ney construction to reduce the general
situation to the atomic case, and refering to theorem 10.2.1.
We did not discuss the fact that P does have an invariant probability
measure if some iterate Pk does, but this should have
been mentioned - the simple, necessary computation is contained in
Meyn and Tweedies proof of theorem 10.4.5.
We went through Proposition 10.4.6, and we gave a version of Theorem
10.4.7, holding for Harris recurrent kernels - this simplifies
matters, as the measure constructed in (10.28) becomes globally
equal to the original subinvariant measure.
We obtained a number of consequences of this approach. First we proved
that any subinvariant measure for a Harris recurrent kernel is
indeed proper invariant. Next, it satisfies (10.30) for all B, as
long as A has finite measure.
Thirdly, we proved that if an invariant probability measure exists
for a Harris recurrent kernel, then this probability measure is a
maximal irreducibility measure. Fourthly, that for a Harris
recurrent chain the condition in (10.34) for just one small
set C of positive irreducibility measure is sufficient to conclude
the existence of an invariant probability measure.
A scan of my
preparational notes for the lecture as a pdf-file. The notes
may be quite unreadable, and they may well be almost unrelated to what
actually happened at the lecture.
Plan:
The plan for the lecture December 3
is to cover Meyn and Tweedies chapter 11. We will deal almost
exclusively with the applications of Dynkins formula in section
11.3. This formula is a general decompositon theorem for
stochastic processes - it is quite trivial, in fact. But it can
be utilized to produce quite strong drift theorems in a Markov
chain setting - the main example being theorem 11.3.4.
November 22
Chapter 13 of Sean Meyn and Richard Tweedies book
Markov Chains and Stochastic Stability was handed out.
The following material was handed out
-
Paper 2 (
postscript - 130 kb /
pdf - 51 kb
)
-
A note entitled
Supplementary proofs on recurrence/transience
(9 handwritten pages, now available electronically
as a scanned pdf-file). This note
contains a proof of Meyn and Tweedies Theorem 8.3.6 (ii).
-
A note entitled
The drift criterion for recurrence
(6 handwritten pages, now available electronically as a pdf-file). This note
contains a proof of Meyn and Tweedies Theorem 8.4.3 under the
additional assumption that the drift function V is unbounded
off small sets, rather than unbounded off petite sets.
It is note very likely that such a drift function can be
constructed in the periodic case, so essentially we have
limited the theorem to proving recurrence for aperiodic
chains. For such chains petite and small sets are the same,
so in that case there is no difference between our theorem and
Meyn and Tweedies theorem. But not though, that Meyn and
Tweedie claim to prove that the set C is Harris recurrent -
as we have discussed on several occasions, this does not
follow from their proof.
In the first part of the lecture, we covered material on Harris
recurrence. We did not quite follow Meyn and Tweedies
definition (p. 207), which seems somewhat untransparent to me.
Instead we defined a set A to be Harris recurrent if Lx(A)
= 1 for all x in A. The we proved a version of Proposition 9.1.1,
which guarantees that the two definition coincides.
We gave a detailed proof of Theorem 9.1.3 (i), and obtained two
consequences: Theorem 9.1.4 (for which we added the extra
assumption that the space X can be written as a union of countably
many small sets, each of positive irreducibility measure - this
condition is in fact always true in the aperiodic case, but we
have not developped the tools to prove it).
And Proposition 9.1.7 (ii) (where we replaced the appearence of a
petite set C with the assumption that C is small). We highlighted
once again that Theorem 9.1.8 seems to be wrong.
We ended this part of the lecture with a brief discussion of the
decomposition of a recurrent kernel P in a part which is Harris
recurrent and a part which is transient (Meyn and Tweedies Theorem
9.1.5). We did not prove the
existence
of this
decomposition, though, and we did not show how it can be utilized
to demonstrate that an aperiodic kernel is Harris recurrent if and
only if some skeleton is Harris recurrent (Theorem 9.1.6).
After this we turned to a discussion of invariant measures.
We
followed Meyn and Tweedie closer here than usual, and proved
Proposition 10.1.1, Proposition 10.1.2 (i) and (iv), the whole
of Theorem
10.2.1 and Theorem 10.2.2 (Kac's theorem).
A scan of my
preparational notes for the lecture as a pdf-file. The notes
may be quite unreadable, and they may well be almost unrelated to what
actually happened at the lecture.
Plan:
The plan for the lecture November 29
is to cover Meyn and Tweedies section 10.4 (p. 248-254). It is
undoubtably overly optimistic to considere more material, but
if time allows, we will also cover section 11.1
(p. 265-267).
November 15
Chapter 11 and 12 of Sean Meyn and Richard Tweedies book
Markov Chains and Stochastic Stability were handed out.
In the lecture, the focus was on the technical details behind the
drift criteria for transience and recurrence.
First we gave a very careful proof of a version of Theorem 8.3.5,
stating that if P is transient and C is a small set of positive
irreducibility measure then C is uniformly transient. This
seems somewhat weaker than the version in Meyn and Tweedie,
where they work with petite sets, not necessarily of positive
irreducibility measure.
The difference is not that important, though. As has been stated a
number of times, the difference between small and petite sets is
a hoax. And the added assumption of positive irreducibility
measure is to focus on the harder case. If we have a small set
of irreducibility measure zero, it is usually contained in a
small set of positive irreducibility measure. If this larger
set is uniformly transient, so is the original set.
After that we gave a proof of the claim in Proposition 8.3.1 (ii) that
a Harris recurrent set for an irreducible Markov kernel must
necessarilly have positive irreducibility measure. Coupling this
with the above, we obtained a proof of Theorem 8.3.6 (i), where we
just replaces the word petite with small.
The intention was to give a similarly careful proof of Theorem 8.3.6 (ii) as
well. But things took longer than planned, so instead I gave a
promise of a handwritten version of those proofs,
and we moved on.
We gave a proof of one half of Proposition 8.4.1 - that any function
satisfying the two conditions in (8.41) must necessarily be
larger that Lx(C) for those x's not in C. The other
half is irrelevant to us - I doubt very much that it has any
applications at all, but the book is full of these useless 'if
and only if' statements.
Finally, we gave a proof of Theorem 8.4.2. I promissed a handwritten
version of a similarly careful proof of Theorem 8.4.3
Plan:
The plan for the lecture November 22 is at present undecided.
November 8
We started out by recalling the definition of transient Markov kernels
and recurrent Markov kernels, and in particular the dichotomy
between these two concepts. We went on to prove Meyn and Tweedies
Theorem 8.2.6.
We spend considerable time disussing the meaning of the various
drift theorems in section 8.4, and the intuition behind
them. The idea is to somehow capture the overall movement of the
chain, the intrinsic forces that are driving the chain towards the
center of the space or away from the center.
We formulated (but did not prove) a drift criterion for recurrence
(Theorem 8.4.3) and a drift criterion for transience (Theorem
8.4.2 - note a misprint in condition (8.43) which should read
'greater than or equal to zero').
As our prime example, we discussed the use of a drift criterion to
prove recurrence of the standard AR(1)-proces, in the case where
the multiplication factor numerically is smaller than 1. It is
quite obvious that this proces will drift towards the 'center' of
the space (some finite region around zero) if it somehow is
released far away, and this intuition signals recurrence. The
intutition can be made precise by examining virtually any drift
function. We did the calculations with V(x) =
x2, but could equally well have used say V(x)
= |x| or V(x) = exp(|x|).
For some odd reason, this particular
example seems not to be present in Meyn and
Tweedie - which is a pity, as it is very easy, and very
educational.
They do consider the AR(1)-proces in proposition 11.4.2, but they
choose the somewhat more involved drift function V(x) = log(1 +
epsilon |x|), in order to arrive at stronger conclusion than
we care about at present.
Next we discussed random walks on Z. We insisted on bounded range of
the increment variables. If the mean of the increment variables
is zero, the random walk is recurrent as demonstrated in
Proposition 8.4.4. But if the mean of the increment variables is
different from zero, the random walk is transient, as demonstrated
in Proposition 8.4.5. Note that these examples involve a somewhat
atypical use of the drift criterium: the conclusion is that the
random walks have no drift at all, rather than a drift towards the
center or away from the center. Hence the conclusion is derived
from careful examination of the drift function itself
(boundedness or unboundedness), rather than
from the inert drift criterion.
Some confusion did arise from the application of condition (8.55).
The important thing here is not the equality, but that the right
hand side is smaller than or equal to 1 - I wrote the
opposite inequlity on the blackboard, so it was not wonder that I
could not make the argument work. The correct inequlity is
obtained for values of rho slightly to the left of 1.
A scan of my
preparational notes for the lecture as a pdf-file. The notes
may be quite unreadable, and they may well be almost unrelated to what
actually happened at the lecture.
Plan:
The plan for the lecture November 15 is:
We will give a proof of the two drift theorems in section 8.4.1 and
8.4.2. Unfortunately, there is a flaw in the logic behind
Theorem 8.4.3: recurrence is proved, but what we will learn to call
Harris recurrence of the set C is not - this is needed for later
applications, and I will see what I can do about correcting the
proof.
A key ingredient in the proofs is Theorem 8.3.5 and
Theorem 8.3.6, which will be disucussed, but hardly proved.
If time permits, we will initiate a discussion of Harris recurrence,
with emphasis on Proposition 9.1.1 and Theorem 9.1.3.
October 27
Unfortunately, I have to cancel the lecture the upcomming monday, that
is on november 1.
Next lecture will thus be held on november 8.
October 25
Chapter 9 and 10 of Sean Meyn and Richard Tweedies book
Markov Chains and Stochastic Stability were handed out.
In the lecture,
se covered parts of Meyn and Tweedies chapter 8 on the dichotomy
between transient Markov kernels and recurrent Markov kernels.
To be precise, we covered the definition of uniformly transient sets
and of recurrent sets, p. 179, though we wrote them down in terms
of the potential Ux(A) and not in terms
of the expectation of the occupation time variables for Markov
chains - but these things are really the same, and we demonstrated
that as a way of bringing intuition into the subject.
Then we gave the definition of transient resp. recurrent Markov
kernel. These definitions are in the boxes p. 188 and 189. We
gave slight variations of the definitions, that seems more natural
to me.
For transience we insisted that the countable cover of X
with uniformly transient sets must be increasing - but as the
unions of two uniformly transient sets is again uniformly
transient, this really does not matter (by the way: I forgot to
prove this fact, but it follows rather trivially from Proposition
8.3.1(i).)
For recurrence, we only demanded that all sets of positive
irreducibility measure were recurrent. We then proved that this
implies that Ux(A) is infinite for all sets
A of positive irreducibility measure and all x in X.
Then we discussed at some length the various versions of the
first-entry decomposition and the last-exit decomposition, which are
the topics of section 8.2.1, p. 186-187. In all honesty, we did
not prove the versions (8.20) and (8.21), but we claimed that
they follow from quite trivial 'dynamic' decompositions of probabilities
involving Markov chains. The difficulty is mainly that (8.20)
involves integration with respect to taboo probabilities, and I
suspect some effort is necessary to make sense of that. In
fact, (8.21) is considerably easier to prove, as it does not
have a similar technical difficulty.
But we did prove (8.27) in all details as a consequence of (8.20).
It is quite clear that (8.28) will follow from (8.21) in the same
manner. Note that we wrote HA,
xz(B) for the taboo kernels in (8.23),
as it seems to me that to many U's clobber the visual
appearence.
We proved Proposition 8.3.1, part (i)-(iii), in all details, except
for one minor fact: In part (ii) it is claimed that if P is
irreducible and A is Harris recurrent (meaning that
Lx(A) = 1 for all x in A), then A is of
positive irreducibility measure. I suspect we will return to
this problem at a later occasion.
Finally, we proved Theorem 8.3.4 by splitting it in two lemmas: We
proved that if P is irreducible and has at least one uniformly
transient set of positive irreducibility measure, then P is
transient. And subsequently we demonstrated that if P is
irreducible and not recurrent, then must have a uniformly
transient set of postive irreducibility measure, and hence be
transient. Apart from notational differences, this is actually
the same proof as Meyn and Tweedie presents.
A scan of my
preparational notes for the lecture as a pdf-file. The notes
may be quite unreadable, and they may well be almost unrelated to what
actually happened at the lecture.
Plan: There will be no lecture on November 1.
The plan for the lecture November 8 is:
We will go through Meyn and Tweedies theorem 8.2.6, and explain their
general line of argument in section 8.2, where they prove
the transience/recurren dichotomy through the splitting
technique - it is nice to know that there is a proof like that,
but the technical details are somewhat worse than in the proof
we have given, and Meyn and Tweedie do not elaborate on them.
After that, we will discuss the remaining parts of section 8.3, mainly
Theorem 8.3.5 and Proposition 8.3.7. We will go through section
8.4 quite carefully: this section contains the first of many
different versions of the socalled drift criteria. A
drift criterion is an extremely powerfull way of
examining the qualitative
behaviour of concrete examples - we will illustrate with as many
examples as possible, though we will probabably not make it as
far as to section 8.5.
October 18
The lecture proceeded quite rapidly, due to an electronically prepared
slide show. In the links below, a few typos have been corrected.
In relation to Meyn and Tweedies book, we covered something along the
lines of the following:
p. 11113-11512. The proof we gave of theorem
5.2.1 (Nummelins theorem) was however very much simplified - to
the extend that it
was actually erroneous, as it was based on an incorrect
measure theoretic assertion. But note that a true understanding of the
correct proof will start with an understanding of
the erroneous proof, in order to get
a feeling for why the strategy of the proof must be so
complicated, and what the proof is trying to acheive.
We did not spend any time on point (iii) of Propositon 5.2.4, but in
the terminology we developed, the claim is that any small set
of positive irreducibility measure is actually non-repelling, though
perhaps not for the original triple (n, delta, nu).
p. 12111-12411. We did not go through all the
technicalities of Theorem 5.4.4 - most of them have to do with how
you can construct a cycle in the periodic case. We emphasized the
opposite viewpoint: in the aperiodic case all skeletons must be
irreducible. It is not completely clear how to get that
information out of Meyn and Tweedies writings.
Never the less, aperiodicity is in practise almost always proved by
showing that P has an n-small set of positiv irreducibility
measure for any n large enough. Hence the abstract theorem
is not of much use.
p. 1247-1277 and
12914-1298 without bothering with details at
all - I did not try to hide my personal conviction that petite
sets is a grossly oversold concept.
Furthermore we gave a very brief introduction to the concepts of
chapter 6, mainly the notions of weak and strong Feller properties
and the definition of a T-chain. The definition of a T-chain
presented in the
handouts looks somewhat different from the definition given
p. 133, but they are in fact equivalent.
If you want to study this chapter, it may be relevant to know that a
function h : X -> R is lower semicontinous if all
sets of the form {x | h(x) > a} are open. The two nice properties
of lower semicontinous functions is that they attain their minimum
over a compact set (but not necessarily their maximum), and that
an increasing sequence of lower semicontinuous functions converges
to a limit that is also lower semicontinuous.
The two main theorems in chapter 6 are proposition 6.2.1 (showing how
the T-chain property plus a little more gives irreducibility) and
Theorem 6.2.5 (ii) (showing that for an irreducible aperiodic
T-chain, all
compact sets are small).
These two consequences can however usually be established directly in
practical examples,
without recourse to the T-chain concept. So in my opinion, the
concept is of little value.
A scan of my
preparational notes for the lecture as a pdf-file. The notes
may be quite unreadable, and they may well be almost unrelated to what
actually happened at the lecture.
Plan: The plan for the lecture October 25 is:
We will go through Meyn and Tweedies section 8.2 and 8.3
on transience and recurrence.
October 4
We went through chapter 5 of my own notes on Lifts and
Splittings, where the Athreya-Ney construction is considered
in detail. The construction has three levels:
-
An overall 'method of update' based on updates of the form
Xn -> Yn
and (Xn, Yn) -> Xn+1 which
gives the coexistence of two time homogenous Markov chains: the
X-chain, and the (X, Y)-chain. And clearly the (X,Y)-kernel
is a lift of the X-kernel.
- The idea of applying this method of update to a situation where
the Y's have values in a two-point space (the
splitting). The advantage of this is that it becomes
tractable what the the transition-kernel for the X-chain is.
- The idea of using a 1-small set to write the given Markov kernel
P as a convex combination of a constant kernel and some
other kernel, to which no attention is paid. This convex combination
enables us to invent Athreya-Ney update functions, such
that the X-chain has transition probability P, and such that
the lift has an atom.
We will go to some length in the sequel to avoid discussion of
the specifics of the Athreya-Ney construction. The general
message is that there is some lift of the the given Markov
kernel that posseses an atom. At various occasions we will need
stronger results - like Harris recurrence of the atom. In those
cases we have to go back to the Athreya-Ney construction, and show
that this specific lift har the desired properties - and having
done that, we again hide the gory details under the rug as fast as
we can.
We also gave examples of chains that evidently have 1-small sets. The
most obvious example was that random walk with normally
distributed increments, but similar considerations applied to
AR(1)-processes and in fact to any proces on Rn, which
has a transtion density f(x, y) - with respect to Lebesguemeasure
or otherwise - with the property that f is bounded away from zero
on products of compact sets.
After this, we went back to Meyn and Tweedie. We proved Proposition
5.1.1 (which is a triviality). And we applied that to the
Athreye-Ney construction to show the following:
Theorem If P is irreducible and C is a E+-set which is
1-small, then the Athreya-Ney lift Q, constructed by splitting on
C, has the upper X-copy as an accessible atom. Hence Q is
irreducible, and the atom is a +-set.
This theorem is the second half of Meyn and Tweedies Theorem
5.1.3(ii), but in all honesty they do not supply a proof.
A scan of my
preparational notes for the lecture as a pdf-file. The notes
may be quite unreadable, and they may well be almost unrelated to what
actually happened at the lecture.
Plan: The plan for the lecture October 18 is:
We will go through Meyn and Tweedies section 5.2 on general small
sets. We will not give a full proof of Theorem 5.2.1, but we
will pressumable sketch the ideas.
Then we give a discussion of the cyclic phenomena, discussed in
section 5.4.3. Cycles are a menace, and they are of no
practical importance in the theory of Markov chains on general
spaces: There are no examples known to man kind of
interesting Markov chains on a non-discrete space wich posses a
cycle. So what comes out of the discussion is not so much a
respect for cycles as it is an appreciation of
the nasty distinction between the
aperiodic chains and the strongly aperiodic chains. We will
deal almost exclusively with strongly aperiodic chains, though
we acknowledge that quite a number of chains occuring in
practise are aperiodic but not strongly aperiodic. The simplest
example of such chains is perhaps the stacking of an AR(2)-proces.
We will also give a brief discussion of the notion of petite
sets, section 5.5. The discussion will not be
penetrating, because the notion of petiteness is mainly a
technical gimmick, useful for studying chains that are aperiodic
but not strongly aperiodic. For strongly aperiodic chains,
petite sets and small sets are the same.
Pressumably we will end the lecture by giving a brief discussion,
covering the main points of Meyn and Tweedies chapter 6.
September 28
Two ways of producing a latex symbol for independence:
1) The simple hack: Construct a compound symbol \MyPerp
by putting the line
        \newcommand{\MyPerp}{\perp \! \! \! \perp}
somewhere in the manuscript, for instance in the preamble (that is,
before the \begin{document}). This compound symbol can be
be used in constructions like
        \begin{equation}
          X \MyPerp Y
        \end{equation}
2) Use the symbol \Perp from the
txfonts-package. The construction goes like this:
        \usepackage{txfonts}
        \begin{document}
        \begin{equation}
          X \Perp Y
        \end{equation}
txfonts is a font package. It is installed on the institutes
computer system, but it is quite unlikely that you have it
installed on your home computer - you will have to get it from
CTAN. Note that font packages are somewhat harder to install than
usual packages.
Be aware that if you include txfonts in you manuscript, it
will change all your fonts, and the result may not be to
your likening.
September 27
The following material was handed out
Furthermore, chapter 6, 7 and 8 of Sean Meyn and Richard Tweedies book
Markov Chains and Stochastic Stability were handed out.
We went through
Meyn and Tweedies p. 931-9613. Well, perhaps
not to the letter - we gave a proof of lemma 4.2.5 with a
somewhat stronger probabilistic flavour than Meyn and Tweedies
proof, which is based on formal manipulations with taboo
probabilites. Actually the proofs are the same, but that will
presumably remain hidden to most readers...
After that we went through chapter 4 of my own notes on atoms.
We gave an extended discussion of regeneration, and in
particular of why the existence of a Harris
recurrent atom almost for free gives a strong law of large
numbers for the Markov chain. The disucssion was heuristic, but
it is probably worthwhile to write down in a more formal way. I
will try to do that before the next lecture.
Plan: The plan for the lecture October 4 is:
chapter 5 of my own notes on Lifts and Splittings followed by a
disucssion of the existence of small sets, based on Meyn and
Tweedie p. 111-116. The great surprise is that small sets occur
all over the place... We will not give a proof of theorem 5.2.1,
but if time permits, the ideas of the proof will be revealed.
September 23
It has been pointed out to me that
an embarrasing mistake has crept into the formulation of problem 4-8
in 'Paper 1'. The error has to do with the usage of the term
geometric distribution.
Geometric distributions arise in situations where a coin is flipped
until it shows head. There are two things you can count: The number T
of tails occuring before the first head. Or the waiting time W until
the first heads shows up. Evidently the two numbers are related by
the trivial formula W = T + 1.
The formal definition of the geometric distribution is that it is the
distribution of T. But speaking colloquially, the word is frequently
used to refer to the distribution of W as well. I have on many
occasionens complained about the confusion that this dual use leads
to, so it is particularly embarrasing that I myself cannot keep my
mind straight on the matter.
I suggest we do the following:
- Invent a term, like simple waiting time distribution to describe
the distribution of W above.
- Replace the occurence of geometric distribution in problem 6 and
8 with simple waiting time distribution.
Do not change any of the formulas - just the words.
September 20
The following material was handed out
Note that a solution to the problems in paper 1 is part of the
course requirement.
We went through the above chapter 3 on taboo probabilities - the
chapter is very sketchy indeed, and merely contains a formal
correct definition of taboo probabilities and return
probabilities.
Based on the definiton of return probabilities we went through
Meyn and Tweedies p. 9022-937, where
irreducibility is introduced, and the maximal irreducibility
measure is constructed. We did not cover part (iii) of proposition
4.2.2, but we will do that next time.
We also discussed a few examples of irreducible chains: the reflected
random walk (which Meyn and Tweedie calls a random walk on a half
line) p. 9612-9716, which is a trivial
example with a onepoint measure as irreducibility measure. And
the general random walk p. 988-984. We did
not go as much into detail with the random walk as it deserves -
in fact we only considered the case with normally distributed
increments, where the Lebesgue measure trivially acts as an
irreducibility measure. So please pay close attention to the
elaborate details of the more general case, discussed in Meyn and
Tweedie.
A thorough understanding of this example should have
extremely high priority.
It was recommended that some time is spend studying p. 983-
10311. This recommendation is more casual.
The language is somewhat perculiar (at least it
seems so to me) but the content is good clean fun. I would not say
that it is important for the further study of Markov chains, as
the details are algebraic, rather than probabilistic, and so have
rather limited range of application. But the result is
certainly nice to know.
It is shown that under reasonable assumptions the update scheme given
p. 10110 leads to an irreducible chain if the error
distribution is a regular normal distribution.
Furthermore, it is shown that the stacked version of an AR(k)-proces
(the idea of stacking autoregressiv processes is explained in Meyn and
Tweedie p. 27) automatically satisfies the 'reasonable
assumptions'. Hence stacked versions of Gaussian AR(k)-processes
are alway irreducible with k-dimensional Lebesgue measure as
irreducibility measure.
A scan of my
preparational notes for the lecture as a pdf-file. The notes
may be quite unreadable, and they may well be almost unrelated to what
actually happened at the lecture.
Plan: The plan for the lecture September 27 is: part (iii) of
Proposition 4.2.2, p. 93-96 and p. 106-115. We will presumable
not give a proof of the existence of small sets (Theorem
5.2.1), as that would occupy almost the entire lecture. I also
have a feeling that the connection bewteen what I will say about
splitting and what Meyn and Tweedie says (p. 108) will seem
rather remote...
September 13
Chapter 4 and 5 of Sean Meyn and Richard Tweedies book
Markov Chains and Stochastic Stability were handed out.
The next lecture will be devoted to the concept of irreducibility,
centered around chapter 4 of Meyn and Tweedie.
A new version of the manuscript we have been discussing so far, was
handed out. It is still very rough, but at least some of the
edges have been smoothed out.
We went through p. 37-56, covering the topics of strong Markov
properties, weak and strong time homogeneity, and the
Chapman-Kolmogorov equations.
We did not go into a detailed
discussion of the lemmas
p. 41-42 containing the core details in the proof of the strong
Markov property. These details are not particular difficult, but
they lack a clear direction, and so they are potentially
bewildering.
We also skipped the proof of theorem 2.24, where it is proved that
is suffices to check weak time homogenity on bounded stopping
times. This proof builds on a rather difficult - but also quite
illuminating and potentially useful - lemma,
stated on p. 4-6, which relates conditional
probabilities of a fixed event, but where the conditioning is
varied.
Recapturing our effort so forth, we have more or less covered the
whole manuscript, except for a few of the examples and the above
technical exceptions. In many ways the essence of the
manuscript is given in example 2.29, where a straight forward
and pretty mechanical proof is given of the fact that the
forward recurrence time chain in a renewal setting is indeed a
Markov chain. We have developped a vocabulary and a technology
suitable for the demonstration of Markov properties, also in
highly non-trivial situations.
September 6
Chapter 1, 2 and 3 of Sean Meyn and Richard Tweedies book
Markov Chains and Stochastic Stability
was distributed.
We will not cover these chapters systematically in the lectures - the
main content is lifted to the notes I distributed last time (I
plan to
hand out updated versions of these notes at the next lecture).
In a sense we started from page 1 of my notes again. We covered
something like p.
23-79,
157-194,
215-2311
243-3018.
There are in fact three definitions of a Markov chain.
- A Markov chain is a proces given by an update scheme.
- A Markov chain is a proces,
where the finitedimensional distributions can be
computed via horrible-looking multiple integrals involving
the one-step transition kernels.
(Check out Meyn & Tweedie p. 66, and observe that the
measure-theoretic details of the exposition are in fact worse
than ours, in the sense that they have to rely on socalled
canonical representation spaces, mainly in order to
make sense of the strong Markov property.)
- A Markov chain is a proces, where the future is
conditionally independent
the past given the present (my notes, definition 2.1).
No effort was spared to convince the audience that the last definition is
preferable! Even though it is not standard in the
litterature. It is not really my own invention, it is
(a somewhat polished version of) the
customary definition of 'Markov properties' in the
graphical models community. I have felt for years that
someone ought to bring that back to Markov chains proper.
This turns out to be extremely succesfull, due to the fact that
it is possible to prove a strong Markov property whithin this
framework. We shall elaborate on this in the comming lectures.
Be that as it may, the three definitions are in fact equivalent.
That definition 3 implies definition 2 is the content
of the remarks on p. 22 of my notes. That definition 2
implies the existence of an update scheme and hence definition 1
is the content of theorem 2.4 of my notes.
And that condition 1 implies condition 3 is
the content of theorem 2.3.
Still, I think that the choice of definition matters quite a lot.
Definition 2 is a historical mistake. It is simply not useful,
except for in the discrete case. Examples of Markov chains on
general spaces come in
two classes:
- Processes given by update schemes - we discussed at some lengths
variations of random walk update schemes and autoregressiv
update schemes. The point is that the explicit
error structures are almost irrelevant in these models, and so
one-step transition kernels are not the key concept.
- Processes constructed on top of other processes - we discussed
the forward recurrence time chain in a renewal setting as an example
of this type of processes. The processes are given in an
extremely concrete way, but there is no update scheme in
sight, and there are no obvious handles on the
finitedimensional distributions.
It is virtually impossible to give direct proofs
that processes in the second class
satisfy definition 2 above. And hence the conspiracy in the
litterature that such a proof 'is not needed'.
As a very
illuminating example, check Meyn and Tweedies exposition of the
forward recurrence time chain. They are defined p. 44, where they
are discussed, but where nothing is proven (which is fair enough,
as no definition of anything has been given at that point). They
are taken up again p. 60, where the one-step transition
probabilities are calculated. But that calculation is completely
trivial, compared to the intricacies of showing that the forward
recurrence time chain is Markovian - a fact which can not be
established by computing the one-step probabilities, but
only by showing that the finite dimensional distributions can be
build up using them.
There is even a continuous-time analogue of the forward recurrence
time chain defined p. 74-75, with an equally vacuous discussion of
the Markov property.
It is a swindle! And it will be evident that there is
something to prove when we do it in the next lecture. It is not
particularly difficult, once you know what to do, but it certainly
is not something obvious.
But of course there is a reason behind the traditional approach.
'Markov chain theory' is not as much a term used for study of
concrete stochastic processes, as for the study of
a specific Markov kernel P = (Px), in the sense that it is
the simultanous study of
all processes where the the one-step transitions kernels all
equal P.
Many of the concepts we will disucss (irreducibility, Harris recurrence
etc.) are in fact properties of a specific kernel, rather than
properties of a stochastic proces. And Markov chain theory is
often thought of as a branch of functional analysis, as the Markov
kernel P gives rise to an operator on functions spaces, where a
function h is mapped to Ph (see Meyn & Tweedie
p. 128 for the exact definition). Traditional Markov chain theory
focuses very much on the spectral properties of this operator.
Though a number of insights can be obtained using this functional
analytic viewpoint, it will not occupy us very much. We will be
much more concerned with the pointwise behaviour of concrete
stochastic processes.
August 30
The following material was distributed:
Both manuscripts are grossly unfinished, and will soon be replaced by
more polished versions :-)
The lecture took place in a quite noisy atmosphere and it was
difficult to concentrate. As a result, the lecture was not in any
way a linear exposition of the notes. Personally I think of
conditional distributions as far more complicated objects
than conditional probabilities (althoug the hardest part
perhaps is to remember which is which) and I tried to postpone any
mentioning of Markov kernel stuff untill we have an environment
better suited for concentration.
Thus I think I have been covering p.
11-22,
710-156,
211-216 and
2312-244
so far.
Latest update: September 21, 2006
erhansen@math.ku.dk