Markov Chains on general state spaces




January 3





The following material was made available





December 13



We started out by proving that if P is irreducible, aperiodic, Harris recurrent and positive with an atom &alpha of positive irreducibilty measure, and if X0, X1, X2,... and Y0, Y1, Y2,... are two Markov chains with one-step transitionprobability P, then there is probability 1 that sooner or later the two chains will be in &alpha simultanously. The proof was based on renewal theory. What we did, is essentially equivalent to lemma 2.1 and corollary 2.3 in Niels Richard Hansens manuscript below.

Next we proved the socalled aperiodic ergodic theorem for Markov chains (Meyn and Tweedies Theorem 13.3.3). There were three steps in the proof: First we demonstrated the theorem in the case with an accessible atom. Next in the strongly aperiodic case. And finally in the general aperiodic case.

Almost all the work lies in the atomic case. We gave a pretty detailed exposition of the socalled coupling argument. Perhaps the closest analogoue in the litterature is the exposition in Olle Häggströms book Finite Markov Chains and Algorithmic Applications. This is by the way an extremely nice book. It deals with Markov chains on finite spaces, so the concept of an atom is somewhat degenerate - any one-point set will will do. But the flow of the argument is very similar to what we did. He spends some time proving that 'the two chains will meet' with probability one - he actually proves that they will meet in a specific point with probability one - and we allready have this part covered by the above renewal theory.

The aperiodic ergodic theorem in the strongly aperiodic case follows from the atomic case by a standard application of Athreya-Neys splitting construction. And the general aperiodic case, say with a k-small set of positive irreducibility measure, follows by considerations of the k-skeletons.


Finally we proved a strong law of large numbers for Markov chains that are irreducible, aperiodic, Harris recurrent and positive. We proved that if X0, X1, X2,... is such a chain with invariant initial distribution &pi, then averages of the form

      \frac 1n \sum_{i=1}^n f(X_i)

(sorry for the typography) will converge almost surely to \int f d&pi whenever f is integrable with respect to &pi. For instance this will be true for all bounded f's. The important fact is that the initial distribution of the chains does not matter - both the limit and the fact that we have almost sure convergence is determined entirely by P.

We sketched a proof based on the ergodic theorem for stationary processes. The point is that the aperiodic ergodic theorem guarantees that the X-proces is strongly mixing, if the initial distribution is &pi, and hence ergodic.

Perhaps the easiest way to extend this result to the non-stationary case is to consider the three-step argument from before: accessible atoms, strongly aperiodic, general aperiodic. The atomic case can be swept away by the coupling argument: A chain with a general initial distribution will at some time coallesque with a stationary version of the chain. And as SLLN holds for the stationary version, it will also hold for the general version.

The extensions to the strongly aperiod and the general aperiodic cases will again follow standard patterns.



I actually had an entirely different proof of SLLN in mind - a proof based on regeneration. In the atomic case, regeneration happens whenever the proces visits the atom. And pieces of the proces between visits to the atom are in fact iid. Hence SLLN for Markov chains can be established using only SLLN for iid. variables. The main problem with this approach is actually to identify the limit...

The reason why I would have preferred the regeneration proof, is because it gives a natural framework for a similar proof of a central limit theorem, and it gives reasonable hints as to what extra conditions are necessary for a CLT to hold. But we did not get this far before the term ended...



A scan of my preparational notes for the lecture as a pdf-file. The notes may be quite unreadable, and they may well be almost unrelated to what actually happened at the lecture.





December 6



We started out by proving Dynkins formula, which is a general fact for many classes of stochastic processes - there is nothing particular markovian about it. Essentially we proved it in the form it has in Meyn and Tweedies Proposition 11.3.2, though we did sum all the way up to tau (Meyn and Tweedies sums stop at tau-1), and though we for the proof did not care how the Markov chain was started. We remarked that the result is a rather trivial consequence of the theorem on optional sampling.

Next we discussed what Meyn and Tweedie calls the criterion for strict drift towards C (p. 270), and we used in in combination with Dynkins lemma to prove lemma 11.3.9. We did not end up with the exact formula (11.26), due to the fact that our version of Dynkins lemma has a term more in the sums that Meyn and Tweedies, but the difference is inconsequential.

Finally we applied the result from lemma 11.3.9 to prove Theorem 11.3.4, to show that if P is irreducible and aperiodic and satisifies the criterion for strict drift towards a small set C, then P is Harris recurrent and has an invariant probability measure.

This is an extremely strong result, making it possible to establish the existence of an invariant probability measure in a vast array of Markov models, hardly without any effort. It does not say much about the nature of this invariant probability measure - but it guarantees that it exists. This is very relevant - and non-trivial - knowledge in many models.

We indicated the ease with which this drift criterion for positivity is applied, by studying AR(1)-processes in one and higher dimensions (Proposition 11.4.2 is relevant reading here) and the simplest version of an ARCH-proces.



A scan of my preparational notes for the lecture as a pdf-file. The notes may be quite unreadable, and they may well be almost unrelated to what actually happened at the lecture.



Plan: In principle, the plan for the lecture December 13 is to cover Meyn and Tweedies chapter 13, dealing with socalled ergodic properties of Markov chains. We shall put emphasis on Theorem 13.3.3. But the exposition will be entirely different from Meyn and Tweedies... I doubt that it is worthwhile to study chapter 13 in advance. The ergodic theorem in the atomic case will be established by the socalled coupling argument. This is the main part of the work, the extension to strongly aperiodic and general aperiodic chains is quite trivial in comparison.

We will also use the coupling argument to give a rigorous proof of a strong law of large numbers for Markov chains, as the highlight of the course.

Finally, we will - to the extend that time permits - discuss the major results in the remaining parts of Meyn and Tweedies book. The main theme here are strong ergodic properties, giving rates of convergence in the ergodic theorem. The two most useful concepts go under the name of geometric ergodicity and uniform ergodicity. They can be checked by relevant drift criteria, and they yield very strong results, for instance central limit theorems.





November 29



It is quite hard to describe what we did in the lecture, in terms of material in Meyn and Tweedie. Basically, we covered section 10.4, but we hardly proved a single theorem in the form it has in that section....

We proved that any irreducible, aperiodic kernel has subinvariant measures - we used the existence of an atomic lift to reduce the general situation to a kernel with an atom, and hence the conclusion followed from theorem 10.2.1. I would expect the theorem to be true in periodic case as well, but we did not consider that.

Next we proved that for any irreducibel, aperiodic kernel, there can at most exist one invariant probability measure - again utilizing the Athreya-Ney construction to reduce the general situation to the atomic case, and refering to theorem 10.2.1.

We did not discuss the fact that P does have an invariant probability measure if some iterate Pk does, but this should have been mentioned - the simple, necessary computation is contained in Meyn and Tweedies proof of theorem 10.4.5.

We went through Proposition 10.4.6, and we gave a version of Theorem 10.4.7, holding for Harris recurrent kernels - this simplifies matters, as the measure constructed in (10.28) becomes globally equal to the original subinvariant measure.

We obtained a number of consequences of this approach. First we proved that any subinvariant measure for a Harris recurrent kernel is indeed proper invariant. Next, it satisfies (10.30) for all B, as long as A has finite measure.

Thirdly, we proved that if an invariant probability measure exists for a Harris recurrent kernel, then this probability measure is a maximal irreducibility measure. Fourthly, that for a Harris recurrent chain the condition in (10.34) for just one small set C of positive irreducibility measure is sufficient to conclude the existence of an invariant probability measure.



A scan of my preparational notes for the lecture as a pdf-file. The notes may be quite unreadable, and they may well be almost unrelated to what actually happened at the lecture.



Plan: The plan for the lecture December 3 is to cover Meyn and Tweedies chapter 11. We will deal almost exclusively with the applications of Dynkins formula in section 11.3. This formula is a general decompositon theorem for stochastic processes - it is quite trivial, in fact. But it can be utilized to produce quite strong drift theorems in a Markov chain setting - the main example being theorem 11.3.4.





November 22



Chapter 13 of Sean Meyn and Richard Tweedies book Markov Chains and Stochastic Stability was handed out.



The following material was handed out

In the first part of the lecture, we covered material on Harris recurrence. We did not quite follow Meyn and Tweedies definition (p. 207), which seems somewhat untransparent to me. Instead we defined a set A to be Harris recurrent if Lx(A) = 1 for all x in A. The we proved a version of Proposition 9.1.1, which guarantees that the two definition coincides.

We gave a detailed proof of Theorem 9.1.3 (i), and obtained two consequences: Theorem 9.1.4 (for which we added the extra assumption that the space X can be written as a union of countably many small sets, each of positive irreducibility measure - this condition is in fact always true in the aperiodic case, but we have not developped the tools to prove it). And Proposition 9.1.7 (ii) (where we replaced the appearence of a petite set C with the assumption that C is small). We highlighted once again that Theorem 9.1.8 seems to be wrong.

We ended this part of the lecture with a brief discussion of the decomposition of a recurrent kernel P in a part which is Harris recurrent and a part which is transient (Meyn and Tweedies Theorem 9.1.5). We did not prove the existence of this decomposition, though, and we did not show how it can be utilized to demonstrate that an aperiodic kernel is Harris recurrent if and only if some skeleton is Harris recurrent (Theorem 9.1.6).



After this we turned to a discussion of invariant measures. We followed Meyn and Tweedie closer here than usual, and proved Proposition 10.1.1, Proposition 10.1.2 (i) and (iv), the whole of Theorem 10.2.1 and Theorem 10.2.2 (Kac's theorem).



A scan of my preparational notes for the lecture as a pdf-file. The notes may be quite unreadable, and they may well be almost unrelated to what actually happened at the lecture.



Plan: The plan for the lecture November 29 is to cover Meyn and Tweedies section 10.4 (p. 248-254). It is undoubtably overly optimistic to considere more material, but if time allows, we will also cover section 11.1 (p. 265-267).





November 15



Chapter 11 and 12 of Sean Meyn and Richard Tweedies book Markov Chains and Stochastic Stability were handed out.



In the lecture, the focus was on the technical details behind the drift criteria for transience and recurrence.

First we gave a very careful proof of a version of Theorem 8.3.5, stating that if P is transient and C is a small set of positive irreducibility measure then C is uniformly transient. This seems somewhat weaker than the version in Meyn and Tweedie, where they work with petite sets, not necessarily of positive irreducibility measure. The difference is not that important, though. As has been stated a number of times, the difference between small and petite sets is a hoax. And the added assumption of positive irreducibility measure is to focus on the harder case. If we have a small set of irreducibility measure zero, it is usually contained in a small set of positive irreducibility measure. If this larger set is uniformly transient, so is the original set.

After that we gave a proof of the claim in Proposition 8.3.1 (ii) that a Harris recurrent set for an irreducible Markov kernel must necessarilly have positive irreducibility measure. Coupling this with the above, we obtained a proof of Theorem 8.3.6 (i), where we just replaces the word petite with small.

The intention was to give a similarly careful proof of Theorem 8.3.6 (ii) as well. But things took longer than planned, so instead I gave a promise of a handwritten version of those proofs, and we moved on. We gave a proof of one half of Proposition 8.4.1 - that any function satisfying the two conditions in (8.41) must necessarily be larger that Lx(C) for those x's not in C. The other half is irrelevant to us - I doubt very much that it has any applications at all, but the book is full of these useless 'if and only if' statements.

Finally, we gave a proof of Theorem 8.4.2. I promissed a handwritten version of a similarly careful proof of Theorem 8.4.3



Plan: The plan for the lecture November 22 is at present undecided.





November 8



We started out by recalling the definition of transient Markov kernels and recurrent Markov kernels, and in particular the dichotomy between these two concepts. We went on to prove Meyn and Tweedies Theorem 8.2.6.

We spend considerable time disussing the meaning of the various drift theorems in section 8.4, and the intuition behind them. The idea is to somehow capture the overall movement of the chain, the intrinsic forces that are driving the chain towards the center of the space or away from the center.

We formulated (but did not prove) a drift criterion for recurrence (Theorem 8.4.3) and a drift criterion for transience (Theorem 8.4.2 - note a misprint in condition (8.43) which should read 'greater than or equal to zero').

As our prime example, we discussed the use of a drift criterion to prove recurrence of the standard AR(1)-proces, in the case where the multiplication factor numerically is smaller than 1. It is quite obvious that this proces will drift towards the 'center' of the space (some finite region around zero) if it somehow is released far away, and this intuition signals recurrence. The intutition can be made precise by examining virtually any drift function. We did the calculations with V(x) = x2, but could equally well have used say V(x) = |x| or V(x) = exp(|x|).

For some odd reason, this particular example seems not to be present in Meyn and Tweedie - which is a pity, as it is very easy, and very educational. They do consider the AR(1)-proces in proposition 11.4.2, but they choose the somewhat more involved drift function V(x) = log(1 + epsilon |x|), in order to arrive at stronger conclusion than we care about at present.

Next we discussed random walks on Z. We insisted on bounded range of the increment variables. If the mean of the increment variables is zero, the random walk is recurrent as demonstrated in Proposition 8.4.4. But if the mean of the increment variables is different from zero, the random walk is transient, as demonstrated in Proposition 8.4.5. Note that these examples involve a somewhat atypical use of the drift criterium: the conclusion is that the random walks have no drift at all, rather than a drift towards the center or away from the center. Hence the conclusion is derived from careful examination of the drift function itself (boundedness or unboundedness), rather than from the inert drift criterion.

Some confusion did arise from the application of condition (8.55). The important thing here is not the equality, but that the right hand side is smaller than or equal to 1 - I wrote the opposite inequlity on the blackboard, so it was not wonder that I could not make the argument work. The correct inequlity is obtained for values of rho slightly to the left of 1.



A scan of my preparational notes for the lecture as a pdf-file. The notes may be quite unreadable, and they may well be almost unrelated to what actually happened at the lecture.



Plan: The plan for the lecture November 15 is: We will give a proof of the two drift theorems in section 8.4.1 and 8.4.2. Unfortunately, there is a flaw in the logic behind Theorem 8.4.3: recurrence is proved, but what we will learn to call Harris recurrence of the set C is not - this is needed for later applications, and I will see what I can do about correcting the proof.

A key ingredient in the proofs is Theorem 8.3.5 and Theorem 8.3.6, which will be disucussed, but hardly proved. If time permits, we will initiate a discussion of Harris recurrence, with emphasis on Proposition 9.1.1 and Theorem 9.1.3.



October 27



Unfortunately, I have to cancel the lecture the upcomming monday, that is on november 1. Next lecture will thus be held on november 8.



October 25



Chapter 9 and 10 of Sean Meyn and Richard Tweedies book Markov Chains and Stochastic Stability were handed out.



In the lecture, se covered parts of Meyn and Tweedies chapter 8 on the dichotomy between transient Markov kernels and recurrent Markov kernels.

To be precise, we covered the definition of uniformly transient sets and of recurrent sets, p. 179, though we wrote them down in terms of the potential Ux(A) and not in terms of the expectation of the occupation time variables for Markov chains - but these things are really the same, and we demonstrated that as a way of bringing intuition into the subject.

Then we gave the definition of transient resp. recurrent Markov kernel. These definitions are in the boxes p. 188 and 189. We gave slight variations of the definitions, that seems more natural to me.

For transience we insisted that the countable cover of X with uniformly transient sets must be increasing - but as the unions of two uniformly transient sets is again uniformly transient, this really does not matter (by the way: I forgot to prove this fact, but it follows rather trivially from Proposition 8.3.1(i).)

For recurrence, we only demanded that all sets of positive irreducibility measure were recurrent. We then proved that this implies that Ux(A) is infinite for all sets A of positive irreducibility measure and all x in X.



Then we discussed at some length the various versions of the first-entry decomposition and the last-exit decomposition, which are the topics of section 8.2.1, p. 186-187. In all honesty, we did not prove the versions (8.20) and (8.21), but we claimed that they follow from quite trivial 'dynamic' decompositions of probabilities involving Markov chains. The difficulty is mainly that (8.20) involves integration with respect to taboo probabilities, and I suspect some effort is necessary to make sense of that. In fact, (8.21) is considerably easier to prove, as it does not have a similar technical difficulty.

But we did prove (8.27) in all details as a consequence of (8.20). It is quite clear that (8.28) will follow from (8.21) in the same manner. Note that we wrote HA, xz(B) for the taboo kernels in (8.23), as it seems to me that to many U's clobber the visual appearence.



We proved Proposition 8.3.1, part (i)-(iii), in all details, except for one minor fact: In part (ii) it is claimed that if P is irreducible and A is Harris recurrent (meaning that Lx(A) = 1 for all x in A), then A is of positive irreducibility measure. I suspect we will return to this problem at a later occasion.



Finally, we proved Theorem 8.3.4 by splitting it in two lemmas: We proved that if P is irreducible and has at least one uniformly transient set of positive irreducibility measure, then P is transient. And subsequently we demonstrated that if P is irreducible and not recurrent, then must have a uniformly transient set of postive irreducibility measure, and hence be transient. Apart from notational differences, this is actually the same proof as Meyn and Tweedie presents.



A scan of my preparational notes for the lecture as a pdf-file. The notes may be quite unreadable, and they may well be almost unrelated to what actually happened at the lecture.



Plan: There will be no lecture on November 1. The plan for the lecture November 8 is: We will go through Meyn and Tweedies theorem 8.2.6, and explain their general line of argument in section 8.2, where they prove the transience/recurren dichotomy through the splitting technique - it is nice to know that there is a proof like that, but the technical details are somewhat worse than in the proof we have given, and Meyn and Tweedie do not elaborate on them. After that, we will discuss the remaining parts of section 8.3, mainly Theorem 8.3.5 and Proposition 8.3.7. We will go through section 8.4 quite carefully: this section contains the first of many different versions of the socalled drift criteria. A drift criterion is an extremely powerfull way of examining the qualitative behaviour of concrete examples - we will illustrate with as many examples as possible, though we will probabably not make it as far as to section 8.5.





October 18

The lecture proceeded quite rapidly, due to an electronically prepared slide show. In the links below, a few typos have been corrected. In relation to Meyn and Tweedies book, we covered something along the lines of the following:

p. 11113-11512. The proof we gave of theorem 5.2.1 (Nummelins theorem) was however very much simplified - to the extend that it was actually erroneous, as it was based on an incorrect measure theoretic assertion. But note that a true understanding of the correct proof will start with an understanding of the erroneous proof, in order to get a feeling for why the strategy of the proof must be so complicated, and what the proof is trying to acheive.

We did not spend any time on point (iii) of Propositon 5.2.4, but in the terminology we developed, the claim is that any small set of positive irreducibility measure is actually non-repelling, though perhaps not for the original triple (n, delta, nu).

p. 12111-12411. We did not go through all the technicalities of Theorem 5.4.4 - most of them have to do with how you can construct a cycle in the periodic case. We emphasized the opposite viewpoint: in the aperiodic case all skeletons must be irreducible. It is not completely clear how to get that information out of Meyn and Tweedies writings.

Never the less, aperiodicity is in practise almost always proved by showing that  P has an n-small set of positiv irreducibility measure for any n large enough. Hence the abstract theorem is not of much use.

p. 1247-1277 and 12914-1298 without bothering with details at all - I did not try to hide my personal conviction that petite sets is a grossly oversold concept.

Furthermore we gave a very brief introduction to the concepts of chapter 6, mainly the notions of weak and strong Feller properties and the definition of a T-chain. The definition of a T-chain presented in the handouts looks somewhat different from the definition given p. 133, but they are in fact equivalent.

If you want to study this chapter, it may be relevant to know that a function h : X -> R is lower semicontinous if all sets of the form {x | h(x) > a} are open. The two nice properties of lower semicontinous functions is that they attain their minimum over a compact set (but not necessarily their maximum), and that an increasing sequence of lower semicontinuous functions converges to a limit that is also lower semicontinuous.

The two main theorems in chapter 6 are proposition 6.2.1 (showing how the T-chain property plus a little more gives irreducibility) and Theorem 6.2.5 (ii) (showing that for an irreducible aperiodic T-chain, all compact sets are small). These two consequences can however usually be established directly in practical examples, without recourse to the T-chain concept. So in my opinion, the concept is of little value.



A scan of my preparational notes for the lecture as a pdf-file. The notes may be quite unreadable, and they may well be almost unrelated to what actually happened at the lecture.



Plan: The plan for the lecture October 25 is: We will go through Meyn and Tweedies section 8.2 and 8.3 on transience and recurrence.





October 4

We went through chapter 5 of my own notes on Lifts and Splittings, where the Athreya-Ney construction is considered in detail. The construction has three levels: We will go to some length in the sequel to avoid discussion of the specifics of the Athreya-Ney construction. The general message is that there is some lift of the the given Markov kernel that posseses an atom. At various occasions we will need stronger results - like Harris recurrence of the atom. In those cases we have to go back to the Athreya-Ney construction, and show that this specific lift har the desired properties - and having done that, we again hide the gory details under the rug as fast as we can.

We also gave examples of chains that evidently have 1-small sets. The most obvious example was that random walk with normally distributed increments, but similar considerations applied to AR(1)-processes and in fact to any proces on Rn, which has a transtion density f(x, y) - with respect to Lebesguemeasure or otherwise - with the property that f is bounded away from zero on products of compact sets.

After this, we went back to Meyn and Tweedie. We proved Proposition 5.1.1 (which is a triviality). And we applied that to the Athreye-Ney construction to show the following:

Theorem If P is irreducible and C is a E+-set which is 1-small, then the Athreya-Ney lift Q, constructed by splitting on C, has the upper X-copy as an accessible atom. Hence Q is irreducible, and the atom is a +-set.

This theorem is the second half of Meyn and Tweedies Theorem 5.1.3(ii), but in all honesty they do not supply a proof.



A scan of my preparational notes for the lecture as a pdf-file. The notes may be quite unreadable, and they may well be almost unrelated to what actually happened at the lecture.



Plan: The plan for the lecture October 18 is: We will go through Meyn and Tweedies section 5.2 on general small sets. We will not give a full proof of Theorem 5.2.1, but we will pressumable sketch the ideas.

Then we give a discussion of the cyclic phenomena, discussed in section 5.4.3. Cycles are a menace, and they are of no practical importance in the theory of Markov chains on general spaces: There are no examples known to man kind of interesting Markov chains on a non-discrete space wich posses a cycle. So what comes out of the discussion is not so much a respect for cycles as it is an appreciation of the nasty distinction between the aperiodic chains and the strongly aperiodic chains. We will deal almost exclusively with strongly aperiodic chains, though we acknowledge that quite a number of chains occuring in practise are aperiodic but not strongly aperiodic. The simplest example of such chains is perhaps the stacking of an AR(2)-proces.

We will also give a brief discussion of the notion of petite sets, section 5.5. The discussion will not be penetrating, because the notion of petiteness is mainly a technical gimmick, useful for studying chains that are aperiodic but not strongly aperiodic. For strongly aperiodic chains, petite sets and small sets are the same.

Pressumably we will end the lecture by giving a brief discussion, covering the main points of Meyn and Tweedies chapter 6.





September 28

Two ways of producing a latex symbol for independence:

1) The simple hack: Construct a compound symbol \MyPerp by putting the line

        \newcommand{\MyPerp}{\perp \! \! \! \perp}

somewhere in the manuscript, for instance in the preamble (that is, before the \begin{document}). This compound symbol can be be used in constructions like

        \begin{equation}
          X \MyPerp Y
        \end{equation}

2) Use the symbol \Perp from the txfonts-package. The construction goes like this:

        \usepackage{txfonts}
        \begin{document}
        \begin{equation}
          X \Perp Y
        \end{equation}

txfonts is a font package. It is installed on the institutes computer system, but it is quite unlikely that you have it installed on your home computer - you will have to get it from CTAN. Note that font packages are somewhat harder to install than usual packages.

Be aware that if you include txfonts in you manuscript, it will change all your fonts, and the result may not be to your likening.





September 27

The following material was handed out Furthermore, chapter 6, 7 and 8 of Sean Meyn and Richard Tweedies book Markov Chains and Stochastic Stability were handed out.



We went through Meyn and Tweedies p. 931-9613. Well, perhaps not to the letter - we gave a proof of lemma 4.2.5 with a somewhat stronger probabilistic flavour than Meyn and Tweedies proof, which is based on formal manipulations with taboo probabilites. Actually the proofs are the same, but that will presumably remain hidden to most readers...

After that we went through chapter 4 of my own notes on atoms. We gave an extended discussion of regeneration, and in particular of why the existence of a Harris recurrent atom almost for free gives a strong law of large numbers for the Markov chain. The disucssion was heuristic, but it is probably worthwhile to write down in a more formal way. I will try to do that before the next lecture.



Plan: The plan for the lecture October 4 is: chapter 5 of my own notes on Lifts and Splittings followed by a disucssion of the existence of small sets, based on Meyn and Tweedie p. 111-116. The great surprise is that small sets occur all over the place... We will not give a proof of theorem 5.2.1, but if time permits, the ideas of the proof will be revealed.





September 23

It has been pointed out to me that an embarrasing mistake has crept into the formulation of problem 4-8 in 'Paper 1'. The error has to do with the usage of the term geometric distribution.

Geometric distributions arise in situations where a coin is flipped until it shows head. There are two things you can count: The number T of tails occuring before the first head. Or the waiting time W until the first heads shows up. Evidently the two numbers are related by the trivial formula W = T + 1.

The formal definition of the geometric distribution is that it is the distribution of T. But speaking colloquially, the word is frequently used to refer to the distribution of W as well. I have on many occasionens complained about the confusion that this dual use leads to, so it is particularly embarrasing that I myself cannot keep my mind straight on the matter.

I suggest we do the following:
  1. Invent a term, like simple waiting time distribution to describe the distribution of W above.

  2. Replace the occurence of geometric distribution in problem 6 and 8 with simple waiting time distribution.
Do not change any of the formulas - just the words.





September 20

The following material was handed out Note that a solution to the problems in paper 1 is part of the course requirement.



We went through the above chapter 3 on taboo probabilities - the chapter is very sketchy indeed, and merely contains a formal correct definition of taboo probabilities and return probabilities. Based on the definiton of return probabilities we went through Meyn and Tweedies p. 9022-937, where irreducibility is introduced, and the maximal irreducibility measure is constructed. We did not cover part (iii) of proposition 4.2.2, but we will do that next time.

We also discussed a few examples of irreducible chains: the reflected random walk (which Meyn and Tweedie calls a random walk on a half line) p. 9612-9716, which is a trivial example with a onepoint measure as irreducibility measure. And the general random walk p. 988-984. We did not go as much into detail with the random walk as it deserves - in fact we only considered the case with normally distributed increments, where the Lebesgue measure trivially acts as an irreducibility measure. So please pay close attention to the elaborate details of the more general case, discussed in Meyn and Tweedie. A thorough understanding of this example should have extremely high priority.



It was recommended that some time is spend studying p. 983- 10311. This recommendation is more casual. The language is somewhat perculiar (at least it seems so to me) but the content is good clean fun. I would not say that it is important for the further study of Markov chains, as the details are algebraic, rather than probabilistic, and so have rather limited range of application. But the result is certainly nice to know.

It is shown that under reasonable assumptions the update scheme given p. 10110 leads to an irreducible chain if the error distribution is a regular normal distribution.

Furthermore, it is shown that the stacked version of an AR(k)-proces (the idea of stacking autoregressiv processes is explained in Meyn and Tweedie p. 27) automatically satisfies the 'reasonable assumptions'. Hence stacked versions of Gaussian AR(k)-processes are alway irreducible with k-dimensional Lebesgue measure as irreducibility measure.



A scan of my preparational notes for the lecture as a pdf-file. The notes may be quite unreadable, and they may well be almost unrelated to what actually happened at the lecture.



Plan: The plan for the lecture September 27 is: part (iii) of Proposition 4.2.2, p. 93-96 and p. 106-115. We will presumable not give a proof of the existence of small sets (Theorem 5.2.1), as that would occupy almost the entire lecture. I also have a feeling that the connection bewteen what I will say about splitting and what Meyn and Tweedie says (p. 108) will seem rather remote...





September 13

Chapter 4 and 5 of Sean Meyn and Richard Tweedies book Markov Chains and Stochastic Stability were handed out. The next lecture will be devoted to the concept of irreducibility, centered around chapter 4 of Meyn and Tweedie.

A new version of the manuscript we have been discussing so far, was handed out. It is still very rough, but at least some of the edges have been smoothed out. We went through p. 37-56, covering the topics of strong Markov properties, weak and strong time homogeneity, and the Chapman-Kolmogorov equations.

We did not go into a detailed discussion of the lemmas p. 41-42 containing the core details in the proof of the strong Markov property. These details are not particular difficult, but they lack a clear direction, and so they are potentially bewildering.

We also skipped the proof of theorem 2.24, where it is proved that is suffices to check weak time homogenity on bounded stopping times. This proof builds on a rather difficult - but also quite illuminating and potentially useful - lemma, stated on p. 4-6, which relates conditional probabilities of a fixed event, but where the conditioning is varied.



Recapturing our effort so forth, we have more or less covered the whole manuscript, except for a few of the examples and the above technical exceptions. In many ways the essence of the manuscript is given in example 2.29, where a straight forward and pretty mechanical proof is given of the fact that the forward recurrence time chain in a renewal setting is indeed a Markov chain. We have developped a vocabulary and a technology suitable for the demonstration of Markov properties, also in highly non-trivial situations.





September 6

Chapter 1, 2 and 3 of Sean Meyn and Richard Tweedies book Markov Chains and Stochastic Stability was distributed. We will not cover these chapters systematically in the lectures - the main content is lifted to the notes I distributed last time (I plan to hand out updated versions of these notes at the next lecture).

In a sense we started from page 1 of my notes again. We covered something like p. 23-79, 157-194, 215-2311 243-3018.

There are in fact three definitions of a Markov chain.
  1. A Markov chain is a proces given by an update scheme.

  2. A Markov chain is a proces, where the finitedimensional distributions can be computed via horrible-looking multiple integrals involving the one-step transition kernels. (Check out Meyn & Tweedie p. 66, and observe that the measure-theoretic details of the exposition are in fact worse than ours, in the sense that they have to rely on socalled canonical representation spaces, mainly in order to make sense of the strong Markov property.)

  3. A Markov chain is a proces, where the future is conditionally independent the past given the present (my notes, definition 2.1).
No effort was spared to convince the audience that the last definition is preferable! Even though it is not standard in the litterature. It is not really my own invention, it is (a somewhat polished version of) the customary definition of 'Markov properties' in the graphical models community. I have felt for years that someone ought to bring that back to Markov chains proper. This turns out to be extremely succesfull, due to the fact that it is possible to prove a strong Markov property whithin this framework. We shall elaborate on this in the comming lectures.

Be that as it may, the three definitions are in fact equivalent. That definition 3 implies definition 2 is the content of the remarks on p. 22 of my notes. That definition 2 implies the existence of an update scheme and hence definition 1 is the content of theorem 2.4 of my notes. And that condition 1 implies condition 3 is the content of theorem 2.3.

Still, I think that the choice of definition matters quite a lot. Definition 2 is a historical mistake. It is simply not useful, except for in the discrete case. Examples of Markov chains on general spaces come in two classes: It is virtually impossible to give direct proofs that processes in the second class satisfy definition 2 above. And hence the conspiracy in the litterature that such a proof 'is not needed'.

As a very illuminating example, check Meyn and Tweedies exposition of the forward recurrence time chain. They are defined p. 44, where they are discussed, but where nothing is proven (which is fair enough, as no definition of anything has been given at that point). They are taken up again p. 60, where the one-step transition probabilities are calculated. But that calculation is completely trivial, compared to the intricacies of showing that the forward recurrence time chain is Markovian - a fact which can not be established by computing the one-step probabilities, but only by showing that the finite dimensional distributions can be build up using them. There is even a continuous-time analogue of the forward recurrence time chain defined p. 74-75, with an equally vacuous discussion of the Markov property.

It is a swindle! And it will be evident that there is something to prove when we do it in the next lecture. It is not particularly difficult, once you know what to do, but it certainly is not something obvious.



But of course there is a reason behind the traditional approach. 'Markov chain theory' is not as much a term used for study of concrete stochastic processes, as for the study of a specific Markov kernel P = (Px), in the sense that it is the simultanous study of all processes where the the one-step transitions kernels all equal P.

Many of the concepts we will disucss (irreducibility, Harris recurrence etc.) are in fact properties of a specific kernel, rather than properties of a stochastic proces. And Markov chain theory is often thought of as a branch of functional analysis, as the Markov kernel P gives rise to an operator on functions spaces, where a function h is mapped to Ph (see Meyn & Tweedie p. 128 for the exact definition). Traditional Markov chain theory focuses very much on the spectral properties of this operator.

Though a number of insights can be obtained using this functional analytic viewpoint, it will not occupy us very much. We will be much more concerned with the pointwise behaviour of concrete stochastic processes.





August 30

The following material was distributed: Both manuscripts are grossly unfinished, and will soon be replaced by more polished versions :-)

The lecture took place in a quite noisy atmosphere and it was difficult to concentrate. As a result, the lecture was not in any way a linear exposition of the notes. Personally I think of conditional distributions as far more complicated objects than conditional probabilities (althoug the hardest part perhaps is to remember which is which) and I tried to postpone any mentioning of Markov kernel stuff untill we have an environment better suited for concentration.

Thus I think I have been covering p. 11-22, 710-156, 211-216 and 2312-244 so far.






Latest update: September 21, 2006
erhansen@math.ku.dk