diff --git a/source/feature_selection.rst b/source/feature_selection.rst
index efdd31e..b23ffec 100644
--- a/source/feature_selection.rst
+++ b/source/feature_selection.rst
@@ -37,18 +37,6 @@ of novels containing works by two authors: Jane Austen and Charlotte Brontë.
 This :ref:`corpus of six novels <datasets>` consists of the following text
 files:
 
-.. ipython:: python
-
-    filenames
-
-.. raw:: html
-    :file: generated/feature_selection_bayesian.txt
-
-
-We will find that among the words that reliably distinguish Austen from Brontë
-are  "such", "could", and "any". This tutorial demonstrates how we arrived at
-these words.
-
 .. ipython:: python
     :suppress:
 
@@ -66,6 +54,18 @@ these words.
     CBRONTE_FILENAMES = ['CBronte_Jane.txt', 'CBronte_Professor.txt', 'CBronte_Villette.txt']
     filenames = AUSTEN_FILENAMES + CBRONTE_FILENAMES
 
+.. ipython:: python
+
+    filenames
+
+We will find that among the words that reliably distinguish Austen from Brontë
+are  "such", "could", and "any". This tutorial demonstrates how we arrived at
+these words.
+
+.. raw:: html
+    :file: generated/feature_selection_bayesian.txt
+
+
 .. note:: The following features an introduction to the concepts underlying
     feature selection. Those who are working with a very large corpus and are
     familiar with statistics may wish to skip ahead to the section on
@@ -345,13 +345,7 @@ in Brontë's novels were much more variable, say, 0.03, 0.04, and 0.66 (0.24 on
 average).  Although the averages remain the same, the difference does not seem
 so pronounced; with only one observation (0.66) noticeably greater than we find in Austen, we
 might reasonably doubt that there is evidence of a systematic difference between
-the authors. [#fnlyon]_
-
-.. [#fnlyon] Unexpected spikes in word use happen all the time. Word usage in a large corpus
-    is notoriously "bursty" (a technical term!) :cite:`church_poisson_1995`.
-    Consider, for example, ten French novels, one of which is set in Lyon.
-    While "Lyon" might appear in all novels, it would appear much (much) more
-    frequently in the novel set in the city.]
+the authors. [#fn_lyon]_
 
 One way of formalizing a comparison of two groups that takes account of the
 variability of word usage comes from Bayesian statistics. To describe our
@@ -640,7 +634,7 @@ This produces a useful ordering of characteristic words. Unlikely `frequentist
 observations within groups. This method will also work for small corpora
 provided useful prior information is available. To the extent that we are
 interested in a close reading of differences of vocabulary use, the Bayesian
-method should be preferred. [#fnunderwood]_
+method should be preferred. [#fn_underwood]_
 
 .. _chi2:
 
@@ -936,7 +930,13 @@ Exercises
 
 .. FOOTNOTES
 
-.. [#fnunderwood] Ted Underwood has written a `blog post discussing some of the
+.. [#fn_lyon] Unexpected spikes in word use happen all the time. Word usage in a large corpus
+    is notoriously *bursty* :cite:`church_poisson_1995`.
+    Consider, for example, ten French novels, one of which is set in Lyon.
+    While "Lyon" might appear in all novels, it would appear much (much) more
+    frequently in the novel set in the city.]
+
+.. [#fn_underwood] Ted Underwood has written a `blog post discussing some of the
    drawbacks of using the log likelihood and chi-squared test statistic in the
    context of literary studies <http://tedunderwood.com/2011/11/09/identifying-the-terms-that-characterize-an-author-or-genre-why-dunnings-may-not-be-the-best-method/>`_.]
 
diff --git a/source/topic_model_mallet.rst b/source/topic_model_mallet.rst
index def33d7..5ec8e81 100644
--- a/source/topic_model_mallet.rst
+++ b/source/topic_model_mallet.rst
@@ -128,11 +128,6 @@ documentation in the Python library `itertools
 <http://docs.python.org/dev/library/itertools.html>`_ describes a function
 called ``grouper`` using ``itertools.izip_longest`` that solves our problem.
 
-.. [#fnmapreduce] Those familiar with
-    `MapReduce <https://en.wikipedia.org/wiki/MapReduce>`_ may recognize the pattern of
-    splitting a dataset into smaller pieces and then (re)ordering them.
-
-
 .. ipython:: python
     :suppress:
 
@@ -483,3 +478,11 @@ to be associated more strongly with Austen's novels than with Brontë's.
 
 .. raw:: html
     :file: generated/topic_model_distinctive_avg_diff.txt
+
+.. FOOTNOTES
+
+.. [#fnmapreduce] Those familiar with
+    `MapReduce <https://en.wikipedia.org/wiki/MapReduce>`_ may recognize the pattern of
+    splitting a dataset into smaller pieces and then (re)ordering them.
+
+
diff --git a/source/topic_model_visualization.rst b/source/topic_model_visualization.rst
index 892fa23..cd90d51 100644
--- a/source/topic_model_visualization.rst
+++ b/source/topic_model_visualization.rst
@@ -438,6 +438,8 @@ This shows us that a greater diversity of vocabulary items are associated with
 topic 3 (likely many of the French words that appear only in Brontë's *The
 Professor*) than with topic 0.
 
+.. FOOTNOTES
+
 .. [#fnpritchard] The topic model now familiar as LDA was independently
     discovered and published in 2000 by Pritchard et al.
     :cite:`pritchard_inference_2000`.
diff --git a/source/working_with_text.rst b/source/working_with_text.rst
index a647de6..6ddc628 100644
--- a/source/working_with_text.rst
+++ b/source/working_with_text.rst
@@ -5,7 +5,7 @@
  Working with text
 ===================
 
-.. note:: This tutorial is also available in download for interactive use
+.. note:: This tutorial is available for interactive use
    with `IPython Notebook <http://ipython.org/notebook.html>`_: :download:`Working with text.ipynb <Working with text.ipynb>`.
 
 Creating a document-term matrix