1 Introduction
This paper develops tools which allow projecting a random walk on a group to a Markov chain on special equivalence classes of the group. Fourier analysis on the group can then be harnessed to give sharp analysis of rates of convergence to stationarity for the Markov chain on equivalence classes. We begin with a motivating example.
Example 1.1 (Coagulationfragmentation processes).
In chemistry and physics, coagulationfragmentation processes are models used to capture the behavior of ‘blobs’ that combine and break up over time. These processes are used in population genetics to model the merging and splitting of family groups. A simple mean field model considers n unlabeled particles in a partition $\lambda = (\lambda _1, \dots , \lambda _k)$ , , $\sum _{i = 1}^k \lambda _i = n$ . At each step of the process, a pair of particles is chosen uniformly at random and the partition evolves according to the rules:

1. If the particles are in distinct blocks, combine the blocks.

2. If the particles are in the same block, break the block uniformly into two blocks.

3. If the same particle is chosen twice, do nothing.
This defines a Markov chain on partitions of n. Natural questions are:

• What is the stationary distribution $\pi (\lambda )$ ?

• How does the process evolve?

• How long to reach stationarity?
All of these questions can be answered by considering the random transpositions process on the symmetric group $S_n$ . The transition probabilities for this process are constant on conjugacy classes, the conjugacy classes are indexed by partitions and the conjugacy class containing the current permutation of the walk evolves as the coagulationfragmentation process on partitions. The answers are (see section 2.3):

• The stationary distribution is $\pi (\lambda ) = \prod _{i=1}^n 1/(i^{a_i} a_i!)$ for a partition $\lambda $ with $a_i$ parts of size i.

• Starting at $\lambda = 1^n$ , the pieces evolve as the connected components of a growing ErdősRényi random graph process: Initially there are n vertices and no edges. Random edges are added to the graph so that the connected components correspond to the parts of a partition $\lambda $ of n. This works as long as there are no repeated edges $(i, j)$ (so for number of steps of smaller order than n). Repeated edges can be easily handled so that there is a tight connection between the growth of the random graph and the random transpositions Markov chain. In particular, random transpositions mix in order $(1/2)n \log n$ , which is the same amount of time for the random graph to become connected. See [Reference Berestycki, Schramm and Zeitouni6] for more details.

• It takes order $n \log n$ steps to reach stationarity.
The coagulationfragmentation process is a special case of a double coset walk: Let G be a finite group, $H, K$ subgroups of G. The equivalence relation
partitions the group into double cosets $H \backslash G / K$ . Let $Q(s)$ be a probability measure on G. That is, $0 \le Q(s) \le 1$ for all $s \in G$ and $\sum _{s \in G} Q(s) = 1$ . Further assume $Q(s)$ is a class function: It is constant on conjugacy classes, that is, $Q(s) = Q(t^{1} s t)$ for all $s, t \in G$ . Then Q defines a random walk on G by repeated multiplication of random elements chosen according to Q. In other words, the random walk is induced by convolution, $Q^{*k}(s) = \sum _{t \in G} Q(t) Q^{*(k1)}(s t^{1})$ and a single transition step has probability $P(x, y) = Q(yx^{1})$ .
This random walk induces a random process on the space of double cosets. While usually a function of a Markov chain is no longer a Markov chain, in this situation the image of the random walk on $H \backslash G / K$ is Markov. Section 2.2 proves the following general result. Throughout, we pick double coset representatives $x \in G$ and write x for $H x K$ .
Theorem 1.2. For $Q(s) = Q(t^{1} s t)$ a probability on G, the induced process on $H \backslash G / K$ is Markov with the following properties.

1. The transitions are
$$\begin{align*}P(x, y) = Q(HyKx^{1}), \quad x, y \in H \backslash G /K. \end{align*}$$ 
2. The stationary distribution is
$$\begin{align*}\pi(x) = \frac{HxK}{G}. \end{align*}$$ 
3. If $Q(s^{1}) = Q(s)$ , then P is reversible with respect to $\pi $ :
$$\begin{align*}\pi(x)P(x, y) = \pi(y)P(y, x). \end{align*}$$
Further, suppose Q is concentrated on a single conjugacy class $\mathcal {C}$ (that is, $Q(s) = \delta _{\mathcal {C}}(s)/\mathcal {C}$ ). Then the Markov chain P has the following properties.

1. The eigenvalues of P are among the set
$$\begin{align*}\left\lbrace \frac{\chi_{\lambda}(\mathcal{C})}{\chi_{\lambda}(1)} \right\rbrace_{\lambda \in \widehat{G}}, \end{align*}$$where $\widehat {G}$ is the set of all irreducible representations of G and $\chi _{\lambda }$ is the character of the irreducible representation indexed by $\lambda $ . 
2. The multiplicity of $\chi _{\lambda }(\mathcal {C})/\chi _{\lambda }(1)$ is
$$\begin{align*}m_{\lambda} = \left\langle \chi_{\lambda}_H, 1 \right\rangle \cdot \left\langle \chi_{\lambda}_K, 1 \right\rangle, \end{align*}$$where $\left \langle \chi _{\lambda }_H, 1 \right \rangle $ is the number of times the trivial representation appears when $\chi _{\lambda }$ is restricted to H. 
3. For any time $\ell> 0$ ,
$$\begin{align*}\sum_{x \in H \backslash G /K} \pi(x) \P_x^{\ell}  \pi \_{TV}^2 \le \frac{1}{4} \sum_{\lambda \neq 1} m_{\lambda} {\left \frac{\chi_{\lambda}(\mathcal{C})}{\chi_{\lambda}(1)} \right}^{2 \ell}. \end{align*}$$
This theorem shows that the properties of the induced chain are available via the character theory of G. It is proved with variations and extensions in Section 2.2. The main example is introduced next.
Example 1.3 ( $GL_n(q)$ and Gaussian elimination).
Fix a prime power q, and let $GL_n(q)$ be the invertible $n \times n$ matrices over $\mathbb {F}_q$ . Let $H = K = \mathcal {B}$ be the Borel subgroup: uppertriangular matrices in $GL_n(q)$ . A classical result is the Bruhat decomposition,
where $\omega $ is the permutation matrix for the permutation $\omega \in S_n$ . This decomposition shows that the double cosets $\mathcal {B} \backslash GL_n(q) / \mathcal {B}$ are indexed by permutations. As explained in Section 2.5 below, the permutation $\omega $ associated to $M \in GL_n(q)$ is the ‘pivotal’ permutation when M is reduced to uppertriangular form by row reduction (Gaussian elimination).
The set of transvections $\mathcal {T}_{n,q}$ is the conjugacy class containing the basic row operations $I + \theta E_{ij}$ ; here, $\theta \in \mathbb {F}_q$ , and $E_{ij}$ is the matrix with a $1$ in position $(i, j)$ and zeroes everywhere else (so $I + \theta E_{ij}$ acts by adding $\theta $ times row i to row j). Hildebrand [Reference Hildebrand35] gave sharp convergence results for the Markov chain on $GL_n(q)$ generated by the class function Q which gives equal probability to all transvections. He shows that n steps are necessary and sufficient for convergence to the uniform distribution for any q. Of course, convergence of the lumped chain on $\mathcal {B} \backslash GL_n(q) / \mathcal {B}$ might be faster. The results we found surprised us. Careful statements are included below. At a high level, we found:

• Starting from a ‘typical’ state $x \in S_n$ , order $\log n/\log q$ steps are necessary and sufficient for convergence. This is an exponential speedup from the original chain.

• Starting from $id$ , order n steps are necessary and sufficient for convergence.

• Starting from $\omega _0$ , the reversal permutation, order $\log n/(2\log q)$ steps are necessary and sufficient for convergence.
To simplify the statement of an honest theorem, let us measure convergence in the usual chisquare or $L^2$ distance:
In Section 3, the usual total variation distance is treated as well.
Theorem 1.4. The random transvections walk on $GL_n(q)$ induces a Markov chain $P(x, y)$ on $S_n \cong \mathcal {B} \backslash GL_n(q) / \mathcal {B}$ with stationary distribution.
for $\omega \in S_n$ , where $I(\omega ) =  \{ (i, j): i < j, \omega (i)> \omega (j) \}$ is the number of inversions in $\omega $ .
Furthermore, if $\log q> 6/n$ then the following statements are true.

1. (Typical start) If $\ell \ge (\log n + c)/(\log q  6/n)$ for any $c> 0$ , then
$$\begin{align*}\sum_{x \in S_n} \pi(x) \chi_x^2(\ell) \le (e^{e^{c}}  1) + e^{cn}. \end{align*}$$Conversely, for any $\ell $
$$\begin{align*}\sum_{x \in S_n} \pi(x) \chi_x^2(\ell) \ge (n1)^2 q^{4 \ell}. \end{align*}$$These results show order $\log _q(n)$ steps are necessary and sufficient for convergence.

2. (Starting from $id$ ) If $\ell \ge (n \log q/2 + c)/(\log q  6/n), c> 0$ , then
$$ \begin{align*} \chi_{id}^2(\ell) \le (e^{e^{2c}}  1) + e^{cn}. \end{align*} $$Conversely, for any $\ell $ ,
$$\begin{align*}\chi_{id}^2(\ell) \ge (n  1)(q^{n1}  1)q^{4 \ell}. \end{align*}$$These results show that order n steps are necessary and sufficient for convergence starting from the identity.

3. (Starting from $\omega _0$ ) If $\ell \ge (\log n/2 + c)/(\log q  6/n)$ for $c \ge 2 \sqrt {2}$ , then there is a universal constant $K> 0$ (independent of $q, n$ ) such that
$$ \begin{align*} \chi_{\omega_0}^2(\ell) \le  2K \log(1  e^{c}) + K e^{cn}. \end{align*} $$Conversely, for any $\ell $ ,
$$\begin{align*}\chi_{\omega_0}^2(\ell) \ge q^{(n2)}(n1) (q^{n1} 1) q^{4 \ell}. \end{align*}$$These results show that order $\log _q(n)/2$ steps are necessary and sufficient for convergence starting from $\omega _0$ .
Remark 1.5. Note that, while Hildebrand’s result of order n convergence rate was independent of the parameter q, the rates in Theorem 1.4 depend on q.
The stationary distribution $\pi _q$ is the Mallows measure on $S_n$ . This measure has a large enumerative literature; see [Reference Diaconis and Simper24] Section 3 for a review or [Reference Zhong64]. It is natural to ask what the induced chain ‘looks like’ on $S_n$ . After all, the chain induced by random transpositions on partitions has a simple description and is of general interest. Is there a similarly simple description of the chain in $S_n$ ? This question is treated in Section 5 using the language of Hecke algebras.
Theorem 1.6. Let $H_n(q)$ be the Hecke algebra corresponding to the $\mathcal {B} \backslash GL_n(q) / \mathcal {B}$ double cosets and $D = \sum _{T \in \mathcal {T}_{n,q}} T \in H_n(q)$ be the sum of all transvections. Then,
with $T_{ij}$ in the Hecke algebra.
This gives a probabilistic description of the induced chain on $S_n$ . Roughly stated, from $\omega \in S_n$ pick $(i,j), i < j,$ with probability proportional to $q^{(j  i)}$ and transpose i and j in $\omega $ using the Metropolis algorithm (reviewed in Section 2.6). This description is explained in Section 5; see also [Reference Diaconis and Ram18]. The probabilistic description is crucial in obtaining good total variation lower bounds for Theorem 1.4.
Outline
Section 2 develops and surveys background material on double cosets, Markov chains (proving Theorem 1.2), transpositions and coagulationfragmentation processes, transvections and Gaussian elimination. Theorem 1.4 is proved in Section 3. Theorem 1.6 is proved in Section 5 using a row reduction. Section 6 contains another Markov chain from a lumping of the transvections chain, and Section 7 surveys further examples—contingency tables and extensions of the $GL_n$ results to finite groups of Lie type, for which the Bruhat decomposition holds and there are natural analogs of transvections. Of course, there are an infinite variety of groups $G, H, K$ , and we also indicate extensions to compact Lie groups.
We have posted a more leisurely, expository version of the present paper on the arXiv [Reference Diaconis, Ram and Simper19]. This contains more examples and proof details.
Notation
Throughout, q will be a prime power. For a positive integer n, define the quantities
2 Background
This section gives the basic definitions and tools needed to prove our main results. Section 2.1 gives background on double cosets. In Section 2.2, Markov chains are reviewed and Theorem 1.2 is proved, along with extensions. Section 2.3 reviews the coagulationfragmentation literature along with the random transpositions literature. Section 2.4 develops what we need about transvections, and Section 2.5 connects the Bruhat decomposition to Gaussian elimination. Finally, Section 2.6 reviews the Metropolis algorithm.
2.1 Double cosets
Let $H, K$ be subgroups of a finite group G. The double coset decomposition is a standard tool of elementary group theory. The original proof of Sylow’s first theorem uses double cosets, as does Mackey’s basic theorems on decomposing restrictions of induced representations. The Hecke algebra $\mathrm {End}_{G}(G/H)$ —the linear maps of the right Hinvariant functions that commute with the action of G—has a basis induced by HH double cosets. Hecke algebras are basic objects of study in modern number theory. For a detailed survey, see [Reference Diaconis and Simper24] or [Reference Curtis and Reiner14]. Double cosets can have very different sizes and [Reference Diaconis and Simper24], [Reference Paguyo48] develop a probabilistic and enumerative theory. For present applications, an explicit description of the double cosets is needed.
Example 2.1. Let $S_{\lambda }, S_{\mu }$ be parabolic subgroups of the symmetric group $S_n$ . Here, $\lambda = (\lambda _1, \dots , \lambda _I)$ and $\mu = (\mu _1, \dots , \mu _J)$ are partitions of n. The subgroup $S_{\lambda }$ consists of all permutations in which the first $\lambda _1$ elements may only be permuted amongst each other, the next $\lambda _1 + 1, \dots , \lambda _1 + \lambda _2$ elements may only be permuted amongst each other and so on. It is a classical fact that the double cosets $S_{\lambda } \backslash S_n / S_{\mu }$ are in bijection with ‘contingency tables’—arrays of nonnegative integers with row sums $\lambda $ and column sums $\mu $ . See [Reference James and Kerber38], Section 1.3. For proofs and much discussion of the connections between the group theory and applications and statistics, see [Reference Diaconis and Simper24], Section 5. Random transpositions on $S_n$ induces a natural Markov chain on these tables, see [Reference Simper60] and Chapter 3 of [Reference Simper59]. Contingency tables also label the double cosets of parabolic subgroups in $GL(n, Q)$ . See [Reference Karp and Thomas39] and Section 7.1 below.
Example 2.2. Let M be a finite group, $G = M \times M$ , and $H = K = M$ embedded diagonally as subgroups of G (that is, $\{(m, m): m \in M \}$ ). The conjugacy classes in G are products of conjugacy classes in each coordinate of M. In the double coset equivalency classes, note that
and so double cosets can be indexed by conjugacy classes of M. If $Q_1$ is a conjugacy invariant probability on M, then $Q = Q_1 \times \delta _{id}$ is conjugacy invariant on G. The random walk on G induced by Q maps to the random walk on M induced by $Q_1$ . In this way, the double coset walks extend conjugacy invariant walks on M. Example 1.1 in the introduction is a special case.
Of course, the conjugacy classes in M (and so the double cosets) can be difficult to describe. Describing the conjugacy classes of $U_n(q)$ —the unit uppertriangular matrices in $GL_n(q)$ —is a wellknown ‘wild’ problem. See [Reference Aguiar, André, Benedetti, Bergeron, Chen, Diaconis, Hendrickson, Hsiao, Isaacs, Jedwab, Johnson, Karaali, Lauve, Le, Lewis, Li, Magaard, Marberg, Novelli, Pang, Saliola, Tevlin, Thibon, Thiem, Venkateswaran, Vinroot, Yan and Zabrocki2] or [Reference Diaconis and Malliaris17] for background and details. Describing the double cosets for the Sylow psubgroup in $S_n$ seems difficult.
Example 2.3. Let G be a finite group of Lie type, defined over $\mathbb {F}_q$ , with Weyl group W. Let $\mathcal {B}$ be the Borel subgroup (maximal solvable subgroup). Take $H = K = \mathcal {B}$ . The Bruhat decomposition gives
so the double cosets are indexed by W. See [Reference Carter11], Chapter 8, for a clear development in the language of groups with a $(\mathcal {B}, N)$ pair.
Conjugacy invariant walks on G have been carefully studied in a series of papers by David Gluck, Bob Guralnick, Michael Larsen, Martin Liebeck, Aner Shalev, Pham Tiep and others. These authors develop good bounds on the character ratios needed. See [Reference Guralnick, Larsen and Tiep30] for a recent paper with careful reference to earlier work. Of course, Example 2 with $G = GL_n(q), W = S_n$ is a special case. The present paper shows what additional work is needed to transfer character ratio results from G to W.
The double cosets form a basis for the algebra of HK biinvariant functions $L(H \backslash G /K)$ with product
This is usually developed for $H = K$ [Reference Curtis and Reiner14], [Reference Ceccherini Silberstein, Scarabotti and Tolli12], [Reference Diaconis15], but the extra flexibility is useful. We add a caveat: When $H = K$ , the algebra of biinvariant functions (into $\mathbb {C}$ ) is semisimple and with a unit. This need not be the case for general H and K. David Craven tried many pairs of subgroups of $S_4$ and found examples which were not semisimple. For $G=S_4$ , Marty Issacs produced the example H the cyclic subgroup generated by $(1234)$ and K the cyclic subgroup generated by $(1243)$ . The algebra doesn’t have a unit and so cannot be semisimple. This occurs even for some pairs of distinct parabolic subgroups of $S_n$ . There are also distinct pairs of parabolics where the algebra is semisimple. Determining when this occurs is an open question.
Further examples are in Section 7. Since the theory is developed for general $H, K, G$ there is a large set of possibilities. What is needed are examples where the double cosets are indexed by familiar combinatorial objects and the walks induced on $H \backslash G / K$ are of independent interest.
2.2 Markov chain theory
Let $H, K$ be subgroups of a finite group G, and Q a probability on G. See [Reference Levin and Peres43] for an introduction to Markov chains; see [Reference Diaconis15] or [Reference SaloffCoste53] for random walks on groups.
Proposition 2.4. Let Q be a probability on G which is Hconjugacy invariant ( $Q(s) = Q(h^{1}s h) $ for $h \in H, s \in G$ ). The image of the random walk driven by Q on G maps to a Markov chain on $H \backslash G /K$ with transition kernel
The stationary distribution of P is $\pi (x) = HxK/G$ . If $Q(s) = Q(s^{1})$ , then $(P, \pi )$ is reversible.
Proof. The kernel P is well defined; that is, it is independent of the choice of double coset representatives for $x, y$ . Dynkin’s criteria ([Reference Kemeny and Snell40] Chapter 6, [Reference Pang49]) says that the image of a Markov chain in a partitioning of the state space is Markov if and only if for any set in the partition and any point in a second set, the chance of the original chain moving from the point to the first set is constant for points in the second set.
Fixing $x, y$ , observe
Since the uniform distribution on G is stationary for the walk generated by Q, the stationary distribution of the lumped chain is $\pi (x) = HxK/G$ . Finally, any function of a reversible chain is reversible and $Q(s) = Q(s^{1})$ gives reversibility of the walk on G.
Remark 2.5. A different sufficient condition for Proposition 2.4 is $Q(sh) = Q(s)$ for all $s \in G, h \in H$ .
Remark 2.6. Usually, a function of a Markov chain is not Markov. For relevant discussion of similar ‘orbit chains’, see [Reference Boyd, Diaconis, Parrilo and Xiao9].
In all of our examples, the measure Q is a class function ( $Q(s) = Q(t^{1}st)$ for all $s, t \in G$ ), which is a stronger requirement than that in Proposition 2.4. The eigenvalues of the walk on G can be given in terms of the irreducible complex characters of G. Let $\widehat {G}$ be an index set for these characters. We write $\lambda \in \widehat {G}$ and $\chi _{\lambda }(\mathcal {C})$ for the character value at the conjugacy class $\mathcal {C}$ . Let
If Q is simply concentrated on a single conjugacy class $\mathcal {C}$ , then $\beta _{\lambda }$ is the character ratio
For a review of a large relevant literature on character ratios and their applications, see [Reference Guralnick, Larsen and Tiep30].
The restriction of $\chi _{\lambda }$ to H is written $\chi _{\lambda }_H$ and $\left \langle \chi _{\lambda } _H, 1 \right \rangle $ is the number of times the trivial representation of H appears in $\chi _{\lambda }_H$ . By reciprocity, this is $\left \langle \chi _{\lambda }, \mathrm {Ind}_H^G(1) \right \rangle $ , where $\mathrm {Ind}_H^G$ is the induced representation from H to G.
Proposition 2.7. Let Q be a class function on G. The induced chain $P(x, y)$ of Proposition 2.4 has eigenvalues
with multiplicity
The average square total variation distance to stationarity satisfies
Proof. The eigenvalues of a lumped chain are always some subset of the eigenvalues of the original chain. To determine the multiplicity of the eigenvalue $\beta _{\lambda }$ in the lumped chain, fix $\lambda : G \to GL_{d_{\lambda }}$ an irreducible representation of G. Let $M^{\lambda }$ be the $d_{\lambda } \times d_{\lambda }$ matrix representation of $\lambda $ . That is, each entry $M_{ij}^{\lambda }: G \to \mathbb {C}$ is a function of G. These functions are linearly independent and can be chosen to be orthogonal with respect to
(see Chapter 3 of [Reference Serre58]). Let $V_{\lambda }$ be the space of all linear combinations of the functions $M_{ij}^{\lambda }$ . If $f \in V_{\lambda }$ , then
That is, $V_{\lambda }$ is the eigenspace for the eigenvalue $\beta _{\lambda }$ and it has dimension $d_{\lambda }^2 = \chi _{\lambda }(1)^2$ .
In the lumped chain on $H \backslash G / K$ , a basis for the eigenspace for eigenvalue $\beta _{\lambda }$ are the $H \times K$ invariant functions in $V_{\lambda }$ [Reference Boyd, Diaconis, Parrilo and Xiao9]. To determine the dimension of this subspace, note that $G \times G$ can act on $V_{\lambda }$ by $f^{g_1, g_2}(x) = f(g_1^{1} x g_2)$ . This gives a representation of $G \times G$ on $V_{\lambda }$ . The matrix of this representation is isomorphic to $M \otimes M$ , since $M_{ij}(s^{1} t u) = M_{ij}(s^{1})M_{ij}(t)M_{ij}(u)$ .
This representation restricts to a representation $M_H \otimes M_K$ of $H \times K$ , and the dimension of the $H \times K$ invariant functions in $V_{\lambda }$ is the multiplicity of the trivial representation on $M_H \otimes M_K$ . This is
To note the total variation inequality, let $1 = \beta _1 \ge \beta _2 \ge \dots \ge \beta _n \ge \dots \ge \beta _{S_n} \ge 1$ be the eigenvalues with eigenfunctions $f_j$ (chosen to be orthonormal with respect to $\pi $ ), and we have
where $\ \cdot \_{2, \pi }$ denotes the $\ell ^2$ norm with respect to the distribution $\pi $ . Multiplying by $\pi (x)$ and summing over all x in the state space gives
using orthonormality of $f_j$ . The total variation bound arises since $4 \P^{\ell }_x  \pi \_{TV}^2 \le \P^{\ell }_x/\pi  1 \_{2, \pi }^2$ .
2.3 Random transpositions and coagulationfragmentation processes
Let $G = S_n$ be the symmetric group. The random transpositions Markov chain, studied in [Reference Diaconis and Shahshahani22], is generated by the measure
This was the first Markov chain where a sharp cutoff for convergence to stationarity was observed. A sharp, explicit rate is obtained in [Reference SaloffCoste and Zúñiga54]. They show
The asymptotic ‘profile’ (the limit of $\Q^{\ell }  u\_{TV}$ as a function of c for n large) is determined in [Reference Teyssier63]. Schramm [Reference Schramm56] found a sharp parallel between random transpositions and the growth of an ErdősRényi random graph: Given vertices $1, 2, \dots , n$ , for each transposition $(i, j)$ chosen, add an edge from vertices i to j to generate a random graph. See [Reference Berestycki, Schramm and Zeitouni6] for extensions and a comprehensive review. The results, translated by the coagulationfragmentation description of the cycles, give a full and useful picture for the simple mean field model described in the introduction.
It must be emphasized that this mean field model is a very special case of coagulationfragmentation models studied in the chemistryphysicsprobability literature. These models study the dynamics of particles diffusing in an ambient space, and allow general collision kernels (e.g., particles close in space may be more likely to join). The books by Bertoin [Reference Bertoin7] and Pitman [Reference Pitman50] along with the survey paper of Aldous [Reference Aldous3] are recommended for a view of the richness of this subject. On the other hand, the sharp rates of convergence results available for the mean field model are not available in any generality.
There is a healthy applied mathematics literature on coagulationfragmentation. A useful overview which treats discrete problems such as the ones treated here is [Reference Ball and Carr5]. A much more probabilistic development of the celebrated Becker–Doring version of the problem is in [Reference Hingant and Yvinec36]. This develops rates of convergence using coupling. See also [Reference Durrett, Granovsky and Gueron27] for more models with various stationary distributions on partitions.
Other lumpings of random transpositions include classical urn models—the Bernoulli–Laplace model [Reference Diaconis and Shahshahani23], [Reference Eskenazis and Nestoridi28], and random walks on phylogenetic trees [Reference Diaconis and Holmes26]. The sharp analysis of random transpositions transfers, via comparison theory, to give good rates of convergence for quite general random walks on the symmetric group [Reference Diaconis and SaloffCoste20], [Reference Helfgott, Seress and Zuk34]. For an expository survey, see [Reference Diaconis16].
2.4 Transvections
Fix n, a prime p and $q = p^a$ for some positive integer a. A transvection is an invertible linear transformation of $\mathbb {F}_q^n$ which fixes a hyperplane, is not the identity and has all eigenvalues equal to $1$ . Transvections are convenient generators for the group $SL_n(q)$ because they generalize the basic row operations of linear algebra. These properties are carefully developed in [Reference Suzuki62] Chapter 1, 9; [Reference Artin4] Chapter 4.
Using coordinates, let $\mathbf {a}, \mathbf {v} \in \mathbb {F}_q^n$ be two nonzero vectors with $\mathbf {a}^{\top } \mathbf {v} = 0$ . A transvection, denoted $T_{\mathbf {a}, \mathbf {v}} \in GL_n(q)$ is the linear map given by
It adds a multiple of $\mathbf {v}$ to $\mathbf {x}$ , the amount depending on the ‘angle’ between $\mathbf {a}$ and $\mathbf {x}$ . As a matrix, . Multiplying $\mathbf {a}$ by a nonzero constant and dividing $\mathbf {v}$ by the same constant doesn’t change $T_{\mathbf {a}, \mathbf {v}}$ . Let us agree to normalize $\mathbf {v}$ by making its last nonzero coordinate equal to $1$ . Let $\mathcal {T}_{n, q} \subset SL_n(q)$ be the set of all transvections.
An elementary count shows
It is easy to generate $T \in \mathcal {T}_{n, q}$ uniformly: Pick $\mathbf {v} \in \mathbb {F}_q^n$ uniformly, discarding the zero vector. Normalize $\mathbf {v}$ so the last nonzero coordinate, say index j, is equal to $1$ . Pick $a_1, a_2, \dots , a_{j1}, a_{j+1}, \dots a_n$ uniformly in $\mathbb {F}_q^{n1}  \{0 \}$ , and set $a_j$ so that $\mathbf {a}^{\top } \mathbf {v} = 0$ . The transvection $T_{\mathbf {a}, \mathbf {v}}$ fixes the hyperplane $\{\mathbf {x}: \mathbf {a}^{\top } \mathbf {x} = 0 \}$ .
Example 2.8. Taking $\mathbf {v} = \mathbf {e}_1, \mathbf {a} = \mathbf {e}_2$ gives the transvection with matrix
This acts on $\mathbf {x}$ by adding the second coordinate to the first. Similarly, the basic row operation of adding $\theta $ times the ith coordinate to the jth is given by $T_{\mathbf {e}_j, \theta \mathbf {e}_i}$ .
Lemma 2.9. The set of transvections $\mathcal {T}_{n, q}$ is a conjugacy class in $GL_n(q)$ .
Proof. Let $M \in GL_n(q)$ , so $M T_{\mathbf {e}_2, \mathbf {e}_1} M^{1}$ is conjugate to $T_{\mathbf {e}_2, \mathbf {e}_1}$ . Then,
Let $\mathbf {a}$ be the second column of $(M^{1})T_{\mathbf {e}_2, \mathbf {e}_1}$ and $\mathbf {v}$ the first column of M, and check this last is $T_{\mathbf {a}, \mathbf {v}}$ (and $\mathbf {a}^{\top } \mathbf {v} = 0$ ). Thus, transvections form a conjugacy class.
2.5 Gaussian elimination and the Bruhat decomposition
The reduction of a matrix $M\in GL_n(q)$ to standard form by row operations is a classical topic in introductory linear algebra courses. It gives efficient, numerically stable ways to solve linear equations, compute inverses and calculate determinants. There are many variations.
Example 2.10. Consider the sequence of row operations
The first step subtracts $3$ times row $2$ from row $3$ , multiplication by
The second step adds $2$ times row $1$ to row $3$ , multiplication by
The third (pivot) step brings the matrix to uppertriangular form by switching rows $1$ and $2$ , which corresponds to multiplication by the matrix by
This gives $\omega _1 L_2 L_1 M = U \implies M = L_2^{1} L_1^{1} \omega _1^{1} U = L \omega U$ with $L = L_2^{1} L_1^{1}, \omega = \omega _1^{1} = \omega _1$ .
If $\mathcal {L}, \mathcal {B}$ are the subgroups of lower and uppertriangular matrices in $GL_n(q)$ , this gives
Any linear algebra book treats these topics. A particularly clear version which uses Gaussian elimination as a gateway to Lie theory is in Howe [Reference Howe37]. Articles by Lusztig [Reference Lusztig45] and Strang [Reference Strang61] have further historical, mathematical and practical discussion.
Observe that carrying out the final pivoting step costs $d_c(\omega ,\mathrm {id})$ operations, where $d_c(\omega ,\mathrm {id})$ , the Cayley distance of $\omega $ to the identity, is the minimum number of transpositions required to sort $\omega $ (with arbitrary transpositions $(i,j)$ allowed). Cayley proved $d_c(\omega ,\mathrm {id}) = n  \#\mbox {cycles in} \ \omega $ (see [Reference Diaconis16]). In the example above $n=3$ , $\omega = 213$ has two cycles and $32 =1$ —one transposition sorts $\omega $ .
How many pivot steps are needed ‘on average’? This becomes the question of the number of cycles in a pick from Mallows measure $\pi _q$ . Surprisingly, this is a difficult question. Following partial answers by Gladkich and Peled [Reference Gladkich and Peled29], this problem was recently solved by Jimmy He, Tobias Möller and Teun Verstraaten in [Reference He, Möller and Verstraaten33]. They show that, when $q> 1$ , the limiting behavior of the number of even cycles under $\pi _q$ has an approximate normal distribution with mean and variance proportional to n, and that the number of odd cycles has bounded mean and variance.
The Bruhat decomposition
In algebraic group theory, one uses
This holds for any semisimple group over any field with $\mathcal {B}$ replaced by the Borel group (the largest solvable subgroup) and $S_n$ replaced by the Weyl group.
Let $\omega _0 = \left (\begin {smallmatrix} 1 & 2 & \cdots & n\\ n & n1 & \cdots & 1 \end {smallmatrix}\right )$ be the reversal permutation in $S_n$ . Since $\mathcal {L} = \omega _0 \mathcal {B} \omega _0$ , equation (2.7) is equivalent to the LU decomposition (2.6). Given $M \in GL_n(q)$ , Gaussian elimination on $\omega _0 M$ can be used to find $\omega _0 M \in L \omega ' U$ and thus $M \in B \omega B$ with $\omega = \omega _0 \omega '$ .
The subgroup $\mathcal {B}$ gives rise to the quotient $GL_n(q)/\mathcal {B}$ . This may be pictured as the space of ‘flags’. Here, a flag F consists of an increasing sequence of subspaces with $\text {dim}(F_i) = i$ . Indeed, $GL_n(q)$ operates transitively on flags and the subgroup fixing the standard flag is exactly $\mathcal {B}$ . This perspective will be further explained and used in Section 6 to study a function of the double coset Markov chain on $\mathcal {B} \backslash GL_n(q) / \mathcal {B}$ .
Remark 2.11. The double cosets of $GL_n(q)$ define equivalence classes for any subgroup of $GL_n(q)$ . For the matrices $SL_n(q)$ with determinant $1$ , these double cosets again induce the Mallows distribution on permutations. More precisely, for $x \in SL_n(q)$ , let $[x]_{SL_n(q)} = \{x' \in SL_n(q) : x' \in \mathcal {B} x \mathcal {B} \}$ be the equivalence class created by the double coset relation $\mathcal {B} \backslash GL_n(q) / \mathcal {B}$ , within $SL_n(q)$ . Note that two matrices $x, x' \in SL_n(q)$ could be in the same double coset with $x' = b_1 x b_2$ , but $b_1, b_2 \notin SL_n(q)$ (necessarily, $\mathrm {det}(b_1) = \mathrm {det}(b_2)^{1}$ ).
Then, $[\omega ]_{SL_n(q)}/SL_n(q) = p_q(\omega )$ . This follows since $GL_n(q) = (q1) \cdot SL_n(q)$ , and $\mathcal {B} \omega \mathcal {B} = (q  1) \cdot [\omega ]_{SL_n(q)}$ . If $M \in GL_n(q)$ and $M \in B \omega B$ , then $M/\det (M) \in [\omega ]_{SL_n(q)}$ . Conversely, for each $M \in SL_n(q)$ there are $(q  1)$ unique matrices in $GL_n(q)$ created by multiplying M by $1, 2, \dots , q  1$ .
2.6 The Metropolis algorithm
The Metropolis algorithm is a basic algorithm of scientific computing which arises in describing the random walk induced by transvections on the double cosets $\mathcal {B}_n \backslash GL_n(q) / \mathcal {B}_n$ (Section 3.2). This section gives background.
Given a probability distribution $\pi $ on a space $\mathcal {X}$ , the Metropolis algorithm gives a way of changing the output of a Markov chain with transition matrix $K(x, y)$ to have stationary distribution $\pi $ on $\mathcal {X}$ . For simplicity, suppose the original chain is symmetric, $K(x, y) = K(y, x)$ (as in our examples). This implies that $K(x, y)$ has a uniform stationary distribution. Define the Metropolis Markov chain with the transition matrix:
These transition probabilities have a simple implementation: From x, pick y according to $K(x, y)$ . If $\pi (y) \ge \pi (x)$ , move to y. If $\pi (y) < \pi (x)$ , flip a coin with heads probability $\pi (y)/\pi (x)$ . If the coin is heads, move to y. If the coin is tails, stay at x. Elementary calculations show that $\pi (x) M(x, y) = \pi (y) M(y, x)$ , that is, M has $\pi $ as stationary distribution. Note that the normalizing constant of $\pi $ is not needed.
For background, applications, and theoretical properties of the Metropolis algorithm, see the textbook of Liu [Reference Liu44] or the survey [Reference Diaconis and SaloffCoste21]. Sharp analysis of rates of convergence of the Metropolis algorithm is still an open research problem. The special cases developed in Section 3.2 show it can lead to fascinating mathematics.
3 Double coset walks on $\mathcal {B} \backslash GL_n(q) / \mathcal {B}$
Throughout this section, $\mathcal {B}$ is the group of uppertriangular matrices in $GL_n(q)$ , $\mathcal {T}_{n,q}$ is the conjugacy class of transvections in $GL_n(q)$ . This gives the probability measure on $GL_n(q)$ defined by
Note the random transvections measure Q is supported on $SL_n(q)$ , a subgroup of $GL_n(q)$ . This means that the random walk on $GL_n(q)$ driven by Q is not ergodic (there is zero probability of moving x to y if $x, y$ are matrices with different determinants). However, Q is a class function on $GL_n(q)$ since transvections form a conjugacy class. The image of the uniform distribution on $GL_n(q)$ mapped to $\mathcal {B} \backslash GL_n(q) /\mathcal {B}$ is the Mallows measure $\pi _q(\omega ) = q^{I(\omega )}/[n]_q!$ .
Section 3.1 introduces the definition of the Markov chain as multiplication in the Hecke algebra, which is further explained in Section 5. Section 3.3 gives the combinatorial expressions for the eigenvalues and their multiplicities, which are needed to apply Theorem 1.2 for this case. Section 4.2 shows that for the induced Markov chain on $S_n$ , starting from $id \in S_n$ , order n steps are necessary and sufficient for convergence. Section 4.3 studies the chain starting from the reversal permutation, for which only order $\log n/2 \log q$ steps are required. Finally, Section 4.4 considers starting from a ‘typical’ element, according to the stationary distribution, for which $\log n/\log q$ steps are necessary and sufficient.
These results can be compared to Hildebrand’s Theorem 1.1 [Reference Hildebrand35] which shows that the walk driven by Q on $GL_n(q)$ converges in $n + c$ steps (uniformly in q). Our results thus contribute to the program of understanding how functions of a Markov chain behave and how the mixing time depends on the starting state. In this case, changing the starting state gives an exponential speed up.
3.1 Hecke algebras and the Metropolis algorithm
The set of $\mathcal {B}$  $\mathcal {B}$ double cosets of $GL_n(q)$ has remarkable structure. For $\omega \in S_n$ , let $T_{\omega } = \mathcal {B} \omega \mathcal {B}$ . Linear combinations of double cosets form an algebra (over $\mathbb {C}$ , for example).
Definition 3.1. The Iwahori–Hecke algebra $H_n(q)$ is spanned by the symbols $\{T_{\omega } \}_{\omega \in S_n}$ and generated by $T_i = T_{s_i}$ for $s_i = (i, i+1), 1 \le i \le n 1$ , with the relations
where $I(\omega )$ is the usual length function on $S_n$ ( $I(s_i \omega ) = I(\omega )\pm 1$ ).
Consider the flag space $\mathcal {F} = G/\mathcal {B}$ . The group $GL_n(q)$ acts on the left on $\mathcal {F}$ . One can see $H_n(q)$ acting on the right of $\mathcal {F}$ and in fact
The Hecke algebra is the full commuting algebra of $GL_n(q)$ acting on $GL_n(q)/\mathcal {B}$ . Because transvections form a conjugacy class, the sum of transvections is in the center of the group algebra $\mathbb {C}[GL_n(q)]$ , and so it may be regarded as an element of $H_n(q)$ . This will be explicitly delineated and the character theory of $H_n(q)$ used to do computations.
3.2 The Metropolis connection
The relations (3.1) can be interpreted probabilistically. Consider what equation (3.1) says as linear algebra: Left multiplication by $T_{s_i}$ can take $\omega $ to $\omega $ or $s_i \omega $ . The matrix of this map (in the basis $\{T_{\omega }\}_{\omega \in S_n}$ ) has $\omega , \omega '$ entry
For example, on $GL_3(q)$ using the ordered basis $T_{id}, T_{s_1}, T_{s_2}, T_{s_1s_2}, T_{s_2s_1}, T_{s_1s_2s_1}$ , the matrix of left multiplication by $s_1$ is
The first column has a $1$ in row $s_1$ because $I(s_1)> I(id)$ . The second column has entries q and $q1$ in the first two rows because $I(s_1^2) = I(id) < I(s_1)$ .
We can also write the matrices for multiplication defined by $T_{s_2}$ and $T_{s_1s_2s_1}$ as
Observe that all three matrices above have constant row sums (q, q and $q^3$ , respectively). Dividing by these row sums gives three Markov transition matrices: $M_1, M_2$ and $M_{121} = M_1M_2M_1$ .
These matrices have a simple probabilistic interpretation: Consider, for $\overline {q} = 1/q$ , the matrix defined
The description of this Markov matrix is: From $\omega $ , propose $s_1 \omega $ :

• If $I(s_1 \omega )> I(\omega )$ , go to $s_1 \omega $ .

• If $I(s_1 \omega ) < I(\omega )$ , go to $s_1 \omega $ with probability $1/q$ , else stay at $\omega $ .
This is exactly the Metropolis algorithm on $S_n$ for sampling from $\pi _q(\omega )$ with the proposal given by the deterministic chain ‘multiply by $s_1$ ’. The matrices $M_i, 1 \le i \le n 1$ , have a similar interpretation and satisfy
The Metropolis algorithm always results in a reversible Markov chain. See Section 2.6 or [Reference Diaconis and SaloffCoste21] for background. It follows that any product of $\{M_i \}$ and any convex combination of such products yields a $\pi _q$ reversible chain. Note also that the Markov chain on $S_n$ is automatically reversible since it is induced by a reversible chain on $GL_n$ .
Corollary 3.2. The random transvections chain on $GL_n(q)$ lumped to $\mathcal {B}$  $\mathcal {B}$ cosets gives a $\pi _q$ reversible Markov chain on $S_n$ .
Proof. Up to normalization, the matrix D in Theorem 1.6 is a positive linear combination of Markov chains corresponding to multiplication by
This yields a combination of the reversible chains .
Example 3.3. The transition matrix of the transvections chain on $GL_3(q)$ lumped to $S_3$ is $\frac {1}{\mathcal {T}_{n, q}}D$ , with
When $q=2$ , the lumped chain has transition matrix
We report that this example has been verified by several different routes including simply running the transvections chain, computing the double coset representative at each step and estimating the transition rates from a long run of the chain.
Remark 3.4. The random transvections Markov chain on $S_n$ is the ‘qdeformation’ of random transpositions on $S_n$ . That is, as q tends to $1$ , the transition matrix tends to the transition matrix of random transpositions. To see this, recall $\mathcal {T}_{n, q} = (q^n  1)(q^{n1}  1)/(q1)$ and use L’Hopitals rule to note, for any integer k,
The interpretation of multiplication on the Hecke algebra as various ‘systematic scan’ Markov chains is developed in [Reference Diaconis and Ram18], [Reference Bufetov and Nejjar10]. It works for other types in several variations. We are surprised to see it come up naturally in the present work.
The following corollary provides the connection to [Reference Ram52 Reference Ram, (3.16),(3.18),(3.20)] and [Reference Diaconis and Ram18, Proposition 4.9].
Corollary 3.5. Let $J_1 = 1$ , and let $J_k = T_{s_{k1}}\cdots T_{s_2}T_{s_1}T_{s_1}T_{s_2}\cdots T_{s_{k1}}$ , for $k\in \{2, \ldots , n\}$ . Then
Proof. Using that $J_k = T_{s_{k1}}J_{k1}T_{s_{k1}}$ , check, by induction, that
Thus,
where $D_{(21^{n2})} := \sum _{i < j} q^{(n  1)  (ji)} T_{(i, j)}$ .
3.3 Eigenvalues and multiplicities
Hildebrand [Reference Hildebrand35] determined the eigenvalues of the random walk driven by Q on $GL_n(q)$ . His arguments use Macdonald’s version of J.A. Green’s formulas for the characters of $GL_n(q)$ along with sophisticated use of properties of Hall–Littlewood polynomials. Using the realization of the walk on the Hecke algebra, developed below in Section 5, and previous work of Ram and Halverson [Reference Halverson and Ram31], we can find cleaner formulas and proofs. Throughout, we have tried to keep track of how things depend on both q and n (the formulas are easier when $q = 2$ ).
Theorem 3.6.

1. The eigenvalues $\beta _{\lambda }$ of the Markov chain $P(x, y)$ driven by the random transvections measure Q on $\mathcal {B} \backslash GL_n(q) / \mathcal {B}$ are indexed by partitions $\lambda \vdash n$ . These are
(3.2) $$ \begin{align} \beta_{\lambda} = \frac{1}{\mathcal{T}_{n, q}} \left( q^{n1} \sum_{b \in \lambda} q^{ct(b)}  \frac{q^{n}  1}{q  1} \right), \end{align} $$with b ranging over the boxes of the partition $\lambda $ . For the box in row i and column j, the content is defined as $ct(b) = j  i$ , as in [Reference Macdonald46]. 
2. The multiplicity of $\beta _{\lambda }$ for the induced Markov chain on $\mathcal {B} \backslash GL_n(q) / \mathcal {B}$ is $f_{\lambda }^2$ , where
(3.3) $$ \begin{align} f_{\lambda} = \frac{n!}{\prod_{b \in \lambda} h(b)}. \end{align} $$Here, $h(b)$ is the hook length of box b [Reference Macdonald46].

3. The multiplicity of $\beta _{\lambda }$ for the Markov chain induced by Q on $GL_n(q)/\mathcal {B}$ is
(3.4) $$ \begin{align} d_{\lambda} = f_{\lambda} \cdot \frac{[n]_q!}{\prod_{b \in \lambda} [h(b)]_q}. \end{align} $$
The argument uses the representation of the Markov chain as multiplication on the Hecke algebra. This is developed further in Section 5.
Proof. (a): Let D be the sum of transvections as in Theorem 1.6. By [Reference Halverson and Ram31] (3.20), the irreducible representation $H_n^{\lambda }$ of $H_n$ indexed by $\lambda $ has a ‘seminormal basis’ $\{ v_T \mid \widehat {S}_n^{\lambda } \}$ such that $J_k v_T = q^{k1} q^{c(T(k))} v_T$ , where $T(k)$ is the box containing k in T. Thus,
(b): The dimension of the irreducible representation of $H_n$ indexed by $\lambda $ is the same as the dimension of the irreducible representation of $S_n$ indexed by $\lambda $ , which is well known as the hooklength formula [Reference Macdonald46, Ch. I(7.6)(ii), §6 Ex.2(a)]:
(c): With $G = GL_n(q), H = H_n$ , the result follows since [Reference Macdonald46, Ch. IV(6.7)],
Example 3.7. Equation 3.2 for some specific partitions gives
The following simple lemma will be used several times in subsequent sections. It uses the usual majorization partial order (moving up boxes) on partitions of n, [Reference Macdonald46]. For example, when $n = 4$ the ordering is $1111 \prec 211 \prec 22 \prec 31 \prec 4$ .
Lemma 3.8. The eigenvalues $\beta _{\lambda }$ of Theorem 3.6 are monotone increasing in the majorization order.
Proof. Moving a single box (at the corner of the diagram of $\lambda $ ) in position $(i, j)$ to position $(i', j')$ necessitates $i' < i$ and $j'> j$ , and so $q^{ji} < q^{j'  i'}$ . Since any $\lambda \prec \lambda '$ can be obtained by successively moving up boxes, the proof is complete. For example,
The partition $1^n = (1, 1, \dots , 1)$ is the minimal element in the partial order, and since $\beta _{(1^n)} = 0$ , we have the following:
Corollary 3.9. If $\lambda \vdash n$ , then
Corollary 3.10. If is a partition of n with $\lambda _1 = n  j$ and $j \le n/2$ , then
Proof. The first inequality follows from Lemma 3.8. The formula for $\beta _{(n  j, j)}$ is a simple calculation from equation (3.2). Recall the elementary inequalities:
These give the inequality for $\beta _{(n  j, j)}$ :
In the following sections, we will use a further bound from Corollary 3.10.
Corollary 3.11. Define $\kappa _{n, q, j} := (1 + q^{(n  2j + 1)})(1 + q^{(n  1)})(1 + q^{(n  2)})$ and
Then for all $n, q$ ,
Proof. Using $1 + x \le e^x$ , there is the initial bound
which uses that $j \ge 1$ and so $(n  2) \ge (n 2j)$ . Then,
With $f(x) = 3q^{(n  2x)}/x$ , we have
Since $2 \log (2)> 1$ , we see that $f(x)$ is increasing for $x \ge 1$ and any $q \ge 2$ . Thus, for $1 \le j \le n/2$ , $f(j)$ is maximized for $j = n/2$ , which gives
4 Mixing time analysis
In this section, the eigenvalues from Section 3.3 are used to give bounds on the distance to stationarity for the random transvections Markov chain on $S_n$ . Section 4.1 reviews the tools which are needed for the bounds from specific starting states. Section 4.2 proves results for the chain started from the identity element, Section 4.3 proves results for the chain started from the reversal permutation, and Section 4.4 contains bounds for the average over all starting states.
4.1 Eigenvalue bounds
The following result from [Reference Diaconis and Ram18] will be the main tool for achieving bounds on the chisquare distance of the chain from different starting states.
Proposition 4.1 (Proposition 4.8 in [Reference Diaconis and Ram18]).
Let H be the Iwahori–Hecke algebra corresponding to a finite real reflection group W. Let K be a reversible Markov chain on W with stationary distribution $\pi $ determined by left multiplication by an element of H (also denoted by K). The following identities are true:

1. $\chi ^2_x(\ell ) = q^{2 I(x)}\sum _{\lambda \neq \mathbf {1}} t_{\lambda } \chi _H^{\lambda }(T_{x^{1}} K^{2 \ell } T_x)$ , $x \in W$ ,

2. $\sum _{x \in W} \pi (x) \chi ^2_x(\ell ) = \sum _{\lambda \neq \mathbf {1}} f_{\lambda } \chi _H^{\lambda }(K^{2 \ell })$ ,
where $\chi _H^{\lambda }$ are the irreducible characters, $t_{\lambda }$ the generic degrees and $f_{\lambda }$ the dimensions of the irreducible representations of W.
In general, the righthand side of (a) could be difficult to calculate, but it simplifies for the special cases $x = id, x = \omega _0$ . These calculations, and the analysis of the sum, are contained in the following sections.
The righthand side of the equations in Proposition 4.1 involves the following quantities, defined for $\lambda \vdash n$ :

• $n_{\lambda } = \sum _{i = 1}^{\lambda } (i  1) \lambda _i$ ,

• $c_{\lambda } = \sum _{b \in \lambda } ct(b)$ , where $ct(b) = j  i$ if box b is in column j and row i,

• $t_{\lambda } = q^{n_{\lambda }} \cdot r_{\lambda }$ , where $r_{\lambda } = \frac {[n]_q!}{\prod _{b \in \lambda } [h_b]_q}$ , $[k]_q = (q^k  1)/(q  1)$ and ,

• $f_{\lambda }$ , which is the number of standard Young tableau of shape $\lambda $ . The formula is
$$\begin{align*}f_{\lambda} = \frac{n!}{\prod_{b \in \lambda} h(b)}. \end{align*}$$
Table 1 shows these values for $n = 4$ and general q. From this example, we can observe that $c_{\lambda }$ is increasing with respect to the partial order on partitions, while $n_{\lambda }$ is decreasing.
Let us record that
Since $n!\le n^n = e^{n\log n}$ then
We will use the following bounds from [Reference Diaconis and Ram18, Lemma 7.2]:
In addition, we need the following bounds for sums of $f_{\lambda }$ . Part (b) of the proposition below is Lemma 7.2(b) in [Reference Diaconis and Ram18]; the proof there is incomplete, so we give the simple proof below.
Proposition 4.2.

1. There is a universal constant $K> 0$ such that, for all $1 \le j \le n$ ,
$$\begin{align*}\sum_{\lambda: \lambda_1=nj} f_{\lambda} \le \frac{n^{j}}{\sqrt{j!}} \cdot \frac{K}{j} e^{2 \sqrt{2j}}. \end{align*}$$ 
2. For $1\le j\le n$ ,
$$\begin{align*}\sum_{\lambda: \lambda_1=nj} f^2_{\lambda} \le \frac{n^{2j}}{j!}. \end{align*}$$
Proof. Recall that $f_{\lambda }$ is the number of standard tableau of shape $\lambda $ , that is, the elements $1, 2, \dots , n$ are arranged in the shape $\lambda $ so that rows are increasing left to right and columns are increasing top to bottom. If $\lambda _1 = n j$ , then there are $\binom {n}{j}$ ways to choose the elements not in the first row of the tableau. For a fixed partition $\lambda $ , the number of ways of arranging the remaining j elements is at most the number of Young tableau corresponding to the partition of j created from the remaining rows of $\lambda $ . This number is at most $\sqrt {j!}$ (Lemma 3 in [Reference Diaconis and Shahshahani22]). Thus,
where $p(j)$ is equal to the number of partitions of j. It is well known that $\log (p(n)) \sim B \cdot \sqrt {n}$ for a constant B. More precisely, from (2.11) in [Reference Hardy and Ramanujan32], there is a universal constant $K>0$ such that for all $n \ge 1$ ,
This gives (a).
For part (b), we again use the inequality $f_{\lambda } \le \binom {n}{j} f_{\lambda ^*}$ , where $\lambda ^* = (\lambda _2, \dots , \lambda _k)$ is the partition of j determined by the rest of $\lambda $ after the first row. Then,
Proposition 4.3. The function $s(\lambda ) := q^{c_{\lambda }} t_{\lambda }$ is monotone with respect to the partial order on partitions. For any $\lambda \vdash n$ ,
Proof. Suppose that $\lambda \prec \widetilde {\lambda }$ and $\widetilde {\lambda }$ is obtained from $\lambda $ by ‘moving up’ one box. Suppose the box at position $(i, j)$ is moved to $(i', j')$ , with $i' < i, j'> j$ .
Let $g(\lambda ) = c_{\lambda } + n_{\lambda }$ . Then,
This implies that
Now, consider the change in $r_{\lambda }$ . The hook lengths of $\widetilde {\lambda }$ are:
and $\widetilde {h}(k, l) = h(k, l)$ for all other boxes $b = (k, l)$ . Thus,
Using the inequalities $q^{r1} < (q^r  1) < q^r$ , we have
Combining this with the result for $g(\lambda )$ :
since $i> i'$ .
Assuming the monotonicity, then if $\lambda _1 \ge n/2$ we have $s(\lambda ) \le s((\lambda _1, n  \lambda _1))$ . To calculate this quantity:
For $r_{\lambda }$ , note that the hook lengths of $(\lambda _1, n  \lambda _1)$ are
If $\lambda _1 = n  j$ , we see the terms that cancel:
This uses the inequality $q^{r1} < q^r  1 < q^r$ . Thus, if $\lambda _1 \ge n/2$ , we have shown
Now, suppose $\lambda _1 \le n/2$ , so $s(\lambda ) \le s((n/2, n/2))$ (assume n is even). To calculate this,
To bound $r_{\lambda }$ , use the same calculation as before to get
and so in total $s((n/2, n/2)) \le q^{n^2/2  n/2} = q^{\binom {n}{2}}$ .
4.2 Starting from $\mathrm {id}$
Theorem 4.4. Let P be the Markov chain on $S_n$ induced by random transvections on $GL_n(q)$ .

1. For $t_{\lambda }, \beta _{\lambda }$ defined in Theorem 3.6, we have
$$\begin{align*}4 \P_{id}^{\ell}  \pi_q \_{TV}^2 \le \chi_{id}^2(\ell) = \sum_{\lambda \vdash n, \lambda \neq (n)} t_{\lambda} f_{\lambda} \beta_{\lambda}^{2 \ell}. \end{align*}$$ 
2. Let $\alpha _{n, q}$ be as in Corollary 3.11 and $n, q$ such that $\log q> 6/n$ . Then if $\ell \ge \frac {n \log q/2 + \log n + c}{(\log q  \alpha _{n, q})}, c> 0$ , we have
$$ \begin{align*} \chi_{id}^2(\ell) &\le (e^{e^{2c}}  1) + e^{cn}. \end{align*} $$ 
3. For any $\ell \ge 1$ ,
$$\begin{align*}\chi_{id}^2(\ell) \ge (q^{n1}  1)(n  1)q^{4 \ell}. \end{align*}$$ 
4. If $\ell \le n/8$ , then for fixed q and n large,
(4.4) $$ \begin{align} \P_{id}^{\ell}  \pi_q \_{TV} \ge 1  o(1). \end{align} $$
Theorem 4.4 shows that restricting the random transvections walk from $GL_n(q)$ to the double coset space only speeds things up by a factor of $2$ when started from the identity. Hildebrand [Reference Hildebrand35] shows that the total variation distance on all of $GL_n(q)$ is only small after $n + c$ steps. Note this is independent of q.
Proof. (a): The inequality follows from Proposition 4.1 (b):
(b): From Corollary 3.10 if $\lambda _1 = n  j$ , then $\beta _{\lambda } \le \kappa _{n, q, j} q^{j}$ , where
Using the bound on $t_{\lambda }$ from equation (4.3), for $1 \le j \le \lfloor n/2 \rfloor $ , we have
Recall the final inequality follows since $\alpha _{n, q} := \max _{1 \le j \le n/2} \log (\kappa _{n, q, j})/j$ . If $\ell = \frac {n \log q/2 + \log n + c}{(\log q  \alpha _{n, q})}$ , then the exponent in equation (4.5) is
This gives
Next, we need to consider the partitions $\lambda $ with $\lambda _1 \le n/2$ . For these partitions,
Then we have
using $\sum _{\lambda } f_{\lambda }^2 = n!$ and that if $n/2 \le j \le n$ , then
since the function is increasing in j. Note also that if $q \ge 3$ , then $\log q> 1$ and $n! \le n^n = e^{\log n}$ . To finish the bound,
(c): The lower bound comes from considering the $\lambda = (n  1, 1)$ term from the sum in (a). Using the quantities (4.1), this gives
This uses that $(q^{n2}  1)/(q^{n1}  1) \ge q^{2}$ .
(d): From the alternative version of the walk on the Hecke algebra, involving $D/\mathcal {T}_{n, q}$ with D from 1.6, the walk proceeds by picking a transposition $(i,j), i < j$ with probability proportional to
and multiplying by $T_{ij}$ . As described in Section 3.2, multiplication by $T_{ij}$ corresponds to proposing the transposition $(i, j)$ and proceeding via the Metropolis algorithm. Thus, multiplication by $T_{ij}$ induces at most $2(j  i)$ inversions, always less than $2n$ . From [Reference Diaconis and Shahshahani22] Theorem 5.1, under $\pi _q$ , a typical permutation has $\binom {n}{2}  \frac {n  q}{q + 1} + O(\sqrt {n})$ inversions (and the fluctuations are Gaussian about this mean). If $\ell = n/8$ , the measure $P_{id}^{\ell }(\cdot )$ is concentrated on permutations with at most $n^2/4$ inversions and $\pi _q$ is concentrated on permutations with order $n^2/2  (n  1)/(q + 1) + O(\sqrt {n})$ inversions.
4.3 Starting from $\omega _0$
Theorem 4.5. Let P be the Markov chain on $S_n$ induced by random transvections on $GL_n(q)$ , and let $\omega _0 \in S_n$ be the reversal permutation in $S_n$ .

1. With $t_{\lambda }, c_{\lambda }, \beta _{\lambda }$ defined in Section 3.3,
$$\begin{align*}4 \P_{\omega_0}^{\ell}  \pi_q \_{TV}^2 \le \chi_{\omega_0}^2(\ell) = q^{\binom{n}{2}} \sum_{\lambda \neq (n)} q^{c_{\lambda}} t_{\lambda} f_{\lambda} \beta_{\lambda}^{2 \ell}. \end{align*}$$ 
2. Let $\alpha _{n, q}$ be as in Corollary 3.11 and $n, q$ such that $\log q> 6/n$ . If $\ell \ge (\log n/2 + c)/(\log q  \alpha _{n, q})$ for $c> 0$ with $c \ge 2 \sqrt {2}$ then
$$\begin{align*}\chi_{\omega_0}^2(\ell) \le  2K \log(1  e^{c}) + \sqrt{K} e^{ cn}, \end{align*}$$for a universal constant $K> 0$ (independent of $q, n$ ).

3. For any $\ell \ge 1$ ,
$$\begin{align*}\chi^2_{\omega_0}(\ell) \ge q^{(n2)}(n1) (q^{n1} 1) q^{4 \ell}. \end{align*}$$
Remark 4.6. Theorem 4.5 shows that the Markov chain has a cutoff in its approach to stationarity in the chisquare metric. It shows the same exponential speed up as the walk started at a typical position (Theorem 4.7 below) and indeed is faster by a factor of $2$ . This is presumably because it starts at the permutation $\omega _0$ , at which the stationary distribution $\pi _q$ is concentrated, instead of ‘close to $\omega _0$ ’.
Proof of Theorem 4.5.
(a): By Proposition 4.9 in [Reference Diaconis and Ram18], if $\omega _0$ is the longest element of W, then
where $\rho ^{\lambda }$ is the irreducible representation indexed by $\lambda $ . Using this and 4.1 (a),
since $K \in Z(H)$ , that is, K commutes with all elements of the Hecke algebra.
(b): Suppose $\lambda _1 = n j$ for $1 \le j \le n/2$ . Recall the definition $s(\lambda ) = q^{c_{\lambda }}t_{\lambda }$ . From Proposition 4.3, $s(\lambda ) \le q^{\binom {n}{2}}$ . Then,
The third inequality uses Proposition 4.2 for $\sum _{\lambda : \lambda _1 = n  j} f_{\lambda }$ . Recall $\alpha _{n, q} := \max _{1 \le j \le n/2} \log (\kappa _{n, q, j})/j$ . If $\ell = (\log n/2 + c)/(\log q  \alpha _{n, q})$ , then the bound becomes
using the loose bound $\sqrt {j!}> j/2$ for all $j \ge 1$ . With the assumption that $c \ge 2 \sqrt {2}$ , we have $2jc + 2 \sqrt {2j} \le jc$ for all $j \ge 1$ . Finally,
Now, for the $\lambda $ with $\lambda _1 \ge n/2$ , we have
where $p(n)$ is equal to the number of partitions of n (the inequality is Cauchy–Schwarz). Since $p(n) \le \frac {K}{n} e^{2 \sqrt {2n}}$ for a constant $K> 0$ ([Reference Hardy and Ramanujan32]) and $\sum _{\lambda \vdash n} f_{\lambda }^2 = n!$ , this gives
since $\sqrt {n!} \le e^{n \log n/2}$ . Since $c> 2 \sqrt {2}$ , then the bound is $\le \sqrt {K} e^{cn}$ for any $n \ge 1$ .
(c): A lower bound comes from using equation (4.1) for the lead term on the righthand side of (a):
4.4 Starting from a typical site
In analyzing algorithms used repeatedly for simulations, as the algorithm is used, it approaches stationarity. This means the quantity
is of interest. For the problem under study, $\pi _q$ is concentrated near $\omega _0$ so we expect rates similar to those in Theorem 4.5.
Theorem 4.7. Let P be the Markov chain on $S_n$ induced by random transvections on $GL_n(q)$ .

1. With $f_{\lambda }, \beta _{\lambda }$ defined in Section 3.3,
$$\begin{align*}\left( \sum_{x \in S_n} \pi_q(x) \P_x^{\ell}  \pi_q \_{TV} \right)^2 \le \frac{1}{4} \sum_{x \in S_n} \pi_q(x) \chi_{x}^2(\ell) = \frac{1}{4} \sum_{\lambda \neq (n)} f_{\lambda}^2 \beta_{\lambda}^{2 \ell}. \end{align*}$$ 
2. Let $\alpha _{n, q}$ be as in Corollary 3.11 and $n, q$ such that $\log q> 6/n$ . If $\ell \ge (\log n + c)/(\log q  \alpha _{n, q}), c> 0$ , then
$$\begin{align*}\sum_{x \in S_n} \pi_q(x) \chi_{x}^2(\ell) \le (e^{e^{c}}  1) + e^{cn}. \end{align*}$$ 
3. For any $\ell \ge 1$ ,
$$\begin{align*}\sum_{x \in S_n} \pi_q(x) \chi_{x}^2(\ell) \ge (n1)^2 q^{4 \ell}. \end{align*}$$
Proof. (a): This is simply a restatement of 4.1 part (b).
(b): We will divide the sum depending on the first entry of the partition. By Proposition 4.2, we have the bound (true for any $1 \le j \le n$ )
Combining this with Corollary 3.10, for $j \le \lfloor n/2 \rfloor $ ,
Define
so then if $\ell = (\log n + c)/(\log q  \alpha _{n, q})$ , the bound is
Then summing over all possible j gives
Now, we have to bound the contribution from partitions $\lambda $ with $\lambda _1 \le n/2$ . Because $\beta _{\lambda }$ is monotone with respect to the order on partitions, we have for all $\lambda $ such that $\lambda _1 \le n/2$ ,
since $\lambda \preceq (n/2, n/2)$ (assuming without essential loss that n is even). Then,
using $\sum _{\lambda } f_{\lambda }^2 = n! \le n^n$ .
(c): The sum is bounded below by the term for $\lambda = (n  1, 1)$ . This is
5 Hecke algebra computations
This section proves Theorem 1.6 which describes the transvections Markov chain on $S_n$ as multiplication in the Hecke algebra from Definition 3.1. This is accomplished by careful and elementary row reduction. Our first proof used Hall–Littlewood symmetric functions. It is recorded in the expository account [Reference Diaconis, Ram and Simper19].
5.1 Overview
Let $\mathbb {C}[G]$ denote the group algebra for $G = GL_n(q)$ . This is the space of functions $f: G \to \mathbb {C}$ , with addition defined $(f + g)(s) = f(s) + g(s)$ and multiplication defined by
Equivalently, $\mathbb {C}[G] = \mathrm {span} \{g \mid g \in G \}$ and we can write an element $f = \sum _{g} c_g g$ for $c_g \in \mathbb {C}$ , so $f(g) = c_g$ .
Define elements in $\mathbb {C}[G]$ :