## 1. Introduction

Exploration of the growth of structure in the first billion years of the Universe is a key observational driver for many experiments. One tracer of the conditions within the early Universe is the 21-cm spectral line of neutral hydrogen, which encodes in its brightness temperature distribution details of the radiation field and gas properties in the intergalactic medium permeating the cosmos (Furlanetto, Oh, & Briggs Reference Furlanetto, Oh and Briggs2006; Pritchard & Loeb Reference Pritchard and Loeb2008). Redshifted to low frequencies, the 21-cm line is accessible with radio telescopes (*υ* < 300 MHz), including current and future instruments. These include the Murchison Widefield Array, MWAFootnote ^{a} (Bowman et al. Reference Bowman, Cairns, Kaplan, Murphy and Oberoi2013; Tingay et al. Reference Tingay2013; Jacobs et al. Reference Jacobs2016); the Precision Array for Probing the Epoch of Reionisation, PAPERFootnote ^{b} (Parsons et al. Reference Parsons2010); the LOw-Frequency ARray, LOFARFootnote ^{c} (van Haarlem et al. Reference van Haarlem2013; Patil et al. Reference Patil2016); the Long Wavelength Array, LWAFootnote ^{d} (Ellingson et al. Reference Ellingson, Clarke, Cohen, Craig, Kassim, Pihlstrom, Rickard and Taylor2009), and the future HERA (DeBoer et al. Reference DeBoer2016) and SKA-Low (Koopmans et al. Reference Koopmans2015).

The weakness of the signal, combined with the expectation that most of its information content is contained in the second moment (Wyithe & Morales Reference Wyithe and Morales2007), which is uncorrelated across spatial Fourier wave mode, motivates the use of the power spectrum as a statistical tool for detecting and characterising the cosmological signal. Despite the ease with which the power spectrum can be computed from radio interferometric data, the presence of strong, spectrally structured residual foreground sources (Trott, Wayth, & Tingay Reference Trott, Wayth and Tingay2012; Datta, Bowman, & Carilli Reference Datta, Bowman and Carilli2010; Vedantham, Udaya Shankar, & Subrahmanyan Reference Vedantham, Udaya Shankar and Subrahmanyan2012; Thyagarajan et al. Reference Thyagarajan2015), complex instrumentation (Trott & Wayth Reference Trott and Wayth2016), and imperfect calibration (Patil et al. Reference Patil2014; Barry et al. Reference Barry, Hazelton, Sullivan, Morales and Pober2016), yield power spectra that are dominated by systematics. Thus far, a detection of signal from the Epoch of Reionisation (EoR) has not been achieved (Patil et al. Reference Patil2016; Beardsley et al. Reference Beardsley2016; Trott et al. Reference Trott and Wayth2016; Cheng et al. Reference Cheng2018). These systematics, combined with the expectation that non-Gaussian information can be extracted usefully from cosmological data, lead the discussion for other statistics. The bispectrum, as a measure of signal non-Gaussianity, is one such statistic that contains cosmologically relevant information (Bharadwaj & Pandey Reference Bharadwaj and Pandey2005; Majumdar et al. Reference Majumdar, Pritchard, Mondal, Watkinson, Bharadwaj and Mellema2018; Watkinson et al. Reference Watkinson, Giri, Ross, Dixon, Iliev, Mellama and Pritchard2018), while being relatively straightforward to compute with interferometric data (Shimabukuro et al. Reference Shimabukuro, Yoshiura, Takahashi, Yokoyama and Ichiki2017).

The bispectrum is the Fourier Transform of the three-point correlation function, and extracts higher-order correlations between different spatial scales. Its spatial and redshift evolution can be used to place different constraints on the underlying processes that set the 21-cm brightness temperature, and therefore it provides complementary information to the power spectrum. In an early paper exploring the use of the bispectrum for a model EoR signal, and radio interferometers, Bharadwaj & Pandey (Reference Bharadwaj and Pandey2005) demonstrated that a strong non-Gaussian signal is produced by the presence of ionised regions, and discussed the behaviour of the power spectrum and bispectrum signals as a function of frequency channel separation, although they only consider non-Gaussianity due to the ionisation field modelled as non-overlapping randomly placed spherical ionised regions. Some recent work has explored the combination of bispectrum with other tracers (CII spectral features) to extract clean cosmological information (Beane & Lidz Reference Beane and Lidz2018). The bispectrum has also been used in the single-frequency (angular) case in the CMB community, where non-Gaussianities can be contaminated by structured foregrounds (Jung, Racine, & van Tent Reference Jung, Racine and van Tent2018).

Majumdar et al. (Reference Majumdar, Pritchard, Mondal, Watkinson, Bharadwaj and Mellema2018) explore the ability of the bispectrum to discriminate fluctuations in the matter density distribution from those of the hydrogen neutral fraction, reporting that for some triangle configurations the sign of the bispectrum is a marker for which of these processes is dominating the bispectrum. They show output bispectra for equilateral and isosceles configurations over a range of wavemodes and redshifts, including parameters of relevance to current low-frequency 21-cm experiments (*z <* 9, 0.1 *< k <* 1.0). For modes relevant to the MWA, the bispectrum amplitude fluctuates in sign with wavenumber and triangle geometry (stretched → equilateral → squeezed) with a range spanning 10^{3} − 10^{9} mK^{3}*h* ^{−6} Mpc^{6}. This range of potential signs and amplitudes in measurable modes and redshifts motivates us to study this signal in MWA data.

Watkinson et al. (Reference Watkinson, Giri, Ross, Dixon, Iliev, Mellama and Pritchard2018) provide a useful tool for visualising the correspondence of real-space structures and bispectrum. They highlight that equilateral *k*-vector configurations probe above-average signal concentrated in filaments with a circular cross-section (their Figure 1). Stretched (flattened) *k*-vector triangle configurations (with one *k*-mode larger than the other two), by extension, probe above-average signal concentrated in filaments with ellipsoidal cross-sections (at the extreme these filaments tend towards planes). Finally, squeezed *k*-vector triangle configurations (with one *k*-mode smaller than the other two) correspond to a modulation of a large-scale mode over small-scale plane-wave concentrations of above-average signal, and therefore measure the correlation of the small-scale power spectrum with large-scale modes. Notably, they introduce and explore other bispectrum normalisations that are found to be more stable to parameter fluctuations. In this work, we discuss the relative merits of different bispectrum statistics for use with real data in the presence of real systematics.

Crucially, the switch to positive bispectrum at the end of reionisation occurs as we reach regimes/scales at which the concentration of above-average signal drive the non-Gaussianity. This will occur before the EoR (on scales where the density field is the dominant driver of the temperature fluctuations, or, if the spin temperature is not yet saturated during this phase, when heated regions are driving the non-Gaussianity) and towards the end of reionisation (when islands of 21-cm signal drive the non-Gaussianity).

Conversely, a negative-valued bispectrum will be unique to the phase when ionised regions drive the non-Gaussianity. In general, foreground astrophysical processes are not expected to produce a negative bispectrum, because they are associated with overdensities in the brightness temperature distribution (Lewis Reference Lewis2011; Watkinson & Pritchard Reference Watkinson and Pritchard2014). These factors may play a future important role in discriminating real cosmological non-Gaussianity from contaminants.

Despite some work studying the sensitivity of current and future experiments for measuring the bispectrum (Shimabukuro et al. Reference Shimabukuro, Yoshiura, Takahashi, Yokoyama and Ichiki2017; Yoshiura et al. Reference Yoshiura, Shimabukuro, Takahashi, Momose, Nakanishi and Imai2015), these have used idealised scenarios that omit any residual foreground signal and systematics introduced by the instrument. Bharadwaj & Pandey (Reference Bharadwaj and Pandey2005) discuss foreground fitting tools using frequency separation to study the bispectrum over visibility correlations across frequency, but this method breaks down for large field-of-view instruments where the interferometric response affects the foreground smoothness (Morales et al. Reference Morales, Hazelton, Sullivan and Beardsley2012). Further, no 21-cm interferometric data have been used to estimate the bispectrum. In this work, we address both of these by presenting bispectrum estimators that can use real datasets, computing the expected impact of foregrounds measured by the instrument, and applying the estimators to 21 h of MWA EoR data.

## 2. MWA Phase II Array

The MWA is a 256-tile low-frequency radio interferometer located in the Western Australian desert, on the future site of the Square Kilometre Array (SKA) (Tingay et al. Reference Tingay2013; Bowman et al. Reference Bowman, Cairns, Kaplan, Murphy and Oberoi2013). The telescope operates from 80 to 300 MHz with antennas spread over a 5-km diameter. Its primary science areas include exploration of the EoR, radio transients, solar and heliospheric studies, study of pulsars and fast transients, and the production of a full-sky low-frequency extragalactic catalogue. In 2016 it underwent an upgrade from 128 to 256 antenna tiles (Wayth et al. Reference Wayth2018). At any time, 128 of the tiles can be connected to the signal processing system. The array operates in a ‘compact’ configuration, utilising redundant spacings and short baselines for EoR science, or an ‘extended’ configuration, maximising angular resolution and instantaneous *uv*-coverage. The compact configuration is employed in this work.

The compact configuration has a maximum baseline of 500 m and is optimised for EoR science. Figure 1 shows the tile layout, including the two 36-tile hexagonal subarrays of redundantly spaced tiles. The minimum redundant spacing is 14 m. The primary motivations for the hexagons are twofold: (1) to increase the sensitivity to angular scales of relevance for the EoR, allowing coherent addition of measurements from redundant baselines, and (2) enabling additional methods for calibrating the array (redundant calibration, Li et al. Reference Li2018, Joseph et al. Reference Joseph, Trott and Wayth2018). For the bispectrum, there is an additional advantage of multiple, redundant equilateral triangle baselines being formed from the short spacings. These can be added coherently to study the bispectrum signal on particular scales, and allows for a direct bispectrum measurement (perfectly defined triangles formed from discrete baselines). These direct bispectrum results can be compared to a more general gridded bispectrum, whereby all baselines formed by an irregularly spaced array (such as MWA Phase I, or the non-hexagon tiles of Phase II compact) can be gridded onto the Fourier (*uv*-) plane, using a gridding kernel that represents the Fourier response function of the telescope (in this case, the Fourier Transform of the primary beam response to the sky). These estimators will both be explored in this work.

## 3. Power spectrum

We briefly review the power spectrum as the primary estimator for studying the EoR with 21-cm observations. The power spectrum is typically used to describe radio interferometer observations from the EoR, and contains all of the Gaussian-distributed fluctuation information. The power spectrum is the power spectral density of the spatial fluctuations in the 21-cm brightness temperature field. It is used because it encodes the fluctuation variance (where most of the EoR signal is expected to reside), and sums signal from across the observing volume to increase sensitivity. It is defined as

where $V(\vec k) = V(u,v,\eta ) = {\cal F}{\cal T}\left( {V(u,v,\nu )} \right)$ is the measured interferometric visibility (Jansky), Fourier transformed along frequency (*υ*) to map frequency to line-of-sight spatial scales (Jy Hz) at a given point in the Fourier (*uv*-) angular plane (*u*, *v*); 〈〉 encode an ensemble average over different realisations of the Universe, and the δ* _{D}*-function ensures that we are expecting to measure a Gaussian random field where the different modes are uncorrelatedFootnote

^{e}. Further assuming spatial isotropy allows us to average incoherently in spherical shells, where $\vec k = k$. Ω

*provides the volume normalisation, where Ω*

_{V}*= (BW) Ω is the product of the observing bandwidth and angular field of view. Converting from measured to physical units maps Jy*

_{V}^{2}Hz

^{2}to mK

^{2}

*h*

^{−6}Mpc

^{6}. After volume normalisation this becomes, mK

^{2}

*h*

^{−3}Mpc

^{3}.

### 3.1. Power spectra with radio interferometric data

The power spectrum can be produced naturally with interferometric data. Unlike optical telescopes that produce images of the sky, or single-dish radio telescopes that acquire a single sky power, a radio interferometer visibility (Jy) directly measures Fourier representations of the sky brightness distribution at the projected baseline location (*u* = Δ*x*/λ; *v* = Δ*y/λ*). In the flat-sky approximationFootnote ^{f}:

where *A*(*l*, *m*, *υ*) is the instrument primary beam response to the sky at position (*l*, *m*) from the phase centre and frequency *υ*, *S*(*l*, *m*, *υ*) is the corresponding sky brightness (Jy/sr, which is proportional to temperature), and the exponential encodes the Fourier kernel. The physical correspondence of sky projected on to the tile locations yields a fixed set of discrete but incomplete Fourier modes to be measured. This incompleteness leads to parts of the Fourier plane where there is no information. The line-of-sight spatial scales are obtained by Fourier Transform of visibilities measured at different frequencies, along frequency to map *υ* to *η*:

where *N* _{ch} is the number of spectral channels, Δ*υ* is the spectral resolution, and *j* and *k* index frequency and spatial mode (Hz^{−1}, or seconds).

The attenuation of the sky due to the primary beam (and general sky finiteness) alters the complete continuous Fourier Transform to a windowed transform, whereby the primary beam response leaks signal into adjacent Fourier modes, as can be seen using the convolution theorem:

where the true sky brightness distribution is convolved with the Fourier Transform of the primary beam response. This leakage implies that the visibility measured by a discrete baseline actually contains signal from a region of the Fourier plane, as described by the Fourier beam kernel, *Ã* (*u*, *v*, *η*).

In general, to compute the power spectrum from a large amount of data, we are motivated by the signal weakness to add the data coherently; i.e., we sum complex visibilities directly that contribute signal to the same point in the Fourier *uv*-plane. To do this, the measurement from each baseline is convolved with the Fourier beam kernel and ‘gridded’ (added with a weight) onto a common two-dimensional plane. Signal will add coherently, while noise adds as the square-root (because the thermal noise is uncorrelated between measurements). The weights for each measurement are also gridded with the kernel onto a similar plane. After addition of all the data, the signal *uv*-plane is divided by the weights to yield the optimal-weighted average signal at each point. The resulting cube resides in (*u*, *v*, *υ*) space, and can be Fourier transformed along frequency to obtain a cube in (*u*, *v*, *η*) space. The power spectrum can then be formed by squaring and normalising the cube, and averaging incoherently (in power) in spherical shells:

where *W* are the weights and $|\vec k| = |({k_u},{k_v},{k_\eta })| = \sqrt {k_u^2 + k_v^2 + k_\eta ^2} $.

As an intermediate step, the cylindrically averaged power spectrum can be formed (e.g., Datta et al. Reference Datta, Bowman and Carilli2010):

and ${k_ \bot } = \sqrt {k_u^2 + k_v^2}, \ k_\parallel=k_\eta$. This is a useful estimator for discriminating contaminating foregrounds (continuum sources with power concentrated at small *k* _{||}) from 21-cm signal. Herein we will refer to this power spectrum, and its bispectrum analog, as the ‘gridded power spectrum’ and ‘gridded bispectrum’, respectively.

Alternatively, one can take the baselines themselves, and their visibilities measured along frequency, and take the Fourier Transform directly along the frequency axis. This ‘delay spectrum’ approach is utilised by some experiments with short baselines (Parsons et al. Reference Parsons, Pober, McQuinn, Jacobs and Aguirre2012; Ali et al. Reference Ali2015; Thyagarajan et al. Reference Thyagarajan2015), both to increase sensitivity when there are redundant spacings, and to work as a diagnostic. The frequency and *η* axes are not parallel, except at zero-length baseline. Because an interferometer is formed instantaneously from antennas with a fixed spatial offset, the baseline length in Fourier space (e.g., *u*) evolves with frequency as *u* = Δ*xυ/c*, and this evolution is therefore increased for larger bandwidths and for longer baselines. For the short spacings of interest to the EoR, the correspondence is good, and the delay transform can be used to mimic the direct *k* _{||} transform of gridded data (see Figure 1 of Morales et al. Reference Morales, Hazelton, Sullivan and Beardsley2012, for a visual explanation). In general, ‘imaging’ arrays with many non-redundant spacings are suited to gridded power spectra, whereas redundant arrays, with a lesser number of multiply-sampled modes, are suited to delay power spectra. For the Phase II compact MWA, the two hexagonal subarrays have these short-spaced redundant baselines, and the ‘delay power spectrum’ and its bispectrum analog can also be used effectively. In general, we would not suggest use of the delay spectrum to undertake EoR science, because of the limitations discussed, but it is the appropriate analogue for the direct bispectrum estimator, and is therefore pertinent for the normalised bispectrum analysis.

## 4. Bispectrum

The bispectrum is the Fourier Transform of the three-point correlation function. Akin to the two-point correlation function (the Fourier dual of which is the power spectrum), the three-point correlation function measures the excess signal over that of a Gaussian random field distribution measured at three spatial locations, averaged over the volume. For a field with Fourier Transform denoted by $\Delta (\vec k)$, the bispectrum is formed over closed triangles of *k* vectors in Fourier space:

Here the δ* _{D}*-function ensures closure in Fourier space. It has units of mK

^{3}

*h*

^{−6}Mpc

^{6}after volume normalisation. The bispectrum is often applied to matter density fields, where $\Delta (\vec k)$ is the Fourier Transform of matter overdensity, $\delta (\vec x) = {{\rho (\vec x)} \over {\overline \rho }} - 1$. In radio interferometric measurements, the coherence of the wavefront (the visibilities obtained by cross-correlating voltages from individual antennas) represents the Fourier Transform of the sky brightness temperature distribution, measured in Jansky.

As discussed earlier, this bispectrum estimator can be unstable, with cosmological simulations showing rapid fluctuations between positive and negative values as non-Gaussianity becomes negligible but the amplitude is still large. As such, Watkinson et al. (Reference Watkinson, Giri, Ross, Dixon, Iliev, Mellama and Pritchard2018) suggest the normalised bispectrum as a more stable statistic:

where $P(|\vec k|)$ is the three-dimensional power spectrum, which describes the volume-normalised variance on a given spatial scale, and is the Fourier Transform of the two-point correlation function (Eggemeier & Smith Reference Eggemeier and Smith2017; Brillinger & Rosenblatt Reference Brillinger, Rosenblatt and Bernard1967). This normalisation isolates the contribution from the non-Gaussianity to the bispectrum, by normalising out the amplitude part of the statistic. It is akin to normalising the third central moment by *σ* ^{3} to calculate the skewness.

### 4.1. Bispectrum with radio interferometric data

Because the bispectrum is formed from the triple product of a triangle of wavespace measurements, it can be formed directly through the product of three interferometric visibilities. In the limit where the array has perfect (complete) *uv*-sampling, individual measurements of signal on triangles of baselines can be multiplied to form the bispectrum estimate. In the more general case, where an interferometer has instantaneously incomplete, but well-sampled baselines, there are two options for extracting the triangles of signal measurements: direct (via multiplication of measurements from three tiles forming a triangle of baselines), or gridded, where each *uv*-measurement is gridded onto the *uv*-plane (with its corresponding Fourier beam kernel and weights), and the final bispectra are computed from the fully-integrated and gridded data.

Direct bispectrum estimators can be applied to specific triangles according to the array layout, but these are usually unique, with irregular configurations (all three internal angles are distinct), leading to difficult cosmological interpretation and poor sensitivity. These issues arise for imaging-like arrays with pseudo-random layouts, but are alleviated for redundant arrays, where regular triangles (isosceles and equilateral) exist and are instantaneously available in many copies in the array. These features make interpretation more straightforward and increase sensitivity to these bispectrum modes.

Gridded bispectrum estimators can be applied to any array, yield improved sensitivity by coherent gridding of data, and may allow for a wider range of triangles to be probed. Nonetheless, they suffer from the increased difficulty of extracting robust estimates that correctly account for the correlation of data in *uv*-space.

With the benefit of having a redundant array, we will apply both sets of estimators to our data.

## 5. The gridded estimator

Each measured visibility encodes information about a small range of Fourier modes of the sky brightness distribution. Although each baseline is usually reported as a single number representing the antenna separation measured between antenna centres, the baselines actually measure a range of separations when accounting for the actual physical size.Footnote ^{g} This translates to a range of Fourier modes being measured by a given baseline, and is equivalent to the statement that a finite primary beam response to the sky mixes Fourier modes through spectral leakage (effectively a taper on the continuous Fourier transform). Thus, when measurements from different baselines are combined coherently (with phase information) onto a *uv*-plane, they can be gridded with a kernel that is the Fourier Transform of the primary beam response to the sky. Such a gridding kernel captures the degree of spectral leakage introduced by the antenna response, and means that baselines of similar length and orientation have some shared information. The gridding kernel is represented by *Ã* in Equation (6). With a single defined visibility phase centre, all visibility measurements can be added with this beam kernel onto a single plane (for each frequency channel), along with their associated weights, to form a coherently averaged estimate for the Fourier representation of the sky brightness temperature:

where *i* indexes measurement and *W* is the weight associated with each. The bispectrum is then estimated as the sum over the beam-weighted gridded visibilities:

where

are the beam-gridded measurement weights.

### 5.1. Gridded estimator noise

The gridded bispectrum estimator is formed from coherent addition of visibilities over *all* observations. As such, if a given visibility has thermal noise level *σ* _{therm} (Jy Hz)Footnote ^{h}, the uncertainty on the bispectrum is

where the denominator is the sum over the gridding kernel of the weights triplets.

The uncertainty on the normalised bispectrum is then:

where the uncertainties can contain both thermal noise and noise-like uncertainty from residual foregrounds.

For 300 2-min observations, and 24 triangles per 28 m baseline triad group, the expected thermal noise level for a complete dataset is

The presence of residual foregrounds will be studied in Section 10.2.

## 6. The direct estimator

As an alternate approach to the gridded estimator, visibilities are Fourier-transformed along the frequency direction to compute the delay transform, and closed bispectrum triangles formed from the closed redundant triads of antennas. This approach does not use the primary beam and ignores the local spatial correlations generated by the primary beam spatial taper. It also transforms along a dimension that changes angle with respect to *k* _{||} as a function of baseline length, but approximates a *k* _{||} Fourier Transform for small *u* (small angle).

The bispectrum for a given observation is the weighted average over all triads:

where *i* indexes over redundant triangles (triads). The final bispectrum estimate then performs a weighted average over observations, such that:

where

### 6.1. Direct Estimator noise

The direct bispectrum estimator is formed from coherent addition of baseline triplets for a given observation, which are then averaged with relative weights to the final estimate. As such, if a given visibility has thermal noise level *σ* _{therm} (Jy Hz), the uncertainty on the bispectrum is

The uncertainty on the normalised bispectrum is then given by the same expression as for the Gridded Estimator [Equation (15)].

For 300 observations, and 24 triangles per 28 m baseline triad group, the expected noise level is

## 7. Triangles considered for estimation

Unlike bispectrum estimates that can be obtained from Phase I data, where the array is in an imaging configuration with no redundant triangles, we aim to take advantage of the 72 redundant tiles in the hexagonal sub-arrays, afforded by the Phase II layout. This allows for both direct and gridded bispectrum estimators to be applied to matched observations with matched data calibration.

The most numerous (highest sensitivity) groups of redundant triangles are the *angularly*-equilateral configurations of the 14 m and 28 m baselines (48 and 24 sets, respectively). For these triangles, the equilateral configurations exist only for the *η* = 0 (*k* _{||} = 0) line-of-sight mode. Other configurations of these closed angular triangles are isosceles or irregular triangles, depending on the *η* values chosen; however, the closed triangle requirement of the bispectrum demands that:

in addition to the angular components of the vectors summing to zero (as is enforced by choosing the closed triangle baselines).

For comparison with theoretical predictions, we will focus on equilateral and isosceles triangles. The 14 m and 28 m baselines are very short, corresponding to cosmological scales of *k* _{⊥} ≃ 0.01*h*Mpc^{−1} at *z* = 9. Thus, although the equilateral configuration is cosmologically relevant and the easiest to interpret, these modes are expected to be heavily foreground dominated (i.e., they correspond to the line-of-sight DC mode, and the large angular scales of diffuse and point-source foreground emission). We consider them for completeness, but will show them to be cosmologically irrelevant from an observational perspective when computed this way. These same angularly-equilateral triangle configurations will, however, be used to form relevant isosceles configurations with *η* _{1} = *η* _{2} and *η* _{3} = −2 *η* _{1}. Given that we aim to sample modes where foregrounds are not dominant in our power spectra, these isosceles configurations form ‘stretched’ (also referred to ‘flattened’ in Watkinson et al. Reference Watkinson, Giri, Ross, Dixon, Iliev, Mellama and Pritchard2018) configurations (*k* _{||} *> > k* _{⊥}). Figure 2 shows how the stretched isosceles configurations are extracted from the data with a redundant baseline triad. Figure 3 then shows schematically the approximate vectors for two of the four isosceles configurations considered here.

## 8. Observations

The direct and gridded estimators are applied to 21.0 h of Phase II high-band zenith-pointed data, comprising 10.7 h (320 observations) on the EoR0 field (RA = 0 h, Dec. = −27°) and 10.3 h (309 observations) on the EoR1 field (RA = 4 h, Dec. = −30°). We observe 30.72 MHz in 384 contiguous 80 kHz channels, with a base frequency of 167.035 MHz. Approximately 15% of the observations were obtained from drift-scan data, where the telescope remains pointed at zenith for many hours and the sky drifts through. For consistency with the drift-n-shift data, we chose drift scan data observed with the field phase centres within 3° of zenith. The data were observed over 5 weeks from 2016 October 15 to November 28, and 1 week in 2017 July. Because the delay spectrum is used as part of the power spectrum estimator for the direct bispectrum, each observation was individually inspected for poor calibration or data quality, and bad observations excised from the dataset. The excised observations comprised ∼ 5% of the dataset, and primarily were due to poor calibration solutions over sets of data contiguous in time due to poor instrument conditions (e.g., many flagged tiles or spectral channels).

The 2-min observations were each calibrated through the MWA Real-Time System (RTS; Mitchell et al. 2008), as is routinely performed for MWA EoR data, and 1 000 of the brightest (apparent) sources peeled from the dataset (Jacobs et al. Reference Jacobs2016). These 629 calibrated and peeled observations were used for bispectrum estimation.

## 9. Results

We begin by reporting the bispectrum estimates for the two methods and fields, and then report the normalised bispectra, which incorporate the power spectrum estimates. Table 1 shows the bispectrum estimates and their one-sigma uncertainties (thermal noise) for the direct and gridded estimators, both observing fields and for different triangle configurations. Bold-faced results indicate bispectrum estimates that are consistent with thermal noise. These tend to be those that are extremely stretched isosceles configurations (cos *θ ∼* 1), with estimates that sit well outside the primary foreground contamination parts of parameter space. Conversely, the equilateral triangle configurations that use the *k* _{||} = 0 mode exclusively show extremely large detections. There is no suggestion that these are 21-cm cosmological bispectrum detections, but rather are foreground contaminants. This will be explored more fully in Section 10. Note also that the thermal noise levels reported here are a factor of a few larger than the theoretical expectation derived in Sections 5 and 6. This is due to the fraction of data with weights that are less than unity, indicating flagged baselines and spectral channels.

Also listed in Table 1 is the expected bispectrum values from simulations that assume either bright or faint galaxies drive reionisation (Greig & Mesinger Reference Greig and Mesinger2017). The largest amplitudes are for the smallest *k*-modes, which also tend to be more foreground dominated.

The normalised bispectrum, $\cal {B}$, is normalised by the power spectra at each of the *k* modes forming the triangles. Figure 4 shows the power spectra for the EoR1 and EoR0 fields for the full datasets as used in the gridded estimator. These have been processed through the CHIPS power spectrum estimator (Trott et al. Reference Trott and Wayth2016). Figures 5 and 6 show the corresponding delay spectra, as used in the direct bispectrum. There are small differences between the two power spectrum estimators, as is expected given that delay spectra do not grid with primary beams, and Fourier Transform along frequency, yielding different results for longer baselines. The signature of Galactic emission from close to the horizon is evident in the EoR0 power spectra, while it is less structured in EoR1, where the Galactic Centre has set. Most notably, the delay spectra show large foreground leakage into the EoR window (*k* _{||} *<* 0.4), yielding large power spectrum denominator values for the normalised bispectrum.

Using these data, Table 2 describes the normalised bispectrum. Bold-faced results are broadly consistent with thermal noise (*<* 5*σ*), again reflecting the modes that are least affected by foregrounds. The difference between the dimensional and reduced bispectrum results is due to the different power spectral estimators. Also notable is the difference in amplitude of the gridded and direct normalised bispectrum estimates. Due to the division by the power spectrum, the normalised bispectrum is heavily dependent on the details of the power spectrum estimates, which fluctuate substantially in foreground-affected regions. The delay-space power spectra show increased foreground power in the EoR window, and this is reflected in a larger power spectrum estimate, and therefore a lower normalised bispectrum. This reliance highlights the complexity for interpreting the normalised bispectrum with foreground-affected data.

## 10. Bispectrum signature of foregrounds

Estimates of bispectrum sensitivity for operational and future 21-cm experiments are incomplete without a treatment of foregrounds. Despite the expectation that point source, continuum foregrounds only impact a region of the three dimensional EoR parameter space (*k _{x}*,

*k*,

_{y}*k*

_{||}), in reality the details of the instruments, complexity of extragalactic and Galactic emission, limited bandwidth and calibration errors leave residual contaminating signal throughout the full parameter space. Although these methods perform very well to remove such signal, the extreme dynamic range demanded by this experiment translate to bias that exceeds the expected cosmological signal strength. The results presented here are clearly foreground-dominated, particularly for the equilateral triangle configuration.

As such, the bispectrum signature of foregrounds can be computed for a simple point-source foreground model. We first consider the expected foreground bispectrum, which quantifies the bias in the measurement, and then turn to the variance of the foreground bispectrum, which quantifies the additional noise term.

We employ a model where the sky is populated with a random distribution of unresolved extragalactic point sources that follow a low-frequency number counts distribution (Intema et al. Reference Intema, van Weeren, Röttgering and Lal2011; Franzen et al. Reference Franzen2016):

where *α* ≃ 3900 and *β* = −1.59 for sources with flux density at 150 MHz of less than 1 Jansky. We assume there is no source clustering and spectral dependence, yielding a Poisson-distributed number of sources in each differential sky area.

The clustering of point sources in the power spectrum has been studied by Murray, Trott, & Jordan (Reference Murray, Trott and Jordan2017). They find that source clustering will be unimportant for the MWA (unless the clustering is extreme, which is not measured), but may be important for the SKA, which can clean to deeper source levels. Nonetheless, the structure due to clustered point-source foregrounds only changes the amplitude of the foreground structure in the EoR wedge as a function of angular scale (*k* _{⊥}). Because the line-of-sight spectral component is unaffected, the signature of clustered foregrounds in the EoR Window is mostly unchanged. These more realistic point-source foregrounds will be considered in the simulations of Watkinson & Trott (Reference Watkinson and Trott2019) and here we retain the analytic signature of the Poisson foregrounds.

We further assume that the primary beam can be approximated by a frequency-dependent Gaussian:

where A_{eff} is the tile effective area and ϵ encodes the conversion from an Airy disc to a Gaussian.

The visibility is given by Equation (4) for frequency *υ*. To compute the line-of-sight component to the visibility, we Fourier Transform over frequency channels, after employing a frequency taper (window function) to reduce spectral leakage from the finite bandwidth:

whee γ(*υ*) is the spectral taper, Δ*υ*. is the channel resolution, and we have performed the Fourier Transform over frequency. For analytic tractability, in this work we use a Gaussian taper, with a characteristic width, Σ ≃BW/7, such that the edges of the band are consistent with zero and it is well-matched to a Blackman–Harris taper:

with corresponding Fourier Transform,

The bispectrum is formed from the triple product of visibilities. Accounting for the fact that the point sources are only correlated locally (δ* _{D}*(

*l*

_{1}+

*l*

_{2}+

*l*

_{3}= 0)), its expected value with respect to foregrounds is

where

Here, the source counts have been separated from the spatial integral. This is a general expression for a triplet of baselines. We can now simplify this for triangles, particularly those with isosceles configurations (where the equilateral is a single case of an isosceles).

Closed triangles follow the relations:

and we define, without loss of generality, the following relations for the isosceles configurations considered in this work:

Making these substitutions in Equation (29), completing the squares and collecting terms, we find:

The source count expectation value uses the source number counts distribution and the fact that the number of sources at any sky location is Poisson-distributed to find:

Incorporating the primary beam from Equation 24, moving to polar coordinates, and performing the integral over (*l*, *m*), we find for the expected foreground bispectrum bias:

where

and BW is the experiment bandwidth. This factor combines the primary beam (spatial taper) and spectral taper components into a single factor.

The equilateral configuration can be derived from this expression with *η* _{2} = 0. For the 28-m baselines, a maximum source flux density of 1 Jy and A_{eff} = 21 m^{2}, and performing the cosmological conversions, we expect a bispectrum estimate of:

which is comparable to the estimates found in Section 9. For 14-m baselines, *B*(*x* = 14) ≃ 1.0 ×10^{20}mK^{3}*h* ^{−6}Mpc^{6}.

The isosceles configurations incorporate the *η* term. For *k* _{||} > 0.1, this term decays to below the noise, which is consistent with that observed in the data. The signature of this isosceles foreground dimensional bispectrum in *k* _{⊥} - *k* _{||} space is shown in Figure 7. For the *k* _{1} = 0.1 *h* Mpc^{−1} stretched configuration, we expect for 28 m (14 m):

The squeezed configurations of large *k* _{⊥} combined with small *k* _{||} might be interesting for future studies, depending on the expected cosmological signal on these scales. Given that the power spectrum is expected to be small on these combination of line-of-sight and angular scales, most EoR experiments are not designed for high sensitivity here (*k* _{⊥} = 0.1 corresponds to 200-m baselines).

Interestingly, the expected foreground bispectrum signal is positive, due to its constituent astrophysical sources being associated with overdensities. Conversely, the stretched isosceles 21-cm bispectrum from the cosmological signal will be negative on many scales during reionisation (Majumdar et al. Reference Majumdar, Pritchard, Mondal, Watkinson, Bharadwaj and Mellema2018).

### 10.1. Normalised foreground bispectrum

The normalised bispectrum also contains the expected power spectrum values for a foreground model. In line with the methodology developed in the previous section, we can write the expected power spectrum at (*u*, *v*, *η*) as

where

encode the spatial and spectral tapers, and |*x*|^{2} = *x* ^{2} + *y* ^{2} (without loss of generality). This expression is derived from the Fourier Transform over Gaussians, and then the integral over *dldm*.Footnote ^{i}

When *η* = 0 and for the 28-m baseline triangles, the expected bispectrum normalisation is

For the 14-m triangles, we find $\sqrt {P{{(u,v,\eta )}^3}} /V = 3.3 \times {10^{21}}{\kern 1pt} {\kern 1pt} {\rm{m}}{\rm{K}^3}{h^{ - 6}}{\rm{M}}p{c^6}$. When compared with the expected bispectrum value, we find that (28 m):

and $\left\langle \cal B \right\rangle = 0.6$ for the 14-m baselines, which exceed the equilateral triangle configuration estimates from the MWA data. As with the bispectrum estimate, the isosceles configurations have expected power values that fall rapidly with *η*, and are less comparable to the data in these idealised scenarios. However, for the *k* _{1} = 0.1*h* Mpc ^{−1} stretched configuration, we expect for 28 m (14 m):

These values are ratios of very small numbers, and therefore are highly dependent on numerical details and are not representative. However, they may lend support to the idea that the normalised bispectrum is difficult to interpret because it relies on foreground details in both the bispectrum and power spectrum.

Alternatively, the combination of power spectrum, dimensional bispectrum and normalised bispectrum may help to shed additional light on whether data are really foreground free. Given the different behaviour of foregrounds in these statistics, this information may be used to discriminate cosmological information from foregrounds, or to help to design some iterative foreground cleaning algorithm, taking into consideration their behaviour in cosmological simulations. In this scenario, the normalised bispectrum may provide useful information.

Figure 8 displays the normalised foreground bispectrum for isosceles configurations in *k* _{⊥} − *k* _{||} space. For the point-source foregrounds, the power spectrum denominator dominates over the expected bispectrum signal, yielding values *<* 10^{−3} across all of parameter space. This presents an interesting divergence from the usual expectation of foreground bias in the power spectrum, where foregrounds add overall signal. In this case, a large measurement that exceeds the thermal noise is consistent with a cosmological origin, and not with residual foregrounds.

### 10.2. Foreground bispectrum error

We now turn our attention to consideration of the signal variance due to residual foregrounds, $\langle B_{{\rm{F}}G}^2\rangle $, such that [cf., Equation (14)]:

and

This reduces to a relatively simple expression for the simple point-source case, due to the cancelling of complex components (this is not generally true for the covariance). Using the same formalism as earlier, and again considering the Poisson-distributed nature of the flux density of the sources, we find:

where $\vec \Upsilon \equiv \Upsilon ({\nu _1})\Upsilon ({\nu _2})\Upsilon ({\nu _3})\Upsilon ({\nu _4})\Upsilon ({\nu _5})\Upsilon ({\nu _6})$. This expression is flat in angular modes, and decays rapidly in line-of-sight modes.

Comparing this with the expected value of the foreground bispectrum, Equation (37), we can form the foreground bispectrum signal-to-error ratio (Figure 9). The signal bias exceeds the uncertainty for small scales, but on the larger scales of interest for EoR, the uncertainty dominates. Nonetheless, there is no line-of-sight dependence, demonstrating that the foreground bias and uncertainty both drop rapidly and are negligible for *k* _{||} > 0.12*h*Mpc^{−1}, implying that for larger *k* _{||} scales, point-source foregrounds are not significant in the signal or noise budget.

One can also now compare the foreground uncertainty to the expected thermal noise level. For the EoR1 field data, the measured uncertainty for the direct bispectrum estimator was 6.7 × 10^{12} mK^{3} Mpc^{6}. Figure 10 shows this level (green line) compared with the foreground bispectrum error (red line) as a function of line-of-sight scale. (The gridded estimator has slightly lower noise level, but the distinction is not significant when compared to the large gradient of the foreground contribution.) As with the foreground bias, the error induced by residual foregrounds drops steeply beyond *k* _{||} = 0.12*h*Mpc^{−1}, and falls below the thermal noise (even in this case with a small dataset for the EoR1 field).

As a final step to assessing the advantages of the bispectrum compared with the power spectrum to detect the cosmological signal, we divide the expected 21-cm bispectrum values for the faint and bright galaxies, presented in Table 1 by the foreground error for these modes, and compare with that for the power spectrum. The power spectrum values for faint and bright galaxies are taken from the same underlying dataset generated by 21-cm FAST (Mesinger, Furlanetto, & Cen Reference Mesinger, Furlanetto and Cen2011). For all but the *k* _{1} = 0.1*h*Mpc^{-1} mode, the foreground uncertainty is negligible, and the ratio is uninteresting. For *k* _{1} = 0.1*h*Mpc^{-1}, *k* _{2} = *k* _{3} = 0.2*h*Mpc^{-1}:

yielding better performance for the power spectrum, within the foreground dominated region.

However, outside of the foreground ‘wedge’, which exists in both power spectrum and bispectrum space, the data uncertainty is limited by the thermal noise, and here the bispectrum achieves higher signal-to-noise ratio for a set observation time:

Taking the ratios we find,

*Accounting for the fact that the gridded bispectrum averages down with t* ^{1.5} *while the gridded power spectrum averages with t*, the observing time multiple (above 10 h) for a detection (SNR = 1) is

Therefore, the bispectrum detection can theoretically be achieved in a fraction of the time of the power spectrum detection, for thermal noise-limited modes close to the EoR wedge. An SNR = 1 detection level could potentially be reached in 250 h, *for this wave mode.* This conclusion is relevant for the MWA, where the excellent instantaneous *uv*-coverage allows for rapid observation of triangle configurations. [We note that the power spectrum SNR shown here is not inconsistent with previous expectations for the performance of the MWA, because it applies only to this single mode (Beardsley et al. Reference Beardsley2016; Wayth et al. Reference Wayth2018)]. Future work presented in Watkinson & Trott (Reference Watkinson and Trott2019) will explore a more full range of triangle configurations and foreground bias and error.

## 11. Discussion and conclusions

As discussed, the model used for the foreground bispectrum signal predicts that isosceles configurations have amplitudes that fall rapidly with non-zero *k* _{||}. Nevertheless, we find that the numerator and denominator of the normalised bispectrum scale such that its amplitude in this model increases as a function of *η*. For the 28-m baselines, the ratio doubles by *k* _{||} > 0.014*h* Mpc ^{−1}. Over the same *k* range, the dimensional bispectrum is rapidly decaying, losing seven orders of magnitude from the *k* _{||} = 0 mode. In this model, the expected foreground signal has fallen to below the expected cosmological signal value by *k* _{||} ≥ 0.12. The results from the MWA datasets have some modes thermal noise-limited at 10 h, and only the *k* _{||} = 0 mode is clearly foreground dominated for all experiments. In line with the discussion of Bharadwaj & Pandey (Reference Bharadwaj and Pandey2005), it is possible that the dimensional bispectrum is less affected by foregrounds than the power spectrum. However, the normalised bispectrum is more difficult to interpret, given the different observational foreground effects on the bispectrum numerator and power spectra denominator.

Reduction of foreground contamination is an active field of research in 21-cm EoR experiments, and primary motivator for testing statistics other than the power spectrum. Despite the normalised bispectrum providing a cosmologically stable and robust estimate of non-Gaussianity compared with the dimensional bispectrum, the expected foreground value is difficult to discriminate from the expected signal value (Watkinson et al. Reference Watkinson, Giri, Ross, Dixon, Iliev, Mellama and Pritchard2018). Conversely, the dimensional bispectrum yields values that are highly significant detections, showing clear foreground contamination. Thus, the normalised bispectrum may not be the best discriminant in real EoR experiments. For future experiments, with higher sensitivity, exploration of modes with negative bispectra may help discrimination from foreground contamination, where the bispectrum is expected to be positive This is explored further in Watkinson & Trott (Reference Watkinson and Trott2019) and demonstrated previously in Lewis (Reference Lewis2011). It would also be interesting to study the signature of calibration errors in radio data on bispectrum estimates, to explore whether they have an imprint that can be discriminated from the cosmological signal.

The thermal noise levels, as are achieved in these 10 h datasets for large *k* isosceles configurations, are 3–4 orders of magnitude larger than the expected bispectrum value for these configurations at low redshifts (Majumdar et al. Reference Majumdar, Pritchard, Mondal, Watkinson, Bharadwaj and Mellema2018). The gridded bispectrum noise scales with observation time to the power of 1.5, requiring a 1 000 h observation with the MWA to achieve a cosmological detection. This estimate is in line with predictions from Yoshiura et al. (Reference Yoshiura, Shimabukuro, Takahashi, Momose, Nakanishi and Imai2015). Further advantage may be gained from incoherent addition of isosceles triangle configurations with similar vector lengths, where the bispectrum is expected to vary slowly with changing parameters. An initial test of this for the *k* _{1} = 0.1*h* Mpc^{−1} mode shows an improvement in sensitivity by a factor of ten for the gridded estimator, yielding a theoretical detection of the signal with 150 h of data. The direct estimator scales incoherently with time (*t* ^{0.75}), due to the incoherent addition of triangles from different observations, but does utilise coherent addition of instantaneously-redundant triads.

We have presented the first effort to estimate the cosmological bispectrum from the EoR with 21 h of MWA data, and have shown the parts of parameter space that are consistent with thermal noise at this level, using two types of bispectrum estimator. These two approaches are presented in order to demonstrate the practicalities of estimation of the bispectrum with real radio interferometer data. We have also derived a form for the expected bispectrum signature of point-source foregrounds for equilateral and isosceles configurations, and demonstrated broad consistency between the analytic model and the estimates obtained from the data.

By considering the foreground bispectrum variance in the noise estimation, we have demonstrated that both the foreground bias and variance are insignificant for *k* _{||} *<* 0.12*h* Mpc^{−1}, allowing these regions of parameter space to be probed with dominant thermal noise. Due to the ability of the gridded bispectrum estimator to reduce thermal noise proportional to *t* ^{1.5}, unlike the power spectrum which reduces with *t*, the 21-cm cosmological bispectrum may be detectable with fewer observing hours than the power spectrum for arrays with excellent instantaneous *uv-* coverage (i.e., with well-sampled baselines). This insight makes observational pursuit of the bispectrum worthwhile for some current instruments.

In a companion paper (Watkinson & Trott Reference Watkinson and Trott2019), we explore optimal triangles to study from a signal and foreground contamination ratio perspective. Future work can also study the signature of calibration errors on the bispectrum. This work helps to define the optimal observational strategy and approach to bispectrum studies.

## Acknowledgements

This research was supported by the Australian Research Council Centre of Excellence for All Sky Astrophysics in 3 Dimensions (ASTRO 3D), through project number CE170100013. CMT is supported by an ARC Future Fellowship under grant FT180100196. CAW and SM acknowledge financial support from the European Research Council under ERC grant number 638743-FIRSTDAWN (held by Jonathan R. Pritchard). KT’s work is partially supported by Grand-in-Aid from the Ministry of Education, Culture, Sports, and Science and Technology (MEXT) of Japan, No. 15H05896, 16H05999 and 17H01110, and Bilateral Joint Research Projects of JSPS. The International Centre for Radio Astronomy Research (ICRAR) is a Joint Venture of Curtin University and The University of Western Australia, funded by the Western Australian State government. The MWA Phase II upgrade project was supported by Australian Research Council LIEF grant LE160100031 and the Dunlap Institute for Astronomy and Astrophysics at the University of Toronto. This scientific work makes use of the Murchison Radio-astronomy Observatory, operated by CSIRO. We acknowledge the Wajarri Yamatji people as the traditional owners of the Observatory site. Support for the operation of the MWA is provided by the Australian Government (NCRIS), under a contract to Curtin University administered by Astronomy Australia Limited. We acknowledge the Pawsey Supercomputing Centre which is supported by the Western Australian and Australian Governments.