The Murchison Widefield Array (MWA; Tingay et al. Reference Tingay2013; Wayth et al. Reference Wayth2018) is one of the low-frequency precursors of the Square Kilometer Array (SKAFootnote a). It is located at the Murchison Radio-astronomy Observatory (MRO), the future site of the low-frequency component of the SKA (SKA-Low). The MWA began its operations in 2013 and since then has recorded over 34 petabytes of visibility (output from the MWA correlator) and voltage capture system (Tremblay et al. Reference Tremblay2015) data, which are archived in the Pawsey Supercomputing Centre (PSC). Based on the MWA Data Access PolicyFootnote b, the data become publicly available 18 months after collection or immediately after collection for the members of the MWA collaboration. As a consequence, most of the archival visibility data (approximately 19.3 PB, representing 79% of the total) are now (as of March 2020) publicly available. Public data have been available via the MWA All-Sky Virtual Observatory (MWA ASVOFootnote c) since its initial pilot release in 2017. The pilot MWA ASVO interface enabled users to download raw MWA data in standard radio astronomy data formats such as CASA measurement sets (McMullin et al. Reference McMullin, Waters, Schiebel, Young, Golap, Shaw, Hill and Bell2007) or UV FITS files (Greisen Reference Greisen2019). The data sets returned to the end user are flagged for radio-frequency interference (RFI) using aoflagger software (Offringa et al. Reference Offringa, van de Gronde and Roerdink2012; Offringa et al. Reference Offringa2015) and averaged in time and frequency according to the requirements specified in the web interface or request file. However, these data are not calibrated and require several additional steps before sky images could be formed, which could have been difficult for users not familiar with MWA data calibration. Here, we present a calibration extension of the MWA ASVO, opening a new avenue for any researcher worldwide without deep knowledge of the details of the MWA instrument and data processing to download calibrated visibility data in the aforementioned formats.
The calibration database (CALDB) has been populated with calibration solutions from the entire history of MWA observations. In order to do it, we developed a dedicated calibration pipeline. The newly collected calibration observations are automatically processed in near-real time, and the resulting calibration solutions are uploaded to the CALDB. This enables us to monitor the stability of their phase and amplitude components, that is, the inteferometric performance of the MWA, which allows the MWA Operations team to identify problems which may go undetected by other components of the Monitor and Control (M&C) softwareFootnote d.
This paper is organised as follows. In Section 2, we present the structure of the CALDB. In Section 3, we describe software pipelines developed to populate the database with calibration solutions derived from archived calibrator observations since 2013, and a near-real-time version of this pipeline which is used to process new calibrator observations (collected just after sunset and before sunrise). We also present a system developed for handling requests for missing calibration solutions, and web services developed to download calibration solutions for a specific observation via web browser or command line (wget command). In Section 4, we describe the applications of the CALDB, such as the MWA ASVO, monitoring of the interferometric performance of the MWA, providing calibration solutions to the new MWA correlator, and potential future applications for transient and ionosphere monitoring. Finally, in Section 5, we make summarising remarks and discuss the importance of these developments in the context of the future SKA-Low telescope.
2. Database of calibration solutions
The overview of the current MWA ASVO system is shown in Figure 1. The newly added calibration component consists of the CALDB, and several scripts and pipelines for: populating this database with calibration solutions, accessing the database, and applying solutions to uncalibrated data downloaded from the MWA archive at the PSC.
2.1. Database structure
The CALDB has been implemented as a PostgreSQL databaseFootnote e, which is an advanced open-source relational database and has also been used for storing other M&C MWA data. Presently, the MWA can record up to 30.72 MHz of bandwidth split into 24 coarse channels of 1.28 MHz each. Figure 2 shows phase of calibration solutions in the frequency range 70–230 MHz computed from Pictor A observations recorded between 10:32 and 10:48 UTC on the 2020 March 24. This figure shows that phase as a function of frequency is very well modelled by a linear function as unaccounted delays (due to cables or fibres) are the main contributors to the MWA calibration terms. Therefore, fitted parameters of a linear function provide a compact and efficient way of storing phase of calibration solutions, which is also robust against any ripples caused by reflections in cables (e.g., Tile 051 in Figure 2 or Figure 4) or inaccuracy of the sky model used in the calibration process. This way four double precision values (two for each polarisation) are preserved for each tileFootnote f instead of two times the number of fine channels (typically 768). Amplitude, on the other hand, can have complicated frequency structure related to the MWA tile bandpasses. However, locally (in a sufficiently narrow frequency band), it can also be approximated by a linear function; and the natural choice of the narrow band for the linear fit is the 1.28 MHz MWA coarse channel. Therefore, the database was designed to store parameters of low-order (first-order) polynomials fitted to amplitudes and phases of calibration solutions as a function of frequency. The original calibration solutions are stored on a hard drive and are not inserted into the database in order to keep the database compact. We envisage that this approach will likely continue in the future. The CALDB is a part of the MWA M&C schema and consists of the following three tables (Figure 3):
calibration_fits: it provides versioning of calibration solutions stored in the database. It enables uploading newer (possibly better) calibration solutions without the necessity of removing the older versions from the database. The table contains its own unique identifier (fitid field), a reference to observation identifier (obsid field), and a timestamp of calibration solution (fit_time field).
calibration_solutions: each record of this table contains information about calibration solutions for both polarisations (X and Y) of a single MWA antenna. Besides references to calibration_fits record (fitid field) and the observation identification field (obsid), it contains calibration fields for both polarisations with names differing by prefix in field names (x_ or y_). Slope (fields x_delay_m and y_delay_m) and intercept (fields x_intercept and y_intercept) fitted to the phase of calibration solutions are used to describe the phase of the calibration solution over the entire 30.72 MHz of the MWA’s instantaneous bandwidth, which can be either continuous or non-continuous (the latter is commonly called ‘picket fence’ mode). The fitted slopes are converted to length units using speed of light in vacuum and these values are stored in the database. First-order polynomial fits (thus two database fields) are sufficient to accurately fit the phase over the full observing band provided an accurate sky model is used in the calibration process (this will be described in Section 3.2), which was verified during development and testing. Amplitudes of calibration solutions were also fitted with a linear function, but in this case the fit was performed over every 1.28 MHz coarse channel. Therefore, they are stored as two arrays (for X and Y polarisations) of real values (typically 24, but the arrays are of variable size). This table also contains quality flags, which are real values in [0, 1] range (x_phase_fit_quality and x_gains_fit_quality for X polarisation). These flags are calculated as ratios of the number of ‘good’ channels, where the difference between the original value of a calibration solution (either phase or amplitude) and the fitted curve is smaller than five standard deviations ( $5\sigma$ ), to the number of all channels. Ratio values above 0.6 are considered good quality calibration solutions.
calsolution_request: it enables requests for missing calibration solutions. If the user requests calibrated data which do not have corresponding calibration solutions for the same frequency channels within 12 h in the CALDB, a new request record is inserted into the table calsolution_request (if it is not already present there). Then an automatic script finds all the new records in this table, identifies corresponding calibration observations in the main MWA database, calibrates them, and uploads resulting calibration solutions into the CALDB. If the appropriate calibration observations cannot be found or the calibration procedure fails, the error message is stored in the field error and can be returned to the end user.
2.2. Present status of the database
Currently (as of March 2020), the database contains calibration solutions from around 11 200 calibration observations, which provides, on average, five calibration solutions per day; one for each of the primary MWA frequency bands (at centre frequencies of 88.32, 119.04, 154.88, 185.6, and 216.32 MHz). The database grows every day as new calibrator observations are collected and the near-real-time pipeline calibrates them and inserts calibration solutions into the database.
In order to populate the database with the historic and new calibration solutions, we developed a dedicated reduction pipeline, described in Section 3.
3. Automatic calibration pipelines
Originally, the pipeline used CASA software to calibrate calibrator observations in near-real time and create control images of the calibrator observations. In order to calibrate many archival calibrator observations, we developed a new pipeline using software more suited to the MWA observations. We are upgrading the current contents of the database with calibration solutions from the new pipeline in order to create a uniform database of calibration solutions resulting from the same data reduction pipeline, software, and sky model. Both pipelines use the MWA ASVO interface to download uncalibrated CASA measurement sets of calibrator observations, which are produced on the MWA ASVO severs. As described in Section 3.5, the cotter program is used in the conversion process, which implies that RFI flagging is also applied at this stage.
3.1. Near-real-time calibration of new calibration observations
The CASA-based pipeline has been used to reduce newly collected MWA calibrator observations. Every day, the MWA observes a calibrator source shortly after sunset and before sunrise at five standard frequency bands (at centre frequencies of 88.32, 119.04, 154.88, 185.6, and 216.32 MHz) and in the so-called ‘picket fence’ mode with 24 coarse channels spread regularly over the frequency range 78–240 MHz. The calibrator script continuously runs on one of the MWA servers, checks the MWA schedule database for new calibrator observations and whenever it detects that new calibrator observations were collected they are automatically downloaded, calibrated and control images of the calibrator field are formed at selected frequencies. Presently, the observations are downloaded from the PSC and processed on a server at Curtin Institute of Radio Astronomy (CIRA), which introduces additional delay. In the future, this processing will be relocated to the MRO, and the pipeline will use ‘raw’ visibility files as they are produced by the MWA correlator. If the quality of the resulting calibration solutions satisfies minimum requirements, they are uploaded to the CALDB. The requirements for the new calibration solutions to be loaded to the database are the following: (i) they must be better than the ones already in the database (if there are any for this obsid) and (ii) more than half of antennas have acceptable calibration solutions, where ‘acceptable’ means that more than 60% of the channels have a phase fit within $5\sigma$ from the data (this criteria is to avoid storing calibration solutions of very low quality). These near-real-time calibration solutions are used for monitoring of the interferometric performance and enable us to examine the long-term stability of the MWA (Section 4).
3.2. Calibration solutions of archived data
The CALDB has been populated with calibration solutions starting from the beginning of 2013. In order to achieve this, we created a list of all calibrator observations and submitted them for processing by the calibration reduction pipeline HeraclesFootnote g. The pipeline is using calibrate software (Offringa et al. Reference Offringa2016) upgraded with the newest 2016 MWA beam model (Sokolowski et al. Reference Sokolowski2017) and sky model generated by PUMAFootnote h (Line Reference Line2018). It creates binary files with calibration solutions and control images of the field using the WSCLEANFootnote i program (Offringa et al. Reference Offringa2014).
3.2.1. HERACLES pipeline
Our initial attempt for calibration of the MWA archive used a single virtual machine on the cloud-based system ‘Nimbus’ hosted by the PSC. However, our task quickly proved to be insufficient for the amount computing resources afforded by this system. For this reason, heracles was converted into a generalised and distributed system, which could be run on any free resources available (such as unused desktop computers in CIRA).
heracles is primarily utilised by a single executable which has two modes of operation: server and client. The server mode is primarily concerned with which observations need to be calibrated, based on a SQLite database. This database tracks which observations have not yet been calibrated, which have been calibrated, and which have failed. The heracles server must also coordinate with the state of any MWA ASVO data downloads (available, in progress, not available, etc.). So as to not flood the MWA ASVO with download requests, the heracles server uses a runtime setting to prepare a certain number of observations for download as the clients progress.
Once connected to a server, the operation of a heracles client follows a simple loop:
(i) request an observation to calibrate. If an observation is ready, move to step (ii), otherwise, wait for 1 min before querying the server again;
(ii) download the observation;
(iii) operate upon the data with an executable (set at runtime, typically a bash script);
(iv) if the result of the executable was a success, the calibration solutions and any other useful products are transmitted to the server. Return to step (i); and
(v) if any failure occurs in the loop, it is also reported to the server, before returning to step (i).
As the computational load of the server is negligible, clients may also be run on the same computer as the server.
The mode of operation of heracles clients allows users to enable or disable clients dynamically, which allows otherwise unused computing resources to be utilised, and proved to be an effective, efficient solution for calibrating a large volume of data. Within a few months, we were able to download, calibrate, and image observations and insert solutions into the CALDB from nearly 6 yr of MWA operations.
3.2.2. CASA pipeline
Originally, the CASA-based pipeline was used to calibrate evening and morning calibration scans and store the calibration solutions in the database. This pipeline used VLA Low-Frequency Sky Survey Redux [VLSSr; Lane et al. (Reference Lane, Cotton, van Velzen, Clarke, Kassim, Helmboldt, Lazio and Cohen2014)] images of calibrator sources (such as Hydra A, 3C444, Hercules A, and Pictor A) to derive calibration solutions. Since the creation of the new heracles pipeline, the CASA pipeline will be retired and the calibration solutions in the database are being superseded with the results from the new pipeline.
3.3. Uploading calibration solutions to the database
Phase and amplitude of calibration solutions resulting from the reduction pipeline are fitted with the first-order polynomial as a function of frequency. Figure 2 shows that phase of calibration solutions is a linear function of frequency over a very wide band. The lowest and highest four 40 kHz fine channels (160 kHz) in each coarse channel as well as the fine channels flagged due to RFI (during the conversion process) are excluded from the fitting.
First, the phase of calibration solutions over the 30.72 MHz band is ‘unwrapped’; phase values are not limited to $[-180,+180]$ degrees, but can range from minus infinity to plus infinity. Then the phase is fitted with a linear function over the entire observing band (30.72 MHz in continuous observations) resulting in two fit parameters: slope and intercept (right column in Figure 4). These are sufficient to accurately describe the phase of calibration solutions as a function of frequency, provided that the sky model used in the calibration process is complete (this was verified in the development and testing stage). The fitted slope is converted to a corresponding time delay ( $\Delta t$ ) and eventually the length $c \Delta t$ (in metres), where c is speed of light in vacuum, is saved to the CALDB.
The amplitude of calibration solutions is also fitted with the first-order polynomial and in this case the fit is performed over a single 1.28 MHz coarse channel, resulting in different slopes and intercepts for each of the 24 coarse channels (left column in Figure 4). It was verified that linear fit is the optimal polynomial order to fit amplitudes over an MWA coarse channel as a parabola had only slightly lower $\chi^2$ values and nearly two times higher Bayesian information criterion value (Schwarz Reference Schwarz1978), which proved that the linear fit is a more appropriate representation of data than the parabola.
If the fit satisfies quality requirements, the resulting fitted parameters are stored in the database. The current quality requirement is that the ratio between number of good quality channels to all the channels in the calibrated observation is above 0.6 (Figure 5).
3.4. Accessing calibration solutions in the database
The calibration solutions in the database can be accessed via a web service with a standard wget commandFootnote j. The request is executed on an MWA server, and if the appropriate (the same frequency band and within 24 h from the target observation) calibration solution exists in the database, it is returned to the user in the same binary file format (‘.bin’ file) as produced by the calibration procedure developed by Offringa et al. (Reference Offringa2016). If there is no suitable calibration solution, an error message stored in a text file is returned to the end user.
3.5. Application of calibration solutions to data downloaded from the MWA ASVO
The MWA ASVO website and APIFootnote k allow users to submit ‘conversion’ jobs which, when run, retrieve the observation, pre-process the data, converting the raw MWA correlator visibility format into a standard CASA or UV fits format, and then make the data product available for download. The conversion/pre-processing steps expose the options available in the cotter pre-processing pipeline (Offringa et al. Reference Offringa2015). The MWA ASVO calibration option utilises a recently added feature of cotter, allowing calibration solutions retrieved from the CALDB to be applied to the data before any data averaging takes place.
In a typical conversion job with the calibration option set, the requested observation is staged from the Pawsey Long-Term Archive (LTA). The LTA has a hierarchical storage management system consisting of several different tiers of storage, ranging from 1.5 PB of spinning disc to an allocation of 40 PB of magnetic tape. Once the observation data are available on the disc cache, they are copied to a scratch area. A web service call is made to retrieve the metafits file that contains much of the metadata associated with the requested observation—this is also stored with the observation files in the scratch area.
The calibration web service is called by the MWA ASVO to retrieve the best calibration solution for this observation (Section 3.4). If the solution is found, then the calibration solution binary file is retrieved and stored with the rest of the observation files in the scratch area. If no calibration solution is suitable, then the job fails and a request record is added to the CALDB to produce a solution for this observation. The user is informed to try again once this is complete (usually within 24–48 h).
With all of the files now available, the server then executes cotter, with the ‘–full-apply’ command line argument, which applies the provided calibration solution to every integration before averaging (if requested). There is also another new cotter option ‘–apply’ which applies provided calibration solutions after averaging integrations over a requested interval. Once cotter has produced the output data in a standard radio astronomy data format, a download url is provided to the user via the website or API so the data can be retrieved. During the conversion process, RFI flags (either pre-computed by aoflagger or calculated by cotter) are also applied to the data. Hence, the resulting data sets do not require any further pre-processing, and initial sky images can be formed using standard radio astronomy software tools, such as for example WSCLEAN, CASA, or MIRIAD (Sault, Teuben, & Wright Reference Sault, Teuben and Wright1995). These initial images can be used in self-calibration procedure in order to improve calibration solutions and/or further processing steps, such as primary beamFootnote l or ionospheric correctionsFootnote m, may be applied depending on the requirements of the specific science case.
4. Other applications of the calibration pipelines and database
Besides the main application of the CALDB, which is to enable downloading of calibrated data in standard astronomy data formats via the MWA ASVO interface, there are several other benefits of having a complete database of calibration solutions, which will be described in this section.
4.1. Monitoring performance and stability of the MWA telescope
The near-real-time pipeline reduces daily calibrator observations, fits their phases and amplitudes with first-order polynomials (Figure 4), and inserts the resulting cable delays and intercepts into the CALDB. These fitted cable delays can be plotted as a function of time to monitor the long-term stability of the instrument. If the system is stable, the slope should be approximately constant over long periods of time (timescales of weeks or even months). Figures 6 and 8 show fitted delays (in nanoseconds) as a function of time for selected 16 MWA tiles in the extended and compact configurations, respectively. It can be seen that the instrument remains very stable over many weeks. A compilation of such plots for all tiles is shown in Figures 7 and 9 where standard deviation of fitted delays is plotted against the antenna index enabling aggregation of the system stability in a single plot.
Routine monitoring of these plots enabled the identification of problems which can remain undetected in real-time plots showing power spectra of all the MWA tiles. In particular, it enables the monitoring of clock signals connected to the MWA receivers and in a few cases it identified the ‘drift’ of a receiver clock due to a failure at the initialisation process, which was fixed by rebooting the receiver. It was also noticed that the phases of the calibration solutions can abruptly change when an MWA receiver is power-cycled and the clock latches with an accuracy of 10 ns resulting in a step-like change of slope (corresponding to less then 3 m of length using speed of light in vacuum). Since this is not a large delay, with insignificant impact on data quality, cable delays below 13 ns (4 m) typically remain uncorrected, which results in less than 145 degrees of phase difference over the $30.72$ MHz band. However, if the delay exceeds 13 ns, the cable length in the instrument set-up database is updated based on the fitted value. Usually, after re-configuration between the compact and extended configurations, several tiles need cable length adjustments in the instrument description database in order to avoid large, uncorrected cable delays (fast ‘phase wraps’) that, if uncorrected, reduce the MWA sensitivity. The calibration system also helped to identify situations when coaxial cables from two tiles were accidentally swapped at a receiver input during a re-configuration between the compact and extended MWA configurations causing large delays (due to cable length from a different tile being used to correct the phase).
The MWA instantaneous observing bandwidth is $30.72$ MHz, which is typically placed between 50 and 350 MHz. Thus, we could not perform the fit of a straight line over the full band (starting from zero frequency). Moreover, the combination of sky and beam models used in the calibration are usually not a perfect representation of the sky and instrument. Therefore, we allowed the intercept to be a free parameter of the fit. We verified that the fitted values of the intercept are also very stable over time (excluding times when receivers are rebooted) and they are often close to zero or a multipliety of 360°. Hence, with the future improvements in the sky model based on the recent extensions of the GaLactic Extragalactic All-sky MWA (GLEAM) catalog (Hurley-Walker et al. Reference Hurley-Walker2019), we will consider constraining the intercept value to be either zero or a multipliety of 360°.
4.2. Providing calibration solutions to the new MWA correlator
The near-real-time pipeline will be used to provide calibration solutions for the new MWA ‘fringe-stopping’ correlator, which is currently in development (Morrison et al. in preparation). The MWA telescope is very stable (Section 4.1) and hence one or two sets of calibrations per 24 h interval should be sufficient. However, if it turns out to be insufficiently accurate, the calibration solutions for the new correlator will be updated more often.
A dedicated calibration server will be deployed at the MRO which will enable immediate direct access to visibility files generated by the MWA correlator. This will significantly speed up the calibration process by eliminating time required to transfer data from the MRO to the PSC archive making the pipeline a truly real-time one.
4.3. Monitoring of calibrator field images for transients and ionospheric quality
For the last 3 yr of the MWA operation, the near-real-time calibration pipeline produced control images of the standard MWA calibrators: Pictor A (537 images); Centaurus A (342 images); Hydra A (323 images); Hercules A (197 images); and 3C444 (214 images). There are even more archival images (before 2016) reduced when populating the MWA ASVO database with the archival data, which opens a possibility of radio-transient searches in these fields over a long-time baseline similar to those performed by Bower & Saul (Reference Bower and Saul2011) with the VLA. Roughly, 1/3–1/2 of these images are at the MWA optimal frequency of 154.88 MHz. We have executed the Aegean source finderFootnote n on these images in order to find transient candidates and catalog sources to a PostgreSQL database. Analysis is ongoing. We are also planning to extend the existing near-real-time pipeline and look for transient candidates on a daily basis. The results of these searches will be reported in a separate publication. Finally, such a database populated soon after the calibrator field data are collected provides an excellent opportunity to calculate the mean offset of the sources from their nominal positions in the GLEAM catalog (Hurley-Walker et al. Reference Hurley-Walker2017; Wayth et al. Reference Wayth2015) or other catalogs and provide early information of the given night’s data quality. However, in such a case, the database should be populated more densely with at least one observation every hour (or more if possible) as the ionosphere can change on timescales of hours during the night.
The MWA ASVO calibration component opens a new avenue for researchers worldwide to download calibrated MWA data, create sky images using standard radio astronomy software packages, and analyse these images for multiple purposes. The development of the calibration component of the MWA ASVO interface is a very important contribution to the astronomical community in Australia and beyond, providing access to the MWA data archive to every researcher without requiring a deep knowledge of the instrument. Using the recent sky models obtained from the MWA data (Hurley-Walker et al. Reference Hurley-Walker2019; Hurley-Walker et al. Reference Hurley-Walker2017), it will be possible to further improve calibration solutions in the database and consequently improve the quality of the resulting images. We expect that this endeavour will facilitate greater use of the system by researchers from outside the MWA Collaboration using MWA data.
The development of the CALDB triggered the establishment of an automated data reduction pipeline, which has been used in near-real time for daily monitoring of the quality of the MWA calibration solutions and hence the interferometric performance of the telescope. The pipeline also produces sky images which can be used for monitoring the quality of the ionosphere and looking for transient objects on a daily basis.
Finally, this work has been a starting point to develop a database of calibration solutions for the upcoming low-frequency component of the Square Kilometre Array telescope. Based on the MWA experience, a similar CALDB was developed to store calibration solutions from the SKA-Low prototype stations Aperture Array Verification Systems (AAVS-1 and AAVS-2) and Engineering Development Array 2 (Wayth et. al. in preparation) already deployed at the MRO. This database will be further extended in order to handle more SKA-Low stations as they will soon be built at the MRO.
We would like to thank the anonymous referee for the prompt review of our manuscript.
This scientific work makes use of the Murchison Radio-astronomy Observatory (MRO), operated by CSIRO. We acknowledge the Wajarri Yamatji people as the traditional owners of the Observatory site.
Support for the operation of the MWA is provided by the Australian Government (NCRIS), under a contract to Curtin University administered by Astronomy Australia Limited. Development of the MWA ASVO was funded via the Australian Research Data Commons (ARDC), administered by Astronomy Australia Limited. Parts of this research were supported by the Australian Research Council Centre of Excellence for All Sky Astrophysics in 3 Dimensions (ASTRO 3D), through project number CE170100013. This work was further supported by resources provided by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia. DLK was supported by NSF grant AST-1816492.
We acknowledge the work and support of the developers of the following following Python packages: Astropy (Astropy Collaboration et al. 2013; Price-Whelan et al. Reference Price-Whelan2018), Numpy (van der Walt, Colbert, & Varoquaux Reference van der Walt, Colbert and Varoquaux2011), Scipy (Virtanen et al. 2020), Matplotlib (Hunter Reference Hunter2007), and AegeanTools (Hancock, Trott, & Hurley-Walker Reference Hancock, Trott and Hurley-Walker2018). We acknowledge developers of the MWA_Tools library. This research has made use of NASA’s Astrophysics Data System.