## 1 Introduction

Laser wakefield accelerators (LWFAs) generate multi-GeV electron beams from cm-scale plasma channels using approximately 100 TW laser pulses^{[}
Reference Leemans, Nagler, Gonsalves, Tóth, Nakamura, Geddes, Esarey, Schroeder and Hooker
^{1}
^{–}
Reference Gonsalves, Nakamura, Daniels, Benedetti, Pieronek, de Raadt, Steinke, Bin, Bulanov, van Tilborg, Geddes, Schroeder, T/oth, Esarey, Swanson, Fan-Chiang, Bagdasarov, Bobrova, Gasilov, Korn, Sasorov and Leemans
^{6}
^{]}. The extreme acceleration gradients of LWFAs, coupled with their relative accessibility, have led to widespread research and pursuit of several applications, such as compact light sources^{[}
Reference Schlenvoigt, Haupt, Debus, Budde, Jäckel, Pfotenhauer, Schwoerer, Rohwer, Gallacher, Brunetti, Shanks, Wiggins and Jaroszynski
^{7}
^{–}
Reference Wang, Feng, Ke, Yu, Xu, Qi, Chen, Qin, Zhang, Fang, Liu, Jiang, Wang, Wang, Yang, Wu, Leng, Liu, Li and Xu
^{10}
^{]}, generation of bright
$\unicode{x3b3}$
-rays^{[}
Reference Sarri, Schumaker, Di Piazza, Vargas, Dromey, Dieckmann, Chvykov, Maksimchuk, Yanovsky, He, Hou, Nees, Thomas, Keitel, Zepf and Krushelnick
^{11}
^{]} and ultra-relativistic positron beams^{[}
Reference Sarri, Corvan, Schumaker, Cole, Di Piazza, Ahmed, Harvey, Keitel, Krushelnick, Mangles, Najmudin, Symes, Thomas, Yeung, Zhao and Zepf
^{12}
^{]}, and for future particle colliders^{[}
Reference Cros and Muggli
^{13}
^{]}. Also, the combination of GeV electron beams and high-intensity laser pulses allows for the study of fundamental physics such as strong-field quantum electrodynamics^{[}
Reference Thomas, Ridgers, Bulanov, Griffin and Mangles
^{14}
^{–}
Reference Poder, Tamburini, Sarri, Di Piazza, Kuschel, Baird, Behm, Bohlen, Cole, Corvan, Duff, Gerstmayr, Keitel, Krushelnick, Mangles, McKenna, Murphy, Najmudin, Ridgers, Samarin, Symes, Thomas, Warwick and Zepf
^{17}
^{]}.

In LWFAs, the non-linear laser pulse evolution^{[}
Reference Thomas, Najmudin, Mangles, Murphy, Dangor, Kamperidis, Lancaster, Mori, Norreys, Rozmus and Krushelnick
^{18}
^{,}
Reference Streeter, Kneip, Bloom, Bendoyro, Chekhlov, Dangor, Döpp, Hooker, Holloway, Jiang, Lopes, Nakamura, Norreys, Palmer, Rajeev, Schreiber, Symes, Wing, Mangles and Najmudin
^{19}
^{]} and its effect on the injection and acceleration processes^{[}
Reference Xia, Liu, Wang, Lu, Cheng, Deng, Li, Zhang, Liang, Leng, Lu, Wang, Wang, Nakajima, Li and Xu
^{20}
^{–}
Reference Zhang, Wan, Guo, Hua, Pai, Li, Zhang, Ma, Wu, Xu, Mori, Chu, Wang, Lu and Joshi
^{23}
^{]} are highly sensitive to the initial conditions and can lead to significant shot-to-shot variation of the electron beam properties^{[}
Reference Osterhoff, Popp, Major, Marx, Rowlands-Rees, Fuchs, Geissler, Hörlein, Hidding, Becker, Peralta, Schramm, Grüner, Habs, Krausz, Hooker and Karsch
^{24}
^{,}
Reference Hafz, Jeong, Choi, Lee, Pae, Kulagin, Sung, Yu, Hong, Hosokai, Cary, Ko and Lee
^{25}
^{]}. Recent work on high-stability laser systems and plasma sources has demonstrated improved stability, with the observation of few-percent variation in the electron beam energy and charge over 24 hours of continuous operation^{[}
Reference Maier, Delbos, Eichner, Hübner, Jalas, Jeppe, Jolly, Kirchen, Leroux, Messner, Schnepp, Trunk, Walker, Werle and Winkler
^{26}
^{]}. Long-term high-repetition rate operation has opened up the possibility of using machine learning techniques to model the sources of electron beam variation and to use closed-loop algorithms to optimise performance^{[}
Reference Maier, Delbos, Eichner, Hübner, Jalas, Jeppe, Jolly, Kirchen, Leroux, Messner, Schnepp, Trunk, Walker, Werle and Winkler
^{26}
^{–}
Reference Kirchen, Jalas, Messner, Winkler, Eichner, Hübner, Hülsenbusch, Jeppe, Parikh, Schnepp and Maier
^{31}
^{]}.

For applications such as the study of the radiation reaction, knowledge of the pre-interaction electron beam properties is required to make precise measurements of any changes of these properties and thereby infer the validity of theoretical models^{[}
Reference Samarin, Zepf and Sarri
^{32}
^{–}
Reference Arran, Cole, Gerstmayr, Blackburn, Mangles and Ridgers
^{34}
^{]}. The destructive nature of the measurements necessitates predictable LWFA performance through one of the following: improved stability; preserving part of the spectrum as a reference^{[}
Reference Baird, Murphy, Blackburn, Ilderton, Mangles, Marklund and Ridgers
^{33}
^{]}; or by developing models capable of producing the electron beam properties from a given shot. In general, the ability to make predictions of the outputs from plasma accelerators will be advantageous to many of their applications.

Previous work in developing machine learning models for LWFAs has demonstrated the prediction of scalar metrics of the electron beam, such as total charge or peak energy^{[}
Reference Shalloo, Dann, Gruse, Underwood, Antoine, Arran, Backhouse, Baird, Balcazar, Bourgeois, Cardarelli, Hatfield, Kang, Krushelnick, Mangles, Murphy, Lu, Osterhoff, Põder, Rajeev, Ridgers, Rozario, Selwood, Shahani, Symes, Thomas, Thornton, Najmudin and Streeter
^{29}
^{–}
Reference Kirchen, Jalas, Messner, Winkler, Eichner, Hübner, Hülsenbusch, Jeppe, Parikh, Schnepp and Maier
^{31}
^{,}
Reference Lin, Qian, Murphy, Hsu, Hero, Ma, Thomas and Krushelnick
^{35}
^{]}. However, many applications will require the prediction of vector properties, such as the spectrum or the longitudinal phase space, for which neural networks provide a convenient framework. A densely connected neural network (DNN) is made of densely connected layers, in which every input is the weighted sum of all of the outputs of the previous layer, with the individual weights as free parameters of the model. A non-linear activation function (e.g., a sigmoid function) then takes the weighted sum plus a bias value (another free model parameter) as its argument and returns an output value. An alternative to deeply connected layers is a convolutional layer, which performs convolutions between the input vector and a set of kernels. Networks using these layers, known as convolutional neural networks (CNNs), have been shown to be better suitable for learning meaningful features from natural signals^{[}
Reference Rawat and Wang
^{36}
^{]}. Further improvement to the predictive power of neural networks has been seen when including stochasticity in the outputs of individual nodes, in an architecture known as a variational neural network (VNN)^{[}
Reference Kristiadi, Hein and Hennig
^{37}
^{]}.

In conventional accelerators, Emma *et al.* ^{[}Reference Emma, Edelen, Hogan, O’Shea, White and Yakimenko^{38}
^{]} demonstrated training of a DNN to produce synthetic diagnostic outputs that matched the measured outputs for a new unseen dataset. CNNs have been used to predict X-ray properties from the post-undulator electron beam spectrum^{[}
Reference Ren, Edelen, Lutman, Marcus, Maxwell and Ratner
^{39}
^{]}, while ensembles of DNNs have also been used to predict the electron beam longitudinal phase space and current profile from non-destructive bending radiation measurements^{[}
Reference Hanuka, Emma, Maxwell, Fisher, Jacobson, Hogan and Huang
^{40}
^{]}. In this work, we report on the training of an ensemble of VNNs to model the LWFA-generated electron spectrum using secondary diagnostics of the laser and plasma conditions. The LWFA ensemble was trained using a subset of experimental measurements of the electron spectrum with the remainder used for model validation. Each individual VNN in the ensemble was trained with a different subset of the training data, so that the ensemble provided both a mean prediction and an estimate of its uncertainty. The model also reveals the extent to which the measurements obtained from the available diagnostics are predictive of the accelerator performance, and which parameters have the strongest influence.

## 2 Experimental methods and results

The experiment was performed using the Gemini laser system at the Central Laser Facility in the UK (see Figure 1 for details). Laser pulses with an energy of
${E}_{\rm L} = \left(6.6\pm 0.5\right)$
J and a pulse duration of approximately equal to 50 fs were used to drive a GeV-scale LWFA. The pulses were focused with an
$f/40$
off-axis parabolic mirror to a spot size of
$\left(50\pm 2\right)\times \left(45\pm 2\right)$
μm in the horizontal (polarisation) and vertical planes, respectively, giving a peak intensity of
$\left(5.5\pm 0.5\right)\times {10}^{18}$
W cm^{−2}. The focus was aligned to a gas jet that was composed of a mixture of 2% nitrogen and 98% helium, enabling ionisation injection^{[}
Reference Rowlands-Rees, Kamperidis, Kneip, Gonsalves, Mangles, Gallacher, Brunetti, Ibbotson, Murphy, Foster, Streeter, Budde, Norreys, Jaroszynski, Krushelnick, Najmudin and Hooker
^{41}
^{–}
Reference Chen, Esarey, Schroeder, Geddes and Leemans
^{44}
^{]}. The gas jet had an average electron density of
$\left(1.00\pm 0.07\right)\times {10}^{18}$
cm
${}^{-3}$
over a 17 mm length.

The LWFA-generated electron energy spectrum $\mathrm{d}W/\mathrm{d}E$ was measured using the spectrometer scintillator screen images, which were energy-calibrated by numerical tracking of electron trajectories in the magnetic field. The interferometry and top view cameras were used to extract the electron density profile, ${n}_{\rm e}(z)$ , and the laser scattering profile, ${S}_{\rm L}(z)$ , respectively, where $z$ is the laser propagation axis. A 2D Gaussian fit was performed on the far-field image to obtain six parameters: the peak fluence ${I}_0$ ; the centroids ${x}_0$ and ${y}_0$ ; the major and minor root-mean-square (RMS) spot widths ${\sigma}_a$ and ${\sigma}_b$ ; and the angle of the major axis of the ellipse with $x$ -axis $\theta$ . Due to the aberrations and clips caused by this beam-line, this far-field is not an exact replica of the main laser focus, but is representative of the shot-to-shot focal spot fluctuations.

The experimental results for this analysis were taken from an investigation of the radiation reaction, in which a second counter-propagating laser pulse is used to collide with the LWFA electron beam. For training and validating our predictive tool, we wish to only use shots where the laser pulse did not significantly overlap with the electron beam, so that the electron spectrum was not affected. For successful collisions, a gamma-beam was generated via the inverse Compton scattering interaction and was diagnosed spatially with a CsI scintillator array^{[}
Reference Cole, Behm, Gerstmayr, Blackburn, Wood, Baird, Duff, Harvey, Ilderton, Joglekar, Krushelnick, Kuschel, Marklund, McKenna, Murphy, Poder, Ridgers, Samarin, Sarri, Symes, Thomas, Warwick, Zepf, Najmudin and Mangles
^{16}
^{]} imaged onto a
$1024\times 1024$
pixels charge-coupled device (CCD).

Due to the shot-to-shot variation in the electron beam position, most shots did not result in a significant collision, providing a large number of null shots for model training and testing. The brightness of the signal on the gamma detector was used to provide an approximate metric of the collision intensity. The 99.99th percentile pixel value of the background subtracted CCD image was taken as the peak of the gamma signal
${C}_{\gamma }$
. The highest value of this metric was
${C}_{\gamma} = 4380$
, whereas the median value was
${C}_{\gamma} = 12$
. From analysis of the collision statistics, a value of
${C}_{\gamma}\le 100$
was estimated to result from collisions with a peak normalised vector potential of
${a}_0<1.4$
. For 1 GeV electrons, this would result in a less than 1% energy loss^{[}
Reference Thomas, Ridgers, Bulanov, Griffin and Mangles
^{14}
^{]}, approximately equal to the resolution of the spectrometer. Therefore, this value was taken as a threshold for null shots, for which the electron beam is unaffected by the collision. The experimental data were taken during a 5-hour period with a total of 779 shots. Model training and validation datasets were taken from shots for which
${C}_{\gamma}\le 100$
, with 90% (570 shots) used for training and 10% (75 shots) reserved for model validation.

## 3 Neural network architecture and training

The measurements of
${n}_{\rm e}(z)$
,
${S}_{\rm L}(z)$
and
$\mathrm{d}W/\mathrm{d}E$
were stored as 1D vectors of lengths 310, 100 and 200, respectively. Although each of these signals is composed of at least 100 values, the variations over the full dataset are limited, and so in principle only a few parameters are required for each to encode these variations. An appropriate decoder would be able to generate a good approximation to the measured signals from this reduced set of parameters, which are called latent space variables. In this work, variational autoencoders (VAEs)^{[}
Reference Kingma and Welling
^{45}
^{,}
Reference Burgess, Higgins, Pal, Matthey, Watters, Desjardins and Lerchner
^{46}
^{]} incorporating convolutional and densely connected layers were trained, as illustrated in Figure 2. By using a bottleneck of only a few nodes, the VAEs were trained to find an optimal latent space representation of the data, which allowed the decoder to reconstruct the measured signals.

The trained encoders for ${n}_{\rm e}(z)$ and ${S}_{\rm L}(z)$ were used to encode their respective measurements to their latent space representations, which were then combined with measurements of the laser far-field and the laser energy to create the inputs for the predictive model. A VNN, which we call the translator network, takes those inputs and returns values that are passed to the trained electron spectra decoder to generate the predicted spectrum. The translator was trained to learn the correlation between the reduced input set and the latent variables of the electron spectra decoder, as illustrated in Figure 3.

For the variational layers, two parameters are calculated for each node that represent the expectation value ${\mu}_m$ and standard deviation ${\sigma}_m$ . During training, values were sampled from Gaussian distributions given by these parameters, $\mathcal{N}({\mu}_m,{\sigma}_m)$ , such that the latent values for a given input set, ${x}_m$ , would vary according to ${\sigma}_m$ .

^{
a
}For the LWFA translator models, the value of
$\beta$
varied from high to low during the training, with the final value given in the table. The training time for each autoencoder was 10 minutes and training of the 100 translator networks took a total of 3 hours, using an Intel Xeon Gold 6130 CPU at 2.1 GHz with 32 GB of RAM. The analysis and model training were performed on CLF Data Analysis as a Service (CDAS)^{[}
Reference Barnsley, Matthews, Griffin, Salt, Ross, Cătălin, Dibbo and Crompton
^{49}
^{]}. The neural networks were built using the Keras API (https://keras.io).

The training loss function used was as follows^{[}
Reference Kingma and Welling
^{45}
^{]}:

where
${D}_{\mathrm{KL}} = {\sum}_{m = 0}^M\left[1+\log \left({\sigma}_m\right)-{\mu}_m^2-{\sigma}_m\right]/(2M)$
is the Kullback–Leibler (KL) divergence,
${\mathrm{\mathcal{L}}}_{\mathrm{MSE}}$
is the mean squared error (MSE) and
$M$
is the total number of input sets in a given training iteration. The same loss function was used to train each VAE and also the final translator VNN, with the MSE taken between the predicted and measured diagnostic output (
${n}_{\rm e}(z)$
,
${S}_{\rm L}(z)$
or
$\mathrm{d}W/\mathrm{d}E$
). The
$\beta$
parameter was used to scale the relative importance of the regularisation, following the beta-VAE approach^{[}
Reference Kingma and Welling
^{45}
^{]}. During model validation, only the mean weights for the variational layers were used and the
${D}_{\rm KL}$
term from Equation (1) was omitted. Every node of the neural networks used the leaky rectified linear unit (leaky-ReLU)^{[}
Reference Xu, Wang, Chen and Li
^{47}
^{]} activation function with
$\alpha = 0.3$
, which exhibited superior learning performance in comparison to sigmoid and hyperbolic tan functions, as well as leaky-ReLU with other values of
$\alpha$
.

For the diagnostic VAEs, the number of latent parameters was chosen to be the minimum that gave high-fidelity reconstructions, with the $\beta$ parameter manually tuned to ensure that the distribution of each latent parameter for the training datasets was close to a standard normal distribution ( $\mathcal{N}\left(0,1\right)$ ). One latent space parameter was directly set as the average of the input signal (normalised by the training dataset). This parameter was then used to scale the decoder output and ensured that one of the latent space variables represents the amplitude of the signal, aiding interpretation of the trained networks. Once the VAEs were trained, the weights were frozen during the translator training process.

The translator is a DNN with a variational last layer. The translator VNN architecture (number of nodes and number of layers) and the value of $\beta$ were optimised using a genetic algorithm. During this process the training data were divided in two parts, with 50% of the data used to train each network and the other 50% used to calculate the test loss. This ensured that the validation dataset was kept purely for validation of the final model performance and not used in any tuning of the predictive model. The optimal architecture for the translator network, shown in Figure 3, comprises three densely connected layers, with a final variational layer with five outputs.

In order to quantify the uncertainty in the model predictions, 100 translator VNNs were trained, each using randomly selected 50% samples of the training dataset. The prediction of each of these models can then be used to obtain an average prediction, while the variation between model predictions is indicative of the random uncertainty and the finite size of the training data. In particular, the random sub-sampling affects the predictive quality in regions where the training data are sparse, typically at the extremes of the input parameters, resulting in a larger uncertainty in those regions.

The parameters for the trained VAEs and translator networks are summarised in Table 1. Each autoencoder was trained for 1000 iterations with a batch size of 64. The translator network was trained in three stages with 200, 400 and 300 iterations performed at 10, 4 and 1 times the final
$\beta$
value to balance reconstruction fidelity with latent space smoothness^{[}
Reference Burgess, Higgins, Pal, Matthey, Watters, Desjardins and Lerchner
^{46}
^{]}. The training processes were all performed using the Adam optimiser^{[}
Reference Kingma and Ba
^{48}
^{]}, with a learning rate of
${10}^{-3}$
, which was found to converge well.

## 4 LWFA prediction results

The measured electron spectra from the validation dataset are shown in Figure 4(a), along with the reconstructions by the electron spectra VAE (Figure 4(b)) and the average of the LWFA model ensemble predictions (Figure 4(c) ). The electron spectra VAE had an MSE of 0.011, and shows good qualitative and quantitative reproduction of the measured electron spectra. This indicates that the five parameters of the latent space, in combination with the structures learnt by the decoder, are sufficient to accurately generate the set of observations from the validation dataset. In other words, the five latent parameters are sufficient to generate the full variability of electron beams for this experimental setup. The question is then whether the secondary diagnostics are sufficient to determine the correct latent variables for each shot and thereby give an accurate prediction of the electron spectrum. The mean prediction of the LWFA model ensemble had an MSE of 0.057 and shows a similar trend in cut-off energy to the data, except for the few high- and low-energy outliers. By comparison, a naive prediction that all measured spectra are equal to the average spectrum from the training dataset gives an MSE value of 0.11, indicating that the LWFA model has a significant predictive capability.

Individual predictions of each model of the LWFA ensemble, along with the corresponding measured electron spectra, are shown in Figure 5. The variation in model predictions for a given shot is indicative of the uncertainty, due to the random sub-sampling of the training data and the stochastic training process. For a large region of the parameter space, the LWFA model predictions show a good agreement with the measurements, with large discrepancies occurring for the outliers in terms of cut-off energy. These shots also exhibit the largest variation in predictions between individual models within the ensemble. The total electron beam energy is reasonably accurately predicted, with relative RMS error of 12% for the entire validation dataset, compared to the relative beam energy RMS variation of 30%.

The relative influence of each input parameter on the LWFA model can be seen by varying each one in turn and measuring the effect on the resultant spectrum, as shown in Figure 6. The plasma density parameters have a relatively modest effect on the electron spectrum, indicating that the shot-to-shot variation of the plasma density profile is not the dominant contributor to the electron spectrum variation. Variations of the laser energy and the scattering profile are more significant, having the greatest effect on the generated electron spectra. The spatio-temporal distribution of the laser pulse is only indirectly diagnosed from the far-field diagnostic and the effect on the scattering profile, and is known to have a large influence on the accelerated electrons^{[}Reference Maier, Delbos, Eichner, Hübner, Jalas, Jeppe, Jolly, Kirchen, Leroux, Messner, Schnepp, Trunk, Walker, Werle and Winkler^{26}^{,} Reference Dann, Baird, Bourgeois, Chekhlov, Eardley, Gregory, Gruse, Hah, Hazra, Hawkes, Hooker, Krushelnick, Mangles, Marshall, Murphy, Najmudin, Nees, Osterhoff, Parry, Pourmoussavi, Rahul, Rajeev, Rozario, Scott, Smith, Springate, Tang, Tata, Thomas, Thornton, Symes and Streeter^{28}^{,} Reference Shalloo, Dann, Gruse, Underwood, Antoine, Arran, Backhouse, Baird, Balcazar, Bourgeois, Cardarelli, Hatfield, Kang, Krushelnick, Mangles, Murphy, Lu, Osterhoff, Põder, Rajeev, Ridgers, Rozario, Selwood, Shahani, Symes, Thomas, Thornton, Najmudin and Streeter^{29}^{]}. Including additional laser diagnostics, such as measurement of the spatial phase profile^{[}Reference Maier, Delbos, Eichner, Hübner, Jalas, Jeppe, Jolly, Kirchen, Leroux, Messner, Schnepp, Trunk, Walker, Werle and Winkler^{26}^{,} Reference Jalas, Kirchen, Messner, Winkler, Hübner, Dirkwinkel, Schnepp, Lehe and Maier^{30}^{]}, should enable higher fidelity predictions.

Although many of the input parameters are not straightforward to interpret physically, that is, those that are the latent space of the autoencoders, the laser energy is a physically important parameter in LWFAs. In practice, the inputs for the LWFA models are not independent of one another, as characterised by calculating the Pearson correlation coefficients for the training dataset. This reveals relatively strong correlations between the laser energy and several other parameters, especially ${S}_{\rm L}(3)$ , ${S}_{\rm L}(4)$ , ${S}_{\rm L}(6)$ , ${n}_{\rm e}(4)$ , ${n}_{\rm e}(5)$ and ${I}_0$ , which had correlation coefficients ranging from $r = 0.31$ to $r = 0.55$ . The trained LWFA model is then able to show what effect laser energy fluctuations have on the electron spectrum by varying each parameter proportionally according to their correlation coefficients with laser energy ${E}_{\rm L}$ , as shown in Figure 7(a). As the laser energy increases, the peak electron energy is relatively constant, while the overall charge increases. The total electron beam charge ${Q}_{\rm B}$ is plotted as a function of laser energy in Figure 7(b) , for both the raw data and the LWFA model predictions. The model prediction shows an approximately linear increase with laser energy with the equation ${Q}_{\rm B}\left[\mathrm{nC}\right] = 0.48{E}_{\rm L}\left[\mathrm{J}\right]-2.1$ .

The scaling parameters
${S}_{\rm L}(6)$
and
${n}_{\rm e}(5)$
are also easy to interpret, as they are the average scattering signal and electron density, respectively (normalised to the mean and variance over the training dataset). The effect of
${n}_{\rm e}(5)$
on the electron density profile and the predicted electron spectrum is shown in Figure 8. The average plasma electron density varied by 4% over the training dataset, as illustrated by the small perturbations to the density profile observed in Figure 8(a). A more significant effect is seen on the electron spectra in Figure 8(b), with the peak energy shifting higher as the average density drops, as expected for a dephasing-limited LWFA^{[}Reference Lu, Tzoufras, Joshi, Tsung, Mori, Vieira, Fonseca and Silva^{50}^{,} Reference Bloom, Streeter, Kneip, Bendoyro, Cheklov, Cole, Döpp, Hooker, Holloway, Jiang, Lopes, Nakamura, Norreys, Rajeev, Symes, Schreiber, Wood, Wing, Najmudin and Mangles^{51}^{]}. The effect on the spectrum is much smaller than that seen to be caused by the laser energy variation in Figure 7. This indicates that the level of natural variations of the plasma electron density in this dataset was sufficiently low that it was not a dominant contributor to the shot-to-shot variations in the electron spectra.

The other latent parameters generated by the VAEs do not have straightforward physical interpretations and only have meaning in combination with the trained encoders. In order to gain some insight into their physical meaning, the effect of changing each parameter can be observed on the corresponding diagnostic output, as well as on the predicted electron spectrum. An example is shown in Figure 9, where the effect of varying ${S}_{\rm L}(3)$ , the most dominant input parameter to the translator VNN, is shown.

Figure 9(a) shows that positive
${S}_{\rm L}(3)$
correlates with an increased laser scattering peak at the entrance to the gas jet (
$z = 0$
) and for the last half of the plasma, while suppressing the signal for
$1>z>7$
mm. This also results in an increased predicted total charge as well as an increased predicted maximum electron energy (see Figure 9(b)), a clearly beneficial effect for many applications. The scattered laser intensity is associated with Raman side-scattering and wavebreaking radiation, generated as the laser self-guides and self-compresses to a high peak intensity in the plasma channel^{[}
Reference Thomas, Mangles, Najmudin, Kaluza, Murphy and Krushelnick
^{52}
^{,}
Reference Matsuoka, McGuffey, Cummings, Horovitz, Dollar, Chvykov, Kalintchenko, Rousseau, Yanovsky, Bulanov, Thomas, Maksimchuk and Krushelnick
^{53}
^{]}. Therefore, the increase of this scattering signal seen in Figure 9(a) indicates an increased possibility for the injection of electrons into the plasma wakefield at
$z = 0$
mm, while maintaining a high amplitude plasma wave for
$z\,{>}\,7$
mm, resulting in the enhanced electron spectrum predicted in Figure 9(b).

## 5 Conclusion

In conclusion, we have constructed and trained a predictive model for an LWFA that is capable of predicting the electron spectrum for a given shot, based on secondary diagnostics of the laser and plasma conditions. The model is constructed from separately trained variational convolutional autoencoders, with a VNN used to map a reduced parameter set to the latent space of an electron spectra decoder. An ensemble of models was trained on subsets of the training data, with the range of model predictions providing an estimate of the uncertainty. The predictive model ensemble performs better than the naive assumption that the electron spectrum is constant, and so has utility in estimating the electron spectrum in the case of destructive processes, such as a radiation reaction. The model fidelity is most likely limited by the lack of on-shot spatio-temporal information about the laser pulse, which is known to have a strong influence on the accelerated electron beam^{[}
Reference Maier, Delbos, Eichner, Hübner, Jalas, Jeppe, Jolly, Kirchen, Leroux, Messner, Schnepp, Trunk, Walker, Werle and Winkler
^{26}
^{]}. It is expected that this technique can be improved by including additional diagnostics of the laser spatial and spectral phase, and by increasing the size of the training dataset, especially for reducing the prediction error for the outliers. Further diagnostics of the laser–plasma interaction, such as spectrally resolving the scattering signal, may also provide additional information to improve the prediction accuracy. Neural networks of this kind could be an important tool for understanding the performance sensitivities of plasma accelerators, and also in providing synthetic diagnostics for applications of their electron beams and secondary sources.

## Data availability

The data and code for this publication are available from the online repository zenodo.org at https://zenodo.org/record/7510352#.Y9K2XezP30o .

## Acknowledgements

This work was supported by UK STFC ST/V001639/1, UK EPSRC EP/V049577/1 and EP/V044397/1 and Horizon 2020 funding under European Research Council (ERC) Grant Agreement No. 682399. M.J.V.S. acknowledges support from the Royal Society URF-R1221874. A.G.R.T and A.S.J acknowledge support from US DOE grant DE-SC0016804.