## Impact Statement

Tree mortality is a complex phenomenon involving multiple processes at different temporal scales. Here, we rely on very long simulations of a forest model and develop a method based on generative deep learning to find the relationship between complex weather patterns and tree mortality. The generative nature of the method allows for the generation of new realistic weather conditions outside of the provided samples, which are associated with high biomass loss in a forest. Furthermore, the method can be applied to different weather-driven impacts, adding to the growing pool of explainable/interpretable data-driven methods for Earth system sciences.

## 1. Introduction

Forests are the predominant atmospheric carbon sink on land that play a crucial role in mitigating anthropogenic climate change. Forests sequestered 7.6 GtCO
$ {}_2 $
e/yr on average between 2000 and 2019 (Harris et al., Reference Harris, Gibbs, Baccini, Birdsey, de Bruin, Farina, Fatoyinbo, Hansen, Herold, Houghton, Potapov, Suarez, Roman-Cuesta, Saatchi, Slay, Turubanova and Tyukavina2021), which amounted to 22% of the total fossil CO_{2} emissions during 2020 (Friedlingstein et al., Reference Friedlingstein, Jones, O’Sullivan, Andrew, Bakker, Hauck, Le Quéré, Peters, Peters, Pongratz, Sitch, Canadell, Ciais, Jackson, Alin, Anthoni, Bates, Becker, Bellouin, Bopp, Chau, Chevallier, Chini, Cronin, Currie, Decharme, Djeutchouang, Dou, Evans, Feely, Feng, Gasser, Gilfillan, Gkritzalis, Grassi, Gregor, Gruber, Gürses, Harris, Houghton, Hurtt, Iida, Ilyina, Luijkx, Jain, Jones, Kato, Kennedy, Klein Goldewijk, Knauer, Korsbakken, Körtzinger, Landschützer, Lauvset, Lefèvre, Lienert, Liu, Marland, McGuire, Melton, Munro, Nabel, Nakaoka, Niwa, Ono, Pierrot, Poulter, Rehder, Resplandy, Robertson, Rödenbeck, Rosan, Schwinger, Schwingshackl, Séférian, Sutton, Sweeney, Tanhua, Tans, Tian, Tilbrook, Tubiello, van der Werf, Vuichard, Wada, Wanninkhof, Watson, Willis, Wiltshire, Yuan, Yue, Yue, Zaehle and Zeng2022). Additionally, forests regulate microclimates, sustain biodiversity, and provide ecosystem services such as the production of timber, edible wild plants and fungi, pest control, and increasing nitrogen availability (Felipe-Lucia et al., Reference Felipe-Lucia, Soliveres, Penone, Manning, van der Plas, Boch, Prati, Ammer, Schall, Gossner, Bauhus, Buscot, Blaser, Blüthgen, de Frutos, Ehbrecht, Frank, Goldmann, Hänsel, Jung, Kahl, Nauss, Oelmann, Pena, Polle, Renner, Schloter, Schöning, Schrumpf, Schulze, Solly, Sorkau, Stempfhuber, Tschapka, Weisser, Wubet, Fischer and Allan2018). However, sudden and unexpected weather-related forest mortality events have been observed in recent years across the globe, including in ecosystems that were not previously considered at risk (Hartmann et al., Reference Hartmann, Bastos, Das, Esquivel-Muelbert, Hammond, Martínez-Vilalta, McDowell, Powers, Pugh, Ruthrof and Allen2022). In the European Union, the net carbon sink from forest land decreased by 12% between 2010 and 2018 (Pilli et al., Reference Pilli, Alkama, Cescatti, Kurz and Grassi2022). Weather conditions often interact with disturbances such as insect outbreaks, diseases and forest fires to cause forest mortality (Sturrock et al., Reference Sturrock, Frankel, Brown, Hennon, Kliejunas, Lewis, Worrall and Woods2011; Millar and Stephenson, Reference Millar and Stephenson2015; Jain et al., Reference Jain, Castellanos-Acuna, Coogan, Abatzoglou and Flannigan2022). With weather extremes expected to increase in intensity and frequency in the future (Seneviratne et al., Reference Seneviratne, Zhang, Adnan, Badi, Dereczynski, Di Luca, Ghosh, Iskandar, Kossin, Lewis, Otto, Pinto, Satoh, Vicente-Serrano, Wehner, Zhou, Masson-Delmotte, Zhai, Pirani, Connors, Pean, Berger, Caud, Chen, Goldfarb, Gomis, Huang, Leitzell, Lonnoy, Matthews, Maycock, Waterfield, Yelekci, Yu and Zhou2021), an improved understanding of the weather drivers of forest mortality is necessary for the development of effective forest conservation and management strategies.

Understanding the influence of weather drivers on large-scale tree mortality is challenging because forest mortality depends on complex interactions between weather events, the presence of pests and diseases, and forest characteristics such as canopy height, tree cover, biomass, and diversity (Bastos et al., Reference Bastos, Sippel, Frank, Mahecha, Zaehle, Zscheischler and Reichstein2023). The forest structure and diversity determine forest resistance to climatic or biotic stressors (for example, a forest with less biomass requires less water for its growth), and this structure, in turn, depends on previous weather conditions (López-Serrano et al., Reference López-Serrano, García-Morote, Andrés-Abellán, Tendero and del Cerro2005; Bequet et al., Reference Bequet, Campioli, Kint, Vansteenkiste, Muys and Ceulemans2011; Ježík et al., Reference Ježík, Blaženec, Střelcová and Ditmarová2011; Rais et al., Reference Rais, Uhl, van de Kuilen and Pretzsch2021; Mahecha et al., Reference Mahecha, Bastos, Bohn, Eisenhauer, Feilhauer, Hartmann, Hickler, Kalesse-Los, Migliavacca, Otto, Peng, Quaas, Tegen, Weigelt, Wendisch and Wirth2022; Li et al., Reference Li, Guo, Pasgaard, Niu, Wang, Chen, Qin and Svenning2023). This means that present-day tree mortality can be influenced by weather conditions from previous years. Extreme climate events such as droughts and heatwaves are common drivers of tree mortality (Allen et al., Reference Allen, Macalady, Chenchouni, Bachelet, McDowell, Vennetier, Kitzberger, Rigling, Breshears, Hogg, Gonzalez, Fensham, Zhang, Castro, Demidova, Lim, Allard, Running, Semerci and Cobb2010; Anderegg et al., Reference Anderegg, Kane and Anderegg2013; McDowell et al., Reference McDowell, Ryan, Zeppel and Tissue2013; Park Williams et al., Reference Park Williams, Allen, Macalady, Griffin, Woodhouse, Meko, Swetnam, Rauscher, Seager, Grissino-Mayer, Dean, Cook, Gangodagamage, Cai and McDowell2013; Choat et al., Reference Choat, Brodribb, Brodersen, Duursma, López and Medlyn2018; Senf et al., Reference Senf, Pflugmacher, Zhiqiang, Sebald, Knorn, Neumann, Hostert and Seidl2018; Brodribb et al., Reference Brodribb, Powers, Cochard and Choat2020). However, due to the complex interactions between climatic, biotic, and forest state factors, forest mortality events can also be driven by the temporally compounding effects of multiple weather events, such as observed in sequential droughts (Zscheischler et al., Reference Zscheischler, Westra, van den Hurk, Seneviratne, Ward, Pitman, AghaKouchak, Bresch, Leonard, Wahl and Zhang2018, Reference Zscheischler, Martius, Westra, Bevacqua, Raymond, Horton, van den Hurk, AghaKouchak, Jézéquel, Mahecha, Maraun, Ramos, Ridder, Thiery and Vignotto2020; Sánchez-Pinillos et al., Reference Sánchez-Pinillos, D’Orangeville, Boulanger, Comeau, Wang, Taylor and Kneeshaw2022; Bastos et al., Reference Bastos, Sippel, Frank, Mahecha, Zaehle, Zscheischler and Reichstein2023).

Modeling forests with process-based models can aid in improving the scientific understanding of forest dynamics. There is a long tradition in ecology of using individual-based forest models to answer a wide range of scientific questions (Shugart et al., Reference Shugart, Wang, Fischer, Ma, Fang, Yan, Huth and Armstrong2018), which simulate each tree’s growth within a forest stand. These forest gap models were first developed for North America and have become one of the most widely used model types in ecology (Bugmann, Reference Bugmann2001). Various process-based models are currently employed, each with different strengths and weaknesses (Mahnken et al., Reference Mahnken, Cailleret, Collalti, Trotta, Biondo, D’Andrea, Dalmonech, Marano, Mäkelä, Minunno, Peltoniemi, Trotsiuk, Nadal-Sala, Sabaté, Vallet, Aussenac, Cameron, Bohn, Grote, Augustynczik, Yousefpour, Huber, Bugmann, Merganičová, Merganic, Valent, Lasch-Born, Hartig, Vega del Valle, Volkholz, Gutsch, Matteucci, Krejza, Ibrom, Meesenburg, Rötzer, van der Maaten-Theunissen, van der Maaten and Reyer2022).

Due to the complexity of the simulated processes and their convolution through time, it is challenging to disentangle weather conditions associated with high forest mortality, even in a model environment. Hence, a better understanding of such processes is needed (Bugmann et al., Reference Bugmann, Seidl, Hartig, Bohn, Brůna, Cailleret, François, Heinke, Henrot, Hickler, Hülsmann, Huth, Jacquemin, Kollas, Lasch-Born, Lexer, Merganič, Merganičová, Mette, Miranda, Nadal-Sala, Rammer, Rammig, Reineking, Roedig, Sabaté, Steinkamp, Suckow, Vacchiano, Wild, Xu and Reyer2019). Currently, two widely used approaches exist to identify the drivers of tree mortality in forests. First, one can gain insight through forward simulations from models (Taccoen et al., Reference Taccoen, Piedallu, Seynave, Perez, Gégout-Petit, Nageleisen, Bontemps and Gégout2019; Scheel et al., Reference Scheel, Lindeskog, Smith, Suvanto and Pugh2022). Forcing the model with perturbed weather data and analyzing the output in a conventional sensitivity analysis setting has two main challenges. Due to the long-range temporal dependency of weather variables, the forcing data have many input features. The large number of features translates into an even larger number of combinations of weather, making a comprehensive analysis computationally infeasible. Furthermore, defining ways to modify multivariate weather data so that the modified data are physically realistic is not trivial due to structured correlation within the variables. Nevertheless, forward simulations are helpful in analyzing individual events. Second, statistical methods such as linear regression, generalized additive models, principal component analysis, and random forests can be used to understand the impact of weather drivers (Park Williams et al., Reference Park Williams, Allen, Macalady, Griffin, Woodhouse, Meko, Swetnam, Rauscher, Seager, Grissino-Mayer, Dean, Cook, Gangodagamage, Cai and McDowell2013; Matusick et al., Reference Matusick, Ruthrof, Kala, Brouwers, Breshears and Hardy2018; Xu et al., Reference Xu, Zhou, Yi, Fang, Hendrey and Zhao2018; Crouchet et al., Reference Crouchet, Jensen, Schwartz and Schwinning2019; Senf et al., Reference Senf, Buras, Zang, Rammig and Seidl2020; Forzieri et al., Reference Forzieri, Dakos, McDowell, Ramdane and Cescatti2022; Jaime et al., Reference Jaime, Batllori, Ferretti and Lloret2022; Le Grix et al., Reference Le Grix, Cheung, Reygondeau, Zscheischler and Frölicher2023). Linear methods typically fail to capture complex nonlinear forest dynamics, and conventional machine learning methods like random forests or support vector machines require predefining predictors from an often extensive set of potential candidates.

Deep learning (DL) methods can capture the complex temporal dynamics in high-dimensional data and have been extensively used for time series forecasting and classification problems (Wang et al., Reference Wang, Yan and Oates2017; Zhao et al., Reference Zhao, Lu, Chen, Liu and Wu2017; Wu et al., Reference Wu, Pan, Long, Jiang, Chang and Zhang2020; Lim and Zohren, Reference Lim and Zohren2021; Marcolongo et al., Reference Marcolongo, Vladymyrov, Lienert, Peleg, Haug and Zscheischler2022). Despite high predictive accuracy, DL methods are often criticized for their lack of explainability. Most current weather-induced impact research utilizes post hoc methods, which interpret an existing, trained model (for example SHapley Additive exPlanations (SHAP) values, Layer-wise Relevance Propagation (LRP), feature importance analysis, and regression activation maps (Wolanin et al., Reference Wolanin, Mateo-García, Camps-Valls, Gómez-Chova, Meroni, Duveiller, Liangzhi and Guanter2020; Labe and Barnes, Reference Labe and Barnes2021; Yan et al., Reference Yan, Lei, Jiang, Yan, Ren, Liu and Liu2022; Liu et al., Reference Liu, Duffy, Dy and Ganguly2023; Mamalakis et al., Reference Mamalakis, Barnes and Ebert-Uphoff2023; Sweet et al., Reference Sweet, Müller, Anand and Zscheischler2023; Wadoux et al., Reference Wadoux, Saby and Martin2023). The advantage of post hoc methods is that they can be used with any model architecture and, therefore, do not require a trade-off of predictive model performance. However, such approaches explain model decisions on specified data points and do not provide a holistic view of the entire decision function (Bordt et al., Reference Bordt, Finck, Raidl and von Luxburg2022). An alternative approach is to use intrinsically explainable machine learning models (Rudin, Reference Rudin2019). A method in this approach is to learn a lower-dimensional representation of the input and relate it to the output with a more explainable function (Alvarez-Melis and Jaakkola, Reference Alvarez-Melis, Jaakkola, Bengio, Wallach, Larochelle, Grauman, Cesa-Bianchi and Garnett2018; Kim et al., Reference Kim, Wattenberg, Gilmer, Cai, Wexler, Viegas, Sayres, Dy and Krause2018; Chen et al., Reference Chen, Li, Tao, Barnett, Su, Rudin, Wallach, Larochelle, Beygelzimer, dAlché-Buc, Fox and Garnett2019; Varando et al., Reference Varando, Fernández-Torres and Camps-Valls2021; Yang et al., Reference Yang, Liu, Chen, Shen, Hao and Wang2021; Gautam et al., Reference Gautam, Boubekki, Hansen, Salahuddin, Jenssen, Höhne, Kampffmeyer, Koyejo, Mohamed, Agarwal, Belgrave, Cho and Oh2022). A generative DL architecture like the variational autoencoder (VAE) is one way to learn a lower-dimensional explainable data representation. VAEs learn a lower-dimensional representation (latent variables) of a very high-dimensional dataset by minimizing a reconstruction error and constraining the latent distribution.

In this study, we develop a VAE-based methodology to generate lower-dimensional representations of complex and correlated weather conditions. We apply and evaluate its performance on simulated data as a proof-of-concept, making use of long simulations of multiple forest types driven by synthetic weather data from a weather generator. By using modeled data, we are able to compare our results to the known processes of the model and thereby validate our novel, data-driven approach. With the help of the developed method, we can search for representations (weather prototypes) that are associated with high biomass loss events and thereby identify compound weather drivers of tree mortality. We also explore the relationship between compounding weather prototypes and biomass loss. Our developed methodology could, in future, be applied to real-world data (such as inventory datasets from national forest inventories), for which no ground truth is typically available for method validation.

## 2. Data and methods

An overview of the methodology is shown in Figure 1. We use simulated weather from a weather generator to run a parameterized forest model for the Hainich beech forest in Germany at 51.08° N and 10.43° E. We use composites and logistic regression analysis to obtain a general overview of the weather and biomass loss relationships. Next, a VAE is used to learn a low-dimensional representation of weather conditions. These representations are then used to identify weather prototypes associated with high levels of biomass loss based on the forest simulation for three different types of forests, namely beech, pine, and spruce.

### 2.1. Weather simulation

AWE-GEN is an advanced stochastic weather generator that can generate realistic high-resolution (hourly) weather variables (Fatichi et al., Reference Fatichi, Ivanov and Caporali2011). It can reproduce low- and high-frequency characteristics of weather variables, such as the interannual variability of precipitation and temperature, and has been widely used in hydrology and ecological modeling (Mastrotheodoros et al., Reference Mastrotheodoros, Pappas, Molnar, Burlando, Manoli, Parajka, Rigon, Szeles, Bottazzi, Hadjidoukas and Fatichi2020; Meili et al., Reference Meili, Manoli, Burlando, Bou-Zeid, Chow, Coutts, Daly, Nice, Roth, Tapper, Velasco, Vivoni and Fatichi2020; Hirschberg et al., Reference Hirschberg, Fatichi, Bennett, McArdell, Peleg, Lane, Schlunegger and Molnar2021; Marcolongo et al., Reference Marcolongo, Vladymyrov, Lienert, Peleg, Haug and Zscheischler2022). For this study, AWE-GEN is calibrated on 40 years (1979–2019) of bias-adjusted ERA5 reanalysis data designed explicitly for impact studies (Cucchi et al., Reference Cucchi, Weedon, Amici, Bellouin, Lange, Müller Schmied, Hersbach and Buontempo2020) for the location of the Hainich forest. AWE-GEN requires precipitation, cloud cover, air temperature, shortwave radiation, relative humidity, wind speed, and atmospheric pressure at hourly time step for calibration. For the cloud cover, a single level of hourly ERA5 data is used (Hersbach et al., Reference Hersbach, Bell, Berrisford, Hirahara, Horányi, Muñoz-Sabater, Nicolas, Peubey, Radu, Schepers, Simmons, Soci, Abdalla, Abellan, Balsamo, Bechtold, Biavati, Bidlot, Bonavita, De Chiara, Dahlgren, Dee, Diamantakis, Dragani, Flemming, Forbes, Fuentes, Geer, Haimberger, Healy, Hogan, Hólm, Janisková, Keeley, Laloyaux, Lopez, Lupu, Radnoti, de Rosnay, Rozum, Vamborg, Villaume and Thépaut2020). We compute relative humidity and specific humidity from pressure and temperature. Ultimately, 200,000 years of hourly stationary weather is generated after calibration.

The simulation’s mean annual temperature and precipitation are 8.5 C and 693 mm, respectively. The monthly seasonal cycle of temperature and precipitation is well captured by AWE-GEN, with slightly too warm temperatures in the weather generator (Figure 2a). Reanalysis data show strong negative correlations between monthly radiation and precipitation throughout the year and positive correlations between radiation and temperature during the summer (Figure 2b). The weather generator generally captures these correlations, albeit with weaker magnitude (Figure 2c).

### 2.2. Simulation of forest dynamics

We simulate forest dynamics with the forest gap model FORMIND, a process-based, individual-oriented, grid-based forest growth model (Fischer et al., Reference Fischer, Ensslin, Rutten, Fischer, Costa, Kleyer, Hemp, Paulick and Huth2015, Reference Fischer, Bohn, Dantas de Paula, Dislich, Groeneveld, Gutiérrez, Kazmierczak, Knapp, Lehmann, Paulick, Pütz, Rödig, Taubert, Köhler and Huth2016). It describes forest succession in small-scale forest patches (20 m × 20 m). The main processes considered are tree growth, mortality, recruitment, and competition; the trees within a forest patch compete for space, light, and water.

Carbon balance-based ecophysiological processes, such as photosynthesis, respiration, and carbon allocation, are used to calculate individual tree growth. Photosynthesis is related to temperature with a bell-shaped curve, whereas water shortage linearly reduces photosynthesis. In addition, deciduous species only sequester CO_{2} during the temperature-dependent growing season. Maintenance respiration is also modified by temperature following the well-established
$ {Q}_{10} $
approach (full model description in Bohn et al., Reference Bohn, Frank and Huth2014).

In the FORMIND model, mortality processes occur due to competition for space and reduced growth rate. Space–competition mortality occurs under optimal growing conditions when crowns occupy more space than is available. The model then randomly removes trees until the remaining crowns have sufficient space. In addition, temperature also affects Net Primary Productivity (NPP) through maintenance respiration. NPP is used to calculate the diameter increment of each tree. The increment in diameter affects the biomass loss as there is stochastic diameter increment-dependent mortality included in FORMIND (Bohn et al., Reference Bohn, Frank and Huth2014, www.formind.org). FORMIND has been frequently used to simulate temperate forest dynamics (Bohn et al., Reference Bohn, Frank and Huth2014; Rödig et al., Reference Rödig, Huth, Bohn, Rebmann and Cuntz2017; Bohn et al., Reference Bohn, May and Huth2018; Bugmann et al., Reference Bugmann, Seidl, Hartig, Bohn, Brůna, Cailleret, François, Heinke, Henrot, Hickler, Hülsmann, Huth, Jacquemin, Kollas, Lasch-Born, Lexer, Merganič, Merganičová, Mette, Miranda, Nadal-Sala, Rammer, Rammig, Reineking, Roedig, Sabaté, Steinkamp, Suckow, Vacchiano, Wild, Xu and Reyer2019; Holtmann et al., Reference Holtmann, Huth, Pohl, Rebmann and Fischer2021).

Here, we conduct three simulations, each time simulating only beech, pine, or spruce trees. Beech trees are late-successional, shade-tolerant, and deciduous, whereas pine is a fast-growing, light-demanding, evergreen needle-leaf tree. Spruce is a moderately fast-growing, shade-tolerant, evergreen, needle-leaf tree. The 200,000 years of AWE-GEN-generated precipitation and temperature data, together with solar irradiance (computed as a fraction of shortwave radiation), are used as an input to FORMIND. The specific parameters of the model are derived from various sources. For instance, we use yield tables to fit allometric and growth functions. Other parameters are obtained by taking the average over several field measurements, like wood density or leaf area index. For more details, see Bohn et al. (Reference Bohn, Frank and Huth2014). The simulations are conducted for 200 hectares of land to reduce the internal stochasticity of the model. The forest model starts simulating the growth of trees from barren land and thus requires a spin-up period of roughly 2,000 years to reach a quasi-equilibrium. The simulations are conducted in chunks of 10,000 years for computational efficiency. Therefore, we do not analyze the first 2,000 years for each of these parts and are finally left with 160,000 (8,000 × 20) years of forest data. Only the 160,000 years of weather is then used for further analysis. The simulations, even though potentially strongly differing from real-world behavior, have enough complexity to validate our proof-of-concept.

FORMIND simulates multiple variables at a daily and yearly time scale. For this analysis, we use biomass loss (BL) on a yearly scale. Instead of the fraction of trees dying yearly, BL refers to the fraction of biomass lost yearly. For example, the BL may be low when some young (therefore small) trees die but high when the same number of old (therefore giant) trees die. BL combines different types of mortality, namely background mortality, which depends on the genus; size-dependent mortality, which accounts for the mortality of young trees; and diameter increment-dependent mortality, which accounts for the fact that older trees have a higher mortality rate. We also consider five forest structure variables at a yearly time scale: age, stem volume, cumulative leaf area index, height, and diameter of each tree. The information on the state variables is condensed as histograms with 51 bins.

As this study focuses on extreme biomass loss, we convert BL to a binary value by thresholding at the 90th percentile. To understand extreme BL ( $ {y}_t $ ), the weather conditions in the current year and the two previous years ( $ {\mathbf{x}}_{t-2}^d,{\mathbf{x}}_{t-1}^d,{\mathbf{x}}_t^d $ ) are considered in the analysis. By definition, $ {\mathbf{x}}_t^d $ has a dimension of $ 12\times 3 $ for 12 months and three variables (temperature, precipitation, and radiation). The forest structure variables at the end of the third year before the considered BL year ( $ {\mathbf{x}}_{t-3}^s $ ) are also used in some of our analyses. $ {\mathbf{x}}_t^s $ thus refers to the five forest structure variables described by 51 binned distributions each. When the subscript is not specified, $ {\mathbf{x}}^d $ ( $ 36\times 3 $ ) refers to $ \left[{\mathbf{x}}_{t-2}^d,{\mathbf{x}}_{t-1}^d,{\mathbf{x}}_t^d\right] $ , $ {\mathbf{x}}^s $ ( $ 51\times 5 $ ) refers to $ {\mathbf{x}}_{t-3}^s $ and $ y $ ( $ 1\times 1 $ ) refers to $ {y}_t $ .

### 2.3. Composites and logistic regression

Composites are the difference of the mean weather for years with extremely high BL with respect to the mean weather conditions for all the years. We first construct three years of extreme BL weather composites (BL $ \ge $ 90th percentile) to identify general weather conditions associated with high BL. We also train a logistic regression model, which can be used to identify meteorological drivers of large impacts (Bevacqua et al., Reference Bevacqua, De Michele, Manning, Couasnon, Ribeiro, Ramos, Vignotto, Bastos, Blesić, Durante, Hillier, Oliveira, Pinto, Ragno, Rivoire, Saunders, van der Wiel, Wu, Zhang and Zscheischler2021; Vogel et al., Reference Vogel, Rivoire, Deidda, Rahimi, Sauter, Tschumi, van der Wiel, Zhang and Zscheischler2021):

where $ \mathrm{\mathbb{P}}\left( BL\ge 90 th\; percentile\right) $ is the probability of extremely high BL and $ \mathbf{x}=\left[ vectorize\left({\mathbf{x}}^d\right)\right] $ or $ \mathbf{x}=\left[ vectorize\left({\mathbf{x}}^d\right), vectorize\left({\mathbf{x}}^s\right)\right] $ . The logistic regression model is used to understand the relationship between three years of monthly weather ( $ {\mathbf{x}}^d $ ) and BL $ y $ at the end of the third year. In addition to dynamic weather conditions ( $ {\mathbf{x}}^d $ ), the BL also depends on the forest structure. Therefore, we also train a logistic regression model with the forest structure variables ( $ {\mathbf{x}}^s $ ) to quantify their importance. We assess the goodness of fit of the regression models using the Critical Success Index (CSI, Donaldson et al., Reference Donaldson, Dyer and Kraus1975), $ {F}_1 $ -Score and Average Precision (with definitions provided in Appendix B).

### 2.4. Variational autoencoder as a generative model

A variational autoencoder (VAE) is a form of DL model that can extract nonlinear feature representations from high-dimensional data, which has been widely used in Earth sciences, weather applications and remote sensing (Camps-Valls et al., Reference Camps-Valls, Tuia, Zhu and Reichstein2021; Tibau et al., Reference Tibau, Reimers, Requena-Mesa and Runge2021). A VAE is a generative model that can learn low-dimensional representations of complex data in an unsupervised way. It comprises an encoder, with the parameters $ \phi $ , that estimates the latent representation and a decoder part, with parameters $ \theta $ , that reconstructs the input samples from the latent dimensions. Overall, the aim is to estimate the log-likelihood of some observed data $ \mathbf{x} $ .

This can be done by using variational Bayesian inference. We are interested in the posterior distribution $ p\left(\mathbf{z}|\mathbf{x}\right) $ , where $ \mathbf{x} $ are our observations and $ \mathbf{z} $ are latent variables. However, this posterior is often intractable because we cannot compute the evidence or denominator of Bayes’ theorem, $ p\left(\mathbf{x}\right) $ , because the latent variables, $ \mathbf{z} $ , must be marginalized out. The central concept of variational inference (VI) is to use optimization to find a more tractable distribution $ q\left(\mathbf{z}\right) $ from a family of distributions $ Q $ such that it is close to the desired posterior distribution $ p\left(\mathbf{z}|\mathbf{x}\right) $ . In VI, proximity is defined using the Kullback–Leibler Divergence ( $ {D}_{\mathrm{KL}}\left[\Big\Vert \right] $ ), so the goal is to retrieve an optimal $ q\left(\mathbf{z}\right) $ that minimizes $ {D}_{\mathrm{KL}}\left[q\left(\mathbf{z}\right)\parallel p\left(\mathbf{z}|\mathbf{x}\right)\right] $ , which can be interpreted as minimizing the relative entropy between the two distributions. Mathematically, this is equivalent to maximizing the evidence lower bound (ELBO) (also sometimes called the variational lower bound or the negative variational free energy) (Kingma and Welling, Reference Kingma and Welling2013):

where $ \mathbf{x} $ and $ \mathbf{z} $ represent the input data and latent variables respectively. The ELBO has two components. The KL-Divergence term enforces how close the latent dimensions $ \mathbf{z} $ distribution should be to a specified prior distribution $ {p}_{\theta}\left(\mathbf{z}\right) $ , while the other term aims to minimize reconstruction error, $ \log {p}_{\theta}\left(\mathbf{x}|\mathbf{z}\right) $ . In the case of $ \beta $ -VAE, the hyperparameter $ \beta $ weighs the KL-Divergence term of the loss function:

Therefore, a value of $ \beta >1 $ pushes the model to learn a latent distribution closer to the specified distribution at the cost of a higher reconstruction error (Higgins et al., Reference Higgins, Matthey, Pal, Burgess, Glorot, Botvinick, Mohamed and Lerchner2017) and vice versa for $ \beta <1 $ . Different latent distributions can be learned through VAE (Bond-Taylor et al., Reference Bond-Taylor, Leach, Long and Willcocks2022), and here we choose a normally distributed prior with a diagonal covariance matrix with zero mean and unit variance. This particular choice of prior helps compute the KL-Divergence analytically. The input data ( $ \mathbf{x}={\mathbf{x}}^d $ ) consists of three weather variables, each a monthly time series of three years (Section 2.2).

We have modified the standard $ \beta $ -VAE by making a shared decoder for each weather variable (Figure 3). This is because each weather variable is a time series, and we expect the decoder to learn the general features of time series data. The encoder outputs two 32-dimensional vectors, one for the mean and the other for the log variance. Then, 32-dimensional latent values (following a multivariate normal distribution) can be sampled from these parameters. The decoder always takes in 16-dimensional latent variables to reconstruct one of the weather variables. The decoder takes eight weather-specific and eight shared dimensions and returns one reconstructed weather variable. This is done for each weather variable, and to obtain the full reconstructed output, three passes of the decoder are required. The eight shared dimensions capture the correlation between the weather variables (Figure 2b). As a design choice, the first eight dimensions are selected to represent radiation, the next eight for precipitation, the next eight for temperature, and the last eight are shared between all variables.

The encoders and decoders can have different architectures. In the encoder, a convolution layer is used to learn the different patterns in the data, increasing the number of dimensions, followed by max-pooling (Scherer et al., Reference Scherer, Muller, Behnke, Diamantaras, Duch and Iliadis2010), which reduces the number of dimensions. Batch normalization (Ioffe and Szegedy, Reference Ioffe and Szegedy2015) is used, followed by LeakyReLU (Maas et al., Reference Maas, Hannun and Ng2013) as the nonlinear function. This step is repeated twice for 256 and 512 feature kernels. After reshaping the output, two dense layers generate the mean and the log variance vector. Samples from the latent dimensions are then fed to the decoder. First, a dense layer increases the dimensions to reshape it into $ 9\times 1\times 32 $ . This is followed by upsampling, transposed convolution (Zeiler et al., Reference Zeiler, Krishnan, Taylor and Fergus2010), and batch normalization with Leaky-ReLU as an activation function. Like in the encoder, this is repeated twice, leading to an output of dimensions $ 36\times 1\times 1 $ . Finally, a convolutional layer is used without any activation function to get the output (see Figure 3 for more details).

The entire network is trained through backpropagation (Rumelhart et al., Reference Rumelhart, Hinton and Williams1986). The inputs are normalized based on the statistics of the training set. Adam optimizer is used with a learning rate of 0.0001, the upsampling and downsampling size is $ \left(2,1\right) $ , and the kernel size is $ \left(3,1\right) $ with stride as 1. LeakyReLU is used with the value $ \alpha =0.3 $ . The train, validation, and test split is 80%, 10%, and 10%, respectively. The total sample size available for training is 126,976. The model is trained for 150 epochs with a $ \beta =1.5 $ (cf. Equation 1), and the batch size is 256. The computation was done on NVIDIA A100 GPU using TensorFlow v2.9 (TensorFlow Developer (2022)). The model training time is less than 20 minutes.

### 2.5. Identifying climate prototypes associated with extremely high BL

To find the latent dimensions that are associated with extremely high BL, for each latent dimension, we compute the mean over samples that are associated with high BL and the mean over the remaining samples. We consider latent dimensions relevant for increasing BL when the difference between the two sample means exceeds half of the standard deviation (0.5).

We create weather prototypes associated with high BL based on the identification step above. For each identified latent dimension of interest, we change its value by a fixed amount
$ s $
in the direction that increases the BL while setting all other dimensions to 0. If the absolute magnitude of the correlation between two dimensions is greater than 0.3. In that case, we also change the value of the correlated dimension, assuming a bivariate Gaussian distribution with a covariance that results in this correlation. Finally, we use the decoder to generate the climate prototype
$ {P}_{i\mid s} $
for a given change
$ s $
in dimensions
$ i $
of the latent space. Compound climate prototypes are generated similarly by selecting the two latent dimensions of interest and changing both in the direction that increases BL, thus generating prototypes
$ {P}_{i\wedge j} $
. We generate univariate prototypes with
$ s=2\sigma $
and compound prototypes with
$ s=2\sigma $
/number of dimensions, denoted by
$ {P}_{i\mid 2\sigma } $
and
$ {P}_{i\wedge j\mid \sigma } $
_{,} respectively.

We further test whether samples corresponding to the prototypes are associated with generally higher levels of BL. To this end, we assign weather conditions in the original input space to a univariate prototype if, in the latent space, the value of the corresponding latent dimension exceeds $ s $ , i.e., two standard deviations. Similarly, a sample corresponds to a compound prototype if, in latent space, values in the two corresponding latent dimensions concurrently exceed $ s $ , i.e., one standard deviation.

## 3. Results

### 3.1. Composites and logistic regression

The median BL is 1.8%, 2.4%, and 3.3% per year for beech, pine, and spruce, respectively (Figure 4). The BL distribution is highly right-skewed, with a few years of experiencing extremely high BL values. Excess kurtosis for BL values is 24.8, 17.4, and 8.4 for beech, pine, and spruce forests. Generally, a lower BL is associated with lower radiation and temperature averaged over the preceding three years but higher precipitation, and vice versa for higher BL.

When we increase the temporal resolution of the composites to monthly values, we find that extremely high BL is associated with dry conditions in all forests, especially over the previous 18 months (Figure 5a,c,e). Another prominent feature is the strong positive temperature anomaly from May to September of the BL year. For beech, composites show no temperature anomalies in October $ {}_t $ but positive anomalies in November $ {}_t $ and December $ {}_t $ . For pine and spruce, the signal is reversed for these three months; anomalies are strongly positive in October $ {}_t $ but negative in November $ {}_t $ and December $ {}_t $ . Radiation has an overall weaker signal that is negatively correlated with precipitation.

Predicting BL based on monthly weather predictors reaches average precision values between 0.64 (beech) and 0.70 (pine) when no structure variables are included (Table 1). Including structure variables as predictors increases accuracy substantially, reaching values between 0.76 and 0.85. Generally, the performance of the linear model is similar for pine and spruce forests and higher than for the beech forest. The coefficients of the logistic regression (without structure variables) generally mirror the behavior of the composites (Figure 5b,d,e). Coefficients associated with precipitation for high BL years are negative, increasing in magnitude until May $ {}_{t-1} $ and staying constant until July $ {}_t $ , finally decreasing until the end of the time series. The coefficients for temperature are relatively small, except between April $ {}_t $ and September $ {}_t $ . The coefficients associated with radiation are close to zero most of the time. The logistic regression coefficients and composite plots show similar trends in all the variables except for radiation in year $ {}_t $ .

### 3.2. Weather representations via $ \beta $ -VAE

Initial tests with vanilla $ \beta $ -VAE with similar layers showed better reconstruction for the correlated variables (precipitation and radiation) under-representing temperature. The modified $ \beta $ -VAE achieves similar reconstruction errors for all three weather variables (Table 2) by introducing a shared decoder (Section 2.4). Losses and mean standard errors are similar between training, validation, and test datasets, suggesting that the model does not overfit.

We explore the latent dimensions $ \left({z}_i\right) $ of the trained $ \beta $ -VAE. Each three-year input sample is associated with one point in the 32-dimensional latent space. We test whether certain dimensions of this space are associated with weather conditions that lead to higher BL. Generally, we find high variability in the shift in distribution between the samples associated with extremely high BL and the remaining samples across the 32 dimensions (Figure 6, left). For all forests, $ {z}_{14} $ exhibits the largest absolute difference in the means (Figure 6, right) followed by $ {z}_{22} $ and $ {z}_{12} $ . Here, $ {z}_{12} $ shows a higher absolute mean difference than $ {z}_{14} $ for beech and vice versa for pine and spruce. All three dimensions show significant ( $ p<0.01 $ ) mean absolute differences between years with extreme and nonextreme BL $ >0.5 $ for all forests and are selected to construct the prototypes of weather patterns associated with the high BL in the following. We also test whether extreme BL is predictable when only relying on these three dimensions using a logistic regression model. Prediction accuracy is much lower than when monthly weather predictors are used, reaching average precision between 0.30 and 0.34, but is substantially higher than random (compare Table 3 with Table 1). When relying only on these three dimensions, beech and pine BL are better explained by the latent dimensions than spruce BL.

Our VAE architecture allows us to generate precipitation, temperature, and radiation independently from a sample in the latent space. Still, correlations exist between weather variables in the input space (Section 2.1). These correlations propagate to correlations between latent dimensions. We observe that the latent dimensions responsible solely for radiation ( $ {z}_{1:8} $ ) are negatively correlated (correlation $ <-0.3 $ ) with the latent dimensions responsible solely for precipitation ( $ {z}_{9:16} $ ), respectively (Figure A1 in Appendix A). All other dimensions are not correlated.

### 3.3. Weather prototypes and link to BL

The three selected dimensions $ {z}_{14} $ , $ {z}_{12} $ and $ {z}_{22} $ are used to generate weather prototypes $ {P}_{14\mid 2\sigma } $ , $ {P}_{12\mid 2\sigma } $ , and $ {P}_{22\mid 2\sigma } $ . By construction, $ {z}_{14} $ and $ {z}_{12} $ only affect the reconstruction of precipitation (Section 2.4). However, because of the correlation between $ {z}_{14} $ and $ {z}_6 $ , as well as $ {z}_{12} $ and $ {z}_4 $ (Figure A1), changing $ {z}_{14} $ and $ {z}_{12} $ also requires changing the values of $ {z}_6 $ and $ {z}_4 $ to obtain realistic prototypes. The prototype based on $ {z}_{22} $ is generated by only changing $ {z}_{22} $ .

Prototypes generated from the selected latent dimensions represent basic weather conditions associated with increased BL (Figure 7). Prototype $ {P}_{14\mid 2\sigma } $ and $ {P}_{12\mid 2\sigma } $ , by design, only show changes in precipitation and radiation. Both prototypes feature negative precipitation anomalies and positive radiation anomalies that are correlated. $ {P}_{14\mid 2\sigma } $ is characterized by negative precipitation anomalies in all three summers, coincident with higher radiation. Precipitation is also lower in the autumn of the second year. The lowest precipitation values occur in the year $ {}_t $ followed by the year $ {}_{t-2} $ and the year $ {}_{t-1} $ . $ {P}_{12\mid 2\sigma } $ has a different pattern and is characterized by two periods of negative precipitation anomalies. The first one occurs between March $ {}_{t-2} $ and July $ {}_{t-1} $ and is less pronounced than the negative anomaly in the second period, which occurs in the first five months of the last year. We only see slightly positive precipitation anomalies in the last half of year $ {}_t $ . In both prototypes, maximum precipitation anomalies are around 0.5 mm/d. Prototype $ {P}_{22\mid 2\sigma } $ only features temperature variations, with positive temperature anomalies in three periods. The first period starts from March $ {}_{t-2} $ to August $ {}_{t-2} $ , the second from the March $ {}_{t-1} $ and continues for the entire year. The last high-temperature period starts from March $ {}_t $ and ends in September $ {}_t $ . There is a short period at the beginning of the last year where we see negative temperature anomalies. Based on their anomaly patterns, we call prototypes $ {P}_{14\mid 2\sigma } $ Dry Summers, $ {P}_{12\mid 2\sigma } $ Dry Winters and $ {P}_{22\mid 2\sigma } $ Hot Summers.

We generate two compound prototypes based on the selected three mortality-influencing latent dimensions, namely $ {P}_{12\wedge 22\mid \sigma } $ and $ {P}_{14\wedge 22\mid \sigma } $ . While the univariate prototypes are created by moving two standard deviations along a single latent dimension in the direction of increasing BL, compound prototypes are created by moving one standard deviation in two latent dimensions (Section 2.5). $ {P}_{14\wedge 22\mid \sigma } $ corresponds to dry and hot summers, while $ {P}_{12\wedge 22\mid \sigma } $ corresponds to dry winters and hot summers. Weather conditions associated with compound prototypes occur more frequently in the training dataset (2.5–3.4% of all samples) than those associated with univariate prototypes (1.7–2.2%).

Weather conditions that are associated with any of the prototypes (Section 2.5) are generally associated with a higher BL in all forest types when compared to all weather conditions (Figure 8). The effect is particularly strong in the tail of the distribution; the 90th percentile in the BL distribution is about twice as large for the prototypes compared to all samples (Table 4). From the univariate prototypes, $ {P}_{14\mid 2\sigma } $ (Dry Summers) is associated with the highest increase in both median and 90th-percentile BL across all forest types (10.4–21.5% and 29.8–107.3%, respectively). Weather conditions associated with the compound prototypes $ {P}_{14\wedge 22\mid \sigma } $ and $ {P}_{12\wedge 22\mid \sigma } $ are generally associated with higher BL compared to weather conditions associated with any of the univariate prototypes, with $ {P}_{14\wedge 22\mid \sigma } $ (Dry and Hot Summers) leading to slightly higher median BL (9.7–24.0%) compared to $ {P}_{12\wedge 22\mid \sigma } $ (dry winters and hot summers, 10.8–24.6%).

## 4. Discussion

This study illustrates how novel VAE-based architectures can learn weather prototypes associated with high BL in a simulated forest setting based on the weather for a given time period. Working with simulated data allows us to avoid the complexities of real-world data arising from nonstationarity and confounders’ influence. Applicability of these methods beyond stationary weather requires careful validation and potential modifications to the developed methodology, which is beyond the scope of this paper. Lastly, the reconstruction errors, though small and similar for all the weather variables, impact generating realistic weather prototypes. Very high reconstruction errors would lead to reconstructions different from the input. Within these limits, we test the applicability of our approach. The method could further be adapted for use with observational data in future research.

Here, we learn the weather representations in a completely unsupervised setting and then try to find the latent dimensions associated with high BL. Our generative model can produce realistic weather prototypes. Of the three univariate prototypes generated, dry summers and dry winters feature negative precipitation anomalies and positive radiation anomalies, while hot summers feature positive temperature anomalies in summers. All identified weather prototypes are associated with higher BL overall, particularly in the tail of the BL distribution. Dry Summers lead to the highest increase in BL within the univariate prototypes. However, when combined, dry and hot summers lead to the highest increase in BL overall. Gazol and Camarero (Reference Gazol and Camarero2022) also found that European forest mortality frequently cooccurs with dry and hot summers. Vegetation models and Earth system models also generally support this finding (Zscheischler et al., Reference Zscheischler, Michalak, Schwalm, Mahecha, Huntzinger, Reichstein, Berthier, Ciais, Cook, El-Masri, Huang, Ito, Jain, King, Lei, Lu, Mao, Peng, Poulter, Ricciuto, Shi, Tao, Tian, Viovy, Wang, Wei, Yang and Zeng2014a, Reference Zscheischler, Reichstein, von Buttlar, Mu, Randerson and Mahecha2014b; Bastos et al., Reference Bastos, Orth, Reichstein, Ciais, Viovy, Zaehle, Anthoni, Arneth, Gentine, Joetzjer, Lienert, Loughran, McGuire, Pongratz and Sitch2021; Tschumi et al., Reference Tschumi, Lienert, van der Wiel, Joos and Zscheischler2022, Reference Tschumi, Lienert, Bastos, Ciais, Gregor, Joos, Knauer, Papastefanou, Rammig, van der Wiel, Williams, Xu, Zaehle and Zscheischler2023). Our analysis shows that compound weather prototypes lead to higher BL than univariate weather prototypes despite occurring more frequently. This is consistent with real-world observations, where concurrent hot and dry conditions have often been identified as primary drivers of forest mortality (Allen et al., Reference Allen, Macalady, Chenchouni, Bachelet, McDowell, Vennetier, Kitzberger, Rigling, Breshears, Hogg, Gonzalez, Fensham, Zhang, Castro, Demidova, Lim, Allard, Running, Semerci and Cobb2010; Anderegg et al., Reference Anderegg, Kane and Anderegg2013; Crouchet et al., Reference Crouchet, Jensen, Schwartz and Schwinning2019; Hammond et al., Reference Hammond, Williams, Abatzoglou, Adams, Klein, López, Sáenz-Romero, Hartmann, Breshears and Allen2022; Yi et al., Reference Yi, Hendrey, Niu, McDowell and Allen2022).

Some weather variables used are correlated (Figure 2), so we modify the traditional $ \beta $ -VAE architecture to work well with correlated time series. In particular, we design our encoder to take multivariate, high-dimensional weather data and learn three sets of eight weather-specific latent dimensions, each corresponding to one of precipitation, radiation, and temperature. The final eight latent dimensions contain information about all three weather variables. The decoder takes 16 (eight weather-specific and eight shared) latent dimensions and outputs one of the weather variables. The rationale is that our decoder only learns the structure of the weather time series, and the latent representation has all the information about the time series. While working with correlated weather variables, our architecture helps us learn unbiased weather representations (Table 2). In earlier tests with the standard $ \beta $ -VAE architecture, we found that the two more correlated weather variables (precipitation and radiation) were better represented than the less correlated ones (temperature). Though we expected all the correlations between weather variables to be captured in the eight shared latent dimensions, this was not the case. We find a one-to-one correlation in all latent dimensions related to solar radiation and precipitation ( $ {z}_{1:8} $ and $ {z}_{9:16} $ , Figure A1). The correlations in latent dimensions highlight that radiation and precipitation are correlated for most months of the year. Our method respects this correlation when generating prototypes (Section 2.5). Autoencoders have been used to learn representations of weather variables (Guevara et al., Reference Guevara, Borges, Watson and Zadrozny2021; Heinze-Deml et al., Reference Heinze-Deml, Sippel, Pendergrass, Lehner and Meinshausen2021; Oliveira et al., Reference Oliveira, Diaz, Zadrozny, Watson and Zhu2022; Ahn et al., Reference Ahn, Lee, Ko, Kim, Han and Seok2023). However, so far, most work focuses on learning spatial representations of weather variables or univariate spatiotemporal representations instead of a multivariate temporal latent representation of variables. Huamin et al. (Reference Huamin, Qiuqun and Shanzhu2020) try to learn time series representation, but only after transforming the time series to a recurrence plot, learning the latent representation of 2-D images for a univariate time series. The method we develop in this study can learn representations of multivariate time series and separate the representations specific to weather variables while considering the correlation between the variables.

The latent variables can capture the patterns in the weather variables, but they have no inherent information about forest structure or BL. Thus, they do not learn representations specific to high BL or given forest structures. In unsupervised settings, VAEs cannot learn the disentangled representation without inductive biases (Locatello et al., Reference Locatello, Bauer, Lucic, Rätsch, Gelly, Schölkopf, Bachem, Chaudhuri and Salakhutdinov2019). Including BL and forest structure information would introduce an inductive bias and possibly improve representation learning. In contrast, by not including forest structure, we identify weather prototypes leading to higher overall BL rather than conditioned to specific forest structures. The use of $ \beta $ -VAE may also be a limitation, as the reconstruction error is higher than that of standard VAE for $ \beta >1 $ (in Equation 1), possibly due to a trade-off of reconstruction error against disentanglement aligned with human intuition (Burgess et al., Reference Burgess, Higgins, Pal, Matthey, Watters, Desjardins and Lerchner2018). Fil et al. (Reference Fil, Mesinovic, Morris and Wildberger2021) show how $ \beta >1 $ does not always yield the best qualitative disentanglement for complex datasets. There is no clear consensus on the correct choice of $ \beta $ , and one must subjectively balance the trade-off between disentanglement and reconstruction error.

Compared to simple composite analysis and logistic regression, the VAE allows us to understand better the diverse weather patterns associated with high BL. In particular, we identify three weather prototypes and their possible combinations, which would not be possible using simple analysis. Nevertheless, the composite plots and logistic regression coefficients also give interesting insights. The composites of anomalies (Figure 5) show drier weather on average before BL years alongside higher temperatures in the last year. Unlike pine and spruce, we see negative temperature anomalies for beech forests in October, which rise in November and December of the previous year. This may be explained by the difference in vegetation period between the forest types. In the FORMIND beech forest model, the dynamic vegetation period ends when the ten-day mean air temperature falls below 9 °C. Therefore, a negative temperature in October would lead to an early end of the vegetation period. Higher temperatures after this may lead to high BL by increasing maintenance respiration and inducing plant water stress. For needle-leaf forests like pine and spruce, the simulated vegetation period is year-round, so the effect of lower temperature leading to higher BL dominates. In general, temperature has both positive and negative effects on plant productivity. Increasing temperature has been reported to increase phenology-driven carbon uptake for temperate forests (Keenan et al., Reference Keenan, Gray, Friedl, Toomey, Bohrer, Hollinger, Munger, O’Keefe, Schmid, Wing, Yang and Richardson2014), and the higher the carbon uptake, the healthier the forest. At the same time, high temperatures increase maintenance respiration for plants and can increase the atmosphere’s evaporative demand (Millar and Stephenson, Reference Millar and Stephenson2015), leading to water stress.

Using a DL model that can capture high-dimensional relationships enables us to study the relationship between BL and the preceding years of weather in more detail. There is strong evidence that it is essential to consider multiple years of weather when analyzing tree mortality in the real world. For instance, an assessment of many different forest mortality events across the globe showed that those are typically associated with negative precipitation anomalies two years before the mortality year (Hammond et al., Reference Hammond, Williams, Abatzoglou, Adams, Klein, López, Sáenz-Romero, Hartmann, Breshears and Allen2022). In response to the Europe-wide 2003 drought, mortality peaked two years after the driest year (Senf et al., Reference Senf, Buras, Zang, Rammig and Seidl2020; George et al., Reference George, Neumann, Vogt, Cammalleri and Lang2021). On the other hand, the drought in 2018 led to peak forest mortality in the same year (George et al., Reference George, Neumann, Vogt, Cammalleri and Lang2021). We chose the weather of the previous two years along with the current year to analyze the BL in the current year. Accounting for three years of weather emerged as a good trade-off between model complexity and accuracy in our model simulations.

Adding information about the forest structure ( $ {\mathbf{x}}^s $ ) alongside the weather variables ( $ {\mathbf{x}}^d $ ) increases the predictive accuracy for all forest types (Table 1). It is well known that forest structures play an essential role in forest mortality (van Mantgem et al., Reference van Mantgem, Stephenson, Byrne, Daniels, Franklin, Fulé, Harmon, Larson, Smith, Taylor and Veblen2009; Lines et al., Reference Lines, Coomes and Purves2010; Young et al., Reference Young, Stevens, Earles, Moore, Ellis, Jirka and Latimer2017; Restaino et al., Reference Restaino, Young, Estes, Gross, Wuenschel, Meyer and Safford2019). Our VAE does not use any information about the forest structure or forest mortality in the current setting, which may limit its usability. Adding this information to the input or the latent space might lead to a better representation learning for characterizing the extreme BL years.

In our study, we run FORMIND for 200 ha, for which we then see a somewhat deterministic response of forest dynamics to weather conditions. For smaller areas, stochasticity dominates forest mortality (Eid and Øyen, Reference Eid and Øyen2003; Vanderwel et al., Reference Vanderwel, Coomes and Purves2013). It is important to highlight that in this study, each simulated forest contains only one variety of trees. Most forests are more diverse and include multiple plant functional types. Thus, our interpretations of weather drivers apply to extensive homogeneous forests with a particular kind of tree.

## 5. Conclusion

In this study, we develop an interpretable machine learning-based method to identify compound weather drivers of BL in forests using generative DL on simulated forest dynamics. The proposed method is designed to work well with correlated time series variables, which are often present in climate-related data. In our application, where we use simulations of beech, spruce, and pine forests in a temperate climate, we find that sequential hot summers, sequential dry summers, and dry winters are the weather patterns chiefly associated with high BL. Furthermore, we demonstrate that combinations of these weather patterns are associated with even higher BL. The approach thus enables us to generate hypotheses for compound drivers of extreme impacts, which can then be verified by using, for instance, real-world observations. In principle, the method is able to generate new realistic weather conditions associated with high BL. Overall, the method is a step toward an improved understanding of the weather drivers of significant impacts by using deep generative models.

## Acknowledgments

We thank Nadav Peleg for the help with setting up the weather generator (AWE-GEN). We also thank Miguel Mahecha for the fruitful discussions on the manuscript.

## Author contribution

Conceptualization: M.A., J.Z.; Data curation: M.A.; Data visualization: M.A.; Methodology: M.A., J.Z.; Writing original draft: M.A.; All authors contributed to Writing – Review & Editing and approve the final submitted draft.

## Competing interest

The authors declare none.

## Data availability statement

The AWE-GEN and FORMIND simulations are available at zenodo https://doi.org/10.5281/zenodo.7708544. Deep Learning architecture and weights are available at https://github.com/AIM4ES/weather-vae-fomo.

## Funding statement

M.A., L-b.S. and J.Z. acknowledge funding from the Helmholtz Initiative and Networking Fund (Young Investigator Group COMPOUNDX, Grant Agreement VH-NG-1537). G.C-V. was supported by the European Research Council (ERC) under the ERC Synergy Grant USMILE (grant agreement 855187). G.C-V. and J.Z. acknowledge the European Union’s Horizon 2020 research and innovation program within the project ‘XAIDA: Extreme Events – Artificial Intelligence for Detection and Attribution’ under grant agreement No 101003469. FB was supported by the Collaborative Research Centre AquaDiva 559 of the Friedrich Schiller University Jena, funded by the Deutsche Forschungsgemeinschaft 560 (DFG, German Research Foundation) – SFB 1076 – “Project Number 218627073".

## A. Figures

## B. Metrics

Three metrics are used for measuring the performance of the logistic regression models. CSI is equal to the total number of correct events predicted (true-positives) divided by the total number of events predicted (true-positives + false-positives) plus the number of misses (false-negatives).

$ {F}_1 $ -score is the harmonic mean of precision and recall, which can also be written as a function of true-positives, false-positives, and false-negatives.

Average Precision is the area under the precision and recall curve. It is typically used for imbalanced datasets. CSI, $ {F}_1 $ -Score and Average Precision range between 0 and 1, with higher values indicating a better model.