A cylindrical and inclined jet in crossflow is studied under two distinct velocity ratios, $r=1$ and $r=2$, using highly resolved large eddy simulations. First, an investigation of turbulent scalar mixing sheds light onto the previously observed but unexplained phenomenon of negative turbulent diffusivity. We identify two distinct types of counter gradient transport, prevalent in different regions: the first, throughout the windward shear layer, is caused by cross-gradient transport; the second, close to the wall right after injection, is caused by non-local effects. Then, we propose a deep learning approach for modelling the turbulent scalar flux by adapting the tensor basis neural network previously developed to model Reynolds stresses (Ling et al., J. Fluid Mech., vol. 807, 2016a, pp. 155–166). This approach uses a deep neural network with embedded coordinate frame invariance to predict a tensorial turbulent diffusivity that is not explicitly available in the high-fidelity data used for training. After ensuring analytically that the matrix diffusivity leads to a stable solution for the advection diffusion equation, we apply this approach in the inclined jets in crossflow under study. The results show significant improvement compared to a simple model, particularly where cross-gradient effects play an important role in turbulent mixing. The model proposed herein is not limited to jets in crossflow; it can be used in any turbulent flow where the Reynolds averaged transport of a scalar is considered.