We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure coreplatform@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In numerical linear algebra, a well-established practice is to choose a norm that exploits the structure of the problem at hand to optimise accuracy or computational complexity. In numerical polynomial algebra, a single norm (attributed to Weyl) dominates the literature. This article initiates the use of
$L_p$
norms for numerical algebraic geometry, with an emphasis on
$L_{\infty }$
. This classical idea yields strong improvements in the analysis of the number of steps performed by numerous iterative algorithms. In particular, we exhibit three algorithms where, despite the complexity of computing
$L_{\infty }$
-norm, the use of
$L_p$
-norms substantially reduces computational complexity: a subdivision-based algorithm in real algebraic geometry for computing the homology of semialgebraic sets, a well-known meshing algorithm in computational geometry and the computation of zeros of systems of complex quadratic polynomials (a particular case of Smale’s 17th problem).
We present a JASMIN-based two-dimensional parallel implementation of an adaptive combined preconditioner for the solution of linear problems arising in the finite volume discretisation of one-group and multi-group radiation diffusion equations. We first propose the attribute of patch-correlation for cells of a two-dimensional monolayer piecewise rectangular structured grid without any suspensions based on the patch hierarchy of JASMIN, classify and reorder these cells via their attributes, and derive the conversion of cell-permutations. Using two cell-permutations, we then construct some parallel incomplete LU factorisation and substitution algorithms, to provide our parallel -GMRES solver with the help of the default BoomerAMG in the HYPRE library. Numerical results demonstrate that our proposed parallel incomplete LU preconditioner (ILU) is of higher efficiency than the counterpart in the Euclid library, and that the proposed parallel -GMRES solver is more robust and more efficient than the default BoomerAMG-GMRES solver.
This paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higher-order FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from the Intel MKL on the Intel Xeon (E5-2680 v3, 12 threads) central processing unit (CPU) executed in parallel mode. Compared to the CPU reference implementation based on the Intel MKL functions, the proposed GPU-based LOBPCG method with inexact nullspace filtering allowed us to achieve up to 2.9-fold acceleration.
In this paper, we study an exponential time differencing method for solving the gauge system of incompressible viscous flows governed by Stokes or Navier-Stokes equations. The momentum equation is decoupled from the kinematic equation at a discrete level and is then solved by exponential time stepping multistep schemes in our approach. We analyze the stability of the proposed method and rigorously prove that the first order exponential time differencing scheme is unconditionally stable for the Stokes problem. We also present a compact representation of the algorithm for problems on rectangular domains, which makes FFT-based solvers available for the resulting fully discretized system. Various numerical experiments in two and three dimensional spaces are carried out to demonstrate the accuracy and stability of the proposed method.
In many applications, the Gaussian convolution is approximately computed by means of recursive filters, with a significant improvement of computational efficiency. We are interested in theoretical and numerical issues related to such an use of recursive filters in a three-dimensional variational data assimilation (3Dvar) scheme as it appears in the software OceanVar. In that context, the main numerical problem consists in solving large linear systems with high efficiency, so that an iterative solver, namely the conjugate gradient method, is equipped with a recursive filter in order to compute matrix-vector multiplications that in fact are Gaussian convolutions. Here we present an error analysis that gives effective bounds for the perturbation on the solution of such linear systems, when is computed by means of recursive filters. We first prove that such a solution can be seen as the exact solution of a perturbed linear system. Then we study the related perturbation on the solution and we demonstrate that it can be bounded in terms of the difference between the two linear operators associated to the Gaussian convolution and the recursive filter, respectively. Moreover, we show through numerical experiments that the error on the solution, which exhibits a kind of edge effect, i.e. most of the error is localized in the first and last few entries of the computed solution, is due to the structure of the difference of the two linear operators.
We construct modulus-based synchronous multisplitting iteration methods to solve a large implicit complementarity problem on parallel multiprocessor systems, and prove their convergence. Numerical results confirm our theoretical analysis and show that these new methods are efficient.
An idea of designing oscillation-less and high-resolution hybrid schemes is proposed and several types of hybrid schemes based on this idea are presented on block-structured grids. The general framework, for designing various types of hybrid schemes, is established using a Multi-dimensional Optimal Order Detection (MOOD) method proposed by Clain, Diot and Loubère [1]. The methodology utilizes low dissipation or dispersion but less robust schemes to update the solution and then implements robust and high resolution schemes to deal with problematic situations. A wide range of computational methods including central scheme, MUSCL scheme, linear upwind scheme and Weighted Essentially Non Oscillatory (WENO) scheme have been applied in the current hybrid schemes framework. Detailed numerical studies on classical test cases for the Euler system are performed, addressing the issues of the resolution and non-oscillatory property around the discontinuities.
In this study an explicit Finite Difference Method (FDM) based scheme is developed to solve the Maxwell's equations in time domain for a lossless medium. This manuscript focuses on two unique aspects – the three dimensional time-accurate discretization of the hyperbolic system of Maxwell equations in three-point non-staggered grid stencil and it's application to parallel computing through the use of Graphics Processing Units (GPU). The proposed temporal scheme is symplectic, thus permitting conservation of all Hamiltonians in the Maxwell equation. Moreover, to enable accurate predictions over large time frames, a phase velocity preserving scheme is developed for treatment of the spatial derivative terms. As a result, the chosen time increment and grid spacing can be optimally coupled. An additional theoretical investigation into this pairing is also shown. Finally, the application of the proposed scheme to parallel computing using one Nvidia K20 Tesla GPU card is demonstrated. For the benchmarks performed, the parallel speedup when compared to a single core of an Intel i7-4820K CPU is approximately 190x.
In this paper we develop explicit fast exponential Runge-Kutta methods for the numerical solutions of a class of parabolic equations. By incorporating the linear splitting technique into the explicit exponential Runge-Kutta schemes, we are able to greatly improve the numerical stability. The proposed numerical methods could be fast implemented through use of decompositions of compact spatial difference operators on a regular mesh together with discrete fast Fourier transform techniques. The exponential Runge-Kutta schemes are easy to be adopted in adaptive temporal approximations with variable time step sizes, as well as applied to stiff nonlinearity and boundary conditions of different types. Linear stabilities of the proposed schemes and their comparison with other schemes are presented. We also numerically demonstrate accuracy, stability and robustness of the proposed method through some typical model problems.
As an exploratory study for structural deformation and thermodynamic response induced by spacecraft reentry aerodynamic force and thermal environment, a finite element algorithm is presented on the basis of the classic Fourier heat conductive law to simulate the dynamic thermoelasticity coupling performance of the material. The Newmark method and Crank-Nicolson scheme are utilized to discretize the dynamic thermoelasticity equation and heat conductive equation in the time domain, respectively, and the unconditionally stable implicit algorithm is constructed. Four types of finite-element computing schemes are devised and discussed to solve the thermodynamic coupling equation, all of which are implemented and compared in the computational examples including the one-dimensional transient heat conduction in considering and not considering the vibration, the transient heat flow for the infinite cylinder, and the dynamic coupling thermoelasticity around re-entry flat plate from hypersonic aerothermodynamic environment. The computational results show that the transient responses of temperature and displacement field generate lag phenomenon in case of considering the deformation effect on temperature field. Propagation, rebounding, attenuation and stabilized phenomena of elastic wave are also observed by the finite-element calculation of thermodynamic coupling problem considering vibration and damping, and the oscillation of the temperature field is simultaneously induced. As a result, the computational method and its application research platform have been founded to solve the transient thermodynamic coupling response problem of the structure in strong aerodynamic heating and force environment. By comparing various coupling calculations, it is demonstrated that the present algorithm could give a correct and reliable description of transient thermodynamic responses of structure, the rationality of the sequentially coupling method in engineering calculation is discussed, and the bending deformation mechanism produced by the thermodynamic coupling response from windward and leeward sides of flying body is revealed, which lays the foundation in developing the numerical method to solve material internal temperature distribution, structural deformation, and thermal damage induced by spacecraft dynamic thermoelasticity coupling response under uncontrolled reentry aerothermodynamic condition.
Computational scientists generally seek more accurate results in shorter times, and to achieve this a knowledge of evolving programming paradigms and hardware is important. In particular, optimising solvers for linear systems is a major challenge in scientific computation, and numerical algorithms must be modified or new ones created to fully use the parallel architecture of new computers. Parallel space discretisation solvers for Partial Differential Equations (PDE) such as Domain Decomposition Methods (DDM) are efficient and well documented. At first glance, parallelisation seems to be inconsistent with inherently sequential time evolution, but parallelisation is not limited to space directions. In this article, we present a new and simple method for time parallelisation, based on partial fraction decomposition of the inverse of some special matrices. We discuss its application to the heat equation and some limitations, in associated numerical experiments.
In this work, two approaches, based on the certified Reduced Basis method, have been developed for simulating the movement of nuclear reactor control rods, in time-dependent non-coercive settings featuring a 3D geometrical framework. In particular, in a first approach, a piece-wise affine transformation based on subdomains division has been implemented for modelling the movement of one control rod. In the second approach, a “staircase” strategy has been adopted for simulating the movement of all the three rods featured by the nuclear reactor chosen as case study. The neutron kinetics has been modelled according to the so-called multi-group neutron diffusion, which, in the present case, is a set of ten coupled parametrized parabolic equations (two energy groups for the neutron flux, and eight for the precursors). Both the reduced order models, developed according to the two approaches, provided a very good accuracy compared with high-fidelity results, assumed as “truth” solutions. At the same time, the computational speed-up in the Online phase, with respect to the fine “truth” finite element discretization, achievable by both the proposed approaches is at least of three orders of magnitude, allowing a real-time simulation of the rod movement and control.
The unified lattice Boltzmann model is extended to the quadtree grids for simulation of fluid flow through porous media. The unified lattice Boltzmann model is capable of simulating flow in porous media at various scales or in systems where multiple length scales coexist. The quadtree grid is able to provide a high-resolution approximation to complex geometries, with great flexibility to control local grid density. The combination of the unified lattice Boltzmann model and the quadtree grids results in an efficient numerical model for calculating permeability of multi-scale porous media. The model is used for permeability calculation for three systems, including a fractured system used in a previous study, a Voronoi tessellation system, and a computationally-generated pore structure of fractured shale. The results are compared with those obtained using the conventional lattice Boltzmann model or the unified lattice Boltzmann model on rectangular or uniform square grid. It is shown that the proposed model is an accurate and efficient tool for flow simulation in multi-scale porous media. In addition, for the fractured shale, the contribution of flow in matrix and fractures to the overall permeability of the fractured shale is studied systematically.
Mesh generation is a bottleneck for finite element simulations of biomolecules. A robust and efficient approach, based on the immersed boundary method proposed in [8], has been developed and implemented to generate large-scale mesh body-fitted to molecular shape for general parallel finite element simulations. The molecular Gaussian surface is adopted to represent the molecular surface, and is finally approximated by piecewise planes via the tool phgSurfaceCut in PHG [43], which is improved and can reliably handle complicated molecular surfaces, through mesh refinement steps. A coarse background mesh is imported first and then is distributed into each process using a mesh partitioning algorithm such as space filling curve [5] or METIS [22]. A bisection method is used for the mesh refinements according to the molecular PDB or PQR file which describes the biomolecular region. After mesh refinements, the mesh is optionally repartitioned and redistributed for load balancing. For finite element simulations, the modification of region mark and boundary types is done in parallel. Our parallel mesh generation method has been successfully applied to a sphere cavity model, a DNA fragment, a gramicidin A channel and a huge Dengue virus system. The results of numerical experiments show good parallel efficiency. Computations of electrostatic potential and solvation energy also validate the method. Moreover, the meshing process and adaptive finite element computation can be integrated as one PHG project to avoid the mesh importing and exporting costs, and improve the convenience of application as well.
Simulation of turbulent flows with shocks employing subgrid-scale (SGS) filtering may encounter a loss of accuracy in the vicinity of a shock. This paper addresses the accuracy improvement of LES of turbulent flows in two ways: (a) from the SGS model standpoint and (b) from the numerical method improvement standpoint. In an internal report, Kotov et al. ( “High Order Numerical Methods for large eddy simulation (LES) of Turbulent Flows with Shocks”, CTR Tech Brief, Oct. 2014, Stanford University), we performed a preliminary comparative study of different approaches to reduce the loss of accuracy within the framework of the dynamic Germano SGS model. The high order low dissipative method of Yee & Sjögreen (2009) using local flow sensors to control the amount of numerical dissipation where needed is used for the LES simulation. The considered improved dynamics model approaches include applying the one-sided SGS test filter of Sagaut & Germano (2005) and/or disabling the SGS terms at the shock location. For Mach 1.5 and 3 canonical shock-turbulence interaction problems, both of these approaches show a similar accuracy improvement to that of the full use of the SGS terms. The present study focuses on a five levels of grid refinement study to obtain the reference direct numerical simulation (DNS) solution for additional LES SGS comparison and approaches. One of the numerical accuracy improvements included here applies Harten's subcell resolution procedure to locate and sharpen the shock, and uses a one-sided test filter at the grid points adjacent to the exact shock location.
We present a parallel algorithm to calculate a numerical approximation of a single, isolated root ${\it\alpha}$ of a function $f:\mathbb{R}\rightarrow \mathbb{R}$ which is sufficiently regular at and around ${\it\alpha}$. The algorithm is derivative free and performs one function evaluation on each processor per iteration. It requires at least three processors and can be scaled up to any number of these. The order with which the generated sequence of approximants converges to ${\it\alpha}$ is equal to $(n+\sqrt{n^{2}+4})/2$ for $n+1$ processors with $n\geqslant 2$. This assumes that particular combinations of the derivatives of $f$ do not vanish at ${\it\alpha}$.
This paper presents a parallel algorithm for finding the smallest eigenvalue of a family of Hankel matrices that are ill-conditioned. Such matrices arise in random matrix theory and require the use of extremely high precision arithmetic. Surprisingly, we find that a group of commonly-used approaches that are designed for high efficiency are actually less efficient than a direct approach for this class of matrices. We then develop a parallel implementation of the algorithm that takes into account the unusually high cost of individual arithmetic operations. Our approach combines message passing and shared memory, achieving near-perfect scalability and high tolerance for network latency. We are thus able to find solutions for much larger matrices than previously possible, with the potential for extending this work to systems with greater levels of parallelism. The contributions of this work are in three areas: determination that a direct algorithm based on the secant method is more effective when extreme fixed-point precision is required than are the algorithms more typically used in parallel floating-point computations; the particular mix of optimizations required for extreme precision large matrix operations on a modern multi-core cluster, and the numerical results themselves.
We construct a wavelet-based almost-sure uniform approximation of fractional Brownian motion (FBM) (Bt(H))_t∈[0,1] of Hurst index H ∈ (0, 1). Our results show that, by Haar wavelets which merely have one vanishing moment, an almost-sure uniform expansion of FBM for H ∈ (0, 1) can be established. The convergence rate of our approximation is derived. We also describe a parallel algorithm that generates sample paths of an FBM efficiently.
A second-order in time finite-difference scheme using a modified predictor–corrector method is proposed for the numerical solution of the generalized Burgers–Fisher equation. The method introduced, which, in contrast to the classical predictor–corrector method is direct and uses updated values for the evaluation of the components of the unknown vector, is also analysed for stability. Its efficiency is tested for a single-kink wave by comparing experimental results with others selected from the available literature. Moreover, comparisons with the classical method and relevant analogous modified methods are given. Finally, the behaviour and physical meaning of the two-kink wave arising from the collision of two single-kink waves are examined.
Computing a zero of a continuous function is an old and extensively researched problem in numerical computation. In this paper, we present an efficient subdivision algorithm for finding all real roots of a function in multiple variables. This algorithm is based on a simple computationally verifiable necessity test for the existence of a root in any compact set. Both theoretical analysis and numerical simulations demonstrate that the algorithm is very efficient and reliable. Convergence is shown and numerical examples are presented.