Homogeneity Testing at the Micrometer Scale

Dennis Harries

doi:10.1017/S1551929516001115

Homogeneity Testing at the Micrometer Scale

Published online by Cambridge University Press: 04 January 2017

Dennis Harries

Show author details

Dennis Harries*: Affiliation:
Department of Analytical Mineralogy, Institute of Geosciences, Friedrich Schiller University, Jena, Germany
*: *dennis.harries@uni-jena.de

Article contents

Abstract
Introduction
Materials and Methods
Results
Discussion
Conclusion
References

Abstract

An abstract is not available for this content. As you have access to this content, full HTML content is provided on this page. A PDF of this content is also available in through the ‘Save PDF’ action button.

Type: Microanalysis
Information: Microscopy Today , Volume 25 , Issue 1 , January 2017 , pp. 28 - 35

DOI: https://doi.org/10.1017/S1551929516001115 [Opens in a new window]
Copyright: Copyright © Microscopy Society of America 2017

Introduction

Homogeneous reference materials (standards) are a key to accurate and reproducible elemental microanalysis by energy- and wavelength-dispersive x-ray spectrometry in the scanning electron microscope (SEM) and the electron probe microanalyzer (EPMA). Standards are needed for quantification (converting x-ray counts to mass fractions) and for quality control of analysis results (testing on known compositions). Because the spatial resolution of the electron beam allows analysis of volumes down to a few µm³, any reference material with a known composition used as a microanalysis standard ideally must have the same composition regardless of where on the standard the measurements are made: it must be homogeneous.

In principle, perfect homogeneity in materials does not exist. Modern aberration-corrected scanning transmission electron microscopes equipped with energy-dispersive x-ray detectors and electron energy loss spectrometers are capable of detecting elemental differences at atomic levels. For the more common use of x-ray microanalysis of bulk specimens, the scale and level of heterogeneity that can be accepted for a given purpose must be defined. Tests for homogeneity are then employed to answer the question of whether a material is “fit for the purpose.” This question can be answered positively when significant heterogeneity cannot be detected with the measurement method under its chosen conditions of operation.

While standards are often pure elements, homogeneous multi-element standards are relatively easy to produce if stoichiometric compounds with a limited number of components are used (for example, pure oxides). However, as soon as components substitute for each other, as in glasses or solid solutions, chemical heterogeneity becomes a concern. This is the case for natural minerals that may have formed under changing physicochemical conditions, but it also concerns synthetic materials where homogeneous doping or alloying may be difficult. For these reasons, testing for homogeneity is an important step in qualifying a material as an analysis standard.

Homogeneity testing of a potential standard requires the investigator to discriminate between the elemental variations due to true compositional differences in the sample and those variations that are related to the instrument and the measurement process itself. The latter variations are the “precision” of the analysis, which represents the statistically random scatter in the results. Whether or not heterogeneity among different locations on the sample can be detected depends on the magnitude of the chemical variations relative to this omnipresent scatter; hence, the instrumental precision has to be known.

The easiest approach to determining the instrumental precision is to repeatedly analyze the same spot on the sample. Then the comparison of different locations on the sample by “analysis of variance” (ANOVA) allows the statistical separation of instrumental scatter from compositional variations with a given degree of confidence [Reference Linsinger1, Reference Marinenko2]. However, in practical microanalysis this approach is often problematic because repeated analyses of a single spot may result in progressive degradation (or even loss) of the analyzed volume. In electron probe microanalysis this degradation may result from the build-up of surface contamination, induced diffusion, or structural damage and decomposition. In consequence, the apparent instrumental variances from repeated measurements on single spots are often not suitable for a sound statistical analysis. This was recognized early in the history of electron probe microanalysis, and counting statistics traditionally have been used to obtain the instrumental precision [Reference Boyd3]. The method outlined here follows this approach: The instrumental precision of a single measurement is the standard deviation solely derived from measured x-ray counts; the variance of the measurement is the standard deviation squared. Calculations are usually more conveniently done with variances, while the standard deviations are usually stated because they preserve the units of the measured quantity. Both are expressions of the uncertainty of the measurement.

Beginning with [Reference Boyd3], geological and technological reference materials have been characterized by an index derived as the ratio of the observed standard deviation to the standard deviation (instrumental precision) predicted by counting statistics [Reference Jarosewich, Nelen and Norberg4, 5, Reference Carpenter6]. However, the interpretation of this “sigma ratio,” or “homogeneity index,” has been based on subjective experience: usually materials with a homogeneity index of H < 3 have been accepted as suitable standards. However, this is statistically not sound, and a revised interpretation of the homogeneity index has been given only recently [Reference Harries7]. Based on this new evaluation, this article describes an improved statistical method for determining the degree of homogeneity in a bulk specimen at the micrometer level of spatial resolution, given the instrumental precision of the analysis based on counting statistics.

Materials and Methods

As a candidate reference material for use as a microprobe standard, natural crystals of the mineral fluorapatite from Imilchil in Morocco were investigated. Fluorapatite is a halogen-bearing calcium phosphate, ideally Ca₅(PO₄)₃F, and can substitute a wide range of elements into its structure. It is also one of the more beam-sensitive minerals and known for loss of fluorine during electron beam analysis [Reference Stormer8]. It was analyzed using wavelength-dispersive electron probe x-ray spectrometry on a JEOL JXA-8900 (15 kV, 30 nA beam current, 14 µm spot diameter). A worked example of a different material is found in the electronic annex of [Reference Harries7].

Instrumental precision

In the case that only one measurement per analysis spot is practically feasible in order to avoid specimen degradation, significance criteria for detectable heterogeneity can be derived in a statistically sound fashion if the counting statistics of x-ray photons is the only significant contribution to the instrumental precision. Hence, the instrumental precision then would be the standard deviation derived from the number of counting events for a particular element x-ray peak. This limitation of instrumental precision to counting statistics is only valid if other sources of instrumental random noise are insignificant. Such a source may be, for example, the alignment of the analyzer crystal in a mechanical wavelength-dispersive spectrometer or variations in the beam current measurement—basically any parameter that is set and reset before and after the acquisition of each data point. Because technological development has achieved highly reproducible mechanics and electronics, these sources of scatter in the results are usually negligible.

Drift

A much bigger problem is instrumental drift. In this case instrumental variations occur not randomly but in a time-constrained manner, for example in relation to changing temperature. Instrumental drift must be avoided or properly corrected, and the time series of analyses acquired must be screened meticulously for the presence of variations that correlate with time. Statistical methods for assessing this have been suggested [Reference Liebich9]. The technique presented here assumes that drift is absent and any instrumental contribution to measurement uncertainty, other than the statistical noise of x-ray counting, is insignificant.

Homogeneity index

A traditional approach to test for homogeneity is the recently refined homogeneity index (H) [Reference Boyd3, Reference Harries7]:

(Equation 1)

$$H^{2} \,\equiv\,{{s_{c}^{2} } \over {s_{{Pois}}^{2} }}$$

The homogeneity index compares the variance expected from Poisson counting statistics ( $s_{{Pois}}^{2} $ ) to the combined variance actually observed ( $s_{c}^{2} $ ). The combined variance $s_{c}^{2} $ is simply the variance (standard deviation squared) of the mass fraction values obtained from a number of analysis points (designated N) across the sample and includes the variance due to Poisson noise and any compositional variance. For each of the N analysis points an individual variance is calculated based on counting statistics and these are then combined into s2Pois as the average of the N individual values. Equations for this are given in [Reference Harries7].

If the observed variance is larger than the expected variance from counting statistics (H > 1), compositional heterogeneity may be present. This significance of this heterogeneity can be tested based on F or chi-squared statistics because H ² is a ratio of variances. The answer is strictly valid only for the specific set of measurement conditions that were used for the data acquisition. For the null hypothesis that no heterogeneity can be detected, H = 1, the combined variance observed is equal to the variance due to counting statistical noise.

The homogeneity index on its own is not suitable for deciding whether heterogeneity is detected or not, because H itself has an inherent uncertainty. This is because the variances used to calculate H are statistical estimates and will scatter around true but unknown values. This scatter will decrease by increasing N. Hence, H > 1 may indicate significant heterogeneity, or it may just be larger than unity because of statistical scatter, even when there is no heterogeneity present. In order to decide which case applies, a critical homogeneity index (H _crit) can be calculated based on the chi-squared distribution [Reference Harries7, Reference Liebich9].

(Equation 2)

$$H_{{crit}}^{2} {\equals}\chi _{{\left( {\alpha ,N{\minus}1} \right)}}^{2} \,/\,(N{\minus}1)$$

If H > H _crit, the null hypothesis has to be rejected and significant heterogeneity was detected. However, because statistical testing cannot provide a definitive answer, a level of significance α has to be chosen before H _crit is computed; this level is usually 0.05 (5%). At this level, a false rejection of the null hypothesis may occur in 5% of all tests when there is actually no detectable heterogeneity present, a type I error (false detection of heterogeneity) shown in Figure 1. Note that χ ² _(α,N-1) is available in Excel as function CHISQ.INV where the significance level is stated as 1 - α, that is, 0.95 for a 5% level.

Figure 1 Schematic probability density functions of the homogeneity index for a perfectly homogeneous material (maximum probability density at H = 1). A significance criterion (critical homogeneity index H _crit) is chosen by accepting that a certain number (usually 5%) of all tests fail despite the null hypothesis is true (type I error). (a) A curve assuming N = 30 measurements. The width of the curve indicates the inherent uncertainty of the homogeneity index. (b) A curve assuming N = 300 measurements. The uncertainty of the homogeneity index is strongly reduced and H _crit is reduced as well.

Because the uncertainty of the homogeneity index depends on the number of measurements N, the critical homogeneity index H _crit depends on N. At N = 30 measurements and α = 0.05, the critical homogeneity index is 1.21. At N = 300, it is reduced to 1.07 (Figure 1). To be considered homogeneous, the determined homogeneity index must be below this critical value. Therefore, increasing the number of measurements increases the power to detect heterogeneity. Increasing N also decreases the probability of erroneously accepting the null hypothesis, the type II error (false non-detection of heterogeneity) illustrated in Figure 2. However, this probability cannot be quantified because the true heterogeneity present in the sample is not known—this is what the test is trying to infer statistically. Moreover, increasing N has practical limitations because it requires time and increases the risk that instrumental drift may become a significant contribution to the variations observed. Thus, choosing a reasonable N is an important task.

Figure 2 Schematic probability density functions of the homogeneity index for a heterogeneous material (maximum probability density at H > 1). Most of the tests fail the homogeneity criterion, but some tests are below the critical homogeneity index H _crit. This erroneous non-detection of heterogeneity is a type II error. Increasing the number of measurements N decreases the width of the curve (its center stays fixed) and decreases H _crit. This decreases the probability of a type II error.

Uncertainty budget

A useful concept in homogeneity testing and helpful in choosing a reasonable N is the “uncertainty budget.” In its simplest form this relates the combined variance ( $s_{c}^{2} $ ) to its components comprised of the variance due to heterogeneity of the sample ( $s_{h}^{2} $ ) and the variance due to the measurement process, which, in this case, is reduced to the Poisson variance:

(Equation 3)

$$s_{c}^{2} {\equals}s_{h}^{2} {\plus}s_{{Pois}}^{2} $$

Based on this simple relationship, it is possible to define the relative contribution of heterogeneity to the total uncertainty of a reference value (s _h,rel), a parameter that can be related to the homogeneity index:

(Equation 4)

$$s_{{h,rel}} {\equals}{{s_{h} } \over {s_{c} }}{\equals}\sqrt {1{\minus}{1 \over {H^{2} }}} $$

The parameter s _h provides information on the extent of the compositional variations of the sample. The interval ±2s _h around the measured mass fraction of an element covers about 95% (2 sigma) of the sample’s compositional variations. Because s _h and s _Pois cannot be separated with absolute certainty, s _h contains some uncertainty that depends on the number of measurements N (this is shown in the Results section). This approach is limited by the non-linear relationship between H and s _h,rel resulting in a bias that yields apparently large contributions of compositional heterogeneity even if materials are almost perfectly homogeneous [Reference Harries7]. In such a case only an upper limit of possibly present heterogeneity can be stated: a detection limit of heterogeneity. At N=10 a homogeneous sample that passes the homogeneity test (H < H _crit) may show an apparent contribution of heterogeneity to the uncertainty budget of up to 68%. Similar to stating the detection limit for a non-detected element, this s _h,rel indicates the upper limit of relative heterogeneity that may be present. To obtain this number, H _crit is calculated as a scaled quantile of the chi-squared distribution with a significance level α = 0.05 and N - 1 degrees of freedom via Equation 2. If H _crit replaces H in Equation 4, then the upper limit of s _h,rel results, and the percentage can be calculated by multiplying it with 100%. Because the uncertainty of the homogeneity index is primarily determined by the number of measurements N, this limit can be reduced by increasing N. In order to state that the contribution of compositional heterogeneity to the total uncertainty budget is less than 30%, a homogeneity test with N = 577 measurements has to be passed (H _crit = 1.048). For 20% this number increases to more than 3,000 measurements.

The homogeneity index and the uncertainty budget not only allow decisions of whether or not heterogeneity has been detected—in the sense of a detection limit for elemental variations instead of the mere presence of a chemical element—they allow the quantification of heterogeneity if it is detected. Because s _c and s _Pois are obtained from the measurements, s _h can be calculated easily from Equation 3.

Results

Figure 3 shows silicon and cerium x-ray intensity maps of one crystal of the fluorapatite candidate reference material. In this example the distribution of Ce was studied, which replaces Ca in a coupled substitution with Si (replacing phosphorus) or Na (replacing calcium) for charge balance. While the variation in Si is clearly visible, the significance of heterogeneity in the content of Ce is difficult to judge by the eye.

Figure 3 Electron microprobe x-ray distribution maps by wavelength-dispersive spectrometry obtained on a polished apatite crystal investigated for its suitability as reference material. (a) Distribution of silicon (Si Kα) shows chemical zoning within the crystal. The material is obviously heterogeneous and, in its present form, not suitable as a reference material. (b) Distribution of cerium (Ce Lα) shows barely visible heterogeneity. In this case, statistical testing based on quantitative analyses has to be conducted.

Figure 4 displays the wavelength-dispersive electron probe results of the Ce content of the apatite crystals. Each location on the polished crystal section was analyzed only once. Out of 36 crystals, each measured at N = 30 locations, 16 samples showed detectable heterogeneity of Ce under the measurement conditions used. The critical homogeneity index and the corresponding values of s _h and s _h,rel define the level at which heterogeneity becomes significant at the α = 0.05 level of conficence. Below H _crit, the uncertainty of the heterogeneity contribution s _h to the total combined uncertainty strongly increases. A minimum of uncertainty at any given N is obtained if s _h equals s _Pois, and the overall uncertainty can be reduced by increasing N.

Figure 4 Results of the homogeneity studies on 36 apatite crystals. The composition of each crystal (shown as dots) was measured 30 times along a diagonal profile across the crystal. Shown is the contribution of heterogeneity to the total combined uncertainty of the average Ce mass fraction, either as absolute value (s _h) or relative value (s _h,rel). In 16 crystals significant heterogeneity was detected; they plot to the right of the critical values of s _h and s _h,rel (red solid vertical line), which can be computed from the critical homogeneity index H _crit. The ordinate shows the uncertainty of the uncertainty due to heterogeneity. It becomes very large below the critical values, which serve as a detection limit for heterogeneity. Increasing the number of measurements N decreases the expected uncertainty of s _h (dashed curves).

One apatite crystal marked in Figure 4 showed an average Ce content of 0.215 wt% with a standard deviation (s _c) of 0.020 wt% (crystal s6.3 in Table 1). In this case, heterogeneity among the 30 measurement locations could not be detected: the data point is to the left of the critical homogeneity index (red vertical line), which was derived from Equation 2 (N = 30 and α = 0.05) and converted to s _h,rel using Equation 4. Therefore, it is possible to state that the uncertainty due to heterogeneity is less than 56% of the total uncertainty (corresponding to s _h <0.013 wt%). Table 2 shows the detailed analysis of another crystal (crystal s7.5 in Table 1) that showed an average Ce content of 0.378 wt% with a total uncertainty of 0.038 wt%. In this case heterogeneity was significant (in Figure 4 to the right of the critical value). The contribution of heterogeneity to the total uncertainty was determined to be 84% (corresponding to s _h = 0.032 wt%). The dashed curves in Figure 4 display the standard deviation of s _h, which can be approximated as:

(Equation 5)

$$s\left( {s_{h} } \right){\equals}{{s_{c}^{2} } \over {s_{h} }}{1 \over {\sqrt {2(N{\minus}1)} }}$$

Table 1 Results of the EPMA homogeneity studies of 36 fluorapatite crystals (N = 30 for each crystal).

* In these cases $s_{{Pois}}^{2} $ is larger than $s_{c}^{2} $ due to statistical scatter, and the resulting $s_{h}^{2} $ is negative (and the homogeneity index smaller than unity). **At N = 30 the critical homogeneity index is 1.21; a measured homogeneity index larger than this indicates detected heterogeneity. ***Values in parentheses are below the detection limit of heterogeneity of s _h,rel= 56.4% (calculated from the critical homogeneity index at N = 30) and should not be stated because they are statistically not significant. Instead the upper limit of heterogeneity (i.e. “<56.4%”) and the corresponding value of s _h (i.e. “<56.4% of s _c”) should be stated. They are shown here to illustrate the principle.

Table 2 EPMA homogeneity study of a single fluorapatite crystal (crystal s7.5).

* Individual standard deviations calculated from counting statistics (see [Reference Harries7]).

** Mean of all individual s2Pois,i $s_{{Pois,i}}^{2} $ .

These curves illustrate that below the critical value (left of the red line) the uncertainty strongly increases. Hence, the critical homogeneity index and the parameters derived from it, as explained above, serve as a detection limit for heterogeneity at a chosen level of significance given by α.

Discussion

The example given in Figure 4 shows that the quantification of heterogeneity needs to consider a detection limit. Obtaining the uncertainty due to heterogeneity s _h from the uncertainty budget offers a quantitative parameter, but the significance of s _h and its uncertainty strongly depend on the number of analysis spots and the analytical precision. For example, a short counting time leads to a poor precision and to a large contribution of counting statistics to the uncertainty budget. In this case the relative contribution of heterogeneity to the uncertainty budget is low, and heterogeneity may be below the detection limit discussed above. Because compositional heterogeneity is a fixed characteristic of the sample, an increase in counting time or beam currents will decrease the counting statistical uncertainty and increase the relative contribution of heterogeneity to a value above the detection limit. Hence, perception of homogeneity or heterogeneity depends on the analytical method of investigation and its conditions of operation. An EPMA routine tailored for a more precise determination of Ce in fluorapatite might detect heterogeneity even in those crystals that passed the homogeneity test in the example above. The statistical evaluation discussed is potentially applicable to a wide range of analytical techniques. A standard suitably homogeneous for electron probe x-ray analysis would be adequate for methods with poorer spatial resolution and similar precision but may require re-testing for use with a method with better spatial resolution (for example, thin specimen analysis in STEM) or better precision at low concentrations (for example, SIMS).

From a practical perspective the correct determination of s _Pois can be a challenge. In the case of electron probe microanalysis it is not sufficient to just consider the counts above the background because the counts of the background measurements also contribute to s _Pois [Reference Harries7]. At high precision, matrix effects may produce apparent heterogeneity of a homogeneously distributed element A if a heterogeneously distributed element B affects the measured x-ray intensity of A. Matrix correction procedures should account for this.

Conclusion

In electron probe x-ray microanalysis, homogeneity testing using the homogeneity index based on counting statistics offers the advantage that only one analysis per sample spot is required. Methods requiring multiple measurements on the same spot are prone to errors from degradation of beam-sensitive samples such as many geological materials and glasses. The sound statistical evaluation shown here is applicable to all kinds of reference materials (alloys, glasses, natural and synthetic crystals, etc.) and may be applied to other microanalytical techniques in which photon or ion counting takes place. The level at which heterogeneity may be detected and quantified depends strongly on the number of measurements N.

References

[1] Linsinger, TPJ et al., Accredit Qual Assur 6 (2001) 20–25.Google Scholar

[2] Marinenko, RB et al., National Bureau of Standards Special Publication 260–6265 (1979).Google Scholar

[3] Boyd, FR et al., Carnegie Institution of Washington Year Book 67 (1967) 210–215.Google Scholar

[4] Jarosewich, E, Nelen, JA and Norberg, JA, Geostandard Newslett 4 (1980) 43–47.Google Scholar

[5] Jarosewich, E and MacIntyre, IG, J Sediment Petrol 53 (1983) 677–678.Google Scholar

[6] Carpenter, P et al., J Res Natl Inst Stan 107 (2002) 703–718.Google Scholar

[7] Harries, D, Chemie der Erde 74 (2014) 375–384.Google Scholar

[8] Stormer, JC et al., Am Mineral 78 (1993) 641–648.Google Scholar

[9] Liebich, V et al., Fresen Z Anal Chem 335 (1989) 945–953.CrossRef Google Scholar

Figure 1 Schematic probability density functions of the homogeneity index for a perfectly homogeneous material (maximum probability density at H = 1). A significance criterion (critical homogeneity index Hcrit) is chosen by accepting that a certain number (usually 5%) of all tests fail despite the null hypothesis is true (type I error). (a) A curve assuming N = 30 measurements. The width of the curve indicates the inherent uncertainty of the homogeneity index. (b) A curve assuming N = 300 measurements. The uncertainty of the homogeneity index is strongly reduced and Hcrit is reduced as well.

Figure 2 Schematic probability density functions of the homogeneity index for a heterogeneous material (maximum probability density at H > 1). Most of the tests fail the homogeneity criterion, but some tests are below the critical homogeneity index Hcrit. This erroneous non-detection of heterogeneity is a type II error. Increasing the number of measurements N decreases the width of the curve (its center stays fixed) and decreases Hcrit. This decreases the probability of a type II error.

Figure 4 Results of the homogeneity studies on 36 apatite crystals. The composition of each crystal (shown as dots) was measured 30 times along a diagonal profile across the crystal. Shown is the contribution of heterogeneity to the total combined uncertainty of the average Ce mass fraction, either as absolute value (sh) or relative value (sh,rel). In 16 crystals significant heterogeneity was detected; they plot to the right of the critical values of sh and sh,rel (red solid vertical line), which can be computed from the critical homogeneity index Hcrit. The ordinate shows the uncertainty of the uncertainty due to heterogeneity. It becomes very large below the critical values, which serve as a detection limit for heterogeneity. Increasing the number of measurements N decreases the expected uncertainty of sh (dashed curves).

Table 1 Results of the EPMA homogeneity studies of 36 fluorapatite crystals (N = 30 for each crystal).

Table 2 EPMA homogeneity study of a single fluorapatite crystal (crystal s7.5).

Article contents

Homogeneity Testing at the Micrometer Scale

Abstract

Introduction

Materials and Methods

Instrumental precision

Drift

Homogeneity index

Uncertainty budget

Results

Discussion

Conclusion

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests