Wheat grain yield is often associated with grain number/m2. Spike fertility (SF), i.e. the quotient between grain number and spike chaff dry weight, is a major component of grain number/m2 determination. Several methodologies have been proposed in the literature for field determination of SF, but they are tedious and expensive. Also, no comparison between methodologies has been done. The feasibility of using wheat SF as a selection criterion in a breeding programme or as a variable of interest in crop physiology studies depends largely upon the availability of a simpler and faster method for collecting and processing samples. Thus, the objective of the present study was to determine: (1) the association between SF calculated with the non-grain spike dry weight at anthesis (reference method) or at crop maturity, (2) the association between SF evaluated at the plot level (i.e. both non-grain spike dry weight and grain number determined as per area unit) and at the individual spike level and (3) the minimum number of individual spikes that should be sampled for the development of a screening method that can be applied in wheat breeding programmes or in crop physiology studies. Associations between variables were determined by correlation analysis of treatment means, and by a test of agreement for categorical rating (low, medium and high SF) between individual data of each variable. Four experiments (BY95, BC96, BC97 and ML07) were performed with five, ten, eight and eight wheat cultivars, respectively, under no environmental limitations, except for experiment ML07 which was not irrigated. In the first three experiments, SF was determined both at the beginning of grain filling and at maturity, in plot-size samples (0·8 m2/plot). In experiments BC96 and BC97, SF was determined both in plot-size samples and in individual spikes (five spikes per plot), at the beginning of grain filling. In experiment ML07, increasing numbers of individual spikes were sampled at maturity to assess SF. As a result: (1) a significant association (R2=0·78; P<0·001; d.f.=20) was detected between SF determined at the beginning of grain filling and at maturity, and the test of agreement for categorical rating showed that the classification of data into categories of SF was equivalent between methods (P>0·05); (2) when comparing SF determined in large plot-size samples v. in small samples of individual spikes, a good adjustment (R2=0·77; P<0·001; d.f.=6) was also observed, with no significant cultivar×experiment interaction and a good agreement between methods in the classification of data into categories of SF (P>0·05); and (3) increasing sample size from 5 to 40 spikes gradually decreased the average relative standard error of the mean (from 0·034 to 0·012, respectively). In conclusion, wheat SF can be determined in a fairly accurate way by sampling a small group of individual spikes at crop maturity, thereby allowing the evaluation of a large number of treatments in a timely fashion and the screening of breeding material from early generations.