Nutrient profiling can be defined as ‘the science of categorising foods according to their nutritional composition’Reference Scarborough, Rayner and Stockley1. It can be used in a range of different circumstances including the regulation of food labelling, food advertising and vending of foods.
Nutrient profile models can categorise foods in a variety of ways. One way is to divide foods into two or more groups and to categorise groups as healthier than others on the basis of one or more nutritional characteristics of the foods. This ‘categorical’ approach has been used by food retailers, food manufacturers and others for designating ranges of products as, for example, ‘low-fat’ or ‘healthy’.
Another approach is to rank foods from the most healthy to least healthy, again using one or more nutritional characteristics of the food. Each food is given a score based on those characteristics. We have used this approach to develop various nutrient profile models for the UK Food Standards Agency (FSA)Reference Rayner, Scarborough and Stockley2–Reference Rayner, Scarborough, Boxer and Stockley4. Continuous models can be converted to categorical models by designating foods below a certain score as, for example, ‘low-fat’ or ‘healthy’.
Nutrient profiling has, however, been hampered by a lack of validity testing. The method most commonly used is to examine different foods to assess whether the nutrient profile model classifies them appropriately. This assessment is generally done purely subjectively by the team developing the model. Generally the approach taken is to identify ‘anomalies’ that are generated by the model, and if there are too many of these then the model is rejected or modified. This method of assessment is open to bias. Accordingly we have sought a more systematic and transparent method of assessing and comparing the validity of different models.
One method that has been proposedReference Rayner, Scarborough and Stockley2 is to use the subjective judgements of qualified experts of a panel of representative foods, but to collect these in a standardised and repeatable way. Accordingly we have carried out a survey of nutritionists and dietitians in the UK to create a standard ranking of the ‘healthiness’ of 120 foods representative of the British diet, which can be used to compare the ranking or categorisation of foods by a nutrient profile model. The methods for this survey and its results are described elsewhereReference Scarborough, Rayner, Stockley and Black5.
The present paper compares the results of applying eight nutrient profile models to the 120 foods in the survey with the judgements of the nutrition professionals. The aim of this comparison was to assess which of the models categorised foods most closely in line with the views of nutrition professionals.
Two of the models (Models SSCg3dReference Rayner, Scarborough and Stockley2 and WXYfmReference Rayner, Scarborough, Boxer and Stockley4) were developed for the FSA for the purpose of identifying ‘less healthy’ foods for Ofcom (the broadcast regulator in the UK), which is currently considering further regulation of the broadcast advertising of foods to children (see Appendix for details of these models). Models SSCg3d and WXYfm produce scores for foods which can be used as a ranking, but they also categorise foods scoring below a certain threshold as ‘healthier’ and over another threshold as ‘less healthy’. They can therefore be used as both categorical and continuous models.
Of the other six models, three were continuous models: the Nutritious Food Index (NFI), which has three variants – a, b and cReference Gazibarich and Ricci6, the Ratio of Recommended to Restricted food components (RRR)Reference Scheidt and Daniel7 and the Naturally Nutrient Rich (NNR) scoreReference Drewnowski8; and three were categorical models: the Netherlands tripartite classification scheme for food9, the Australian Heart Foundation (AHF) Tick scheme10 and the American Heart Association (AHA) heart-check mark11. The first four of these models have been proposed as mechanisms for comparing the nutritional quality of different foods; the last two have been used for labelling foods as ‘healthy’.
Survey of nutrition professionals
An online questionnaire was used to assess nutrition professionals' perception of the relative healthiness of individual foods. This questionnaire asked respondents to place 40 foods, randomly selected from a master list of 120 foods, in one of six positions labelled at one end as ‘less healthy’ and at the other ‘more healthy’. Respondents were asked to rate different foods compared with all foods, rather than foods from a similar category. To assist with categorisation, the energy (kcal), protein, carbohydrate, total sugar, fat, saturated fat, fibre (NSP), sodium, calcium and iron contents per 100 g of the foods were provided.
The questionnaire was administered by sending a password-protected link for the questionnaire to selected members of the British Dietetic Association and the (British) Nutrition Society. Seven hundred and two responses were suitable for inclusion in an analysis to generate a standard ranking of the foods. The 120 foods from the questionnaire were ranked according to the average score awarded by the nutrition professionals (the least healthy position was allocated a score of 1 and the healthiest position a score of 6). A complete description of the development, administration and results of the questionnaire is provided elsewhereReference Scarborough, Rayner, Stockley and Black5.
Identification of nutrient profile models
The nutrient profile models selected for testing were identified from lists of nutrient profile models in two sources: (1) a background paper for a recent international conference on nutrient profiling12, which identified 22 different nutrient profile models by searching MedLine and Google; and (2) a brief review of nutrient profile model designReference Drewnowski8 that identified four different nutrient profile models, of which three – NNR, the Padberg Nutrition Quality Index and the Calories For Nutrients (CFN) model – were not in the previous list. Of the total of 25 different models identified, we considered eight suitable for comparison with the results of the survey of nutrition professionals (for inclusion criteria for testing, see Box 1). For one model – the CFN model – we were unable to find precise details of the criteria.
Box 1 – Inclusion criteria
1 Model must use data on more than one nutrient or food component to produce a single score or categorisation for the ‘healthiness’ of a food.
2 Model must be (a) published in a peer-reviewed journal, (b) currently in use, or (c) recommended for use by a government body.
3 Model design must allow for application to all foods for adults*.
4 Model must be based on absolute nutrient values (as opposed to values relative to other products within a food category, e.g. 25% lower fat than standard pizza).
5 Model must have clear guidelines for application, published in English. *For the purposes of this paper, it was assumed that the Australian Heart Foundation Tick scheme would not award a logo to all crisps and confectionery.
Testing of nutrient profile models
In order to test the nutrient profile models against the results of the questionnaire, a compositional database was constructed for the 120 foods in the questionnaire using data from the McCance and Widdowson database for the following nutrients: energy, fat, saturated fat, monounsaturated fat, long-chain n–3 polyunsaturated fatty acids (defined as being 20:5, 22:5 and 22:6 fatty acids), cholesterol, protein, total sugars, lactose, non-starch polysaccharides (NSP), sodium, calcium, iron, magnesium, phosphorus, zinc, potassium, vitamin A (defined as being retinol+carotene/6), thiamin, riboflavin, niacin, vitamin B6, vitamin B12, folate, vitamin C, vitamin D and vitamin EReference Roe, Finglas and Church13. Some nutritional information was unavailable. In some instances it was noted in the McCance and Widdowson database that the nutrient was present in significant quantities but that there was no reliable information on the amount. Where this was the case, the amount was assumed to be the same as a similar product from the McCance and Widdowson database (e.g. ‘Minestrone soup, dried, as served’ was assumed to have the same level of folate as ‘Minestrone soup, canned’). This was necessary for 64 nutrient levels. Where no data were available for a similar product – as was the case for 13 nutrient levels – the food was assumed to have none of the nutrient present. Assumptions of these kinds were made for 2% of the nutrient levels in the database.
Additionally, the non-milk extrinsic sugar (NMES) levels for the 120 foods were calculated from the total sugars and lactose levels using a method described elsewhereReference Rayner, Scarborough and Stockley2. The fruit and vegetable content of each food was estimated by a nutritionist, again using methods described elsewhereReference Rayner, Scarborough and Stockley3. Some of the models require dietary fibre to be measured by the AOAC (American Association of Analytical Chemists) method; however, the McCance and Widdowson database provides AOAC fibre levels for only 26 of the 120 foods. So for the remaining 94 foods AOAC fibre levels were estimated by multiplying NSP levels by 1.33, for reasons described elsewhereReference Rayner, Scarborough and Stockley3. Serving sizes for the foods were taken from the standard serving size guide used in the UK14.
Nutrient profiling models that are continuous models were compared by calculating Spearman's ρ for the rank correlation between the scores awarded by the models and the ‘standard’ ranking derived from the results of the survey of nutrition professionals. Two methods were used to calculate the 95% confidence intervals around Spearman's ρ. First, the standard method was usedReference Altman15. Using this method the width of the confidence intervals is dependent on the size of the sample of foods. Second, a method was developed which takes account of the size of the sample of nutrition professionals and their level of agreement about the healthiness of the foods. This method involved using the confidence intervals for the average score for each of the foods derived from the survey of nutrition professionals. The foods were ranked on the basis of these average scores, and this ranking was allowed to vary to the extent allowed by the upper and lower limits of the confidence intervals around each average score. For each model, the Spearman's ρ was then calculated for all possible rankings, and the 95% confidence intervals were interpreted as the highest and lowest scores for the correlation between the ranking of foods produced by the model and the rankings of foods on the basis of the average scores.
For categorical models, χ2 scores were calculated using a cross-tabulation of data. All of these models, in effect, categorise some foods as ‘healthier’ and some as ‘less healthy’, so this categorisation was compared with the quintiles of the scores of nutrition professionals. Where the models categorised foods into three groups on the basis of their healthiness (Models SSCg3d and WXYfm and the Netherlands tripartite model), these categorisations were also compared with the quintiles of the scores of nutrition professionals.
As well as tests of the level of agreement between the judgements of the nutrition professionals and the nutrient profile models, the distribution of the 120 foods between ‘healthier’ and ‘less healthy’ categories – as categorised by both nutrition professionals and nutrient profile model – was assessed.
A sensitivity analysis was performed to assess the effect of assumptions made about nutrient content levels on the outcome of the tests of the nutrient profile models. The results of this sensitivity analysis are not reported here because they showed that the assumptions made little difference to the outcome of the tests.
Table 1 shows the results and the 95% confidence intervals when Spearman's rank correlation test was used to compare the average scores awarded to the 120 foods by the nutrition professionals with the scores awarded by nutrient profile models. The significance of all of these correlations was high (P < 0.001). The test identified only two significant differences between the models: when the 95% confidence intervals around Spearman's ρ were calculated in the standard way, Model SSCg3d achieved a significantly higher result than the NNR and the NFI(c) (P < 0.05).
CI – confidence interval; NFI – Nutritious Food Index; RRR – Ratio of Recommended to Restricted nutrients; NNR – Naturally Nutrient Rich score.
* Model WXYfm categorises foods and drinks on different scales so for this analysis a variant of the model was used, i.e. WXYfm(2), where the nutrient levels in drinks are scored per 200 g rather than per 100 g in order to be able to rank the scores of foods and drinks on the same scale (see Appendix).
Table 2 shows the number of foods that are categorised as ‘healthier’ by the different categorical models. It suggests that the categorical schemes are similar in the proportion of foods that they categorise as healthier, i.e. they all categorise about 20–40% of foods as healthier. The table also shows that about this proportion of foods was given an average score of 4.34 or more by the nutrition professionals.
AHF – Australian Heart Foundation; AHA – American Heart Association.
Table 3 shows that the strongest relationship between the nutrient profile models and the nutrition professionals' categorisations was for Model WXYfm followed by Model SSCg3d, and the weakest relationship was for the Netherlands' tripartite model. However, all of the models showed a high degree of dependence with the nutrition professionals' categorisations (P < 0.001).
AHF – Australian Heart Foundation; AHA – American Heart Association.
Table 4 shows the differences in the way nutrition professionals and the continuous models ranked individual foods (only 60 of the 120 foods are shown). For 53 of the 120 foods from the questionnaire, the standard ranking and the ranking provided by one of the continuous models differed by at least 40 positions. The NNR produced the most differences from the standard ranking and Model SSCg3d produced the least.
NFI – Nutritious Food Index; RRR – Ratio of Recommended to Restricted nutrients; NNR – Naturally Nutrient Rich score.
Negative numbers indicate model ranked food as healthier than the standard ranking, and positive number indicate model ranked foods as healthier than the standard ranking. Differences in ranking of more than 40 are given in bold.
* See footnote to Table 1.
Even the models which gave the greatest correlation with the standard ranking produced some large differences in some cases. For example, Model SSCg3d ranked products with high fibre content, such as wholemeal bread and wholemeal fruit cake, as much less healthy than did the nutrition professionals. For this reason a fibre criterion was added when Model SSCg3d was converted to Model WXYfmReference Rayner, Scarborough, Boxer and Stockley4. The effect of this can be seen in the change in ranking for wholemeal bread, wholemeal fruit cake, etc. Similarly, following the public consultation on Model WXY, it was agreed that the model ranked nuts as much less healthy than would a nutritionistReference Rayner, Scarborough, Boxer and Stockley4; so a modification was incorporated into Model WXYfm to ensure a higher score for nuts and nut-based products. The effects of this modification can be seen in the changes in ranking for pistachio nuts from Model SSCg3d to WXYfm.
For some of the foods, such as plain omelette, there was a large difference between the standard ranking and the ranking provided by the majority of the nutrient profile models. Conversely for some of the foods such as raw green peppers and milk chocolate there was a high level of agreement between the nutrition professionals and most of the nutrient profile models.
There were also differences in the way categorical models classified foods and the views of the nutrition professionals. Granary bread and unsweetened soya milk would not be awarded a logo under the AHF Tick scheme, despite being categorised in the healthiest quintile by the nutrition professionals. Similarly the AHA mark would not be awarded to apples, lettuce, steamed haddock, grilled rainbow trout, granary bread and semi-skimmed milk – all foods categorised in the healthiest quintile by the nutrition professionals. The Netherlands tripartite model also categorised granary and wholemeal bread as ‘exceptional’, along with semi-skimmed milk and reduced-salt and -sugar baked beans.
The aim of this study was to compare nutrient profile models with a standard ranking of the ‘healthiness’ of foods derived from the results of a survey of the views of nutrition professionals. Comparison with such a standard ranking provides one way of validating nutrient profile models. Comparison of one measure with another is called testing for criterion validity. Ideally this should involve comparison with a ‘gold standard’ measure, but it need notReference Abramson and Abrahsom16.
All five continuous models tested here achieved good agreement with the standard ranking derived from the survey, with the highest correlation achieved by Models SSCg3d and WXYfm. Although these were the only models to achieve a correlation above 0.75, the wide confidence intervals around the correlation coefficients mean that the difference between these and the other models was generally not significantly different. Statistical differences between models were not even observed when the level of agreement amongst the nutrition professionals over the categorisation of foods was taken into account.
For categorical models, the categorisation of foods by Models WXYfm and SSCg3d was more strongly related to the views of the nutrition professionals than for the three other categorical models – the AHF Tick scheme, the AHA mark and the Netherlands tripartite model. However, comparisons between χ2 statistics should be made with caution since the χ2 test is intended to show dependence between two categorical variables and this is achieved for all of the models to a high degree (P < 0.001).
Nutrient profile models can be of two types: across-the-board and category-specific. An across-the-board model ranks or categorises foods with respect to all other foods. A category-specific model ranks or categorises foods with respect to foods in the same categoryReference Rayner, Scarborough and Stockley2. The questionnaire for the survey of nutrition professionals asked respondents to categorise foods relative to all foods, rather than foods from a similar category. So the standard ranking the survey generates could only be used to test nutrient profile models which are ‘across-the-board’. Models SSCg3d and WXYfm, NFI, RRR, NNR and the AHA mark (which separates ‘whole grain’ foods from other foods, but the nutritional criteria are very similar) are all across-the-board models.
The AHF Tick scheme and the Netherlands tripartite model set different criteria for different categories of foods but are a mixture of across-the-board and category-specific in their purpose. This is shown by the fact that the criteria for food categories that the developers clearly considered to be ‘less healthy’ are more restrictive than for categories of foods that the developers considered ‘healthier’. For example, the AHF Tick scheme does not allow the logo to be displayed on savoury snacks or confectionery, and the Netherlands tripartite model identifies several categories of foods (savoury snacks, sauces, cakes, confectionery, etc.) that are not considered ‘basic’ foods – effectively categorising all foods from these groups as ‘exceptional’. A pure category-specific model would enable as many foods in these categories to be categorised as ‘healthier’ as in any other category. Because the two models are partially ‘across-the-board’ in their purpose we consider that it is legitimate to compare them against the standard ranking.
Nutrient profile models also differ in the nutrients and other food components they consider. The eight models tested here considered between seven and 17 nutrients from a total of 34. The survey of nutrition professionals involved providing them with information about 10 nutrients in the foods. Of these 10 nutrient levels, six were used by Model WXYfm, RRR and the AHA mark, five by Model SSCg3d, NFI and the AHF Tick scheme, and three by NNR and the Netherlands tripartite model. The accompanying paper suggests that of these 10 nutrient levels only total fat information (used by NFI, RRR, the AHF Tick scheme and the AHA mark) and sugars information (used by all models except NFI, NNR and the AHA mark) were used by the nutrition professionals to classify foodsReference Scarborough, Rayner, Stockley and Black5. However, it is possible that had different information been provided to the nutrition professionals then they would have scored the foods in a different way. This would of course have affected the results of comparisons between the standard ranking and the way nutrient profile models categorise foods.
Using the standard ranking of foods derived from a survey of views of nutrition professionals to assess the validity of different nutrient profile models should therefore be used with caution. Care should be taken to ensure that the way that the survey is carried out does not favour particular types of nutrient profile model. Nevertheless, we consider that the method is more systematic and transparent than the methods generally used.
It should not be used as the only method of validity testing. This is because nutrition professionals are not entirely logical or consistent in the way they categorise foods (see accompanying paperReference Scarborough, Rayner, Stockley and Black5). Indeed it cannot be the only way, because this paper shows that even when the categorisations of a relatively large sample of foods (120) by a large sample of nutrition professionals (over 700) are used, this is not sufficient to discriminate between a range of different nutrient profile models. Accordingly, we and others are looking for ways of validating and comparing nutrient profile models with reference to healthy and unhealthy diets.
Sources of funding: P.S., A.B. and M.R. are supported by funding provided by the British Heart Foundation. The survey of nutrition professionals was funded by the Food Standards Agency.
Conflict of interest declaration: There are no further conflicts of interest for any author.
Authorship responsibilities: Each author was involved in the analysis of data, and drafting of the final manuscript.
Acknowledgements: The authors are grateful to Rosemary Hignett, Mark Browne, Robin Clifford, Jennifer Burns and Claire Boville from the Food Standards Agency for advice and help.
Appendix – Details of Models SSCg3d and WXYfm
Nutrient profile Model SSCg3d (see also Rayner et al.Reference Rayner, Scarborough and Stockley2)
Model SSCg3d allocates points on the basis of the nutritional content in 100 g of a food or 200 g of a drink. The overall score for the food or drink is calculated in three steps as follows.
A maximum of 10 points can be scored for each nutrient. The following table indicates the points scored for ‘A’ nutrients depending on the content of each nutrient in 100 g of the food or 200 g of the drink:
Total ’C’ points = (points for calcium)+(points for iron)+(points for n − 3 fatty acids)+(points for fruit and vegetable content)
A maximum of 10 points can be scored for each nutrient. The following table indicates the points scored for ‘C’ nutrients, depending on the content of each nutrient/food component in 100 g of a food or 200 g of a drink:
Calculate overall score:
Overall score = (total ’A’ points) − (total ’C’ points)
● A food or drink is classified as ‘less healthy’ where it scores 9 points or more.
● A food or drink is classified as ‘healthier’ where it scores 2 points or less.
Nutrient profile Model WXYfm (see also Rayner et al.Reference Rayner, Scarborough, Boxer and Stockley4)
Model WXYfm allocates points on the basis of the nutritional content in 100 g of the food or drink. The overall score for the food or drink is calculated in three steps as follows.
A maximum of 10 points can be scored for each nutrient. The following table indicates the points scored, depending on the content of each nutrient in 100 g of the food or drink:
A maximum of 5 points can be scored for each nutrient/food component. The following table indicates the points scored, depending on the content of each nutrient/food component in 100 g of the food or drink:
Calculate overall score as follows:
1 If a food or drink scores less than 11 ‘A’ points then the overall score is calculated as follows:
2 If a food or drink scores 11 or more ‘A’ points but scores 5 points for fruit, vegetables and nuts, then the overall score is calculated as follows:
3 If a food scores 11 or more ‘A’ points but scores less than 5 points for fruit, vegetables and nuts, then the overall score is calculated as follows:
● A food is classified as ‘less healthy’ when it scores 4 points or more.
● A food is classified as ‘healthier’ when it scores 0 points or less.
● A drink is classified as ‘less healthy’ when it scores 1 point or more.
● A drink is classified as ‘healthy’ when it scores 0 points or less.
In this paper a variant of Model WXYfm, i.e. Model WXYfm(2), was used when comparing how this model ranks food and drinks compared with the standard ranking. For Model WXYfm(2) the nutrient levels in drinks are scored per 200 g rather than per 100 g. Otherwise it scores foods and drinks in exactly the same way as Model WXYfm.