Skip to main content Accessibility help

Inter-observer reliability of animal-based welfare indicators included in the Animal Welfare Indicators welfare assessment protocol for dairy goats

  • A. Vieira (a1), M. Battini (a2), E. Can (a3) (a4), S. Mattiello (a2) and G. Stilwell (a3)...


This study was conducted within the context of the Animal Welfare Indicators (AWIN) project and the underlying scientific motivation for the development of the study was the scarcity of data regarding inter-observer reliability (IOR) of welfare indicators, particularly given the importance of reliability as a further step for developing on-farm welfare assessment protocols. The objective of this study is therefore to evaluate IOR of animal-based indicators (at group and individual-level) of the AWIN welfare assessment protocol (prototype) for dairy goats. In the design of the study, two pairs of observers, one in Portugal and another in Italy, visited 10 farms each and applied the AWIN prototype protocol. Farms in both countries were visited between January and March 2014, and all the observers received the same training before the farm visits were initiated. Data collected during farm visits, and analysed in this study, include group-level and individual-level observations. The results of our study allow us to conclude that most of the group-level indicators presented the highest IOR level (‘substantial’, 0.85 to 0.99) in both field studies, pointing to a usable set of animal-based welfare indicators that were therefore included in the first level of the final AWIN welfare assessment protocol for dairy goats. Inter-observer reliability of individual-level indicators was lower, but the majority of them still reached ‘fair to good’ (0.41 to 0.75) and ‘excellent’ (0.76 to 1) levels. In the paper we explore reasons for the differences found in IOR between the group and individual-level indicators, including how the number of individual-level indicators to be assessed on each animal and the restraining method may have affected the results. Furthermore, we discuss the differences found in the IOR of individual-level indicators in both countries: the Portuguese pair of observers reached a higher level of IOR, when compared with the Italian observers. We argue how the reasons behind these differences may stem from the restraining method applied, or the different background and experience of the observers. Finally, the discussion of the results emphasizes the importance of considering that reliability is not an absolute attribute of an indicator, but derives from an interaction between the indicators, the observers and the situation in which the assessment is taking place. This highlights the importance of further considering the indicators’ reliability while developing welfare assessment protocols.


Corresponding author


Hide All
Bartussek, H 1999. A review of the animal needs index (ANI) for the assessment of animals’ well-being in the housing systems for Austrian proprietary products and legislation. Livestock Production Science 61, 179192.
Battini, M, Barbieri, S, Vieira, A, Stilwell, G and Mattiello, S 2016. Results of testing the prototype of the AWIN welfare assessment protocol for dairy goats in 30 intensive farms in Northern Italy. Italian Journal of Animal Science 15, 283293.
Battini, M, Vieira, A, Barbieri, S, Ajuda, I, Stilwell, G and Mattiello, S 2014. Invited review: animal-based indicators for on-farm welfare assessment for dairy goats. Journal of Dairy Science 97, 66256648.
Botreau, R, Veissier, I, Butterworth, A, Bracke, M and Keeling, L 2007. Definition of criteria for overall assessment of animal welfare. Animal Welfare 16, 225228.
Brenner, H and Kliebsch, U 1996. Dependence of weighted kappa coefficients on the number of categories. Epidemiology 7, 199202.
Burn, CC and Weir, AAS 2011. Using prevalence indices to aid interpretation and comparison of agreement ratings between two or more observers. Veterinary Journal 188, 166170.
Can, E, Vieira, A, Battini, M, Mattiello, S and Stilwell, G 2016. On-farm welfare assessment of dairy goat farms using animal-based indicators: the example of 30 commercial farms in Portugal. Acta Agriculturae Scandinavica, Section A – Animal Science 66, 4355.
Caroprese, M, Napolitano, F, Mattiello, S, Fthenakis, GC, Ribó, O and Sevi, A 2016. On-farm welfare monitoring of small ruminants. Small Ruminant Research 135, 2025.
Cohen, J 1968. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin 70, 213220.
De Rosa, G, Grasso, F, Pacelli, C, Napolitano, F and Winckler, C 2009. The welfare of dairy buffalo. Italian Journal of Animal Science 8, 103116.
Fraser, D 1995. Science, values and animal welfare: exploring the ‘inextricable connection’. Animal Welfare 4, 103117.
Hallgren, KA 2012. Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in Quantitative Methods for Psychology 8, 2334.
Hewetson, M, Christley, RM, Hunt, ID and Voute, LC 2006. Investigations of the reliability of observational gait analysis for the assessment of lameness in horses. The Veterinary Record 158, 852857.
Hoehler, FK 2000. Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. Journal of Clinical Epidemiology 53, 499503.
Johnsen, PF, Johannesson, T and Sandøe, P 2001. Assessment of farm animal welfare at herd level: many goals, many methods. Acta Agriculturae Scandinavica, Section A – Animal Science 51, 2633.
Kaler, J, Wassink, GJ and Green, LE 2009. The inter- and intra-observer reliability of a locomotion scoring scale for sheep. Veterinary Journal 180, 189194.
Kaufman, AB and Rosenthal, R 2009. Can you believe my eyes? The importance of interobserver reliability statistics in observations of animal behaviour. Animal Behaviour 78, 14871491.
Kottner, J, Audigé, L, Brorson, S, Donner, A, Gajewski, BJ, Hróbjartsson, A, Roberts, C, Shoukri, M and Streiner, DL 2011. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology 64, 96106.
Meagher, RK 2009. Observer ratings: validity and value as a tool for animal welfare research. Applied Animal Behaviour Science 119, 114.
Mullan, S, Edwards, SA, Butterworth, A, Whay, HR and Main, DCJ 2011. Inter-observer reliability testing of pig welfare outcome measures proposed for inclusion within farm assurance schemes. Veterinary Journal 190, 100109.
Mülleder, C, Troxler, J, Laaha, G and Waiblinger, S 2007. Can environmental variables replace some animal-based parameters in welfare assessment of dairy cows? Animal Welfare 16, 153156.
Phythian, CJ, Cripps, PJ, Michalopoulou, E, Jones, PH, Grove-White, D, Clarkson, MJ, Winter, AC, Stubbings, LA and Duncan, JS 2012. Reliability of indicators of sheep welfare assessed by a group observation method. Veterinary Journal 193, 257263.
Phythian, CJ, Toft, N, Cripps, PJ, Michalopoulou, E, Winter, AC, Jones, PH, Grove-White, D and Duncan, JS 2013. Inter-observer agreement, diagnostic sensitivity and specificity of animal-based indicators of young lamb welfare. Animal: An International Journal of Animal Bioscience 7, 11821190.
Ruddat, I, Scholz, B, Bergmann, S, Buehring, A-L, Fischer, S, Manton, A, Prengel, D, Rauch, E, Steiner, S, Wiedmann, S, Kreienbrock, L and Campe, A 2014. Statistical tools to improve assessing agreement between several observers. Animal: An International Journal of Animal Bioscience 8, 643649.
Scott, EM, Nolan, AM and Fitzpatrick, JL 2001. Conceptual and methodological issues related to welfare assessment: a framework for measurement. Acta Agriculturae Scandinavica, Section A – Animal Science 51, 510.
Shrout, PE 1998. Measurement reliability and agreement in psychiatry. Statistical Methods in Medical Research 7, 301317.
Shrout, PE and Fleiss, JL 1979. Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin 89, 420428.
Streiner, DL and Norman, GR 2008. Health Measurement Scales: A practical guide to their development and use, 4th Edition. Oxford University Press, New York, USA.
Uebersax, J 2014. Raw agreement indices. Retrieved on 11 September 2017 from
Vieira, A, Battini, M, Ajuda, I, Mattiello, S and Stilwell, G 2012. Set up of a sampling strategy for the collection of animal-based welfare indicators during milking. In Proceeding of the XI International Conference on Goats, 23–27 September 2012, Las Palmas, Gran Canaria, Spain, p. 51.
Vieira, A 2015. Development and integration of animal-based welfare indicators, including pain, in goat farms in Portugal. PhD thesis, Universidade de Lisboa, Lisboa, Portugal.
Waiblinger, S, Knierim, U and Winckler, C 2001. The development of an epidemiologically based on-farm welfare assessment system for use with dairy cows. Acta Agriculturae Scandinavica, Section A – Animal Science 51, 7377.



Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed