A scientific literature review and consensus of expert opinion used the welfare definitions provided by the Farm Animal Welfare Council (FAWC) Five Freedoms as the framework for selecting a set of animal-based indicators that were sensitive to the current on-farm welfare issues of young lambs (aged ⩽6 weeks). Ten animal-based indicators assessed by observation – demeanour, response to stimulation, shivering, standing ability, posture, abdominal fill, body condition, lameness, eye condition and salivation were tested as part of the objective of developing valid, reliable and feasible animal-based measures of lamb welfare The indicators were independently tested on 966 young lambs from 17 sheep flocks across Northwest England and Wales during December 2008 to April 2009 by four trained observers. Inter-observer reliability was assessed using Fleiss's kappa (κ), and the pair-wise agreement with an experienced, observer designated as the ‘test standard observer’ (TSO) was examined using Cohen's κ. Latent class analysis (LCA) estimated the sensitivity (Se) and specificity (Sp) of each observer without assuming a gold standard and predicted the Se and Sp of randomly selected observers who may apply the indicators in the future. Overall, good levels of inter-observer reliability, and high levels of Sp were identified for demeanour (κ = 0.54, Se ⩾ 0.70, Sp ⩾ 0.98), stimulation (κ = 0.57, Se = 0.30 to 0.77, Sp ⩾ 0.98), shivering (κ = 0.55, Se = 0.37 to 0.85, Sp ⩾ 0.99), standing ability (0.54, Se ⩾ 0.80, Sp ⩾ 0.99), posture (κ = 0.45, Se ⩾ 0.56, Sp = 0.99), abdominal fill (κ = 0.44, Se = 0.39 to 0.98, Sp = 0.99), body condition (κ = 0.72, Se ⩾ 0.38 to 0.90, Sp = 0.99), lameness (κ = 0.68, Se > 0.73, Sp = 1.00), and eye condition (κ = 0.72, Se ⩾ 0.86, Sp = 0.99). LCA predicted that randomly selected observers had Se > 0.77 (acceptable), and Sp ⩾ 0.98 (high) for assessments of demeanour, lameness, abdominal fill posture, body condition and eye condition. The diagnostic performance of some indicators was influenced by the composition of the study population, and it would be useful to test the indicators on lambs with a greater level of outcomes associated with poor welfare. The findings presented in this paper could be applied in the selection of valid, reliable and feasible indicators used for the purposes of on-farm assessments of lamb welfare.