One of the greatest challenges of assessment in the professions is the higher consequence of errors. In classrooms, there is a consequence to the individual learner if assessment errors are made, but the impact of such errors may not have large-scale consequences. However, when we certify trainees – be they in the military, in medicine, or professions in general – there can be consequences if the measures of proficiency are inadequate. Valid measures take on new meaning when the risks of improper; assessments could endanger people's well-being. This paper will address the techniques used in the design and evaluation of Sherlock, an avionics tutor used by the U.S. Air Force to train technicians to troubleshoot problems pertinent to the F-15 aircraft (Lajoie & Lesgold, 1992a; Lesgold, Lajoie, Bunzo, & Eggan, 1992; Lesgold, Lajoie, Logan & Eggan, 1990). Sherlock presented airmen with realistic fault-isolation problems, similar to those they encounter when troubleshooting avionics equipment. The Sherlock trainees demonstrated improvements in their troubleshooting skills on a variety of measures, taken by Sherlock as training proceeded and via a post-training performance test (Lesgold et al., 1990 Lesgold et al., 1992 Nichols, Pokorny, Jones, & Gott, et al., in press). An analysis of Sherlock will be provided in terms of “what worked” or “lessons learned.” Applications of these methods and extensions to these techniques are also provided for another domain, medical problem solving (Lajoie, 2007).