Innovation Concept: The outcome of emergency medicine training is to produce physicians who can competently run an emergency department (ED) shift. While many workplace-based ED assessments focus on discrete tasks of the discipline, others emphasize assessment of performance across the entire shift. However, the quality of assessments is generally poor and these tools often lack validity evidence. The use of entrustment scale anchors may help to address these psychometric issues. The aim of this study was to develop and gather validity evidence for a novel tool to assess a resident's ability to independently run an ED shift. Methods: Through a nominal group technique, local and national stakeholders identified dimensions of performance reflective of a competent ED physician. These dimensions were included in a new tool that was piloted in the Department of Emergency Medicine at the University of Ottawa during a 4-month period. Psychometric characteristics of the items were calculated, and a generalizability analysis used to determine the reliability of scores. An ANOVA was conducted to determine whether scores increased as a function of training level (junior = PGY1-2, intermediate = PGY3, senior = PGY4-5), and varied by ED treatment area. Safety for independent practice was analyzed with a dichotomous score. Curriculum, Tool or Material: The developed Ottawa Emergency Department Shift Observation Tool (O-EDShOT) includes 12-items rated on a 5-point entrustment scale with a global assessment item and 2 short-answer questions. Eight hundred and thirty-three assessment were completed by 78 physicians for 45 residents. Mean scores differed significantly by training level (p < .001) with junior residents receiving lower ratings (3.48 ± 0.69) than intermediate residents who received lower ratings (3.98 ± 0.48) than senior residents (4.54 ± 0.42). Scores did not vary by ED treatment area (p > .05). Residents judged to be safe to independently run the shift had significantly higher mean scores than those judged not to be safe (4.74 ± 0.31 vs 3.75 ± 0.66; p < .001). Fourteen observations per resident, the typical number recorded during a 1-month rotation, were required to achieve a reliability of 0.80. Conclusion: The O-EDShOT successfully discriminated between junior, intermediate and senior-level residents regardless of ED treatment area. Multiple sources of evidence support the O-EDShOT producing valid scores for assessing a resident's ability to independently run an ED shift.