In this focused methodological synthesis, the sample construction procedures of 110 second language (L2) instructed vocabulary interventions were assessed in relation to effect size–driven sample-size planning, randomization, and multisite usage. These three areas were investigated because inferential testing makes better generalizations when researchers consider them during the sample construction process. Only nine reports used effect sizes to plan or justify sample sizes in any fashion, with only one engaging in an a priori power procedure referencing vocabulary-centric effect sizes from previous research. Randomized assignment was observed in 56% of the reports while no report involved randomized sampling. Approximately 15% of the samples observed were constructed from multiple sites and none of these empirically investigated the effect of site clustering. Leveraging the synthesized findings, we conclude by offering suggestions for future L2 instructed vocabulary researchers to consider a priori effect size–driven sample planning processes, randomization, and multisite usage when constructing samples.