Studies on gaze allocation during sentence production have recently begun to implement cross-linguistic analyses in the investigation of visual and linguistic processing. The underlying assumption is that the aspects of a scene that attract attention prior to articulation are, in part, linked to the specific linguistic system and means used for expression. The present study concerns naturalistic, dynamic scenes (video clips) showing causative events (agent acting on an object) and exploits grammatical differences in the domain of verbal aspect, and the way in which the status of an event (a specific vs. habitual instance of an event) is encoded in English and German. Fixations in agent and action areas of interest were timelocked to utterance onset, and we focused on the pre-articulatory time span to shed light on sentence planning processes, involving message generation and scene conceptualization. Findings are threefold: (i) English speakers mark the status of an event as specific in relation to the action, with progressive aspect marking on the verb in each utterance. German speakers do so by elaborating specific characteristics of the agent; (ii) participants display significantly different gaze allocation patterns to agent and action regions although the sentences produced in both languages follow the same subject−verb word order; and (iii) the analysis of gaze patterns during sentence production given dynamic scenes provide complementary results from a more naturalistic paradigm, to those obtained in studies with still images.