Interactive language use inherently involves a process of coordination, which often leads to matching behaviour between interlocutors in different semiotic channels. We study this process of interactive alignment from a multimodal perspective: using data from head-mounted eye-trackers in a corpus of face-to-face conversations, we measure which effect gaze fixations by speakers (on their own gestures, condition 1) and fixations by interlocutors (on the gestures by those speakers, condition 2) have on subsequent gesture production by those interlocutors. The results show there is a significant effect of interlocutor gaze (condition 2), but not of speaker gaze (condition 1) on the amount of gestural alignment, with an interaction between the conditions.