In this paper, we discuss a natural language interface to a database of structured textual
descriptions in the form of annotations of video objects. The interface maps the natural
language query input on to the annotation structures. The language processing is done in
three phases of expectations and implications from the input word, disambiguation of noun
implications and slot-filling of prepositional expectations, and finally, disambiguation of verbal
expectations. The system has been tested with different types of user inputs, including ill-formed sentences, and studied for erroneous inputs and for different types of portability issues.