While the meeting setting creates many challenges just in terms of recognizing words and who is speaking them, once we have the words, there is still much to be done if the goal is to be able to understand the conversation. To do this, we need to be able to understand the language and the structure of the language being used.
The structure of language is multilayered. At a fine-grained, detailed level, we can look at the structure of the spoken utterances themselves. Dialogue acts which segment and label the utterances into units with one core intention are one type of structure at this level. Another way of looking at understanding language at this level is by focusing on the subjective language being used to express internal mental states, such as opinions, (dis-)agreement, sentiments, and uncertainty.
At a coarser level, language can be structured by the topic of conversation. Finally, within a given topic, there is a structure to the language used to make decisions. Language understanding is sufficiently advanced to capture the content of the conversation for specific phenomena like decisions based on elaborate domain models. This allows an indexing and summarization of meetings at a very high degree of understanding.
Finally, the language of spoken conversation differs significantly from written language. Frequent types of speech disfluencies can be detected and removed with techniques similar to those used for understanding language structure as described above.