To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure email@example.com
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Human-generated summaries are a blend of content and style, bound by the task restrictions, but are ‘subject to subjectiveness’ of the individuals summarising the documents. We study the impact of various facets that cause subjectivity such as brevity, information content and information coverage on human-authored summaries. The scale of subjectivity is quantitatively measured among various summaries using a question–answer-based cross-comprehension test. The test evaluates summaries for meaning rather than exact words based on questions, framed by the summary authors, derived from the summary. The number of questions that cannot be answered after reading the candidate summary reflects its subjectivity. The qualitative analysis of the outcome of the cross-comprehension test shows the relationship between the length of a summary, information content and nature of questions framed by the summary author.
This paper describes an approach for constructing a mixture of language models based on
simple statistical notions of semantics using probabilistic models developed for information
retrieval. The approach encapsulates corpus-derived semantic information and is able to model
varying styles of text. Using such information, the corpus texts are clustered in an unsupervised
manner and a mixture of topic-specific language models is automatically created. The principal
contribution of this work is to characterise the document space resulting from information
retrieval techniques and to demonstrate the approach for mixture language modelling. A
comparison is made between manual and automatic clustering in order to elucidate how the
global content information is expressed in the space. We also compare (in terms of association
with manual clustering and language modelling accuracy) alternative term-weighting schemes
and the effect of singular value decomposition dimension reduction (latent semantic analysis).
Test set perplexity results using the British National Corpus indicate that the approach can
improve the potential of statistical language modelling. Using an adaptive procedure, the
conventional model may be tuned to track text data with a slight increase in computational
Email your librarian or administrator to recommend adding this to your organisation's collection.