Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-8kt4b Total loading time: 0 Render date: 2024-06-19T09:31:49.071Z Has data issue: false hasContentIssue false

A Strategy for Compressed Storage and Retrieval of Documents

Published online by Cambridge University Press:  05 May 2010

Get access

Summary

ABSTRACT

Document storage and retrieval systems should possess fast string search capabilities. The access paths needed to reduce the search times require substantial amounts of storage in addition to the very large storage requirements for the documents themselves. In this paper we investigate a technique that supports access paths on compressed documents, so that the total storage requirements for the access paths and the compressed documents are less than that for the original documents.

Introduction

Advances in hardware technology are unlikely to keep pace with the increasing growth of on-line document storage. In an environment where the trend is towards local and wide area networks (there is the promise of an interconnected society around the corner), a large number of documents would be transmitted between nodes. Document storage, their communication along network paths and between peripherals and processors requires, for the provision of a satisfactory service at reasonable cost, that the documents be held more compactly than at present. Natural language being highly redundant a suitable encoding scheme could be utilized with any resultant compression reducing both storage and communication cost. In an online environment the compression and decompression schemes must not involve excessive overheads in either time or space; since the documents would need to be compressed only once for storage while decompressed (or retrieved) more often, it is possible to tolerate higher levels of overhead during the compression stage.

Document retrieval requires fast string search capabilities, and it is usual to provide additional access paths to reduce the search times e.g. by providing inverted lists on words. In [Goyal83] a scheme was proposed that made use of inverted indexes associated with compressed documents.

Type
Chapter
Information
Text Processing and Document Manipulation
Proceedings of the International Conference, University of Nottingham, 14-16 April 1986
, pp. 224 - 232
Publisher: Cambridge University Press
Print publication year: 1986

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×