To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure firstname.lastname@example.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The paper reviews the state of the art of natural language engineering (NLE) around 1995, when this journal first appeared, and makes a critical comparison with the current state of the art in 2018, as we prepare the 25th Volume. Specifically the then state of the art in parsing, information extraction, chatbots, and dialogue systems, speech processing and machine translation are briefly reviewed. The emergence in the 1980s and 1990s of machine learning (ML) and statistical methods (SM) is noted. Important trends and areas of progress in the subsequent years are identified. In particular, the move to the use of n-grams or skip grams and/or chunking with part of speech tagging and away from whole sentence parsing is noted, as is the increasing dominance of SM and ML. Some outstanding issues which merit further research are briefly pointed out, including metaphor processing and the ethical implications of NLE.
Language and communication are considered as relevant to artificial intelligence. Linguists are not the only scientists wishing to test theories of language functioning: so do psychologists and neurophysiologists. This chapter briefly looks at samples of important and prescient early work, and shows two contrasting, slightly later, approaches to the extraction of content, evaluation, representation, and the role of knowledge. It considers a range of systems embodying natural language processing (NLP)/computational linguistics (CL) aspects since the early seventies, and divides them by their relationships to linguistic systems and in relation to concepts normally taken as central to AI, namely logic, knowledge, and semantics. Broadly, statistical methods imply the use of only numerical, quantitatively based, methods for NLP/CL, rather than methods based on representations, whether those are assigned by humans or by computers. The chapter discusses the role of annotations to texts and the interpretability of core AI representations.
Nothing in McKay & Dennett's (M&D's) target article deals with the issue of how the adaptivity, or some other aspect, of beliefs might become a biological adaptation; which is to say, how the functions discussed might be coded in such a way in the brain that their development was also coded in gametes or sex transmission cells.
This chapter gives an estimate of the research value of word-for-word translation into a pidgin language, rather than into the full normal form of an output language.
The basic problem in machine translation is that of multiple meaning, or polysemy. There are two lines of research that highlight this problem in that both set a low value on the information-carrying value of grammar and syntax, and a high one on the resolution of semantic ambiguity. These are:
matching the main content-bearing words and phrases with a semantic thesaurus that determines their meanings in context;
word-for-word matching translation into a pidgin language using a very large bilingual word-and-phrase dictionary.
This chapter examines the second.
The phrase ‘Mechanical Pidgin’ was first used by R. H. Richens to describe the output given at the beginning of Section 2 of this chapter (below), which, he said, was not English at all but a special language, with the vocabulary of English and a structure reminiscent of Chinese. Machine translation output always is a pidgin, whose characteristics per se are never investigated. Either the samples of this pidgin are post-edited into fuller English, or the nature of the output is explained away as low-level machine translation, or rough machine translation, or some vague remark is made to the effect that pidgin machine translation is all right for most purposes.
To the question ‘What is a word?’ philosophers usually give, in succession (as the discussion proceeds), three replies:
‘Everybody knows what a word is.’
‘Nobody knows what a word is.’
‘From the point of view of logic and philosophy, it doesn't matter anyway what a word is, since the statement is what matters, not the word.’
In this paper I shall discuss these three reactions in turn, and dispute the last. Since it is part of my argument that the ways of thinking of several different disciplines must be correlated if we are to progress in our thinking as to what a word is, I shall try to exemplify as many differing contentions as possible by the use of the word ward, since this word is a word which can be used in all senses of ‘word’, which many words cannot.
Two preliminary points about terminology need to be made clear. I am using the word ‘word’ here in the type sense as used by logicians, rather than in the token sense, as synonymous with ‘record of single occurrence of pattern of sound-waves issuing from the mouth’. Thus, when I write here ‘mouth’, ‘mouth’, ‘mouth’, I write only one word.
The second point is that I use in this paper, in different senses, the terms ‘Use’, ‘usage’ and ‘use’. The question as to how the words ‘usage’ and ‘use’ should be used is, as philosophers know, a very thorny one.
1. Current relativist conceptions of science depend widely, though vaguely, upon the insights of T. S. Kuhn (1962), and, in particular, upon his notion of a paradigm. This notion is being used by relativists to support the contention that, since scientific theory is paradigm-founded, and therefore context-based, there can be no one discernible process of scientific verification. However, as I have shown in an earlier paper (1970a), there is another, more exact conception of a Kuhnian paradigm to be considered: namely, that conception of it which says that it is either an analogically used artefact, or even sometimes an actual ‘crude analogy’, that is, an analogical figure of speech expressed in a string of words.
This alternative conception of paradigm, far from supporting a verification-deprived conception of science (which, for those of us philosophers who are also trying to do technological science, just seems a conception of science totally divorced from scientific reality) can, on the contrary, be used to enrich and amplify the most strictly verification-based philosophy of science that is known, namely the Braithwaitean conception of it as a verifiable hypothetico-deductive (H-D) system. For such a paradigm, even though, in unselfconscious scientific thinking, it is usually a crude and concrete conceptual structure, can yet be shown to yield a set of abstract attributes.
The purpose of this chapter is to present a philosophical model of real translation. ‘Translation’ is here used in its ordinary sense: in the sense, that is, in which we say that passages of Burke can be translated into Ciceronian Latin prose, or that the sentence ‘He shot the wrong woman’ is untranslatable into good French. The term ‘philosophical’, however, needs some explaining, since, so far as I know, no one has made a philosophical model of translation as yet. I shall call a model of translation ‘philosophical’ if it has the following characteristics:
It must not only throw some light on the problem of transformation within a language, but must deal also with the problem of reference to something. That is to say, it must relate the strings of language units in the various languages with which it deals to public and recognisable situations in everyday life. It is characteristic of philosophers that, unlike most linguists, they do not regard a text in language as self-contained.
It must deal in concepts, not only in words or terms. All philosophers believe in concepts, though they sometimes pretend not to.
It must face, and not evade, the problem of constructing a universal grammar, while yet recognising fully how greatly languages differ, and howperipheral is the whole problemof determining the nature of language.
The study of language, like the study of mathematical systems, has always been thought to be relevant to the study of forms of argument in science. Language as the scientist uses it, however, is assumed to be potentially interlingual, conceptual and classificatory. This fact makes current philosophical methods of studying language irrelevant to the philosophy of science.
An alternative method of analysing language is proposed. This is that we should take as a model for language the classification system of a great library. Such a classification system is described.
Classification systems of this kind, however, tend to break down because of the phenomena of profusion of meaning, extension of meaning and overlap of meaning in actual languages. The librarian finds that empirically based semantic aggregates (overlapping clusters of meanings) are forming within the system. These are defined as concepts. By taking these aggregates as units, the system can still be used to classify.
An outline sketch is given of a mathematical model of language, language being taken as a totality of semantic aggregates. Language, thus considered, forms a finite lattice. A procedure for retrieving information within the system is described.
The scientific procedures of phrase-coining, classifying and analogy-finding are described in terms of the model.
The point of relevance of the study of language to the philosophy of science
Two very general disciplines have always been thought especially relevant to our understanding of the nature of science.
Faced with the necessity of saying, in a finite space and in an extremely finite time, what I believe the thesaurus theory of language to be, I have decided on the following procedure.
First, I give, in logical and mathematical terms, what I believe to be the abstract outlines of the theory. This account may sound abstract, but it is being currently put to practical use. That is to say, with its help an actual thesaurus to be used for medium-scale mechanical translation (MT) tests, and consisting of specifications in terms of archeheads, heads and syntax markers, made upon words, is being constructed straight on to punched cards. The cards are multiply punched; a nuisance, but they have to be, since the thesaurus in question has 800 heads. There is also an engineering bottleneck about interpreting them; at present, if we wish to reproduce the pack, every reproduced card has to be written on by hand, which makes the reproduction an arduous business; a business also that will become more and more arduous as the pack grows larger. If this interpreting difficulty can be overcome, however, we hope to be able to offer to reproduce this punched-card thesaurus mechanically, as we finish it, for any other MT group that is interested, so that, at last, repeatable, thesauric translations (or mistranslations) can be obtained.
This chapter examines a first-stage translation from Latin into English with the aid of Roget's Thesaurus of a passage from Virgil's Georgics.
The essential feature of this program is the use of a thesaurus as an interlingua: the translation operations are carried out on a head language into which the input text is transformed and from which an output is obtained. The notion of ‘heads’ is taken from the concepts or topics under which Roget classified words in his thesaurus. These operations are of three kinds: semantic, syntactic and grammatical.
The general arrangement of the program is as follows:
Dictionary matching: the chunks of the input language are matched with the entries in a Latin interlingual dictionary giving the raw material of the head language; this consists of heads representing the semantic, syntactic and grammatical elements of the input.
Operations on the semantic heads: these give a first-stage translation.
Operations on the syntactic heads: giving a syntactically complete, though unparsed, translation.
Operations on the grammatical heads: giving a parsed and correctly ordered output.
Cleaning up operations: the output is ‘trimmed’ by, e.g., insertion of capital letters, removal of repetitions like ‘farmer-er’.
Only Stage 2 of the procedure is given in detail here.
Information obtained from stage 1
The Latin sentence to be translated was chunked as follows:
AGRI-COL-A IN-CURV-O TERR-AM DI-MOV-IT AR-ATRO
A number of these generated syntactic heads only. Those with semantic head entries are AGRI-COL-IN-CURV-TERR-DI-MOV-AR-.