Anniversary article: Then and now: 25 years of progress in natural language engineering

John Tait; Yorick Wilks

doi:10.1017/S1351324919000081

Anniversary article: Then and now: 25 years of progress in natural language engineering

Published online by Cambridge University Press: 15 May 2019

John Tait and

Yorick Wilks

Show author details

John Tait*: Affiliation:
Johntait.net Ltd, Thorpe Thewles, Stockton-on-Tees, UK
Yorick Wilks: Affiliation:
Florida Institute of Human and Machine, Cognition 15, SE Osceola, Ocala FL 34471, USA
*: *Corresponding author. Email: john@johntait.net

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The paper reviews the state of the art of natural language engineering (NLE) around 1995, when this journal first appeared, and makes a critical comparison with the current state of the art in 2018, as we prepare the 25th Volume. Specifically the then state of the art in parsing, information extraction, chatbots, and dialogue systems, speech processing and machine translation are briefly reviewed. The emergence in the 1980s and 1990s of machine learning (ML) and statistical methods (SM) is noted. Important trends and areas of progress in the subsequent years are identified. In particular, the move to the use of n-grams or skip grams and/or chunking with part of speech tagging and away from whole sentence parsing is noted, as is the increasing dominance of SM and ML. Some outstanding issues which merit further research are briefly pointed out, including metaphor processing and the ethical implications of NLE.

Keywords

Information extraction Machine learning Machine translation Parsing Statistical methods

Type: Article
Information: Natural Language Engineering , Volume 25 , Issue 3 , May 2019 , pp. 405 - 418

DOI: https://doi.org/10.1017/S1351324919000081 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Andor, D. Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrov, S. and Collins, M. (2016). Globally Normalized Transition-Based Neural Networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany. 2442–2452.Google Scholar

Azmi, A. and Alshenaifi, N. (2017). Lemaza: An Arabic why-question answering system. Natural Language Engineering 23(6), 877–903. doi: 10.1017/S1351324917000304CrossRef Google Scholar

Bachenko, J., Fitzpatrick, E. and Daugherty, J. (1995). A rule-based phrase parser for real-time text-to-speech synthesis. Natural Language Engineering 1(2), 191–212. doi: 10.1017/S1351324900000140CrossRef Google Scholar

Ballim, A. and Wilks, Y. (1991/2018). Artificial Believers: The Ascription of Belief. New Jersey: Ablex Books; reprinted by Routledge, London.Google Scholar

Banea, C. and Mihalcea, R. (2018). Possession identification in text. Natural Language Engineering 24(4), 589–610. doi: 10.1017/S1351324918000062CrossRef Google Scholar

Biemann, C., Faralli, S., Panchenko, A. and Ponzetto, S. (2018). A framework for enriching lexical semantic resources with distributional semantics. Natural Language Engineering 24(2), 265–312. doi: 10.1017/S135132491700047XCrossRef Google Scholar

Boguraev, B. and Briscoe, T. (Eds) (1989). Computional Lexicography for Natural Language Processing. Harlow, Essex, England: Longman.Google Scholar

Boguraev, B.K., Garigliano, R. and Tait, J.I. (1995). Editorial. Natural Language Engineering 1(1), 1–7.Google Scholar

Boguraev, B., Carroll, J., Briscoe, E., Carter, D. and Grover, C. (1987). The Derivation of a Grammatically-Indexed Lexicon from the Longman Dictionary of Contemporary English. In Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, Stanford, CA. 193–200.CrossRef Google Scholar

Bond, F. and Paik, K. (2012). A survey of wordnets and their licenses. In Proceedings of the 6th Global WordNet Conference (GWC 2012). Matsue. 64–71Google Scholar

Braun, D., Reiter, E. and Siddharthan, A. (2018). SaferDrive: An NLG-based behaviour change support system for drivers. Natural Language Engineering 24(4), 551–588. doi: 10.1017/S1351324918000050CrossRef Google Scholar

Brown, J.C. (1995). High speed feature unification and parsing. Natural Language Engineering 1(4), 309–338.CrossRef Google Scholar

Callison-Burch, C., Osborne, M., Koehn, P. (2006). Re-evaluation the Role of Bleu in Machine Translation Research. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006). Trento, Italy. 249–256.Google Scholar

Chelba, C. and Jelinek, F. (2000) Structured language modeling. Computer Speech & Language 14(4), 283–332. doi: 10.1006/csla.2000.0147CrossRef Google Scholar

Chen, Y., Zheng, Q., Tian, F., Liu, H., Hao, Y. and Shah, N. (2018). Exploring open information via event network. Natural Language Engineering 24(2), 199–220. doi: 10.1017/S1351324917000390CrossRef Google Scholar

Cho, K. (2018). Deep learning. In Mitkov, R. (ed), The Oxford Handbook of Computational Linguistics, 2nd Edition. Oxford, England: Oxford University Press. doi: 10.1093/oxfordhb/9780199573691.013.55Google Scholar

Choi, E., Seo, M., Chen, D., Jia, R. and Berant, J. (2018). Proceedings of the Workshop on Machine Reading for Question Answering. Melbourne, Australia: Association for Computational Linguistics.Google Scholar

Church, K.W. and Gale, W.A. (1995). Poisson mixtures. Natural Language Engineering 1(4), 163–190.CrossRef Google Scholar

Colby, K.M. (1973). Simulation of Belief Systems. In Schank, R.C. and Colby, K.M. (eds), Computer Models of Thought and Language. San Francisco: W.H. Freeman and Co. 251–286.Google Scholar

Cranias, L., Papageorgiou, H. and Piperidis, S. (1997). Example retrieval from a translation memory. Natural Language Engineering 3(4), 255–277CrossRef Google Scholar

Cunningham, H. (1999). A definition and short history of language engineering. Natural Language Engineering 5(1), 1–16.CrossRef Google Scholar

De Jong, G.F. (1982). An overview of the FRUMP system. In Lehnert, W.G. and Ringle, M.H. (eds), Strategies for Natural Language Processing. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar

Derici, C., Aydin, Y., Yenialaca, Ç, Aydin, N., Kartal, G., Özgür, A. and Güngör, T. (2018). A closed-domain question answering framework using reliable resources to assist students. Natural Language Engineering 24(5), 725–762. doi: 10.1017/S1351324918000141CrossRef Google Scholar

Evans, R., Gaizauskas, R., Cahill, L.J., Walker, J., Richardson, J. and Dixon, A. (1995). POETIC: A system for gathering and disseminating traffic information. Natural Language Engineering 1(4), 363–387.CrossRef Google Scholar

Fatima, M., Anwar, S., Naveed, A., Arshad, W., Nawab, R., Iqbal, M. and Masood, A. (2018). Multilingual SMS-based author profiling: Data and methods. Natural Language Engineering 24(5), 695–724. doi: 10.1017/S1351324918000244CrossRef Google Scholar

Fellbaum, C. and Miller, G.A. (1998). Wordnet: An Electronic Lexical Database. Cambridge, MA: MIT Press.CrossRef Google Scholar

Floridi, L., Taddeo, M. and Turilli, M. (2009). Turing’s imitation game: Still an impossible challenge for all machines and some judges—an evaluation of the 2008 Loebner contest. Minds & Machines (19):145–150. doi: 10.1007/s11023-008-9130-6.CrossRef Google Scholar

Friedman, C., Hripcsak, G., DuMouchel, W., Johnson, S.B. and Clayton, P.D. (1995). Natural language processing in an operational clinical information system. Natural Language Engineering 1(1), 83–108.CrossRef Google Scholar

Garcia, M., Gómez-Rodríguez, C. and Alonso, M. (2018). New treebank or repurposed? On the feasibility of cross-lingual parsing of Romance languages with Universal dependencies. Natural Language Engineering 24(1), 91–122. doi: 10.1017/S1351324917000377CrossRef Google Scholar

Garside, R. (1987). The CLAWS Word-tagging System. In Garside, R., Leech, G. and Sampson, G. (eds), The Computational Analysis of English: A Corpus-Based Approach. London: Longman.Google Scholar

Giannella, C., Winder, R. and Petersen, S. (2017). Dropped personal pronoun recovery in Chinese SMS. Natural Language Engineering 23(6), 905–927. doi: 10.1017/S1351324917000158CrossRef Google Scholar

Grishman, R. and Sundheim, B. (1996). Message Understanding Conference - 6: A Brief History. In Proceedings of the 16th International Conference on Computational Linguistics (COLING), I, Copenhagen, 466–471.CrossRef Google Scholar

Gründer-Fahrer, S., Schlaf, A., Wiedemann, G. and Heyer, G. (2018). Topics and topical phases in German social media communication during a disaster. Natural Language Engineering 24(2), 221–264. doi: 10.1017/S1351324918000025CrossRef Google Scholar

Han, Y.S. and Choi, K.-S. (1995). Best parse parsing with Earley’s and Inside algorithms on probabilistic RTN. Natural Language Engineering 1(2), 147–161.CrossRef Google Scholar

Hirano, D., Tanaka-Ishii, K. and Finch, A. (2018). Extraction of templates from phrases using Sequence Binary Decision Diagrams. Natural Language Engineering 24(5), 763–795. doi: 10.1017/S1351324918000268CrossRef Google Scholar

Hutchins, J. and Somers, H. (1992). An Introduction to Machine Translation. London: Academic Press.Google Scholar

Juang, B.H. and Rabiner, L.R. (2005). Automatic Speech Recognition– A Brief History of the Technology Development. Georgia Institute of Technology, Atlanta. https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/354_LALI-ASRHistory-final-10-8.pdf (Checked 10 December 2018)Google Scholar

Justeson, J. and Katz, S. (1995). Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1(1), 9–27. doi: 10.1017/S1351324900000048CrossRef Google Scholar

Kadari, R., Zhang, Y., Zhang, W. and Liu, T. (2018). CCG supertagging with bidirectional long short-term memory networks. Natural Language Engineering 24(1), 77–90. doi: 10.1017/S1351324917000250CrossRef Google Scholar

Krüger, K., Lukowiak, A., Sonntag, J., Warzecha, S. and Stede, M. (2017). Classifying news versus opinions in newspapers: Linguistic features for domain independence. Natural Language Engineering 23(5), 687–707. doi: 10.1017/S1351324917000043CrossRef Google Scholar

Kübler, S., Liu, C. and Sayyed, Z. (2018). To use or not to use: Feature selection for sentiment analysis of highly imbalanced data. Natural Language Engineering 24(1), 3–37. doi: 10.1017/S1351324917000298CrossRef Google Scholar

Laddha, A. and Mukherjee, A. (2018). Aspect opinion expression and rating prediction via LDA-CRF hybrid. Natural Language Engineering 24(4), 611–639. doi: 10.1017/S135132491800013XCrossRef Google Scholar

Langlois, D., Saad, M. and Smaliki, K. (2018). Alignment of comparable documents: Comparison of similarity measures on French–English–Arabic data. Natural Language Engineering 24(5), 677–694. doi: 10.1017/S1351324918000232CrossRef Google Scholar

Läubli, S. and Orrego-Carmona, D. (2017). When Google Translate is better than Some Human Colleagues, those People are no longer Colleagues. In Proceedings of Translation and the Computer 39, Asling, the International Association for Advancement in Language Technology, London. 59–69.Google Scholar

Li, B., Gaussier, E. and Yang, D. (2018). Measuring bilingual corpus comparability. Natural Language Engineering 24(4), 523–549. doi: 10.1017/S1351324917000481CrossRef Google Scholar

MacKay, D.J.C. and Bauman Peto, L.C. (1995). A hierarchical Dirichlet language model. Natural Language Engineering 1(3), 289–307.CrossRef Google Scholar

Manning, C.D. (2015). Computational linguistics and deep learning. Computational Linguistics 41(4), 701–707.CrossRef Google Scholar

Marcus, M.P., Marcinkiewicz, M.A. and Santorini, B. (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330.Google Scholar

Marrero, M. and Urbano, J. (2018). A semi-automatic and low-cost method to learn patterns for named entity recognition. Natural Language Engineering 24(1), 39–75. doi: 10.1017/S135132491700016XCrossRef Google Scholar

Michiels, A. (1983). Automatic analysis of texts. In Jones, K.P. (ed), Informatics 7: Intelligent Information Retrieval. Cambridge: Aslib, pp. 103–120.Google Scholar

Mikheev, A. and Liubushkina, L. (1995). Russian morphology: An engineering approach. Natural Language Engineering 1(3), 235–260. doi: 10.1017/S135132490000019XCrossRef Google Scholar

Nagao, M. (1984). A framework of a mechanical translation between Japanese and English by analogy principle. In Elithorn, A. and Banerji, R. (eds), Artificial and Human Intelligence. Edited Review Papers Presented at the International NATO Symposium on Artificial and Human Intelligence, 1981. Lyon, Amsterdam, New York, Oxford, North Holland, pp. 173–180.Google Scholar

Oakley, B. (1993). EUROTRA final Review Panel Report. Commission of the European Communities. Available from: http://aei.pitt.edu/36888/1/A2903.pdf (Checked 26 January 2019).Google Scholar

Palmer, M. and Finin, T. (1990). Workshop on the evaluation of natural language processing systems. Computational Linguistics 16(3), 175–181.Google Scholar

Papenini, K., Rouskos, S., Ward, T. and Whu, W.-J. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia. 311–318.Google Scholar

Periñan-Pascual, C. (2018). DEXTER: A workbench for automatic term extraction with specialized corpora. Natural Language Engineering 24(2), 163–198. doi: 10.1017/S1351324917000365CrossRef Google Scholar

Pierce, J.R., Carroll, J.B., Hamp, E.P., Hays, D.G., Hockett, C.F., Oettinger, A.G. and Perlis, A. (1966). Language and Machines — Computers in Translation and Linguistics. Washington, DC: ALPAC report, National Academy of Sciences, National Research Council.Google Scholar

Prince, V. and Pernel, D. (1995). Several knowledge models and a blackboard memory for human-machine robust dialogues. Natural Language Engineering 1(20), 113–145.CrossRef Google Scholar

Proctor, P. (ed.) (1978). Longman Dictionary of Contemporary English. Harlow, Essex: Longman Group.Google Scholar

Pulman, S. (1995). Anaphora and ellipsis in artificial languages. Natural Language Engineering 1(3), 217–234. doi: 10.1017/S1351324900000188CrossRef Google Scholar

Rosenbaum, R. and Lochak, D. (1966). The IBM core grammar of English. In Lieberman, D. (ed), Specification and Utilization of a Transformational Grammar. AFCRL-66-270 (1966). Yorktown Heights, New York: Thomas J. Watson Research Center, IBM Corporation.CrossRef Google Scholar

Schank, R.C. and Colby, K.M. (Eds.) (1973). Computer Models of Thought and Language. San Francisco: W.H. Freeman and Co.Google Scholar

Somers, H. (2003). Translation memory. In Somers, H. (ed), Computers and Translation: A Translator’s Guide. Amsterdam: John Benjamins.CrossRef Google Scholar

Sparck Jones, K. (1986). Synonymy and Semantic Classification. Edinburgh: Edinburgh University Press.Google Scholar

Sparck Jones, K. and Galliers, J.R. (1995). Evaluating Natural Language Processing Systems: An Analysis and Review. Berlin: Springer.CrossRef Google Scholar

Tait, J. (2019). Editorial. Natural Language Engineering 25(1), 1–4.CrossRef Google Scholar

Tait, J.I. (ed). (2005). Charting a New Course: Natural Language Processing and Information Retrieval. Dordrecht, NL: Springer.CrossRef Google Scholar

Thompson, H. (1983). Natural language processing: A critical analysis of the structure of the field, with some implications for parsing. In Sparck Jones, K. and Wilks, Y. (eds), Automatic Natural Language Parsing. Chichester, England: Ellis Horwood.Google Scholar

Wei, Y., Wei, J. and Yang, Z. (2018). Unsupervised learning of semantic representation for documents with the law of total probability. Natural Language Engineering 24(4), 491–522. doi: 10.1017/S1351324917000420CrossRef Google Scholar

Weizenbaum, J. (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM 9, 36–45. doi: 10.1145/365153.365168.CrossRef Google Scholar

Wilks, Y. (1967). Text searching with templates. Cambridge language research unit, research memorandum. In Ahmad, K., Brewster, C., Stevenson, M. (eds), Words and Intelligence I. Text, Speech and Language Technology, vol. 35. Dordrecht: Springer. Reprinted (2007).Google Scholar

Wilks, Y.A., Slator, B.M. and Guthrie, L.M. (1996). Electric Words. Cambridge, Mass: MIT Press.CrossRef Google Scholar

Wilks, Y.A. and Tait, J.I. (2005). A retrospective view of synonymy and semantic classification. In Charting a New Course: Natural Language Processing and Information Retrieval, pp. 1–11. Springer, Dordrecht.Google Scholar

Winograd, T. (1973). A procedural model of language understanding. In Schank, R.C. and Colby, K.M. (eds), (1973). Computer Models of Thought and Language. San Francisco: W.H. Freeman and Co. pp. 152–186.Google Scholar

Wintner, S. and Ornan, U. (1995). Syntactic analysis of Hebrew sentences. Natural Language Engineering 1(3), 261–288. doi: 10.1017/S1351324900000206CrossRef Google Scholar

Article contents

Anniversary article: Then and now: 25 years of progress in natural language engineering

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests