Data mining for building knowledge bases: techniques, architectures and applications

Alfred Krzywicki; Wayne Wobcke; Michael Bain; John Calvo Martinez; Paul Compton

doi:10.1017/S0269888916000047

Data mining for building knowledge bases: techniques, architectures and applications

Published online by Cambridge University Press: 31 March 2016

Alfred Krzywicki ,

Wayne Wobcke ,

Michael Bain ,

John Calvo Martinez and

Paul Compton

Show author details

Alfred Krzywicki: Affiliation:
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia e-mail: alfredk@cse.unsw.edu.au, wobcke@cse.unsw.edu.au, mike@cse.unsw.edu.au, jcalvo@cse.unsw.edu.au, compton@cse.unsw.edu.au
Wayne Wobcke: Affiliation:
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia e-mail: alfredk@cse.unsw.edu.au, wobcke@cse.unsw.edu.au, mike@cse.unsw.edu.au, jcalvo@cse.unsw.edu.au, compton@cse.unsw.edu.au
Michael Bain: Affiliation:
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia e-mail: alfredk@cse.unsw.edu.au, wobcke@cse.unsw.edu.au, mike@cse.unsw.edu.au, jcalvo@cse.unsw.edu.au, compton@cse.unsw.edu.au
John Calvo Martinez: Affiliation:
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia e-mail: alfredk@cse.unsw.edu.au, wobcke@cse.unsw.edu.au, mike@cse.unsw.edu.au, jcalvo@cse.unsw.edu.au, compton@cse.unsw.edu.au
Paul Compton: Affiliation:
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia e-mail: alfredk@cse.unsw.edu.au, wobcke@cse.unsw.edu.au, mike@cse.unsw.edu.au, jcalvo@cse.unsw.edu.au, compton@cse.unsw.edu.au

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Data mining techniques for extracting knowledge from text have been applied extensively to applications including question answering, document summarisation, event extraction and trend monitoring. However, current methods have mainly been tested on small-scale customised data sets for specific purposes. The availability of large volumes of data and high-velocity data streams (such as social media feeds) motivates the need to automatically extract knowledge from such data sources and to generalise existing approaches to more practical applications. Recently, several architectures have been proposed for what we call knowledge mining: integrating data mining for knowledge extraction from unstructured text (possibly making use of a knowledge base), and at the same time, consistently incorporating this new information into the knowledge base. After describing a number of existing knowledge mining systems, we review the state-of-the-art literature on both current text mining methods (emphasising stream mining) and techniques for the construction and maintenance of knowledge bases. In particular, we focus on mining entities and relations from unstructured text data sources, entity disambiguation, entity linking and question answering. We conclude by highlighting general trends in knowledge mining research and identifying problems that require further research to enable more extensive use of knowledge bases.

Type: Articles
Information: The Knowledge Engineering Review , Volume 31 , Issue 2 , March 2016 , pp. 97 - 123

DOI: https://doi.org/10.1017/S0269888916000047 [Opens in a new window]
Copyright: © Cambridge University Press, 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agarwal, A., Chapelle, O., Dudík, M. & Langford, J. 2014. A reliable effective terascale linear learning system. Journal of Machine Learning Research 15, 1111–1133.Google Scholar

Aggarwal, C. C. & Zhai, C. 2012. Mining Text Data. Springer.CrossRef Google Scholar

Agichtein, E. & Gravano, L. 2000. Snowball: extracting relations from large plain-text collections. In Proceedings of the Fifth ACM Conference on Digital Libraries, 85–94.Google Scholar

Agrawal, R. & Srikant, R. 1995. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering, 3–14.Google Scholar

Althoff, T., Dong, X. L., Murphy, K., Alai, S., Dang, V. & Zhang, W. 2015. TimeMachine: timeline generation for knowledge-base entities. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 19–28.Google Scholar

Angeli, G., Gupta, S., Premkumar, M. J., Manning, C. D., Ré, C., Tibshirani, J., Wu, J. Y., Wu, S. & Zhang, C. 2014. Stanford’s distantly supervised slot filling systems for KBP 2014. In Proceedings of the Seventh Text Analysis Conference.Google Scholar

Antoniou, G. & van Harmelen, F. 2009. Web ontology language (OWL). In Handbook on Ontologies, Staad S. & Studer R. (eds). Springer, 91–110.CrossRef Google Scholar

Asr, F. T., Sonntag, J., Grishina, Y. & Stede, M. 2014. Conceptual and practical steps in event coreference analysis of large-scale data. In Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference and Representation, 35–44.Google Scholar

Baena-García, M., del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavaldà, R. & Morales-Bueno, R. 2004. Early drift detection method. In Proceedings of the Fourth International Workshop on Knowledge Discovery from Data Streams, 77–86.Google Scholar

Becker, H., Iter, D., Naaman, M. & Gravano, L. 2012. Identifying content for planned events across social media sites. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, 533–542.Google Scholar

Becker, H., Naaman, M. & Gravano, L. 2011. Beyond Trending Topics: Real-World Event Identification on Twitter. Technical report CUCS-012-11, Department of Computer Science, Columbia University.Google Scholar

Beltagy, I., Erk, K. & Mooney, R. 2014. Probabilistic soft logic for semantic textual similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 1210–1219.Google Scholar

Berant, J., Chou, A., Frostig, R. & Liang, P. 2013. Semantic parsing on Freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1533–1544.Google Scholar

Biemann, C. 2005. Ontology learning from text: a survey of methods. Journal for Language Technology and Computational Linguistics 20, 75–93.Google Scholar

Bifet, A. & Gavaldà, R. 2006. Learning from time-changing data with adaptive windowing. In Proceedings of the Sixth SIAM International Conference on Data Mining, 443–448.Google Scholar

Blei, D. M., Ng, A. Y. & Jordan, M. I. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022.Google Scholar

Bollacker, K., Tufts, P., Pierce, T. & Cook, R. 2007. A platform for scalable, collaborative, structured information integration. In Proceedings of the Sixth International Workshop on Information Integration on the Web, 22–27.Google Scholar

Bröcheler, M., Mihalkova, L. & Getoor, L. 2010. Probabilistic similarity logic. In Proceedings of the Twenty-Sixth Annual Conference on Uncertainty in Artificial Intelligence, 73–82.Google Scholar

Brunzel, M 2008. The XTREEM methods for ontology learning from web documents. In Ontology Learning and Population: Bridging the Gap Between Text and Knowledge, Buitelaar P. & Cimiano P. (eds). IOS Press, 3–26.Google Scholar

Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E. R. & Mitchell, T. M. 2010. Toward an architecture for never-ending language learning. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 1306–1313.Google Scholar

Chai, X., Deshpande, O., Garera, N., Gattani, A., Lam, W., Lamba, D. S., Liu, L., Tiwari, M., Tourn, M., Vacheri, Z., Prasad, S. T. S., Subramaniam, S., Harinarayan, V., Rajaraman, A., Ardalan, A., Das, S., Suganthan G. C., P. & Doan, A. 2013. Social media analytics: the Kosmix story. IEEE Data Engineering Bulletin 36, 4–12.Google Scholar

Chen, Y. & Wang, D. Z. 2014. Knowledge expansion over probabilistic knowledge bases. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 649–660.Google Scholar

Chen, Z. & Ji, H. 2011. Collaborative ranking: a case study on entity linking. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 771–781.Google Scholar

Cheng, Z., Caverlee, J. & Lee, K. 2010. You are where you tweet: a content-based approach to geo-locating Twitter users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, 759–768.Google Scholar

Cimiano, P., Lopez, V., Unger, C., Cabrio, E., Ngomo, A.-C. N. & Walter, S. 2013. Multilingual Question Answering over Linked Data (QALD-3): lab overview. In Information Access Evaluation. Multilinguality, Multimodality, and Visualization, Forner P., Müller H., Paredes R., Rosso P. & Stein B. (eds). Springer-Verlag, 321–332.Google Scholar

Clarke, J., Merhav, Y., Suleiman, G., Zheng, S. & Murgatroyd, D. 2012. Basis technology at TAC 2012 entity linking. In Proceedings of the Fifth Text Analysis Conference.Google Scholar

Compton, P. & Jansen, R. 1990. A philosophical basis for knowledge acquisition. Knowledge Acquisition 2, 241–258.CrossRef Google Scholar

Cortes, C. & Vapnik, V. 1995. Support-vector networks. Machine Learning 20, 273–297.CrossRef Google Scholar

Curran, J. R., Murphy, T. & Scholz, B. 2007. Minimising semantic drift with mutual exclusion bootstrapping. In Proceedings of the Tenth Conference of the Pacific Association for Computational Linguistics, 172–180.Google Scholar

Davis, A., Veloso, A., da Silva, A. S., Meira, W. J., & Laender, A. H. F. 2012. Named entity disambiguation in streaming data. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers , 1, 815–824.Google Scholar

Dellschaft, K. & Staab, S. 2006. On how to perform a gold standard based evaluation of ontology learning. In Proceedings of the 5th International Conference on the Semantic Web, 228–241.Google Scholar

Deshpande, O., Lamba, D. S., Tourn, M., Das, S., Subramaniam, S., Rajaraman, A., Harinarayan, V. & Doan, A. 2013. Building, maintaining, and using knowledge bases: a report from the trenches. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 1209–1220.Google Scholar

Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S. & Zhang, W. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 601–610.Google Scholar

Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S. & Yates, A. 2005. Unsupervised named-entity extraction from the web: an experimental study. Artificial Intelligence 165, 91–134.Google Scholar

Fan, J., Kalyanpur, A., Gondek, D. C. & Ferrucci, D. A. 2012. Automatic knowledge extraction from documents. IBM Journal of Research and Development 56, 5:1–5:10.Google Scholar

Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. 1996. From data mining to knowledge discovery in databases. AI Magazine 17, 37–54.Google Scholar

Ferré, S. 2013. Squall2sparql: a translator from controlled English to full SPARQL 1.1. In Proceedings of the Question Answering over Linked Data (QALD-3).Google Scholar

Ferrucci, D. A. 2012. Introduction to ‘This is Watson’. IBM Journal of Research and Development 56, 1:1–1:15.Google Scholar

Fung, G. P. C., Yu, J. X., Yu, P. S. & Lu, H. 2005. Parameter free bursty events detection in text streams. In Proceedings of the 31st International Conference on Very Large Data Bases, 181–192.Google Scholar

Furht, B. & Escalante, A. 2011. Handbook of Data Intensive Computing. Springer Science & Business Media.Google Scholar

Gama, J. 2012. A survey on learning from data streams: current and future trends. Progress in Artificial Intelligence 1, 45–55.CrossRef Google Scholar

Gama, J., Medas, P., Castillo, G. & Rodrigues, P. 2004. Learning with drift detection. In Advances in Artificial Intelligence, Bazzan A. L. C. & Labidi S. (eds). Springer-Verlag, 66–112.Google Scholar

Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M. & Bouchachia, A. 2014. A survey on concept drift adaptation. ACM Computing Surveys (CSUR) 46, 44.Google Scholar

Gao, D., Li, X. C. W., Zhang, R. & Ouyang, Y. 2014. Sequential summarization: a full view of Twitter trending topics. IEEE Transactions on Knowledge and Data Engineering 22, 296–302.Google Scholar

Gattani, A., Lamba, D. S., Garera, N., Tiwari, M., Chai, X., Das, S., Subramaniam, S., Rajaraman, A., Harinarayan, V. & Doan, A. 2013. Entity extraction, linking, classification, and tagging for social media: a Wikipedia-based approach. Proceedings of the VLDB Endowment 6, 1126–1137.Google Scholar

Geng, L. & Hamilton, H. J. 2006. Interestingness measures for data mining: a survey. ACM Computing Surveys (CSUR) 38, 1–32.Google Scholar

Gruber, T. R. 1993. A translation approach to portable ontology specifications. Knowledge Acquisition 5, 199–220.Google Scholar

Guo, W., Li, H., Ji, H. & Diab, M. T. 2013. Linking tweets to news: a framework to enrich short text data in social media. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 239–248.Google Scholar

Gupta, A., Mumick, I. S. & Subrahmanian, V. S. 1993. Maintaining views incrementally. ACM SIGMOD Record 22, 157–166.Google Scholar

Han, J., Kamber, M. & Pei, J. 2011. Data Mining: Concepts and Techniques. Elsevier.Google Scholar

He, S., Liu, S., Chen, Y., Zhou, G., Liu, K. & Zhao, J. 2013. CASIA@QALD-3: a question answering system over linked data. In Proceedings of the Question Answering over Linked Data (QALD-3).Google Scholar

Ho, V. H., Wobcke, W. & Compton, P. 2003. EMMA: an e-mail management assistant. In Proceedings of the 2003 IEEE/WIC International Conference on Intelligent Agent Technology, 67–74.Google Scholar

Hoffart, J., Suchanek, F. M., Berberich, K. & Weikum, G. 2013. YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194, 28–61.Google Scholar

Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L. & Weld, D. S. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 541–550.Google Scholar

Hua, W., Wang, Z., Wang, H., Zheng, K. & Zhou, X. 2015. Short text understanding through lexical-semantic analysis. In 2015 IEEE 31st International Conference on Data Engineering (ICDE), 495–506.Google Scholar

Huang, H., Cao, Y., Huang, X., Ji, H. & Lin, C.-Y. 2014. Collective tweet wikification based on semi-supervised graph regularization. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 380–389.Google Scholar

Huang, R. & Riloff, E. 2013. Multi-faceted event recognition with bootstrapped dictionaries. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 41–51.Google Scholar

Hulten, G., Spencer, L. & Domingos, P. 2001. Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 97–106.Google Scholar

Ji, H. & Grishman, R. 2011. Knowledge base population: successful approaches and challenges. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 1, 1148–1158.Google Scholar

Ji, H., Grishman, R. & Dang, H. T. 2011. Overview of the TAC 2011 knowledge base population track. In Proceedings of the Fourth Text Analysis Conference.Google Scholar

Ji, H., Grishman, R., Dang, H. T., Griffitt, K. & Ellis, J. 2010. Overview of the TAC 2010 knowledge base population track. In Proceedings of the Third Text Analysis Conference.Google Scholar

Kim, M. H. & Compton, P. 2012a. Improving open information extraction for informal web documents with ripple-down rules. In Knowledge Management and Acquisition for Intelligent Systems, Richards D. & Kang B. H. (eds). Springer-Verlag, 160–174.CrossRef Google Scholar

Kim, M. H. & Compton, P. 2012b. Improving the performance of a named entity recognition system with knowledge acquisition. In Proceedings of the 18th International Conference on Knowledge Engineering and Knowledge Management, 97–113.Google Scholar

Kotov, A., Zhai, C. & Sproat, R. 2011. Mining named entities with temporally correlated bursts from multilingual web news streams. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, 237–246.Google Scholar

Koychev, I. 2000. Gradual forgetting for adaptation to concept drift. In Proceedings of the ECAI Workshop Current Issues in Spatio-Temporal Reasoning, 101–106.Google Scholar

Krzywicki, A. & Wobcke, W. 2010. Exploiting concept clumping for efficient incremental e-mail categorization. In Advanced Data Mining and Applications, Cao L., Feng Y. & Zhong J. (eds). Springer-Verlag, 244–258.Google Scholar

Krzywicki, A. & Wobcke, W. 2011. Exploiting concept clumping for efficient incremental news article categorization. In Advanced Data Mining and Applications, Tang J., King I., Chen L. & Wang J. (eds). Springer-Verlag, 353–366.Google Scholar

Kumar, R., Raghavan, P., Rajagopalan, S. & Tomkins, A. 1999. Extracting large-scale knowledge bases from the web. In Proceedings of the 25th International Conference on Very Large Data Bases, 639–650.Google Scholar

Lafferty, J. D., McCallum, A. & Pereira, F. C. N. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, 282–289. Morgan Kaufmann Publishers.Google Scholar

Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710.Google Scholar

Li, J., Wang, G. A. & Chen, H. 2011. Identity matching using personal and social identity features. Information Systems Frontiers 13, 101–113.Google Scholar

Li, Y., Wang, C., Han, F., Han, J., Roth, D. & Yan, X. 2013. Mining evidences for named entity disambiguation. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1070–1078.Google Scholar

Liu, X., Li, Y., Wu, H., Zhou, M., Wei, F. & Lu, Y. 2013. Entity linking for tweets. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 1304–1311.Google Scholar

Liu, X., Zhang, S., Wei, F. & Zhou, M. 2011. Recognizing named entities in tweets. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 1, 359–367.Google Scholar

Maynard, D., Li, Y. & Peters, W. 2008. NLP techniques for term extraction and ontology population. In Ontology Learning and Population: Bridging the Gap Between Text and Knowledge, Buitelaar P. & Cimiano P. (eds). IOS Press, 107–127.Google Scholar

McGarry, K. 2005. A survey of interestingness measures for knowledge discovery. The Knowledge Engineering Review 20, 39–61.Google Scholar

Mendes, P. N., Jakob, M. & Bizer, C. 2012. DBpedia: a multilingual cross-domain knowledge base. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, 1813–1817.Google Scholar

Mintz, M., Bills, S., Snow, R. & Jurafsky, D. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 1003–1011.Google Scholar

Mitchell, T., Cohen, W., Hruschka, E., Talukdar, P., Betteridge, J., Carlson, A., Dalvi, B., Gardner, M., Kisiel, B., Krishnamurthy, J., Lao, N., Mazaitis, K., Mohamed, T., Nakashole, N., Platanios, E., Ritter, A., Samadi, M., Settles, B., Wang, R., Wijaya, D., Gupta, A., Chen, X., Saparov, A., Greaves, M. & Welling, J. 2015. Never-ending learning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2302–2310.Google Scholar

Monahan, S. & Brunson, M. 2014. Qualities of eventiveness. In Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference and Representation, 59–67.Google Scholar

Monahan, S., Lehmann, J., Nyberg, T., Plymale, J. & Jung, A. 2011. Cross-lingual cross-document coreference with entity linking. In Proceedings of the Fourth Text Analysis Conference.Google Scholar

Napoles, C., Gormley, M. & Van Durme, B. 2012. Annotated Gigaword. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, 95–100.Google Scholar

Nasukawa, T. & Nagano, T. 2001. Text analysis and knowledge mining system. IBM Systems Journal 40, 967–984.Google Scholar

Nenkova, A. & McKeown, K. 2012. A Survey of Text Summarization Techniques. In Mining Text Data. Aggarwal C. C. and Zhai C. (eds). Springer Science+Business Media, 43–76.Google Scholar

Ottens, K., Aussenac-Gilles, N., Gleizes, M. P. & Camps, V. 2007. Dynamic ontology co-evolution from texts: principles and case study. In Proceedings of the International Workshop on Emergent Semantics and Ontology Evolution, 70–83.Google Scholar

Pan, J. Z. 2009. Resource description framework. In Handbook on Ontologies, Staad S. & Studer R. (eds). Springer, 71–90.Google Scholar

Park, S. S., Kim, Y. S. & Kang, B. H. 2004. Personalized web document classification using MCRDR. In Proceedings of the Pacific Knowledge Acquisition Workshop 2004, 63–73.Google Scholar

Pham, S. B. & Hoffmann, A. 2005. Incremental knowledge acquisition for extracting temporal relations. In Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, 354–359.Google Scholar

Ramakrishnan, N., Butler, P., Muthiah, S., Self, N., Khandpur, R., Saraf, P., Wang, W., Cadena, J., Vullikanti, A., Korkmaz, G., Kuhlman, C., Marathe, A., Zhao, L., Hua, T., Chen, F., Lu, C.-T., Huang, B., Srinivasan, A., Trinh, K., Getoor, L., Katz, G., Doyle, A., Ackermann, C., Zavorin, I., Ford, J., Summers, K., Fayed, Y., Arredondo, J., Gupta, D. & Mares, D. 2014. ‘Beating the news’ with EMBERS: forecasting civil unrest using open source indicators. In Proceedings of the Twentieth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1799–1808.Google Scholar

Ré, C., Sadeghian, A. A., Shan, Z., Shin, J., Wang, F., Wu, S. & Zhang, C. 2014. Feature Engineering for Knowledge Base Construction. Data Engineering Bulletin 37, 26–40.Google Scholar

Riloff, E. & Jones, R. 1999. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, 474–479.Google Scholar

Ritter, A., Clark, S., Mausam, & Etzioni, O. 2011. Named entity recognition in tweets: an experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1524–1534.Google Scholar

Roth, B., Barth, T., Wiegand, M., Singh, M. & Klakow, D. 2013. Effective slot filling based on shallow distant supervision methods. In Proceedings of the Sixth Text Analysis Conference.Google Scholar

Rusu, D., Hodson, J. & Kimball, A. 2014. Unsupervised techniques for extracting and clustering complex events in news. In Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference and Representation, 26–34.Google Scholar

Schrodt, P. A., Davis, S. G. & Weddle, J. L. 1994. Political science: KEDS—a program for the machine coding of event data. Social Science Computer Review 12, 561–587.Google Scholar

Shin, J., Wu, S., Wang, F., Sa, C. D., Zhang, C. & Ré, C. 2015. Incremental knowledge base construction using DeepDive. Proceedings of the VLDB Endowment 8, 1310–1321.Google Scholar

Silva, L. D. & Riloff, E. 2014. User type classification of tweets with implications for event recognition. In Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media, 98–108.Google Scholar

Stoyanov, V., Xu, J., Oard, D., Lawrie, D. & Finin, T. 2012. A context-aware approach to entity linking. In Proceedings of the NAACL Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction, 62–67.Google Scholar

Suganthan, G. C, Sun, P. C., Krishna Gayatri, K., Zhang, H., Yang, F., Rampalli, N., Prasad, S., Arcaute, E., Krishnan, G., Deep, R., Raghavendra, V. & Doan, A. 2015. Why big data industrial systems need rules and what we can do about it. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 265–276.Google Scholar

Surdeanu, M. 2013. Overview of the TAC 2013 knowledge base population evaluation: English slot filling and temporal slot filling. In Proceedings of the Sixth Text Analysis Conference.Google Scholar

Tudorache, T., Noy, N. F., Tu, S. & Musen, M. A. 2008. Supporting collaborative ontology development in protégé. In The Semantic Web − ISWC 2008, Sheth A., Staab S., Dean M., Paolucci M., Maynard D., Finin T. & Thirunarayan K. (eds). Springer-Verlag, 17–32.Google Scholar

Unger, C., Forascu, C., Lopez, V., Ngomo, A.-C. N., Cabrio, E., Cimiano, P. & Walter, S. 2014. Question Answering over Linked Data (QALD-4). CLEF 2014 Working Notes, 1172–1180.Google Scholar

Van Dyke Parunak, H., Rohwer, R., Belding, T. & Brueckner, S. 2007. Dynamic decentralized any-time hierarchical clustering. In Engineering Self-Organising Systems, Brueckner S., Hassas S., Jelasity M. & Yamins D. (eds). Springer-Verlag, 66–81.Google Scholar

Veloso, A., Meira, W. Jr. & Zaki, M. J. 2006. Lazy associative classification. In Proceedings of the Sixth International Conference on Data Mining, 645–654.Google Scholar

Volker, J., Haase, P. & Hitzler, P. 2008. Learning expressive ontologies. In Ontology Learning and Population: Bridging the Gap Between Text and Knowledge, Buitelaar P. & Cimiano P. (eds). IOS Press, 45–69.Google Scholar

Wang, Z., Zhao, K., Wang, H., Meng, X. & Wen, J.-R. 2015. Query understanding through knowledge-based conceptualization. In Proceedings of the International Joint Conference on Artificial Intelligence, 3264–3270.Google Scholar

Widmer, G. 1997. Tracking context changes through meta-learning. Machine Learning 27, 259–286.Google Scholar

Witten, I. H., Frank, E. & Hall, M. A. 2011. Data Mining: Practical Machine Learning Tools and Techniques, 3rd edition. Morgan Kaufmann Publishers.Google Scholar

Wobcke, W., Krzywicki, A. & Chan, Y.-W. 2008. A large-scale evaluation of an e-mail management assistant. In Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 438–442.Google Scholar

Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T. & Liu, X. 1999. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems 14, 32–43.Google Scholar

Yao, X. & Van Durme, B. 2014. Information extraction over structured data: question answering with Freebase. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 956–965.Google Scholar

Yu, D., Li, H., Cassidy, T., Li, Q., Huang, H., Chen, Z., Ji, H., Zhang, Y. & Roth, D. 2013. RPI-BLENDER TAC-KBP2013 knowledge base population system. In Proceedings of the Sixth Text Analysis Conference.Google Scholar

Zacks, J. M. & Tversky, B. 2001. Event structure in perception and conception. Psychological Bulletin 127, 3–21.Google Scholar

Zhang, W., Su, J., Chen, B., Wang, W., Toh, Z., Sim, Y., Cao, Y., Lin, C. Y. & Tan, C. L. 2011. I2R-NUS-MSRA at TAC 2011: entity linking. In Proceedings of the Fourth Text Analysis Conference.Google Scholar

Zhu, J., Nie, Z., Liu, X., Zhang, B. & Wen, J.-R. 2009. StatSnowball: a statistical approach to extracting entity relationships. In Proceedings of the 18th International Conference on World Wide Web, 101–110.Google Scholar

Zou, L., Huang, R., Wang, H., Yu, J. X., He, W. & Zhao, D. 2014. Natural language question answering over RDF: a graph data driven approach. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 313–324.Google Scholar

Article contents

Data mining for building knowledge bases: techniques, architectures and applications

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests