Skip to main content Accessibility help
×
Home

Storing massive Resource Description Framework (RDF) data: a survey

  • Zongmin Ma (a1) (a2), Miriam A. M. Capretz (a3) and Li Yan (a1) (a2)

Abstract

The Resource Description Framework (RDF) is a flexible model for representing information about resources on the Web. As a W3C (World Wide Web Consortium) Recommendation, RDF has rapidly gained popularity. With the widespread acceptance of RDF on the Web and in the enterprise, a huge amount of RDF data is being proliferated and becoming available. Efficient and scalable management of RDF data is therefore of increasing importance. RDF data management has attracted attention in the database and Semantic Web communities. Much work has been devoted to proposing different solutions to store RDF data efficiently. This paper focusses on using relational databases and NoSQL (for ‘not only SQL (Structured Query Language)’) databases to store massive RDF data. A full up-to-date overview of the current state of the art in RDF data storage is provided in the paper.

Copyright

References

Hide All
Abadi, D. J., Marcus, A., Madden, S. & Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33th International Conference on Very Large Data Bases, 411–422.
Abadi, D. J., Marcus, A., Madden, S. & Hollenbach, K. 2009. SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB Journal 18(2), 385406.
Angles, R., Boncz, P. A., Larriba-Pey, J.-L., Fundulaki, I., Neumann, T., Erling, O., Neubauer, P., Martinez-Bazan, N., Kotsev, V. & Toma, I. 2014. The Linked Data Benchmark Council: a graph and RDF industry benchmarking effort. SIGMOD Record 43(1), 2731.
Angles, R. & Gutierrez, C. 2005. Querying RDF data from a graph database perspective. In Proceedings of the Second European Semantic Web Conference, 346–360.
Angles, R. & Gutierrez, C. 2008. Survey of graph database models. ACM Computing Surveys 40, 1:11:39.
Anguita, A., Martin, L., Garcia-Remesal, M. & Maojo, V. 2013. RDFBuilder: a tool to automatically build RDF-based interfaces for MAGE-OM microarray data sources. Computer Methods and Programs in Biomedicine III, 220227.
Apweiler, R., Bairoch, A., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O’Donovan, C., Redaschi, N. & Yeh, L. S. 2004. UniProt: the universal protein knowledge base. Nucleic Acids Research 32, D115D119.
Berners-Lee, T., Hendler, J. & Lassila, O. 2001. The semantic web. Scientific American 284(5), 3443.
Bishop, B., Kiryakov, A., Ognyanoff, D., Peikov, I., Tashev, Z. & Velkov, R. 2011. OWLIM: a family of scalable semantic repositories. Semantic Web 2(1), 110.
Bishop, B., Kiryakov, A., Tashev, Z., Damova, M. & Simov, K. I. 2012. OWLIM reasoning over FactForge. In Proceedings of the 1st International Workshop on OWL Reasoner Evaluation.
Bizer, C., Heath, T. & Berners-Lee, T. 2009. Linked data—the story so far. International Journal of Semantic Web and Information Systems 5(3), 122.
Bizer, C. & Schultz, A. 2009. The Berlin SPARQL benchmark. International Journal on Semantic Web and Information Systems 5(2), 124.
Bonstrom, V., Hinze, A. & Schweppe, H. 2003. Storing RDF as a graph. In Proceedings of the First Conference on Latin American Web Congress, 27–36.
Bornea, M. A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O. & Bhattacharjee, B. 2013. Building an efficient RDF store over a relational database. In Proceedings of the 2013 ACM International Conference on Management of Data, 121–132.
Broekstra, J., Kampman, A. & van Harmelen, F. 2002. Sesame: a generic architecture for storing and querying RDF and RDF schema. In Proceedings of the 2002 International Semantic Web Conference, 54–68.
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A. & Gruber, R. E. 2008. BigTable: a distributed storage system for structured data. ACM Transactions on Computer Systems 26(2), 4:14:26.
Chao, C.-M. 2007a. An object-oriented approach for storing and retrieving RDF/RDFS documents. Tamkang Journal of Science and Engineering 10(3), 275286.
Chao, C.-M. 2007b. An object-oriented approach to storage and retrieval of RDF/XML documents. In Proceedings of the 19th International Conference on Software Engineering & Knowledge Engineering, 586–591.
Chebotko, A., Abraham, J., Brazier, P., Piazza, A., Kashlev, A. & Lu, S. 2013. Storing, indexing and querying large provenance data sets as RDF graphs in Apache HBase. In Proceedings of IEEE Ninth World Congress on Services, 1–8.
Choi, P., Jung, J. & Lee, K.-H. 2013. RDFChain: chain centric storage for scalable join processing of RDF graphs using MapReduce and HBase. In Proceeding of the 2013 International Semantic Web Conference, 249–252.
Cudre-Mauroux, P., Enchev, I., Fundatureanu, S., Groth, P., Haque, A., Harth, A., Keppmann, F. L., Miranker, D. P., Sequeda, J. F. & Wylot, M. 2013. NoSQL databases for RDF: an empirical evaluation. In Proceedings of the 12th International Semantic Web Conference, 310–325.
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P. & Vogels, W. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the 21st ACM Symposium on Operating Systems Principles, 205–220.
Dell’Aglio, D., Calbimonte, J.-P., Balduini, M., Corcho, O. & Valle, E. D. 2013. On correctness in RDF stream processor benchmarking. In Proceedings of the 12th International Semantic Web Conference, 326–342.
Duan, S., Kementsietsidis, A., Srinivas, K. & Udrea, O. 2011. Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 145–156.
Erling, O. & Mikhailov, I. 2007. RDF support in the Virtuoso DBMS. In Proceedings of the 1st Conference on Social Semantic Web, 59–68.
Erling, O. & Mikhailov, I. 2009. Virtuoso: RDF support in a native RDBMS. In Semantic Web Information Management, De Virgilio, R., Giunchiglia, F. & Tanca, L. (eds). Springer-Verlag, 501–519.
Franke, C., Morin, S., Chebotko, A., Abraham, J. & Brazier, P. 2011. Distributed semantic web data management in HBase and MySQL Cluster. In Proceedings of the 2011 IEEE International Conference on Cloud Computing, 105–112.
Garbis, G., Kyzirakos, K. & Koubarakis, M. 2013. Geographica: a benchmark for geospatial RDF stores. In Proceedings of the 12th International Semantic Web Conference, 343–359.
Grolinger, K., Higashino, W. A., Tiwari, A. & Capretz, M. A. M. 2013. Data management in cloud environments: NoSQL and NewSQL data stores. Journal of Cloud Computing: Advances, Systems and Applications 2, 22.
Gueret, C., Kotoulas, S. & Groth, P. 2011. TripleCloud: an infrastructure for exploratory querying over web-scale RDF data. In Proceedings of the 2011 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology—Workshops, 245–248.
Guo, Y., Pan, Z. & Heflin, J. 2005. LUBM: a benchmark for OWL knowledge base systems. Journal of Web Semantics 3(2–3), 158182.
Harris, S. & Gibbins, N. 2003. 3store: efficient bulk RDF storage. In Proceedings of the First International Workshop on Practical and Scalable Semantic Systems.
Harris, S., Lamb, N. & Shadbolt, N. 2009. 4store: the design and implementation of a clustered RDF store. In Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems, 94–109.
Harris, S. & Shadbolt, N. 2005. SPARQL query processing with conventional relational database systems. In Proceedings of the International Workshop on Scalable Semantic Web Knowledge Base Systems, 235–244.
Harth, A., Umbrich, J., Hogan, A. & Decker, S. 2007. YARS2: a federated repository for querying graph structured data from the web. In Proceedings of the 6th International Semantic Web Conference, 211–224.
Hassanzadeh, O., Kementsietsidis, A. & Velegrakis, Y. 2012. Data management issues on the semantic web. In Proceedings of the 2012 IEEE International Conference on Data Engineering, 1204–1206.
Hayes, J. & Gutierrez, C. 2004. Bipartite graphs as intermediate model for RDF. In Proceedings of the 2004 International Semantic Web Conference, 47–61.
Huang, J., Abadi, D. J & Ren, K. 2011. Scalable SPARQL querying of large RDF graphs. Proceedings of the VLDB Endowment 4(11), 11231134.
Husain, M., McGlothlin, J., Masud, M., Khan, L. & Thuraisingham, B. 2011. Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering 23(9), 13121327.
Husain, M. F., Doshi, P., Khan, L. & Thuraisingham, B. 2009. Storage and retrieval of large RDF graph using Hadoop and MapReduce. In Proceedings of the First International Conference on Cloud Computing, 680–686.
Karvounarakis, G., Alexaki, S., Christophides, V., Plexousakis, D. & Scholl, M. 2002. RQL: a declarative query language for RDF. In Proceedings of the 11th International Conference on World Wide Web, 592–603.
Khadilkar, V., Kantarcioglu, M., Thuraisingham, B. M. & Castagna, P. 2012. Jena-HBase: a distributed, scalable and efficient RDF triple store. In Proceedings of the 2012 International Semantic Web Conference.
Kim, H. S., Ravindra, P. & Anyanwu, K. 2012. Scan-sharing for optimizing RDF graph pattern matching on MapReduce. In Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 139–146.
Kim, S. W. 2006. Hybrid storage scheme for RDF data management in semantic web. Journal of Digital Information Management 4(1), 3236.
Kolas, D. 2008. A benchmark for spatial semantic web systems. In Proceedings of the 2008 International Workshop on Scalable Semantic Web Knowledge Base Systems.
Lakshman, A. & Malik, P. 2010. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating System Review 44(2), 3540.
Lee, K. & Liu, L. 2013. Scaling queries over big RDF graphs with semantic hash partitioning. Proceedings of the VLDB Endowment 6(14), 18941905.
Le-Phuoc, D., Dao-Tran, M., Pham, M.-D., Boncz, P., Eiter, T. & Fink, M. 2012. Linked stream data processing engines: facts and figures. In Proceedings of the 11th International Semantic Web Conference, 300–312.
Levandoski, J. J. & Mokbel, M. F. 2009. RDF data-centric storage. In Proceedings of the 2009 IEEE International Conference on Web Services, 911–918.
Libkin, L., Reutter, J. L. & Vrgoc, D. 2013. Trial for RDF: adapting graph query languages for RDF data. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 201–212.
Luo, Y., Picalausa, F., Fletcher, G. H. L., Hidders, J. & Vansummeren, S. 2012. Storing and indexing massive RDF datasets. In Semantic Search Over the Web, De Virgilio, R., Guerra, F. & Velegrakis, Y. (eds). Springer-Verlag, 31–60.
Manola, F. & Miller, E. 2004. RDF primer, W3C Recommendation. http://www.w3.org/TR/2004/REC-rdf-primer-20040210/.
Matono, A., Amagasa, T., Yoshikawa, M. & Uemura, S. 2005. A path-based relational RDF database. In Proceedings of the 16th Australasian Database Conference, 95–103.
Matono, A. & Kojima, I. 2012. Paragraph tables: a storage scheme based on RDF document structure. In Proceedings of the 23rd International Conference on Database and Expert Systems Applications, 231–247.
McBride, B. 2002. Jena: a semantic web toolkit. IEEE Internet Computing 6(6), 5559.
Minack, E., Siberski, W. & Nejdl, W. 2009. Benchmarking fulltext search performance of RDF stores. In Proceedings of the 6th European Semantic Web Conference, 81–95.
Morsey, M., Lehmann, J., Auer, S. & Ngomo, A. C. N. 2011. DBpedia SPARQL benchmark-performance assessment with real queries on real data. In Proceedings of the 10th International Semantic Web Conference, 454–469.
Morsey, M., Lehmann, J., Auer, S. & Ngomo, A. C. N. 2012. Usage-centric benchmarking of RDF triple stores. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2134–2140.
Mulay, K. & Kumar, P. S. 2012. SPOVC: a scalable RDF store using horizontal partitioning and column oriented DBMS. In Proceedings of the 4th International Workshop on Semantic Web Information Management.
Neumann, T. & Moerkotte, G. 2011. Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In Proceedings of the 27th International Conference on Data Engineering, 984–994.
Neumann, T. & Weikum, G. 2008. RDF-3X: a RISC-style engine for RDF. Proceedings of the VLDB Endowment 1(1), 647659.
Neumann, T. & Weikum, G. 2010. The RDF-3X engine for scalable management of RDF data. The VLDB Journal 19(1), 91113.
Owens, A., Seaborne, A., Gibbins, N. & Schraefel, M. 2009. Clustered TDB: a clustered triple store for Jena. In Proceedings of the 13th International Conference on World Wide Web.
Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P. & Koziris, N. 2013. H2RDF+: high-performance distributed joins over large-scale RDF graphs. In Proceedings of the 2013 IEEE International Conference on Big Data, 255–263.
Papailiou, N., Konstantinou, I., Tsoumakos, D. & Koziris, N. 2012. H2RDF: adaptive query processing on RDF data in the cloud. In Proceedings of the 21st World Wide Web Conference, 397–400.
Patni, H., Henson, C. & Sheth, A. 2010. Linked sensor data. In Proceedings of the 2010 International Symposium on Collaborative Technologies and Systems, 362–370.
Przyjaciel-Zablocki, M., Schatzle, A., Hornung, T., Dorner, C. & Lausen, G. 2012. Cascading map-side joins over HBase for scalable join processing. In CoRR 2012.
Ravindra, P., Kim, H. S. & Anyanwu, K. 2011. An intermediate algebra for optimizing RDF graph pattern matching on MapReduce. In Proceedings of the 8th Extended Semantic Web Conference, 46–61.
Rohloff, K. & Schantz, R. E. 2011. Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store. In Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing, 35–44.
Sakr, S. & Al-Naymat, G. 2009. Relational processing of RDF queries: a survey. SIGMOD Record 38(4), 2328.
Salvadores, M., Correndo, G., Harris, S., Gibbins, N. & Shadbolt, N. 2011. The design and implementation of minimal RDFS backward reasoning in 4store. In Proceedings of the 8th Extended Semantic Web Conference, 139–153.
Salvadores, M., Correndo, G., Omitola, T., Gibbins, N., Harris, S. & Shadbolt, N. 2010. 4s-reasoner: RDFS backward chained reasoning support in 4store. In Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology—Workshops, 261–264.
Schmidt, M., Hornung, T., Kuchlin, N., Lausen, G. & Pinkel, C. 2008. An experimental comparison of RDF data management approaches in a SPARQL Benchmark scenario. In Proceedings of the 7th International Semantic Web Conference, 82–97.
Schmidt, M., Hornung, T., Lausen, G. & Pinkel, C. 2009. SP2Bench: a SPARQL Performance Benchmark. In Proceedings of the 25th International Conference on Data Engineering, 222–233.
Sequeda, J. F., Tirmizi, S. H., Corcho, O. & Miranker, D. P. 2011. Survey of directly mapping SQL databases to the semantic web. Knowledge Engineering Review 26(4), 445486.
Sidirourgos, L., Goncalves, R., Kersten, M. L., Nes, N. & Manegold, S. 2008. Column-store support for RDF data management: not all swans are white. Proceedings of the VLDB Endowment 1(2), 15531563.
Sintek, M. & Kiesel, M. 2006. RDFBroker: a signature-based high-performance RDF store. In Proceedings of the 3rd European Semantic Web Conference, 363–377.
Sperka, S. & Smrz, P. 2012. Towards adaptive and semantic database model for RDF data stores. In Proceedings of the Sixth International Conference on Complex, Intelligent, and Software Intensive Systems, 810–815.
Stein, R. & Zachrias, V. 2010. RDF on cloud number nine. In Proceedings of the 4th Workshop on New Forms of Reasoning for the Semantic Web: Scalable & Dynamic, 11–23.
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., Rasin, A., Tran, N. & Zdonik, S. 2005. C-Store: a column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, 553–564.
Suchanek, F. M., Kasneci, G. & Weikum, G. 2008. YAGO: a large ontology from Wikipedia and WordNet. Journal of Web Semantics 6(3), 203217.
Sun, J. L. & Jin, Q. 2010. Scalable RDF store based on HBase and MapReduce. In Proceedings of the 3rd International Conference Advanced Computer Theory and Engineering, V1-633–V1-636.
Theoharis, Y., Christophides, V. & Karvounarakis, G. 2005. Benchmarking database representations of RDF/S stores. In Proceedings of the 4th International Semantic Web Conference, 685–701.
Urbani, J., Kotoulas, S., Oren, E. & Harmelen, F. 2009. Scalable distributed reasoning using MapReduce. In Proceedings of the 8th International Semantic Web Conference, 634–649.
Wang, Y., Du, X. Y., Lu, J. H. & Wang, X. F. 2010. FlexTable: using a dynamic relation model to store RDF data. In Proceedings of the 15th International Conference on Database Systems for Advanced Applications, 580–594.
Weiss, C., Karras, P. & Bernstein, A. 2008. Hexastore: sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment 1(1), 10081019.
Wilkinson, K. 2006. Jena property table implementation. Technical report HPL-2006-140, HP Labs.
Wilkinson, K., Sayers, C., Kuno, H. A. & Reynolds, D. 2003. Efficient RDF storage and retrieval in Jena2. In Semantic Web and Databases Workshop, 131–150.
Wolff, B. G. J., Fletcher, G. H. L. & Lu, J. J. 2015. An extensible framework for query optimization on TripleT-based RDF stores. In Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference, 190–196.
Zeng, K., Yang, J. C., Wang, H. X., Shao, B. & Wang, Z. Y. 2013. A distributed graph engine for web scale RDF data. Proceedings of the VLDB Endowment 6(4), 265276.
Zhang, X. F., Chen, L. & Wang, M. 2012a. Towards efficient join processing over large RDF graph using MapReduce. In Proceedings of the 24th International Conference on Scientific and Statistical Database Management, 250–259.
Zhang, Y., Pham, M. D., Corcho, O. & Calbimonte, J. P. 2012b. SRBench: a streaming RDF/SPARQL benchmark. In Proceedings of the 11th International Semantic Web Conference, 641–657.

Related content

Powered by UNSILO

Storing massive Resource Description Framework (RDF) data: a survey

  • Zongmin Ma (a1) (a2), Miriam A. M. Capretz (a3) and Li Yan (a1) (a2)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.