Skip to main content Accessibility help
×
Home

Federated query processing on linked data: a qualitative survey and open challenges

  • Damla Oguz (a1) (a2) (a3), Belgin Ergenc (a1), Shaoyi Yin (a3), Oguz Dikenelli (a2) and Abdelkader Hameurlain (a3)...

Abstract

A large number of data providers publish and connect their structured data on the Web as linked data. Thus, the Web of data becomes a global data space. In this paper, we initially give an overview of query processing approaches used in this interlinked and distributed environment, and then focus on federated query processing on linked data. We provide a detailed and clear insight on data source selection, join methods and query optimization methods of existing query federation engines. Furthermore, we present a qualitative comparison of these engines and give a complementary comparison of the measured metrics of each engine with the idea of pointing out the major strengths of each one. Finally, we discuss the major challenges of federated query processing on linked data.

Copyright

References

Hide All
Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J. & Ruckhaus, E. 2011. ANAPSID: an adaptive query processing engine for SPARQL endpoints. In The Semantic Web ISWC 2011, Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N. & Blomqvist, E. (eds), Lecture Notes in Computer Science 7031, 18–34. Springer.
Adali, S., Candan, K. S., Papakonstantinou, Y. & Subrahmanian, V. S. 1996. Query caching and optimization in distributed mediator systems. ACM SIGMOD Record 25(2), 137146.
Akar, Z., Halaç, T. G., Ekinci, E. E. & Dikenelli, O. 2012. Querying the web of interlinked datasets using VOID descriptions. In Linked Data on the Web (LDOW2012).
Alexander, K. & Hausenblas, M. 2009. Describing linked datasets—on the design and usage of VoID, the ‘Vocabulary of Interlinked Datasets’. In WWW 2009 Workshop: Linked Data on the Web (LDOW2009).
Amsaleg, L., Franklin, M. J. & Tomasic, A. 1998. Dynamic query operator scheduling for wide-area remote access. Distributed and Parallel Databases 6(3), 217246.
Arcangeli, J., Hameurlain, A., Migeon, F. & Morvan, F. 2004. Mobile agent based self-adaptive join for wide-area distributed query processing. Journal of Database Management (JDM) 15(4), 2544.
Avnur, R. & Hellerstein, J. M. 2000. Eddies: continuously adaptive query processing. ACM SIGMOD Record 29(2), 261272.
Babu, S., Bizarro, P. & DeWitt, D. 2005. Proactive re-optimization. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD’05, 107–118. ACM.
Berners-Lee, T. 2006. Linked data—design issues. http://www.w3.org/DesignIssues/LinkedData.html.
Bizarro, P., Babu, S., DeWitt, D. & Widom, J. 2005. Content-based routing: different plans for different data. In Proceedings of the 31st International Conference on Very Large Data Bases, VLDB’05, 757–768. VLDB Endowment.
Bizer, C. 2009. The emerging web of linked data. IEEE Intelligent Systems 24(5), 8792.
Bizer, C., Heath, T. & Berners-Lee, T. 2009. Linked data—the story so far. International Journal on Semantic Web and Information Systems 5(3), 122.
Blanco, E., Cardinale, Y. & Vidal, M.-E. 2012. Experiences of sampling-based approaches for estimating qos parameters in the web service composition problem. IJWGS 8(1), 130.
Buil-Aranda, C., Arenas, M., Corcho, O. & Polleres, A. 2013. Federating queries in SPARQL 1.1: syntax, semantics and evaluation. Web Semantics: Science, Services and Agents on the World Wide Web 18(1), 117.
Buil-Aranda, C., Polleres, A. & Umbrich, J. 2014. Strategies for executing federated queries in SPARQL 1.1. In The Semantic Web—ISWC 2014—13th International Semantic Web Conference, 19–23 October. Proceedings, Part II, 390–405.
Cambazoglu, B. B., Altingovde, I. S., Ozcan, R. & Ulusoy, O. 2012. Cache-based query processing for search engines. ACM Transactions on the Web (TWEB) 6(4), 14.
Cyganiak, R., Zhao, J., Alexander, K. & Hausenblas, M. 2011. Describing linked datasets with the VoID vocabulary. http://rdfs.org/ns/void/.
Deshpande, A. 2004. An initial study of overheads of eddies. ACM SIGMOD Record 33(1), 4449.
Deshpande, A. & Hellerstein, J. M. 2004. Lifting the burden of history from adaptive query processing. In Proceedings of the Thirtieth International Conference on Very Large Data Bases—Volume 30, VLDB’04, 948–959. VLDB Endowment.
Deshpande, A., Ives, Z. & Raman, V. 2007. Adaptive query processing. Found Trends Databases 1(1), 1140.
Fionda, V., Gutierrez, C. & Pirró, G. 2012. Semantic navigation on the web of data: specification of routes, web fragments and actions. In Proceedings of the 21st International Conference on World Wide Web, WWW’12, 281–290. ACM.
Florescu, D., Levy, A., Manolescu, I. & Suciu, D. 1999. Query optimization in the presence of limited access patterns. ACM SIGMOD Record 28(2), 311322.
Gan, Q. & Suel, T. 2009. Improved techniques for result caching in web search engines. In Proceedings of the 18th International Conference on World Wide Web, WWW’09, 431–440. ACM.
Gardarin, G. & Valduriez, P. 1990. Relational Databases and Knowledge Bases. Addison-Wesley Longman Publishing Co., Inc.
Görlitz, O. & Staab, S. 2011a. Federated data management and query optimization for linked open data. In New Directions in Web Data Management 1, Vakali, A. & Jain, L. C. (eds), Studies in Computational Intelligence 331, 109137. Springer.
Görlitz, O. & Staab, S. 2011b. SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In Proceedings of the Second International Workshop on Consuming Linked Data (COLD2011), 23 October, Hartig, O., Harth, A. & Sequeda, J. (eds), CEUR Workshop Proceedings 782, CEUR-WS.org
Haas, L. M., Kossmann, D., Wimmers, E. L. & Yang, J. 1997. Optimizing queries across diverse data sources. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB’97, 276–285. Morgan Kaufmann Publishers, Inc.
Han, W.-S., Ng, J., Markl, V., Kache, H. & Kandil, M. 2007. Progressive optimization in a shared-nothing parallel database. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD’07, 809–820. ACM.
Hartig, O. 2011. Zero-knowledge query planning for an iterator implementation of link traversal based query execution. In Proceedings of the 8th Extended Semantic Web Conference on The Semantic Web: Research and Applications—Volume Part I, ESWC’11, 154–169. Springer-Verlag.
Hartig, O. 2013. SQUIN: a traversal based query execution system for the web of linked data. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD’13, 1081–1084. ACM.
Hartig, O., Bizer, C. & Freytag, J.-C. 2009. Executing SPARQL queries over the web of linked data. In The Semantic Web—ISWC 2009, Bernstein, A., Karger, D., Heath, T., Feigenbaum, L., Maynard, D., Motta, E. & Thirunarayan, K. (eds), Lecture Notes in Computer Science 5823, 293309. Springer.
Hartig, O. & Langegger, A. 2010. A database perspective on consuming linked data on the web. Datenbank-Spektrum 10(2), 5766.
Ibaraki, T. & Kameda, T. 1984. On the optimal nesting order for computing n-relational joins. ACM Transactions on Database Systems 9(3), 482502.
Ives, Z. G., Florescu, D., Friedman, M., Levy, A. & Weld, D. S. 1999. An adaptive query execution system for data integration. ACM SIGMOD Record 28(2), 299310.
Kabra, N. & DeWitt, D. J. 1998. Efficient mid-query re-optimization of sub-optimal query execution plans. ACM SIGMOD Record 27(2), 106117.
Kache, H., Han, W.-S., Markl, V., Raman, V. & Ewen, S. 2006. POP/FED: progressive query optimization for federated queries in DB2. In Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB’06, 1175–1178. VLDB Endowment.
Lorey, J. & Naumann, F. 2013. Caching and prefetching strategies for SPARQL queries. In The Semantic Web: ESWC 2013 Satellite Events, Cimiano, P., Fernndez, M., Lopez, V., Schlobach, S. & Vlker, J. (eds), Lecture Notes in Computer Science 7955, 4665. Springer.
Lynden, S., Kojima, I., Matono, A. & Tanimura, Y. 2010. Adaptive integration of distributed semantic web data. In Proceedings of the 6th International Conference on Databases in Networked Information Systems, DNIS’10, 174–193. Springer-Verlag.
Lynden, S., Kojima, I., Matono, A. & Tanimura, Y. 2011. ADERIS: an adaptive query processor for joining federated SPARQL endpoints. In Proceedings of the 2011th Confederated International Conference on the Move to Meaningful Internet Systems—Volume Part II, OTM’11, 808–817. Springer-Verlag.
Markl, V., Raman, V., Simmen, D., Lohman, G., Pirahesh, H. & Cilimdzic, M. 2004. Robust query processing through progressive optimization. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD’04, 659–670. ACM.
Martin, M., Unbehauen, J. & Auer, S. 2010. Improving the performance of semantic web applications with SPARQL query caching. In Proceedings of the 7th International Conference on The Semantic Web: Research and Applications—Volume Part II, ESWC’10, 304–318. Springer-Verlag.
Ozakar, B., Morvan, F. & Hameurlain, A. 2005. Mobile join operators for restricted sources. Mobile Information Systems 1(3), 167184.
Ozsu, M. & Valduriez, P. 2011. Principles of Distributed Database Systems, 3rd edition. Springer.
Quilitz, B. & Leser, U. 2008. Querying distributed RDF data sources with SPARQL. In Proceedings of the 5th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC’08, 524–538. Springer-Verlag.
Rakhmawati, N. A., Umbrich, J., Karnstedt, M., Hasnain, A. & Hausenblas, M. 2013. Querying over federated SPARQL endpoints—a state of the art survey. CoRR abs/1306.1723.
Raman, V., Deshpande, A. & Hellerstein, J. M. 2003. Using state modules for adaptive query processing. In Proceedings of the 19th International Conference on Data Engineering, 5–8 March, 353–364.
Saleem, M., Khan, Y., Hasnain, A., Ermilov, I. & Ngomo, A. N. 2015. A fine-grained evaluation of SPARQL endpoint federation systems. Semantic Web Journal, 1–26. http://content.iospress.com/articles/semantic-web/sw186.
Saleem, M. & Ngomo, A. N. 2014. HiBISCuS: hypergraph-based source selection for SPARQL endpoint federation. In The Semantic Web: Trends and Challenges—11th International Conference, ESWC 2014, 25–29 May. Proceedings, 176–191.
Saleem, M., Ngomo, A. N., Parreira, J. X., Deus, H. F. & Hauswirth, M. 2013. DAW: duplicate-aware federated query processing over the web of data. In The Semantic Web—ISWC 2013—12th International Semantic Web Conference, 21–25 October, Proceedings, Part I, 574–590.
Schwarte, A., Haase, P., Hose, K., Schenkel, R. & Schmidt, M. 2011. FedX: optimization techniques for federated query processing on linked data. In The Semantic Web—ISWC 2011—10th International Semantic Web Conference, 23–27 October, Proceedings, Part I, 601–616.
Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C. & Reynolds, D. 2008. SPARQL basic graph pattern optimization using selectivity estimation. In Proceedings of the 17th International Conference on World Wide Web, WWW 2008, 21–25 April, 595–604.
Umbrich, J., Karnstedt, M., Hogan, A. & Parreira, J. X. 2012a. Freshening up while staying fast: towards hybrid SPARQL queries. In Knowledge Engineering and Knowledge Management—18th International Conference, EKAW 2012, 8–12 October. Proceedings, 164–174.
Umbrich, J., Karnstedt, M., Hogan, A. & Parreira, J. X. 2012b. Hybrid SPARQL queries: fresh vs. fast results. In The Semantic Web—ISWC 2012—11th International Semantic Web Conference, 11–15 November, Proceedings, Part I, 608–624.
Urhan, T. & Franklin, M. J. 2000. XJoin: a reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin 23(2), 2733.
Vidal, M., Ruckhaus, E., Lampo, T., Martnez, A., Sierra, J. & Polleres, A. 2010. Efficiently joining group patterns in SPARQL queries. In The Semantic Web: Research and Applications, 7th Extended Semantic Web Conference, ESWC 2010, 30 May 30–3 June, Proceedings, Part I, 228–242.
Wang, X., Tiropanis, T. & Davis, H. C. 2013. LHD: optimising linked data query processing using parallelisation. In Proceedings of the WWW2013 Workshop on Linked Data on the Web, 14 May.
Wiederhold, G. 1992. Mediators in the architecture of future information systems. IEEE Computer 25(3), 3849.
Williams, G. T. & Weaver, J. 2011. Enabling fine-grained HTTP caching of SPARQL query results. In The Semantic Web—ISWC 2011—10th International Semantic Web Conference, 23–27 October, Proceedings, Part I, 762–777.
Wilschut, A. N. & Apers, P. M. G. 1991. Dataflow query execution in a parallel main-memory environment. In Proceedings of the First International Conference on Parallel and Distributed Information Systems, PDIS’91, 68–77. IEEE Computer Society Press.
Yönyül, B. 2014. Performance Management in Federated Linked Data Query Engines. Master’s thesis, Ege University.
Zhou, Y., De, S. & Moessner, K. 2013. Implementation of federated query processing on linked data. In 2013 IEEE 24th International Symposium on Personal Indoor and Mobile Radio Communications (PIMRC), 3553–3557.

Related content

Powered by UNSILO

Federated query processing on linked data: a qualitative survey and open challenges

  • Damla Oguz (a1) (a2) (a3), Belgin Ergenc (a1), Shaoyi Yin (a3), Oguz Dikenelli (a2) and Abdelkader Hameurlain (a3)...

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.