Push versus pull-based loop fusion in query engines

AMIR SHAIKHHA; MOHAMMAD DASHTI; CHRISTOPH KOCH

doi:10.1017/S0956796818000102

Push versus pull-based loop fusion in query engines

Part of: Big Data Special Collection

Published online by Cambridge University Press: 10 April 2018

AMIR SHAIKHHA ,

MOHAMMAD DASHTI and

CHRISTOPH KOCH

Show author details

AMIR SHAIKHHA: Affiliation:
École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland (e-mails: amir.shaikhha@epfl.ch, mohammad.dashti@epfl.ch, christoph.koch@epfl.ch)
MOHAMMAD DASHTI: Affiliation:
École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland (e-mails: amir.shaikhha@epfl.ch, mohammad.dashti@epfl.ch, christoph.koch@epfl.ch)
CHRISTOPH KOCH: Affiliation:
École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland (e-mails: amir.shaikhha@epfl.ch, mohammad.dashti@epfl.ch, christoph.koch@epfl.ch)

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Database query engines use pull-based or push-based approaches to avoid the materialization of data across query operators. In this paper, we study these two types of query engines in depth and present the limitations and advantages of each engine. Similarly, the programming languages community has developed loop fusion techniques to remove intermediate collections in the context of collection programming. We draw parallels between databases (DB) and programming language (PL) research by demonstrating the connection between pipelined query engines and loop fusion techniques. Based on this connection, we propose a new type of pull-based engine, inspired by a loop fusion technique, which combines the benefits of both approaches. Then, we experimentally evaluate the various engines, in the context of query compilation, for the first time in a fair environment, eliminating the biasing impact of ancillary optimizations that have traditionally only been used with one of the approaches. We show that for realistic analytical workloads, there is no considerable advantage for either form of pipelined query engine, as opposed to what recent research suggests. Also, by using micro-benchmarks, which demonstrate certain edge cases on which one approach or the other performs better, we show that our proposed engine dominates the existing engines by combining the benefits of both.

Type: Research Article
Information: Journal of Functional Programming , Volume 28 , 2018 , e10

DOI: https://doi.org/10.1017/S0956796818000102 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2018

References

Abadi, D., Madden, S. & Ferreira, M. (2006) Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. ACM, pp. 671–682.Google Scholar

Abadi, D. J., Myers, D. S., DeWitt, D. J. & Madden, S. R. (2007) Materialization strategies in a column-oriented DBMS. In Proceedings of the IEEE 23rd International Conference on Data Engineering, ICDE 2007. IEEE, pp. 466–475.Google Scholar

Ahmad, Y. & Koch, C. (2009) DBToaster: A SQL compiler for high-performance delta processing in main-memory databases. PVLDB 2 (2), 1566–1569.Google Scholar

Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A. & Zaharia, M. (2015) Spark SQL: Relational data processing in spark. In Proceedings of the SIGMOD '15. New York, NY, USA: ACM.Google Scholar

Biboudis, A., Palladinos, N., Fourtounis, G. & Smaragdakis, Y. (2015) Streams à la carte: Extensible pipelines with object algebras. In Proceedings of the 29th European Conference on Object-Oriented Programming, p. 591.Google Scholar

Binnig, C., Hildenbrand, S., & Färber, F. (2009) Dictionary-based order-preserving string compression for main memory column stores. In Proceedings of the SIGMOD '09. ACM, pp. 283–296.Google Scholar

Böhm, C. & Berarducci, A. (1985) Automatic synthesis of typed λ-programs on term algebras. Theor. Comput. Sci. 39, 135–154.CrossRef Google Scholar

Breazu-Tannen, V. & Subrahmanyam, R. (1991) Logical and Computational Aspects of Programming with Sets/Bags/Lists. Springer.Google Scholar

Breazu-Tannen, V., Buneman, P. & Wong, L. (1992) Naturally Embedded Query Languages. Springer.CrossRef Google Scholar

Buchlovsky, P. & Thielecke, H. (2006) A type-theoretic reconstruction of the visitor pattern. Electron. Notes Theor. Comput. Sci. 155, 309–329.Google Scholar

Chhugani, J., Nguyen, A. D., Lee, V. W., Macy, W., Hagog, M., Chen, Y.-K., Baransi, A., Kumar, S. & Dubey, P. (2008) Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB 1 (2), 1313–1324.Google Scholar

Choi, J.-D., Gupta, M., Serrano, M., Sreedhar, V. C. & Midkiff, S. (1999) Escape analysis for java. ACM SIGPLAN Notices 34 (10), 1–19.Google Scholar

Coutts, D., Leshchinskiy, R. & Stewart, D. (2007) Stream fusion. From lists to streams to nothing at all. In Proceedings of the ICFP '07.Google Scholar

Crotty, A., Galakatos, A., Dursun, K., Kraska, T., Çetintemel, U. & Zdonik, S. B. (2015) Tupleware: “Big” data, big analytics, small clusters. In Proceedings of the CIDR.Google Scholar

Diaconu, C., Freedman, C., Ismert, E., Larson, P.-A., Mittal, P., Stonecipher, R., Verma, N. & Zwilling, M. (2013) Hekaton: SQL server's memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13. New York, NY, USA: ACM, pp. 1243–1254.Google Scholar

Emir, B., Odersky, M. & Williams, J. (2007) Matching objects with patterns. In Proceedings of the ECOOP'07. Berlin, Heidelberg: Springer-Verlag.Google Scholar

Fegaras, L. & Maier, D. (2000) Optimizing object queries using an effective calculus. TODS 25 (4), 457–516.Google Scholar

Gedik, B., Andrade, H., Wu, K.-L., Yu, P. & Doo, M. (2008) SPADE: The system S seclarative stream processing engine. In Proceedings of the SIGMOD.Google Scholar

Gibbons, J. & Oliveira, B. C. d S. (2009) The essence of the iterator pattern. J. Funct. Program. 19 (3–4), 377–402.Google Scholar

Gill, A., Launchbury, J. & Peyton Jones, S. L. (1993) A short cut to deforestation. In Proceedings of the FPCA. ACM.Google Scholar

Graefe, G. (1994) Volcano–an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6 (1), 120–135.Google Scholar

Graefe, G. (1993) Query evaluation techniques for large databases. CSUR 25 (2), 73–169.CrossRef Google Scholar

Grust, T. & Scholl, M. (1999) How to comprehend queries functionally. J. Intell. Inform. Syst. 12 (2–3), 191–218.CrossRef Google Scholar

Grust, T., Mayr, M., Rittinger, J. & Schreiber, T. (2009) FERRY: Database-supported program execution. In Proceedings of the SIGMOD 2009. ACM.Google Scholar

Grust, T., Rittinger, J. & Schreiber, T. (2010) Avalanche-safe LINQ compilation. PVLDB 3 (1–2), 162–172.Google Scholar

Hellerstein, J. M., Stonebraker, M. & Hamilton, J. (2007) Architecture of a database system. Found. Trends® Databases 1 (2), 141–259.CrossRef Google Scholar

Hinze, R., Harper, T. & James, D. W. H. (2011) Theory and practice of fusion. In Proceedings of the 22Nd International Conference on Implementation and Application of Functional Languages, IFL'10. Berlin, Heidelberg: Springer-Verlag, pp. 19–37.Google Scholar

Hirzel, M., Soulé, R., Schneider, S., Gedik, B. & Grimm, R. (2014) A catalog of stream processing optimizations. ACM Comput. Surv. 46 (4), 46:1–46:34.Google Scholar

Hofer, C. & Ostermann, K. (2010) Modular domain-specific language components in scala. In Proceedings of the 9th International Conference on Generative Programming and Component Engineering, GPCE '10. New York, NY, USA: ACM, pp. 83–92.Google Scholar

Hudak, P. (1996) Building domain-specific embedded languages. ACM Comput. Surv. 28 (4es), 196.Google Scholar

Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, S. K., Kersten, M. L., (2012) MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35 (1), 40–45.Google Scholar

Jones, S. L. P., Hall, C., Hammond, K., Partain, W. & Wadler, P. (1993) The glasgow Haskell compiler: A technical overview. In Proceedings of the UK Joint Framework for Information Technology, Technical Conference, vol. 93. Citeseer.Google Scholar

Jonnalagedda, M. & Stucki, S. (2015) Fold-based fusion as a library: A generative programming pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala. ACM, pp. 41–50.Google Scholar

Karpathiotakis, M., Alagiannis, I., Heinis, T, Branco, M. & Ailamaki, A. (2015) Just-in-time data virtualization: Lightweight data management with ViDa. In Proceedings of the CIDR.Google Scholar

Karpathiotakis, M., Alagiannis, I. & Ailamaki, A. (2016) Fast queries over heterogeneous data through engine customization. In Proceedings of the VLDB Endowment 9 (12), 972–983.Google Scholar

Klonatos, Y., Koch, C., Rompf, T. & Chafi, H. (2014a) Building efficient query engines in a high-level language. PVLDB 7 (10), 853–864.Google Scholar

Klonatos, Y., Koch, C., Rompf, T. & Chafi, H. (2014b) Errata for “Building efficient query engines in a high-level language” PVLDB 7(10):853-864. PVLDB 7 (13), 1784–1784.Google Scholar

Koch, C. (2010) Incremental query evaluation in a ring of databases. In Proceedings of the PODS 2010. ACM.CrossRef Google Scholar

Koch, C. (2014) Abstraction without regret in database systems building: A manifesto. IEEE Data Eng. Bull. 37 (1), 70–79.Google Scholar

Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., Nötzli, A., Lupei, D. & Shaikhha, A. (2014) DBToaster: Higher-order delta processing for dynamic, frequently fresh views. Vldbj 23 (2), 253–278.Google Scholar

Krikellas, K., Viglas, S. & Cintra, M. (2010) Generating code for holistic query evaluation. In Proceedings of the ICDE, pp. 613–624.Google Scholar

Li, Z. & Ross, K. A. (1999) Fast joins using join indices. VLDB J. 8 (1), 1–24.CrossRef Google Scholar

Lorie, R. A. (1974) XRM: An Extended (N-ary) Relational Memory. IBM.Google Scholar

Mainland, G., Leshchinskiy, R. & Peyton Jones, S. (2013) Exploiting vector instructions with generalized stream fusion. In Proceedings of the ICFP '13. New York, NY, USA: ACM.Google Scholar

Meijer, E., Beckman, B. & Bierman, G. (2006) LINQ: Reconciling object, relations and XML in the .NET framework. In Proceedings of the SIGMOD '06. ACM.Google Scholar

Murray, D. G., Isard, M. & Yu, Y. (2011) Steno: Automatic optimization of declarative queries. In Proceedings of the PLDI '11. New York, NY, USA: ACM.Google Scholar

Nagel, F., Bierman, G. & Viglas, S. D. (2014) Code generation for efficient query processing in managed runtimes. PVLDB 7 (12), 1095–1106.Google Scholar

Neumann, T. (2011) Efficiently compiling efficient query plans for modern hardware. PVLDB 4 (9), 539–550.Google Scholar

Padmanabhan, S., Malkemus, T., Jhingran, A. & Agarwal, R. (2001) Block oriented processing of relational database operations in modern computer architectures. In Proceedings of the ICDE, pp. 567–574.Google Scholar

Paredaens, J. & Gucht, D. V. (1988) Possibilities and limitations of using flat operators in nested algebra expressions. In Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, March 21–23, 1988, Austin, Texas, USA, pp. 29–38.Google Scholar

Park, Y., Seo, S., Park, H., Cho, H. K., & Mahlke, S. (2012) SIMD Defragmenter: Efficient ILP realization on data-parallel architectures. In Proceedings of the ACM SIGARCH Computer Architecture News, vol. 40. ACM, pp. 363–374.CrossRef Google Scholar

Peyton Jones, S., Leshchinskiy, R., Keller, G. & MT Chakravarty, M.. (2008) Harnessing the multicores: Nested data parallelism in Haskell. In Proceedings of the LIPIcs-Leibniz International Proceedings in Informatics, vol. 2. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google Scholar

Pierce, B. C. (2002) Types and Programming Languages. MIT press.Google Scholar

Polychroniou, O., Raghavan, A. & Ross, K. A. (2015) Rethinking SIMD vectorization for in-memory databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15. New York, NY, USA: ACM, pp. 1493–1508.Google Scholar

Schuh, S., Chen, X. & Dittrich, J. (2016) An experimental comparison of thirteen relational equi-joins in main memory. In Proceedings of the SIGMOD '16. New York, NY, USA: ACM, pp. 1961–1976.Google Scholar

Shaikhha, A., Klonatos, Y. & Koch, C. (2018) Building efficient query engines in a high-level language. Trans. Database Syst. 43 (1).Google Scholar

Shaikhha, A., Klonatos, Y., Parreaux, L., Brown, L., Dashti, M. & Koch, C. (2016) How to architect a query compiler. In Proceedings of the SIGMOD'16.Google Scholar

Shivers, O. & Might, M. (2006) Continuations and transducer composition. In Proceedings of the PLDI '06. ACM.Google Scholar

Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O'Neil, E., O'Neil, P., Rasin, A., Tran, N. & Zdonik, S. (2005) C-store: A column-oriented DBMS. In Proceedings of the VLDB '05. VLDB Endowment.Google Scholar

Svenningsson, J. (2002) Shortcut fusion for accumulating parameters & zip-like Functions. In Proceedings of the ICFP '02. ACM.Google Scholar

Tibbetts, R., Yang, S., MacNeill, R. & Rydzewski, D. (2011) StreamBase LiveView: Push-based real-time analytics. In Proceedings of the StreamBase Systems (Jan 2012).Google Scholar

Transaction Processing Performance Council. (2017) TPC-H, a Decision Support Benchmark. http://www.tpc.org/tpch.Google Scholar

Trinder, P. (1992) Comprehensions, a query notation for DBPLs. In Proceedings of the 3rd DBPL Workshop, DBPL3. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, pp. 55–68.Google Scholar

Veldhuizen, T. L. (2014) Leapfrog triejoin: A simple, worst-case optimal join algorithm. In Proceedings of the 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24–28, 2014.Google Scholar

Viglas, S., Bierman, G. M., & Nagel, F. (2014) Processing declarative queries through generating imperative code in managed runtimes. IEEE Data Eng. Bull. 37 (1), 12–21.Google Scholar

Vlissides, J., Helm, R., Johnson, R. & Gamma, E. (1995) Design patterns: Elements of reusable object-oriented software. Reading: Addison-Wesley 49 (120), 11.Google Scholar

Wadler, P. (1988) Deforestation: Transforming programs to eliminate trees. In Proceedings of the ESOP'88. Springer, pp. 344–358.CrossRef Google Scholar

Wadler, P. (1990) Comprehending monads. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, LFP '90. New York, NY, USA: ACM, pp. 61–78.Google Scholar

Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M. J., Shenker, S., & Stoica, I. (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the NSDI'12. USENIX Association.Google Scholar

Zhou, J. & Ross, K. A. (2002) Implementing database operations using SIMD instructions. In Proceedings of the SIGMOD '02. New York, NY, USA: ACM.Google Scholar

Zukowski, M., Boncz, P. A., Nes, N., & Héman, S. (2005) MonetDB/X100 – A DBMS In The CPU Cache. IEEE Data Eng. Bull. 28, 17–22.Google Scholar

Zukowski, M., Heman, S., Nes, N., & Boncz, P. (2006) Super-scalar RAM-CPU cache compression. In Proceedings of the 22nd International Conference on Data Engineering, ICDE '06. Washington, DC, USA: IEEE Computer Society, p. 59.Google Scholar

Submit a response

Discussions

No Discussions have been published for this article.

Article contents

Push versus pull-based loop fusion in query engines

Abstract

References

Discussions

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests