Skip to main content Accessibility help
×
Home

Generic top-down discrimination for sorting and partitioning in linear time*

  • FRITZ HENGLEIN (a1)

Abstract

We introduce the notion of discrimination as a generalization of both sorting and partitioning, and show that discriminators (discrimination functions) can be defined generically, by structural recursion on representations of ordering and equivalence relations. Discriminators improve the asymptotic performance of generic comparison-based sorting and partitioning, and can be implemented not to expose more information than the underlying ordering, respectively equivalence relation. For a large class of order and equivalence representations, including all standard orders for regular recursive first-order types, the discriminators execute in the worst-case linear time. The generic discriminators can be coded compactly using list comprehensions, with order and equivalence representations specified using Generalized Algebraic Data Types. We give some examples of the uses of discriminators, including the most-significant digit lexicographic sorting, type isomorphism with an associative-commutative operator, and database joins. Source code of discriminators and their applications in Haskell is included. We argue that built-in primitive types, notably pointers (references), should come with efficient discriminators, not just equality tests, since they facilitate the construction of discriminators for abstract types that are both highly efficient and representation-independent.

Copyright

References

Hide All
Abramsky, S. & Jung, A. (1992) Domain theory. In Handbook of Logic in Computer Science Semantic Structures, Abramsky, S., Gabbay, Dov M. & Maibaum, T. S. E. (eds), vol. 3. New York, NY: Oxford University Press, pp. 1168.
Aho, A., Hopcroft, J. & Ullman, J. (1983) Data Structures and Algorithms. Boston, MA: Addison-Wesley.
Ajtai, M., Komlós, J. & Szemerédi, E. (1983) Sorting in c log n parallel steps. Combinatorica 3, 119.
Al-Badarneh, A. & El-Aker, F. (2004) Efficient adaptive in-place radix sorting. Informatica 15 (3), 295302.
Ambus, T. (2004, July) Multiset Discrimination for Internal and External Data Management. M.Phil. thesis, DIKU, University of Copenhagen, Denmark. Available at: http://plan-x.org/projects/msd/msd.pdf.
Andersson, A., Hagerup, T., Nilsson, S. & Raman, R. (1998) Sorting in linear time? J. Comput. Syst. Sci. (JCSS) 57 (1), 7493.
Andersson, A. & Nilsson, S. (1994) A new efficient radix sort. In Proceedings of the 35th Anniual IEEE Symposium on Foundations of Computer Science (FOCS), Santa Fe, NM, USA. pp. 714721.
Andersson, A. & Nilsson, S. (1998) Implementing radixsort. J. Exp. Algorithmics 3, 7.
Batcher, K. E. (1968) Sorting networks and their applications. In Proceedings of AFIPS Spring Joint Computer Conference, vol. 32. Montvale, NJ: AFIPS Press, pp. 307314.
Bentley, J. (1983) Programming pearls: Aha! algorithms. Commun. ACM 26 (9), 623627.
Bentley, J. (1986) Programming pearls: Little languages. Commun. ACM 29 (8), 711721.
Cai, J. & Paige, R. (1991) Look ma, no hashing, and no arrays neither. In Proceedings of the 18th Annual ACM Symposium on Principles of Programming Languages (POPL), Orlando, FL, USA, January, pp. 143154.
Cai, J. & Paige, R. (1995) Using multiset discrimination to solve language processing problems without hashing. Theor. Comput. Sci. (TCS) 145 (1–2), 189228.
Cormen, T. H., Leiserson, C. E., Rivest, R. L. & Stein, C. (2001) Introduction to Algorithms. 2nd ed., the MIT Electrical Engineering and Computer Science Series. ISBN 0-262-03293-7 (MIT Press) and 0-07-013151-1 (McGraw-Hill).Cambridge, MA and New York, NY: MIT Press and McGraw-Hill.
Danvy, O., Henglein, F., Mairson, H. & Pettorossi, A. (eds). (2008) Automatic Program Development – A Tribute to Robert Paige. Netherlands: Springer. ISBN 978-1-4020-6584-2 (Print), 978-1-4020-6585-9 (Online).
Dean, J. & Ghemawat, S. (2004, December) MapReduce: Simplified data processing on large clusters. In Proceedings of 6th Symposium on Operating Systems Design and Implementation (OSDI), San Francisco, CA, USA, pp. 137150.
Dershowitz, N. & Manna, Z. (1979) Proving termination with multiset orderings. Commun. ACM 22 (8), 465476.
Franceschini, G., Muthukrishnan, S. & Pǎtraşcu, M. (2007) Radix sorting with no extra space. In Proceedings of 15th European Symposium on Algorithms (esa), Eilat, Israel. Lecture Notes in Computer Science (LNCS), vol. 4698. Berlin, Germany: Springer, pp. 194205.
Fredman, M. L. & Willard, D. E. (1993) Surpassing the information-theoretic bound with fusion trees. J. Comput. Syst. Sci. (JCSS) 47, 424436.
Gil, Y. & Zibin, Y. (2005) Efficient algorithms for isomorphisms of simple types. Math. Struct. Comput. Sci. (MSCS) 15 (05), 917957.
Haskell, Glasgow. (2005) The Glasgow Haskell Compiler. Available at: http://www.haskell.org/ghc.
Grust, T., Sakr, S. & Teubner, J. (2004) XQuery on SQL hosts. In Proceedings of the 30th Int'l Conference on Very Large Databases (VLDB 2004), Toronto, Canada, vol. 30, 263 pp.
Han, Y. & Thorup, M. (2002) Integer sorting in o(n expected time and linear space. In Proceedings of the 43rd Annual IEEE Sympositum on Foundations of Computer Science (FOCS). Washington, DC: IEEE Computer Society, pp. 135144.
Henglein, F. (2003, September) Multiset Discrimination. Manuscript (incomplete). Denmark: Department of Computer Science, University of Copenhagen (DIKU).
Henglein, F. (2008) Generic discrimination: Sorting and partitioning unshared data in linear time. Proceeding of the 13th ACM Sigplan International Conference on Functional Programming (ICFP '08), Hook, J. & Thiemann, P. (eds). New York, NY: ACM, pp. 91102. Nominated by ACM SIGPLAN for CACM Research Highlights (available at: http://sigplan.org/CACMPapers.htm).
Henglein, F. (2009) What is a sorting function? J. Log. Algebr. Program. (JLAP) 78 (5), 381401. Invited submission to special issue on 19th Nordic Workshop on Programming Theory (NWPT).
Henglein, F. (2010) Optimizing relational algebra operations using discrimination-based joins and lazy products. In Proceedings of ACM Sigplan 2010 Workshop on Partial Evaluation and Program Manipulation. New York, NY: ACM, pp. 7382. Also DIKU TOPPS D-report no. 611.
Henglein, F. & Larsen, K. F. (2010a) Generic multiset programming for language-integrated querying. In Proceedings of the 6th ACM Sigplan Workshop on Generic Programming (WGP). New York, NY: ACM, pp. 4960.
Henglein, F. & Larsen, K. (2010b) Generic multiset programming with discrimination-based joins and symbolic Cartesian products. Higher-Order Symb. Comput. (HOSC) 23, 337370. Publication date: November 24, 2011.
Hinze, R. (2000) Generalizing generalized tries. J. Funct. Program. 10 (4), 327351.
Hoare, C. A. R. (1961) Algorithm 63: Partition. Commun. ACM 4 (7), 321.
Hudak, P., Peterson, J. & Fasel, J. H. (1999, May) A Gentle Introduction to Haskell, Version 98. Online tutorial. Available at: http://www.haskell.org/tutorial.
Jeuring, J. & Jansson, P. (1996) Polytypic programming. In Advanced Functional Programming. Lecture Notes in Computer Science, vol. 1129. London, UK: Springer-Verlag, pp. 68114.
Jha, S., Palsberg, J., Zhao, T. & Henglein, F. (2008) Efficient type matching. In: Automatic Program Developmen, Henglein, D., and Pettorossi, M (eds.). Netherlands: Springer, ISBN 978-1-4020-6584-2 (Print), 978-1-4020-6585-9 (Online).
Jouannaud, J. P. & Lescanne, P. (1982) On multiset orderings. Inf. Process. Lett. 25 (2), 5763.
Knuth, D. (1998) The Art of Computer Programming: Sorting and Searching. 2nd ed., vol. 3. Boston, MA: Addison-Wesley.
Maus, A.. (2002) ARL, a faster in-place, cache-friendly sorting algorithm. Proceedings of the Norwegian Informatics Conference (NIK), Kongsberg, Norway, Tapir Akademisk Forlag. ISBN 82-91116-45-8.
Mehlhorn, K. (1984) Data Structures and Algorithms 1: Sorting and Searching. EATCS Monographs on Theoretical Computer Science, vol. I. Berlin, Germany: Springer-Verlag.
Paige, R. (1991) Optimal Translation of User Input in Dynamically Typed Languages. (Draft).
Paige, R. (1994) Efficient translation of external input in a dynamically typed language. In Proceedings of 13th World Computer Congress, Pehrson, B. & Simon, I. (eds), vol. 1. North Holland: Elsevier Science B.V. pp. 603608.
Paige, R. & Tarjan, R. E. (1987) Three partition refinement algorithms. Siam J. Comput. 16 (6), 973989.
Paige, R. & Yang, Z. (1997) High-level reading and data structure compilation. In Proceedings of 24th ACM Sigplan-Sigact Symposia on Principles of Programming Languages (POPL), Paris, France. New York, NY: ACM Press, pp. 456469. Available at: http://www.acm.org.
Peyton Jones, S. (2003) The Haskell 98 language. J. Funct.Program. (JFP) 13 (1), 0146.
Shell, D. L. (1959) A high-speed sorting procedure. Commun. ACM 2 (7), 3032.
Sinha, R. & Zobel, J. (2003) Efficient Trie-Based Sorting Of Large Sets Of Strings. In Oudshoorn, Michael J. (ed), CRPIT, vol. 16. Sydney, NSW: Australian Computer Society (ACS), pp. 1118.
Strachey, C. (2000) Fundamental concepts in programming languages. Higher-Order Symb. Comput. 13 (1), 1149.
Tarjan, R. (1983) Data Structures and Network Flow Algorithms. Regional Conference Series in Applied Mathematics, vol. CMBS 44. Philadelphia, PA: SIAM.
Trinder, P. & Wadler, P. (1988, August) List comprehensions and the relational calculus. In Proceedings of 1988 Glasgow Workshop on Functional Programming, pp. 115–123.
Williams, J. W. J. (1964) Algorithm 232 – Heapsort. Commun. ACM 7 (6), 347348.
Zibin, Y., Gil, J. & Considine, J. (2003) Efficient algorithms for isomorphisms of simple types. In Proceedings of 30th Annual ACM Sigplan-Sigact Symposium on Principles of Programming Languages (POPL), SIGPLAN Notices, vol. 38, no. 1. New York, NY: ACM Press, pp. 160171.

Generic top-down discrimination for sorting and partitioning in linear time*

  • FRITZ HENGLEIN (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed

Generic top-down discrimination for sorting and partitioning in linear time*

  • FRITZ HENGLEIN (a1)
Submit a response

Discussions

No Discussions have been published for this article.

×

Reply to: Submit a response


Your details


Conflicting interests

Do you have any conflicting interests? *