Estimating the overlap between dependent computations for automatic parallelization

PAUL BONE; ZOLTAN SOMOGYI; PETER SCHACHTE

doi:10.1017/S1471068411000184

Estimating the overlap between dependent computations for automatic parallelization

Published online by Cambridge University Press: 06 July 2011

PAUL BONE ,

ZOLTAN SOMOGYI and

PETER SCHACHTE

Show author details

PAUL BONE: Affiliation:
Department of Computer Science and Software EngineeringThe University of Melbourne and National ICT Australia (NICTA), Australia (e-mail: pbone@csse.unimelb.edu.au, zs@csse.unimelb.edu.au)
ZOLTAN SOMOGYI: Affiliation:
Department of Computer Science and Software EngineeringThe University of Melbourne and National ICT Australia (NICTA), Australia (e-mail: pbone@csse.unimelb.edu.au, zs@csse.unimelb.edu.au)
PETER SCHACHTE: Affiliation:
Department of Computer Science and Software EngineeringThe University of Melbourne, Australia (e-mail: schachte@unimelb.edu.au)

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Researchers working on the automatic parallelization of programs have long known that too much parallelism can be even worse for performance than too little, because spawning a task to be run on another CPU incurs overheads. Autoparallelizing compilers have therefore long tried to use granularity analysis to ensure that they only spawn off computations whose cost will probably exceed the spawn-off cost by a comfortable margin. However, this is not enough to yield good results, because data dependencies may also limit the usefulness of running computations in parallel. If one computation blocks almost immediately and can resume only after another has completed its work, then the cost of parallelization again exceeds the benefit. We present a set of algorithms for recognizing places in a program where it is worthwhile to execute two or more computations in parallel that pay attention to the second of these issues as well as the first. Our system uses profiling information to compute the times at which a procedure call consumes the values of its input arguments and the times at which it produces the values of its output arguments. Given two calls that may be executed in parallel, our system uses the times of production and consumption of the variables they share to determine how much their executions would overlap if they were run in parallel, and therefore whether executing them in parallel is a good idea or not. We have implemented this technique for Mercury in the form of a tool that uses profiling data to generate recommendations about what to parallelize, for the Mercury compiler to apply on the next compilation of the program. We present preliminary results that show that this technique can yield useful parallelization speedups, while requiring nothing more from the programmer than representative input data for the profiling run.

Keywords

automatic parallelism program analysis program optimization Mercury

Type: Regular Papers
Information: Theory and Practice of Logic Programming , Volume 11 , Special Issue 4-5: 27th International Conference on Logic Programming , July 2011 , pp. 575 - 591

DOI: https://doi.org/10.1017/S1471068411000184 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bevemyr, J., Lindgren, T. and Millroth, H. 1993. Reform Prolog: The language and its implementation. In Proceedings of the Tenth International Conference on Logic Programming. Budapest, Hungary, 283–298.Google Scholar

Casas, A., Carro, M. and Hermenegildo, M. V. 2007. Annotation algorithms for unrestricted independent AND-parallelism in logic programs. In Proceedings of the 17th International Symposium on Logic-based Program Synthesis and Transformation. Lyngby, Denmark, 138–153.Google Scholar

Gras, D. C. and Hermenegildo, M. V. 2009. Non-strict independence-based program parallelization using sharing and freeness information. Theoretical Computer Science 410 (46), 4704–4723.CrossRef Google Scholar

Halstead, R. H. 1984. Implementation of MultiLisp: Lisp on a multiprocessor. In Proceedings of the 1984 ACM Symposium on List and Functional Programming. Austin, Texas, 9–17.Google Scholar

Harris, T. and Singh, S. 2007. Feedback directed implicit parallelism. SIGPLAN Notices 42 (9), 251–264.CrossRef Google Scholar

Lopez, P., Hermenegildo, M. and Debray, S. 1996. A methodology for granularity-based control of parallelism in logic programs. Journal of Symbolic Computation 22 (4), 715–734.CrossRef Google Scholar

Marlow, S., Jones, S. P. and Singh, S. 2009. Runtime support for multicore Haskell. SIGPLAN Notices 44 (9), 65–78.Google Scholar

Muthukumar, K., Bueno, F., de la Banda, M. J. G. and Hermenegildo, M. V. 1999. Automatic compile-time parallelization of logic programs for restricted, goal level, independent AND-parallelism. Journal of Logic Programming 38 (2), 165–218.Google Scholar

Pontelli, E., Gupta, G., Pulvirenti, F. and Ferro, A. 1997. Automatic compile-time parallelization of prolog programs for dependent and-parallelism. In Proceedings of the 14th International Conference on Logic Programming. Leuven, Belgium, 108–122.Google Scholar

Shen, K., Costa, V. S. and King, A. 1999. Distance: A new metric for controlling granularity for parallel execution. Journal of Functional and Logic Programming 1999 (1).Google Scholar

Somogyi, Z., Henderson, F. and Conway, T. 1996. The execution algorithm of Mercury, an efficient purely declarative logic programming language. Journal of Logic Programming 26 (1–3), 17–64.CrossRef Google Scholar

Tannier, J. 2007. Parallel Mercury. M.S. thesis, Institut d'informatique, Facultés Universitaires Notre-Dame de la Paix, 21, rue Grandgagnage, B-5000 Namur, Belgium.Google Scholar

von Praun, C., Ceze, L. and Caşcaval, C. 2007. Implicit parallelism with ordered transactions. In Proceedings of the 12th Symposium on Principles and Practice of Parallel Programming. San Jose, California, 79–89.Google Scholar

Wang, P. and Somogyi, Z. 2011. Minimizing the overheads of dependent AND-parallelism. In Proceedings of the 27th International Conference on Logic Programming. Lexington, Kentucky.Google Scholar

Article contents

Estimating the overlap between dependent computations for automatic parallelization

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests