The Tensor Contraction Engine

The majority of software for scientific computations is written in the low-level languages FORTRAN and C. The computational structure of some of this software, however, has sufficient underlying structure that it could benefit from special-purpose software engineering tools or domain-specific programming languages. E.g., electronic structure calculations in quantum chemistry and in physics involve large collections of tensor contractions (generalized matrix multiplications). Currently, chemists spend weeks or months manipulating formulas containing dozens or hundreds of terms with Mathematica, hand-optimizing the computation, and writing FORTRAN code by hand. The computation can take on the order of 1 TFLOP week or more and can require multiple TBs of storage.

We have developed a domain-specific language that allows chemists to specify the computation in a high-level Mathematica-style language. The compiler for this language, the Tensor Contraction Engine (TCE), searches for an optimal implementation and generates FORTRAN code. First, algebraic transformations are used to reduce the number of operations. We then minimize the storage requirements to fit the computation within the disk limits by fusing loops. We have designed an algorithm that finds the optimal evaluation order if intermediate arrays are allocated dynamically and are working on combining loop fusion with dynamic memory allocation. If the computation does not fit within the disk limits, recomputation must be traded off for a reduction in storage requirements. If the target machine is a multi-processor machine, we optimize the communication cost together with finding a fusion configuration for minimizing storage. Finally, we minimize the data access times by minimizing disk-to-memory and memory-to-cache traffic and generate FORTRAN code. We have completed a first prototype of the TCE and are working on implementing the communication minimization and data access optimization algorithms. In future research, we will extend this approach to handle common subexpressions, symmetric matrices, and sparse matrices.

The Tensor Contraction Engine (TCE) is the application of compiler optimization and source-to-source translation technology to craft a domain specific language for many-body theories in chemistry and physics. The underlying equations of these theories are all expressed as contractions of many-dimensional arrays or tensors There may be many thousands of such terms in any one problem but their regularity means that they can be translated into efficient massively parallel code that respects the boundedness of each level of the memory hierarchy and minimizes overall runtime with effective trade-off of increased computation for reduced memory consumption. The approach has been overwhelming successful and now NWChem contains about 1M lines of human-generated code and over 2M lines of machine generated code. The resulting scientific capabilities would have taken many man-decades of effort and new theories/models can be tested in a morning on physically relevant systems instead of on small test systems after months of effort. In combination with the OCE (operator contraction engine) that turns Feynman-like diagrams into tensor expressions the TCE represents perhaps the first end-to-end production quality example of a solution to the semantic gap.

For an overview of the project, see our Proceedings of the IEEE paper.

For more information about version 1.0 of the TCE (the "prototype" TCE), please see our Getting and Using the TCE page.

Investigators

P. Sadayappan (Overall PI), Dept. of Computer Sci. and Engr., Ohio State
J. Ramanujam, (PI from LSU) Dept. of Electrical and Computer Eng., Lousiana State
Gerald Baumgartner, Dept. of Computer Sci., Louisiana State
David E. Bernholdt, Oak Ridge National Lab.
Robert J. Harrison, Oak Ridge National Lab.
Russell M. Pitzer, Dept. of Chemistry, Ohio State
So Hirata, University of Florida.
Marcel Nooijen, Dept. of Chemistry, University of Waterloo

Senior Personnel

Alexander Auer, Dept. of Chemistry, University of Waterloo
Daniel Cociorva, Dept. of Computer Sci. & Engr., Ohio State
Venkatesh Choppella, Indian Institute for Information Technology and Management, Kerala Technopark, India
Chi-Chung Lam, Dept. of Computer Sci. & Engr., Ohio State

Students

Ankush Agarwal
Atef Allam
Alina Bibireata
Xiaoyang Gao
Albert Hartono
Jinpyo Hong
Sandhya Krishnan
Sriram Krishnamoorthy
Qingda Lu
Hassan Salamy
Alexander Sibiriakov
Vaidyanathan Sivaraman
Tong Wang

Publications

Q. Lu, X. Gao, S. Krishnamoorthy, G. Baumgartner, J. Ramanujam and P. Sadayappan, "Empirical Performance Model-Driven Data Layout Optimization and Library Call Selection for Tensor Contraction Expressions," in Journal of Parallel and Distributed Computing, 72(3):338-352, March 2012.

A. Hartono, Q. Lu, T. Henretty, S. Krishnamoorthy, H. Zhang, G. Baumgartner, D. E. Bernholdt, M. Nooijen, R. Pitzer, J. Ramanujam, and P. Sadayappan, "Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry," The Journal of Physical Chemistry A, Vol. 113 (45), pp. 12715-12723, 2009.

X. Gao, S. Krishnamoorthy, S. Sahoo, C. Lam, G. Baumgartner, J. Ramanujam, and P. Sadayappan, "Efficient Search-Space Pruning for Integrated Fusion and Tiling Transformations," Concurrency and Computation: Practice and Experience, 2007. [pdf]
S. Krishnan, S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Ramanujam, P. Sadayappan, and V. Choppella, "Efficient Synthesis of Out-of-Core Algorithms Using a Nonlinear Optimization Solver," Journal of Parallel and Distributed Computing, vol. 66, no. 5, pp. 659-673, May 2006. [pdf]
A. Auer, G. Baumgartner, D. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. Harrison, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, "Automatic Code Generation for Many-Body Electronic Structure Methods: The Tensor Contraction Engine," Molecular Physics, vol. 104, no. 2, pp. 211--228, January 2006. [pdf]
A. Hartono, Q. Lu, X. Gao, S. Krishnamoorthy, M. Nooijen, G. Baumgartner, D. Bernholdt, R. Pitzer, J. Ramanujam, A. Rountev, and P. Sadayappan, "Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations," in Proc. International Conference on Computational Science 2006 (ICCS 2006), Reading, UK, Lecture Notes in Computer Science, Springer-Verlag, 2006. [pdf]
G. Baumgartner, A. Auer, D. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, "Synthesis of High-Performance Parallel Programs for a Class of ab initio Quantum Chemistry Models," Proceedings of the IEEE, vol. 93, no. 2, pp. 276-292, February 2005. [pdf]
X. Gao, S. Sahoo, Q. Lu, G. Baumgartner, C. Lam, J. Ramanujam, and P. Sadayappan, "Performance Modeling and Optimization of Parallel Out-of-Core Tensor Contractions," in Proc. ACM SIGPLAN 2005 Symposium on Principles and Practice of Parallel Programming, Chicago, IL, June 2005. [pdf]
A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner, D.E. Bernholdt, S. Hirata, C. Lam, R. Pitzer, J. Ramanujam, and P. Sadayappan, "Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations," in Proc. International Conference on Computational Science 2005 (ICCS 2005), Atlanta, GA, May 2005. [pdf]
Efficient Synthesis of Out-of-core Algorithms Using a Nonlinear Optimization Solver, Sandhya Krishnan, Sriram Krishnamoorthy, Gerald Baumgartner, Chi-Chung Lam, J. Ramanujam, and P. Sadayappan. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (2004 IPDPS Conference), Santa Fe, April 2004. (Best Paper Award) [pdf]
A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry: The Tensor Contraction Engine
G. Baumgartner, D.E. Bernholdt, V. Choppella, J. Ramanujam, P. Sadayappan. In Proceedings of the 11th Workshop on Compilers for Parallel Computers (CPC 2004), Chiemsee, Germany, 7-9 July 2004, pp. 281-290.

Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms
S. Krishnan, S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam, D.E. Bernholdt, V. Choppella. In Proceedings of the International Conference on High-Performance Computing (HiPC '03), Hyderabad, India, 17-20 December 2003. In Lecture Notes in Computer Science, Vol. 2913, Springer-Verlag, pp. 406-417. Best paper award.

Memory-Constrained Data Locality Optimization for Tensor Contractions
A. Bibireata, S. Krishnan, G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam, D.E. Bernholdt, V. Choppella. In L. Rauchwerger (ed.), Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC '03), College Station, Texas, 2-4 October 2003, Springer-Verlag, Lecture Notes in Computer Science, Vol. 2958, 2004, pp. 93-108.

Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints
D. Cociorva, X. Gao, S. Krishnan, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS '03), Nice, France, 22-26 April 2003, Abstract p. 37b, 8 pages.

Compile-Time Optimizations for Tensor Contraction Expressions
G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam. In Proceedings of Compilers for Parallel Computers (CPC '03), Amsterdam, The Netherlands, 8-10 January, 2003.

A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison, S. Hirata, C. Lam, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan. In Proceedings of Supercomputing 2002, Baltimore, Maryland, 16-22 November 2002. IEEE Computer Society Press, Abstract p. 5, 10 pages.

Memory-Constrained Communication Minimization for a Class of Array Computations
D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam. To appear in B. Pugh, C. Tseng (eds.), Proceedings of the 15th International Workshop on Languages and Compilers for Parallel Computing (LCPC '02), College Park, Maryland, 25-27 July 2002.

Automatic Synthesis of High-Performance Codes for Quantum Chemistry Applications
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison, C. Lam, M. Nooijen, J. Ramanujam, P. Sadayappan. To appear in Proceedings of the Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL-02), New York, New York, 22 June 2002.

G. Baumgartner, J. Ramanujam, and P. Sadayappan (editors), Proceedings of the Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL-02), held in conjunction with the 16th Annual ACM International Conference on Supercomputing (ICS'02), June 2002, New York, NY.
Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations
D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam, M. Nooijen, D.E. Bernholdt, R.J. Harrison. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI '02), Berlin, Germany, 17-19 June 2002, pp. 177-186.

A Performance Optimization Framework for Compilation of Tensor Contraction Expressions into Parallel Programs.
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison, C. Lam, M. Nooijen, J. Ramanujam, P. Sadayappan. 7th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS '02), In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS '02), Fort Lauderdale, Florida, 15 April 2002, IEEE Computer Society, pp. 106-114.

Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization
D. Cociorva, J. Wilkins, G. Baumgartner, P. Sadayappan, J. Ramanujam, M. Nooijen, D.E. Bernholdt, R.J. Harrison. In Proceedings of the International Conference on High-Performance Computing (HiPC '01), Hyderabad, India, 17-21 December 2001, Springer-Verlag, Lecture Nodes in Computer Science, Vol. 2228, pp. 237-248.

Loop Optimizations for a Class of Memory-Constrained Computations
D. Cociorva, J. Wilkins, C. Lam, G. Baumgartner, P. Sadayappan, J. Ramanujam. In Proceedings of the 15th ACM International Conference on Supercomputing (ICS '01), Sorrento, Italy, 16-21 June 2001, pp. 103-113.

Software

The Tensor Contraction Engine, Version 1.0
The "prototype" TCE.

Memory Minimization and Space-Time Trade-Offs
A prototype implementation in ML.

Funding

NSF Information Technology Research Program, Award #0121706, Sept. 2001 - Aug. 2007.

J. Ramanujam

Last modified: Wed Mar 5 06:30:00 CST 2008