The Tensor Contraction Engine
The majority of software for scientific computations is written in the
low-level languages FORTRAN and C. The computational structure of
some of this software, however, has sufficient underlying structure
that it could benefit from special-purpose software engineering tools
or domain-specific programming languages. E.g., electronic structure
calculations in quantum chemistry and in physics involve large
collections of tensor contractions (generalized matrix
multiplications). Currently, chemists spend weeks or months
manipulating formulas containing dozens or hundreds of terms with
Mathematica, hand-optimizing the computation, and writing FORTRAN code
by hand. The computation can take on the order of 1 TFLOP week or
more and can require multiple TBs of storage.
We are developing a domain-specific language that allows chemists to
specify the computation in a high-level Mathematica-style language.
The compiler for this language, the Tensor Contraction Engine (TCE),
searches for an optimal implementation and generates FORTRAN code.
First, algebraic transformations are used to reduce the number
of operations. We then minimize the storage requirements to fit the
computation within the disk limits by fusing loops. We have
designed an algorithm that finds the optimal evaluation order if
intermediate arrays are allocated dynamically and are
working on combining loop fusion with dynamic memory allocation. If
the computation does not fit within the disk limits, recomputation
must be traded off for a reduction in storage requirements. If
the target machine is a multi-processor machine, we optimize the
communication cost together with finding a fusion configuration for
minimizing storage. Finally, we minimize the data access times by
minimizing disk-to-memory and memory-to-cache traffic and
generate FORTRAN code. We have completed a first prototype of the TCE
and are working on implementing the communication minimization and
data access optimization algorithms. In future research, we will
extend this approach to handle common subexpressions, symmetric
matrices, and sparse matrices.
For an overview of the project, see our
Proceedings of the IEEE paper.
For more information about version 1.0 of the TCE (the "prototype"
TCE), please see our Getting and Using the
TCE page.
Investigators
-
P. Sadayappan
(Overall PI), Dept. of Computer Sci. and Engr., Ohio State
-
J. Ramanujam,
(PI from LSU) Dept. of Electrical and Computer Eng., Lousiana State
-
Gerald Baumgartner,
Dept. of Computer Sci., Louisiana State
-
David E. Bernholdt,
Oak Ridge National Lab.
- Robert J. Harrison,
Oak Ridge National Lab.
-
Russell M. Pitzer,
Dept. of Chemistry, Ohio State
-
So Hirata,
University of Florida.
-
Marcel Nooijen,
Dept. of Chemistry, University of Waterloo
Senior Personnel
- Alexander Auer,
Dept. of Chemistry, University of Waterloo
- Daniel Cociorva,
Dept. of Computer Sci. & Engr., Ohio State
- Venkatesh Choppella,
Indian Institute for Information Technology and Management,
Kerala Technopark, India
- Chi-Chung Lam,
Dept. of Computer Sci. & Engr., Ohio State
Students
- Ankush Agarwal
- Atef Allam
- Alina Bibireata
- Xiaoyang Gao
-
Albert Hartono
- Jinpyo Hong
- Sandhya Krishnan
-
Sriram Krishnamoorthy
- Qingda Lu
- Hassan Salamy
- Alexander Sibiriakov
- Vaidyanathan Sivaraman
- Tong Wang
Publications
-
U. Bondhugula, A. Hartono, J. Ramanujam,
and P. Sadayappan,
"PLUTO: A Practical and Fully Automatic Polyhedral Program
Optimization System,"
Proc. ACM SIGPLAN 2008 Conference
on Programming Language Design and Implementation (PLDI 08),
Tucson, June 2008.
[pdf] 
[Extended version]
-
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam,
A. Rountev, and P. Sadayappan,
"Automatic Transformations for Communication-Minimized
Parallelization and Locality Optimization in the
Polyhedral Model,"
in Proc. CC 2008 - International Conference on
Compiler Construction, Budapest, Hungary, March-April 2008.
[pdf] 
[Extended version]
- M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam,
A. Rountev and P. Sadayappan, "Automatic Data Movement and Computation
Mapping for Multi-level Parallel Architectures with Explicitly Managed
Memories," in Proc. 13th ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, (PPoPP 2008), Salt Lake City,
UT, February 2008.
[pdf]
[Extended version]
- X. Gao, S. Krishnamoorthy, S. Sahoo, C. Lam, G. Baumgartner, J.
Ramanujam, and P. Sadayappan, "Efficient Search-Space Pruning for
Integrated Fusion and Tiling Transformations," Concurrency and
Computation: Practice and Experience, 2007.
[pdf]
- S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam,
A. Rountev and P. Sadayappan, "Effective Automatic Parallelization
of Stencil Computations," in Proc. ACM SIGPLAN 2007 Conference
on Programming Language Design and Implementation (PLDI 07),
San Diego, June 2007.
[pdf]
- U. Bondhugula, J. Ramanujam, and P. Sadayappan, "Automatic
Mapping of Nested Loops to FPGAs," in Proc. ACM SIGPLAN 2007
Symposium on Principles and Practice of Parallel Programming
(PPoPP 07), San Jose, CA, March 2007.
[pdf]
-
U. Bondhugula, J. Ramanujam, and P. Sadayappan.
PLUTO: A Practical and Fully Automatic Polyhedral Program
Optimization Systems.
Technical Report OSU-CISRC-11/07-TR70, Department of Computer
Science and Engineering, Ohio State University, November 2007.
[pdf]
- U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam,
A. Rountev, and P. Sadayappan.
Affine Transformations for Communication Minimal Parallelization
and Locality Optimization of Arbitrarily Nested Loop Sequences.
Technical Report OSU-CISRC-5/07-TR43, Department of Computer
Science and Engineering, Ohio State University, May 2007.
[pdf]
- S. Krishnan, S. Krishnamoorthy, G. Baumgartner, C. Lam,
J. Ramanujam, P. Sadayappan, and V. Choppella, "Efficient Synthesis of
Out-of-Core Algorithms Using a Nonlinear Optimization Solver,"
Journal of Parallel and Distributed Computing, vol. 66, no. 5,
pp. 659-673, May 2006.
[pdf]
- A. Auer, G. Baumgartner, D. Bernholdt,
A. Bibireata, V. Choppella, D. Cociorva,
X. Gao, R. Harrison, S. Krishnamoorthy,
S. Krishnan, C. Lam, Q. Lu, M. Nooijen,
R. Pitzer, J. Ramanujam, P. Sadayappan, and
A. Sibiryakov, "Automatic Code Generation for Many-Body
Electronic Structure Methods: The Tensor Contraction Engine,"
Molecular Physics, vol. 104, no. 2,
pp. 211--228, January 2006.
[pdf]
- A. Hartono, Q. Lu, X. Gao, S. Krishnamoorthy, M. Nooijen,
G. Baumgartner, D. Bernholdt, R. Pitzer, J. Ramanujam, A. Rountev,
and P. Sadayappan, "Identifying Cost-Effective Common
Subexpressions to Reduce Operation Count in Tensor Contraction
Evaluations," in Proc. International Conference on
Computational Science 2006 (ICCS 2006), Reading, UK, Lecture
Notes in Computer Science, Springer-Verlag, 2006.
[pdf]
- G. Baumgartner, A. Auer, D. Bernholdt, A. Bibireata,
V. Choppella, D. Cociorva, X. Gao, R. Harrison, S. Hirata,
S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R. Pitzer,
J. Ramanujam, P. Sadayappan, and A. Sibiryakov,
"Synthesis of High-Performance Parallel Programs for a Class of ab
initio Quantum Chemistry Models,"
Proceedings of the IEEE, vol. 93, no. 2, pp. 276-292, February 2005.
[pdf]
- X. Gao, S. Sahoo, Q. Lu, G. Baumgartner, C. Lam, J. Ramanujam, and
P. Sadayappan, "Performance Modeling and Optimization of
Parallel Out-of-Core Tensor Contractions," in Proc. ACM
SIGPLAN 2005 Symposium on Principles and Practice of Parallel
Programming, Chicago, IL, June 2005.
[pdf]
- A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner,
D.E. Bernholdt, S. Hirata, C. Lam, R. Pitzer, J. Ramanujam, and
P. Sadayappan, "Automated Operation Minimization of Tensor
Contraction Expressions in Electronic Structure Calculations," in
Proc. International Conference on Computational Science 2005
(ICCS 2005), Atlanta, GA, May 2005.
[pdf]
- Efficient Synthesis of Out-of-core Algorithms Using a
Nonlinear Optimization Solver, Sandhya Krishnan, Sriram
Krishnamoorthy, Gerald Baumgartner, Chi-Chung Lam, J. Ramanujam, and
P. Sadayappan. In Proceedings of the 18th International
Parallel and Distributed Processing Symposium (2004 IPDPS
Conference), Santa Fe, April 2004. (Best Paper
Award)
[pdf]
- A High-Level Approach to Synthesis of High-Performance Codes
for Quantum Chemistry: The Tensor Contraction Engine
G. Baumgartner, D.E. Bernholdt, V. Choppella, J. Ramanujam,
P. Sadayappan.
In Proceedings of the 11th Workshop
on Compilers for Parallel Computers (CPC 2004),
Chiemsee, Germany, 7-9 July 2004, pp. 281-290.
-
Data Locality Optimization for Synthesis of Efficient
Out-of-Core Algorithms
S. Krishnan, S. Krishnamoorthy, G. Baumgartner, D. Cociorva,
C. Lam, P. Sadayappan, J. Ramanujam, D.E. Bernholdt, V. Choppella.
In Proceedings of the International Conference on
High-Performance Computing (HiPC '03),
Hyderabad, India, 17-20 December 2003.
In Lecture Notes in Computer Science, Vol. 2913, Springer-Verlag,
pp. 406-417. Best paper award.
-
Memory-Constrained Data Locality Optimization for
Tensor Contractions
A. Bibireata, S. Krishnan, G. Baumgartner, D. Cociorva, C. Lam,
P. Sadayappan, J. Ramanujam, D.E. Bernholdt, V. Choppella.
In L. Rauchwerger (ed.),
Proceedings of the 16th International Workshop on
Languages and Compilers for Parallel Computing (LCPC '03),
College Station, Texas, 2-4 October 2003, Springer-Verlag,
Lecture Notes in Computer Science, Vol. 2958, 2004, pp. 93-108.
-
Global Communication Optimization for Tensor Contraction
Expressions under Memory Constraints
D. Cociorva, X. Gao, S. Krishnan, G. Baumgartner, C. Lam,
P. Sadayappan, J. Ramanujam.
In Proceedings of the International Parallel and
Distributed Processing Symposium (IPDPS '03),
Nice, France, 22-26 April 2003, Abstract p. 37b, 8 pages.
- Compile-Time Optimizations for Tensor Contraction Expressions
G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam.
In Proceedings of Compilers for Parallel Computers
(CPC '03), Amsterdam, The Netherlands, 8-10 January, 2003.
-
A High-Level Approach to Synthesis of High-Performance Codes for
Quantum Chemistry
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison, S. Hirata,
C. Lam, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan.
In Proceedings of Supercomputing 2002,
Baltimore, Maryland, 16-22 November 2002.
IEEE Computer Society Press, Abstract p. 5, 10 pages.
-
Memory-Constrained Communication Minimization for a Class of
Array Computations
D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam.
To appear in B. Pugh, C. Tseng (eds.),
Proceedings of the 15th International Workshop
on Languages and Compilers for Parallel Computing (LCPC '02),
College Park, Maryland, 25-27 July 2002.
-
Automatic Synthesis of High-Performance Codes for Quantum
Chemistry Applications
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison,
C. Lam, M. Nooijen, J. Ramanujam, P. Sadayappan.
To appear in Proceedings of the Workshop on Performance
Optimization for High-Level Languages and Libraries
(POHLL-02), New York, New York, 22 June 2002.
- G. Baumgartner, J. Ramanujam, and P. Sadayappan (editors),
Proceedings of the Workshop on Performance Optimization for High-Level
Languages and Libraries (POHLL-02), held in conjunction with the
16th Annual ACM International Conference on Supercomputing (ICS'02),
June 2002, New York, NY.
-
Space-Time Trade-Off Optimization for a Class of Electronic
Structure Calculations
D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan,
J. Ramanujam, M. Nooijen, D.E. Bernholdt, R.J. Harrison.
In Proceedings of the ACM SIGPLAN 2002 Conference on
Programming Language Design and Implementation (PLDI '02),
Berlin, Germany, 17-19 June 2002, pp. 177-186.
-
A Performance Optimization Framework for Compilation of Tensor
Contraction Expressions into Parallel Programs.
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison,
C. Lam, M. Nooijen, J. Ramanujam, P. Sadayappan.
7th International Workshop on High-Level Parallel
Programming Models and Supportive Environments (HIPS '02),
In Proceedings of the International Parallel and Distributed
Processing Symposium (IPDPS '02),
Fort Lauderdale, Florida, 15 April 2002,
IEEE Computer Society, pp. 106-114.
-
Towards Automatic Synthesis of High-Performance Codes for
Electronic Structure Calculations: Data Locality Optimization
D. Cociorva, J. Wilkins, G. Baumgartner, P. Sadayappan, J. Ramanujam,
M. Nooijen, D.E. Bernholdt, R.J. Harrison.
In Proceedings of the International Conference on
High-Performance Computing (HiPC '01),
Hyderabad, India, 17-21 December 2001, Springer-Verlag, Lecture Nodes
in Computer Science, Vol. 2228, pp. 237-248.
-
Loop Optimizations for a Class of Memory-Constrained
Computations
D. Cociorva, J. Wilkins, C. Lam, G. Baumgartner, P. Sadayappan,
J. Ramanujam.
In Proceedings of the 15th ACM International Conference on
Supercomputing (ICS '01),
Sorrento, Italy, 16-21 June 2001, pp. 103-113.
Software
Funding
J. Ramanujam
Last modified: Wed Mar 5 06:30:00 CST 2008