M. Schulz, S.Midkiff, J.
Ramanujam, and P. Sadayappan
(editors), Proceedings of the Joint Workshop on High-Level
Parallel Programming Models and Supportive Environments and
Performance Optimization for High-Level Languages
and Libraries (HIPS-POHLL 2008), held in conjunction with
the 22nd IEEE International Parallel & Distributed
Processing Symposium (IPDPS 2008), Miami, FL, April 2008.
G. Baumgartner, J. Ramanujam, A. Rountev, and P.
Sadayappan
(editors), Proceedings of the Workshop on Performance Optimization for
High-Level Languages and Libraries (POHLL-07), held in conjunction
with the 21st IEEE International Parallel & Distributed
Processing Symposium (IPDPS 2007), Long Beach, CA, March 2007.
G. Baumgartner, J. Ramanujam, and P. Sadayappan
(editors),
Proceedings of the Workshop on Performance Optimization for
High-Level Languages and Libraries (POHLL-06), held in conjunction
with the 20th IEEE International Parallel & Distributed
Processing Symposium (IPDPS 2006), Rhodes, Greece, April 2006.
D. Marculescu and J. Ramanujam (editors), Proceedings
of
the Workshop on Compilers and Operating Systems for Low Power
(COLP'03), held in conjunction with the International
Conference on Parallel Architectures and Compilation (PACT 2003),
September 2003, New Orleans, LA, USA.
D. Marculescu and J. Ramanujam (editors), Proceedings
of
the Workshop on Compilers and Operating Systems for Low Power
(COLP'02) held in conjunction with the International
Conference on Parallel Architectures and Compilation (PACT 2002),
September 2002, Charlottesville, VA, USA.
G. Baumgartner, J. Ramanujam, and P. Sadayappan
(editors),
Proceedings of the Workshop on Performance Optimization for High-Level
Languages and Libraries (POHLL-02), held in conjunction with the
16th Annual ACM International Conference on Supercomputing (ICS'02),
June 2002, New York, NY.
L. Benini, M. Kandemir, and J. Ramanujam (editors),
Proceedings of the Workshop on Compilers and Operating Systems for Low
Power (COLP'01), held in conjunction with the International
Conference on Parallel Architectures and Compilation (PACT 2001),
October 2001, Barcelona, Spain.
M. Kandemir and J. Ramanujam (editors), Proceedings
of the
Workshop on Compilers and Operating Systems for Low Power
(COLP'00), in conjunction with the International Conference on
Parallel Architectures and Compilation (PACT 2000), October 15-19,
2000, Philadelphia, PA.
S. Pande, J. Ramanujam, and Y. Robert
(editors),
Proceedings of the Workshop on Challenges in Compiling for Scalable
Parallel Systems, in conjunction with the 8th IEEE Symposium
on Parallel and Distributed Processing, New Orleans, LA, October
1996.
J. Ramanujam, "Automatic Data
Distribution," in The Compiler Design Handbook:
Optimizations and Machine Code Generation,
(Y. N. Srikant and P. Shankar: Eds.), Chapter
12, pp. 409-459, CRC Press, Boca Raton, FL, 2002.
J. Ramanujam, "Integer lattice
based
methods for local address generation for block-cyclic
distributions," in Compiler Optimizations for Scalable
Parallel Systems - Languages, Compilation Techniques, and Run
Time Systems, S. Pande and D. P. Agrawal
(Eds.), Lecture Notes in Computer Science, vol. 1808,
pp. 597-645, Springer-Verlag, 2001.
[pdf]
A. Thirumalai,
J. Ramanujam and
A. Venkatachar, "Communication generation and
optimization for HPF," in Languages, Compilers, and
Run-Time Systems for Scalable Computers,
B. Szymanski and B. Sinharoy, (Eds.), Chapter 29,
pp. 311-316, Kluwer Academic Publishers, Norwell, MA,
1995.
J. Ramanujam and
P. Sadayappan,
"Iteration space tiling for distributed memory machines," in
Languages, Compilers and Environments for Distributed
Memory Machines, J. Saltz and P. Mehrotra, (Eds.),
North-Holland, Amsterdam, The Netherlands, pp. 255-270, 1992.
H. Salamy and J. Ramanujam,
"Storage Optimization through Offset Assignment with
Variable Coalescing,"
ACM Transactions on Embedded Computing Systems (TECS),
2010.
A. Hartono, Q. Lu, T. Henretty, S. Krishnamoorthy, H. Zhang,
G. Baumgartner, D. E. Bernholdt, M. Nooijen, R. Pitzer,
J. Ramanujam, and P. Sadayappan,
"Performance Optimization of Tensor Contraction Expressions
for Many-Body Methods in Quantum Chemistry,"
The Journal of Physical Chemistry A,
Vol. 113 (45), pp. 12715-12723, 2009.
X. Gao, S.
Krishnamoorthy, S. Sahoo, C. Lam, G. Baumgartner, J.
Ramanujam, and P. Sadayappan, "Efficient Search-Space Pruning for
Integrated Fusion and Tiling Transformations," Concurrency and
Computation: Practice and Experience, 2007.
[pdf]
S. Krishnan, S. Krishnamoorthy, G. Baumgartner, C. Lam,
J. Ramanujam, P. Sadayappan, and V. Choppella, "Efficient Synthesis of
Out-of-Core Algorithms Using a Nonlinear Optimization Solver,"
Journal of Parallel and Distributed Computing, vol. 66, no. 5,
pp. 659-673, May 2006.
[pdf]
G. Chen, M. Kandemir, M. J. Irwin, and J. Ramanujam,
"Reducing
code size through address register assignment,"
ACM Transactions on Embedded Computing (TECS), vol. 5, no. 1,
pp. 225-258, February 2006.
[pdf]
M. Kandemir, J. Ramanujam, and U. Sezer, "Improving the
Energy
Behavior of Block Buffering Using Compiler Optimizations," ACM
Transactions on Design Automation of Electronic Systems, vol. 11,
no. 1, pp. 228-250, January 2006.
[pdf]
J. Ramanujam, J. Hong, M. Kandemir, A. Narayan, and A.
Agarwal,
"Estimating and Reducing the Memory Requirements of Signal
Processing Codes for Embedded Processor Systems," IEEE
Transactions on Signal Processing, vol. 54, no. 1, pp. 286--294,
January 2006.
[pdf]
A. Auer, G. Baumgartner,
D. Bernholdt, A. Bibireata, V. Choppella,
D. Cociorva, X. Gao, R. Harrison,
S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu,
M. Nooijen, R. Pitzer, J. Ramanujam,
P. Sadayappan, and A. Sibiryakov, "Automatic Code Generation
for Many-Body Electronic Structure Methods: The Tensor Contraction
Engine," Molecular Physics, vol. 104, no. 2, pp. 211--228,
January 2006.
[pdf]
G. Baumgartner, A. Auer, D. Bernholdt,
A. Bibireata, V. Choppella, D. Cociorva, X. Gao,
R. Harrison, S. Hirata, S. Krishnamoorthy,
S. Krishnan, C. Lam, Q. Lu, M. Nooijen,
R. Pitzer, J. Ramanujam, P. Sadayappan, and
A. Sibiryakov, "Synthesis of High-Performance Parallel Programs
for a Class of ab initio Quantum Chemistry Models," Proceedings
of the IEEE, vol. 93, no. 2, pp. 276-292, February 2005.
[pdf]
M. Kandemir, I. Kadayif, A. Choudhary,
J. Ramanujam, and I. Kolcu, "Compiler-directed scratch pad memory
optimization for embedded multiprocessors," IEEE Transactions on
VLSI (TVLSI), vol. 12, no. 3, pp. 281-287, March
2004.
[pdf]
M. Kandemir, J. Ramanujam,
M. Irwin, V. Narayanan, I. Kadayif, and A. Parikh,
"A Compiler Based Approach for Dynamically Managing Scratch-Pad
Memories in Embedded Systems," IEEE Transactions on
Computer-Aided Design, vol. 23, no. 2,
pp. 243-260, February 2004.
[pdf]
M. Kandemir, A. Choudhary, J.
Ramanujam, and
P. Banerjee, "Reducing false sharing and improving spatial locality in
a unified compilation framework," IEEE Transactions on Parallel
and Distributed Systems, vol. 14, no. 4,
pp. 337-354, April 2003.
[pdf]
M. Kandemir, A. Choudhary, and J.
Ramanujam,
"An I/O conscious tiling strategy for disk-resident data sets,"
The Journal of Supercomputing, vol. 21, no. 3,
pp. 257-284, 2002.
[pdf]
M. Kandemir, J. Ramanujam, A.
Choudhary, and
P. Banerjee, "A layout-conscious iteration space transformation
technique," IEEE Transactions on Computers, vol. 50,
no. 12, pp. 1321-1336, December 2001.
M. Narasimhan and
J. Ramanujam, "A
fast approach to computing exact solutions to the resource-constrained
scheduling problem," ACM Transactions on Design Automation of
Electronic Systems, vol. 6, no. 4, pp. 490-500,
December 2001.
M. Kandemir, P. Banerjee, A.
Choudhary,
J. Ramanujam, and E. Ayguade, "Static and dynamic locality
optimizations using integer linear programming," IEEE
Transactions on Parallel and Distributed Systems, vol. 12,
no. 9, pp. 922-941, September 2001.
M. Kandemir and
J. Ramanujam, "Data
relation vectors: A new abstraction for data optimizations," IEEE
Transactions on Computers, vol. 50, no. 8,
pp. 798-810, August 2001.
V. Jain, S. Rele,
S. Pande,
and J. Ramanujam, "Compact and efficient code generation through
program restructuring on limited memory embedded DSPs," IEEE
Transactions on Computer-Aided Design, vol. 20, no. 4,
pp. 477-494, April 2001.
M. Kandemir, A. Choudhary,
P. Banerjee, J. Ramanujam, and N. Shenoy, "Minimizing
data and synchronization costs in one-way communication," IEEE
Transactions on Parallel and Distributed Systems, vol. 11,
no. 12, pp. 1232-1251, December 2000.
M. Kandemir, J. Ramanujam,
and
A. Choudhary, "Compiler algorithms for optimizing locality and
parallelism on shared and distributed memory machines," Journal
of Parallel and Distributed Computing, vol. 60, no. 8,
pp. 924-965, August 2000.
M. Kandemir, A. Choudhary,
J. Ramanujam, and M. Kandaswamy, "A unified framework for
optimizing locality, parallelism, and communication in out-of-core
computations," IEEE Transactions of Parallel and Distributed
Systems, vol. 11, no. 7, pp. 648-668, July 2000.
M. Kandemir, P. Banerjee,
A. Choudhary, J. Ramanujam, N. Shenoy, "A global
communication optimization technique based on data flow analysis and
linear algebra," ACM Transactions on Programming Languages and
Systems (TOPLAS), vol. 21, no. 6, pp. 1251-1297,
November 1999.
M. Kandemir, A. Choudhary,
J. Ramanujam, and P. Banerjee, "A
matrix-based approach to global locality optimization,"
Journal of Parallel and Distributed Computing, vol. 58,
no. 2, pp. 190-235, September 1999.
M. Kandemir, J. Ramanujam,
and A. Choudhary, "Improving cache
locality by a combination of loop and data transformations,"
IEEE Transactions on Computers, vol. 48, no. 2,
pp. 159-167, February 1999.
M. Kandemir, A. Choudhary,
N. Shenoy, P. Banerjee, and
J. Ramanujam, "A linear algebra framework for automatic
determination of optimal data layouts," IEEE
Transactions on Parallel and Distributed Systems,
vol. 10,
no. 2, pp. 115-135, February 1999.
P. Sadayappan, F. Ercal and
J. Ramanujam, "Partitioning graphs
on message-passing machines by pairwise mincut,"
Information Sciences, vol. 111, no. 1-4,
pp. 223-237,
October 1998.
M. Kandemir, A. Choudhary,
J. Ramanujam and R. Bordawekar,
"Compilation techniques for out-of-core parallel
computations," Parallel Computing, vol. 24, no. 3-4,
pp. 597-628, June 1998.
M. Kandemir, A. Choudhary,
J. Ramanujam and M. Kandaswamy,
"Locality optimization algorithms for compilation of
out-of-core codes," Journal of Information Science and
Engineering, vol. 14, no. 1, pp. 107-138, March
1998.
A. Venkatachar,
J. Ramanujam, and A. Thirumalai,
"Communication generation for block-cyclic distributions,"
Parallel Processing Letters,
vol. 7, no. 2, pp. 195-202, 1997.
A. Goel and J. Ramanujam, "A
neural architecture for a class
of abduction problems," IEEE Transactions on Systems Man
and Cybernetics, vol. 26, no. 6, pp. 854-860,
December
1996.
A. Thirumalai and
J. Ramanujam,
"Efficient computation of address sequences in data parallel
programs using closed forms for basis vectors,"
Journal of Parallel and Distributed Computing,
vol. 38, no. 2, pp. 188-203, November 1996.
R. Bordawekar, A. Choudhary,
and J. Ramanujam,
"Compilation and communication strategies for out-of-core
programs on distributed memory machines,"
Journal of Parallel and Distributed Computing,
vol. 38, no. 2, pp. 277-288, November 1996.
R. Thakur, A. Choudhary and
J. Ramanujam, "Efficient
algorithms for array redistribution," IEEE Transactions
on Parallel and Distributed Systems, vol. 7, no. 6,
pp. 587-594, June 1996.
J. Ramanujam, "Beyond unimodular
transformations," The
Journal of Supercomputing, vol. 9, no. 4,
pp. 365-389,
1995.
J. Ramanujam and
P. Sadayappan, "Mapping combinatorial
optimization problems onto neural networks," Information
Sciences, vol. 82, no. 3-4, pp. 239-255, January
1995.
J. Ramanujam and
P. Sadayappan, "Tiling multidimensional
iteration spaces for multicomputers," Journal of
Parallel and Distributed Computing, vol. 16, no. 2,
pp. 108-120, October 1992.
J. Ramanujam and
P. Sadayappan, "Compile-time techniques for
data distribution in distributed memory machines," IEEE
Transactions on Parallel and Distributed Systems, vol. 2,
no. 4, pp. 472-482, October 1991.
F. Ercal, J. Ramanujam and
P. Sadayappan, "Task allocation by
recursive mincut bipartitioning onto a hypercube,"
Journal of Parallel and Distributed Computing, vol. 10,
no. 1, pp. 35-44, September 1990.
F. Ercal, P. Sadayappan and
J. Ramanujam, "Cluster
partitioning approaches to mapping parallel programs onto a
hypercube," Parallel Computing, vol. 13, no. 1,
pp. 1-16, March 1990.
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen,
J. Ramanujam and P. Sadayappan,
"Combined Iterative and Model-driven Optimization
in an Automatic Parallelization Framework,"
in Proc. ACM/IEEE Conference on High Performance Computing SC10,
New Orleans, LA, November 2010.
S. Tavarageri, A. Hartono, M. Baskaran, L.-N. Pouchet,
J. Ramanujam, and P. Sadayappan,
"Parametric Tiling of Affine Loop Nests,"
in Proc. 15th Workshop on Compilers for Parallel Computers (CPC 2010),
Vienna, Austria, July 2010.
M. Baskaran, A. Hartono, S. Tavarageri, T. Henretty,
J. Ramanujam, and P. Sadayappan, "Parameterized Tiling Revisited,"
International Symposium on Code Generation and Optimization (CGO), Toronto,
Canada, April 2010.
A. Hartono, M. Baskaran, J. Ramanujam, and P.
Sadayappan, "DynTile: Parametric Tiled Loop Generation for Parallel Execution on
Multicore Processors," 24 International Parallel and Distributed
Processing Symposium (2010 IPDPS Conference), Atlanta, April 2010.
M. Baskaran, J. Ramanujam, and P. Sadayappan,
"Automatic C-to-CUDA Code Generation for Affine Programs ,"
International
Conference on Compiler Construction (CC), Paphos, Cyprus, March 2010.
Q. Lu, C. Alias, U. Bondhugula, S. Krishnamoorthy,
J. Ramanujam, A. Rountev, P. Sadayappan,
Y. Chen, H. Lin and T. Ngai,
"Data Layout Transformation for Enhancing Locality
on NUCA Chip Multiprocessors,"
in Proc. 18th International Conference on
Parallel Architectures and Compilation Techniques (PACT 09),
Raleigh, NC, September 2009.
Z. Yun, Z. Lei, G. Allen, D. S. Katz, T. Kosar, S. Jha, J. Ramanujam,
"An innovative application execution toolkit for
multicluster grids,"
in Proc. CLUSTER 2009,
New Orleans, LA, September 2009. pp. 1-4.
A. Hartono, M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy,
B. Norris, J. Ramanujam, and P. Sadayappan,
"Parametric Multi-Level Tiling of Imperfectly Nested Loops,"
in Proc. 23nd ACM International Conference on
Supercomputing,
Yorktown Heights, New York, June 2009.
M. Baskaran, N. Vydhyanathan, U. Bondhugula, J. Ramanujam,
A. Rountev, and P. Sadayappan,
"Compiler-Assisted Dynamic Scheduling for Effective
Parallelization of Loop Nests on Multicore Processors,"
in Proc. 14th ACM SIGPLAN Symposium on Principles
and Practice of Parallel Programming (PPoPP 2009),
Raleigh, NC, February 2009.
R. Sankaran, B. Ullmer, K. Kallakuri, S. Jandhyala, C. Toole,
J. Ramanujam, and C. Laan, "Decoupling Interaction Hardware
Design Using Libraries of Reusable Electronics,"
in Proc. 3rd International Conference on Tangible and
Embedded Interaction (TEI'09),
Cambridge, UK, February 2009.
Hassan Salamy and J. Ramanujam,
"A Framework for Task Scheduling and Memory Partitioning for
Multi-Processor System-on-Chip,"
in Proc. 4th International Conference on High Performance
and Embedded Architectures and Compilers (HiPEAC 2009),
Paphos, Cyprus, January 2009.
U. Bondhugula, M. Baskaran, A. Hartono,
S. Krishnamoorthy, J. Ramanujam,
A. Rountev, and P. Sadayappan,
"A Polyhedral Framework for Automatic Parallelization
and Locality Optimization,"
in Proc. 14th Workshop
on Compilers for Parallel Computers (CPC 2009),
Zurich, Switzerland, January 2009.
Hassan Salamy and J. Ramanujam,
"Optimal Address Register Allocation for Arrays in DSP
Applications,"
in Proc. 6th IEEE Workshop on Embedded Systems for
Real-Time Multimedia (ESTIMedia 2008),
Atlanta, GA, October 2008.
Hassan Salamy and J. Ramanujam,
"Storage Optimization through Code Size Reduction for
Digital Signal Processors,"
in Proc. 6th IEEE Workshop on Embedded Systems for
Real-Time Multimedia (ESTIMedia 2008),
Atlanta, GA, October 2008.
Jinpyo Hong and J. Ramanujam,
"Address Register Allocation in Digital Signal Processors,"
in Proc. 2008 International Conference on Embedded Systems
and Software (ICESS-08),
Chengdu, China, July 2008.
Jinpyo Hong and J. Ramanujam,
"Scheduling DAGs for Fixed-point DSP Processors
by Using Worm Partitions,"
in Proc. 2008 International Conference on Embedded Systems
and Software (ICESS-08),
Chengdu, China, July 2008.
U. Bondhugula, A. Hartono, J. Ramanujam,
and P. Sadayappan,
"A Practical and Automatic Polyhedral Program Optimization System," Proc. ACM SIGPLAN 2008 Conference
on Programming Language Design and Implementation (PLDI 08),
Tucson, June 2008.
[pdf]
[Extended
version]
M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam,
A. Rountev, and P. Sadayappan,
"A Compiler Framework for Optimization of Affine Loop
Nests for General Purpose Computations on GPUs,"
in Proc. 22nd ACM International Conference on
Supercomputing,
Island of Kos, Greece, June 2008.
[pdf]
[Extended
version]
U. Bondhugula, M. Baskaran, A. Hartono, S.
Krishnamoorthy,
J. Ramanujam, A. Rountev, and P. Sadayappan,
"Towards Effective Automatic Parallelization for Multicore
Systems,"
in Proc. Workshop on Next Generation Software (NGS 2008),
held in conjunction with the
22nd IEEE International Parallel and Distributed Processing
Symposium
(IPDPS 2008), Miami, FL, April 2008.
[pdf]
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam,
A. Rountev, and P. Sadayappan,
"Automatic Transformations for Communication-Minimized
Parallelization and Locality Optimization in the
Polyhedral Model,"
in Proc. CC 2008 - International Conference on
Compiler Construction,
Budapest, Hungary, March-April 2008.
[pdf]
[Extended
version]
M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J.
Ramanujam,
A. Rountev and P. Sadayappan, "Automatic Data Movement and Computation
Mapping for Multi-level Parallel Architectures with Explicitly Managed
Memories," in Proc. 13th ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, (PPoPP 2008), Salt Lake City,
UT, February 2008.
[pdf]
[Extended
version]
S. Pinnepalli, Jinpyo Hong, and J. Ramanujam
and
Doris Carver, "Code Size Optimization for Embedded Processors
using Commutative Transformations," in Proc. The 13th IEEE
International Conference on Embedded and Real-Time Computing Systems
and Applications (RTCSA-07), Daegu, Korea, August 2007.
[pdf]
S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J.
Ramanujam,
A. Rountev and P. Sadayappan, "Effective Automatic Parallelization
of Stencil Computations," in Proc. ACM SIGPLAN 2007 Conference
on Programming Language Design and Implementation (PLDI 07),
San Diego, June 2007.
[pdf]
Jinpyo Hong and J. Ramanujam, "Memory Offset
Assignment for DSPs," in Proc. 2007 International Conference
on Embedded Systems and Software (ICESS-07), Daegu, Korea, May
2007.
[pdf]
U. Bondhugula, J. Ramanujam, and P. Sadayappan,
"Automatic
Mapping of Nested Loops to FPGAs," in Proc. ACM SIGPLAN 2007
Symposium on Principles and Practice of Parallel Programming
(PPoPP 07), San Jose, CA, March 2007.
[pdf]
Hassan Salamy and J. Ramanujam, "An Effective Heuristic
for
Simple Offset Assignment with Variable Coalescing," Languages
and Compilers for Parallel Computing, (C. Cascaval et al. Eds.),
Lecture Notes in Computer Science, Springer-Verlag, 2007.
[pdf]
Atef Allam and J. Ramanujam, "Dynamic Memory Usage
Optimization
using ILP," Proc. 2nd International Computer Engineering
Conference: Engineering the Information Society (ICENCO 2006),
Cairo, Egypt, December 2006.
[pdf]
Atef Allam and J. Ramanujam, "ILP and Iterative LP
Solutions
for Peak and Average Power Optimization in HLS," Proc. 2nd
International Computer Engineering Conference: Engineering the
Information Society (ICENCO 2006), Cairo, Egypt, December 2006.
[pdf]
Atef Allam and J. Ramanujam, "Modified Force-Directed
Scheduling for Peak and Average Power Optimization using Multiple
Supply-Voltages," in Proc. International Conference on Integrated
Circuit Design and Technology (ICICDT), Padova, Italy, May
2006.
[pdf]
Atef Allam and J. Ramanujam, "Simultaneous Peak and
Average Power Optimization in Synchronous Sequential Designs Using
Retiming and Multiple Supply Voltages," in Proc. International
Conference on Integrated Circuit Design and Technology (ICICDT),
Padova, Italy, May 2006.
[pdf]
A. Hartono, Q. Lu, X. Gao, S. Krishnamoorthy, M.
Nooijen,
G. Baumgartner, D. Bernholdt, R. Pitzer, J. Ramanujam, A. Rountev, and
P. Sadayappan, "Identifying Cost-Effective Common Subexpressions
to Reduce Operation Count in Tensor Contraction Evaluations," in
Proc. International Conference on Computational Science 2006 (ICCS
2006), Reading, UK, Lecture Notes in Computer Science,
Springer-Verlag, 2006.
[pdf]
A. Allam, J. Ramanujam, G. Baumgartner, and P.
Sadayappan,
"Memory Minimization for Tensor Contractions using Integer Linear
Programming," Proc. Workshop on Performance Optimization for
High-Level Languages and Libraries (POHLL-06), held in conjunction
with the 20th IEEE International Parallel & Distributed
Processing Symposium (IPDPS 2006), Rhodes, Greece, April
2006.
[pdf]
X. Gao, S. Krishnamoorthy, Q. Lu, V. Choppella, G.
Baumgartner, J. Ramanujam, and P. Sadayappan, "Search-Based
Performance-Model Driven Optimization for Compilation of Tensor
Contraction Expressions," in Proc. 12th Workshop on Compilers for
Parallel Computers (CPC 2006), A Coruna, Spain, January 2006.
X. Gao, S.
Krishnamoorthy, S. Sahoo, C. Lam, G. Baumgartner, J.
Ramanujam, and P. Sadayappan, "Efficient Search-Space Pruning for
Integrated Fusion and Tiling Transformations," in Languages and
Compilers for Parallel Computing, (E. Ayguade et al. Eds.),
Lecture Notes in Computer Science, Springer-Verlag, 2006.
[pdf]
X. Gao, S. Sahoo, Q. Lu, G. Baumgartner, C. Lam, J.
Ramanujam,
and P. Sadayappan, "Performance Modeling and Optimization of
Parallel Out-of-Core Tensor Contractions," in Proc. ACM
SIGPLAN 2005 Symposium on Principles and Practice of Parallel
Programming, pp. 266-276, Chicago, IL, June 2005.
[pdf]
A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner,
D.E. Bernholdt, S. Hirata, C. Lam, R. Pitzer, J. Ramanujam, and
P. Sadayappan, "Automated Operation Minimization of Tensor
Contraction Expressions in Electronic Structure Calculations," in
Proc. International Conference on Computational Science 2005 (ICCS
2005), Atlanta, GA, May 2005.
[pdf]
Q. Lu, X. Gao, S. Krishnamoorthy,
G. Baumgartner, J. Ramanujam, and P. Sadayappan, "Empirical
Performance-Model Driven Data Layout Optimization," Languages and
Compilers for Parallel Computing, (R. Eigenmann et
al. Eds.), Lecture Notes in Computer Science, Springer-Verlag,
2005.
[pdf]
D. Cociorva, G. Baumgartner,
C. Lam, P. Sadayappan, J. Ramanujam, "Memory-Constrained
Communication Minimization for a Class of Array Computations," in
Languages and Compilers for Parallel Computing, (W. Pugh
et al. Eds.), Springer-Verlag, 2005.
[pdf]
G. Baumgartner,
D. Bernholdt,
V. Choppella, J. Ramanujam, and P. Sadayappan, "A
High-Level Approach to Synthesis of High-Performance Codes for Quantum
Chemistry: The Tensor Contraction Engine," in Proc. 11th Workshop
on Compilers for Parallel Computers (CPC 2004), Chiemsee,
Germany, July 2004.
S. Krishnan,
S. Krishnamoorthy, G. Baumgartner, C. Lam,
J. Ramanujam, and P. Sadayappan,
"Efficient Synthesis of Out-of-core Algorithms Using a
Nonlinear Optimization Solver," in Proc. 18th
International Parallel and Distributed Processing Symposium
(IPDPS 2004),
Santa Fe, New Mexico, April 2004.
(Best Paper Award)
[pdf]
A. Bibireata, S. Krishnan,
G. Baumgartner,
D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam,
D. Bernholdt, and V. Choppella,
"Memory-Constrained Data Locality Optimization
for Tensor Contractions," in
Languages and Compilers for Parallel Computing,
(L. Rauchwerger et al. Eds.),
Lecture Notes in Computer Science,
Vol. 2958, pp. 93-108, Springer-Verlag, 2004.
[pdf]
S. Krishnan,
S. Krishnamoorthy, G. Baumgartner,
D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam,
D. Bernholdt, and V. Choppella,
"Data Locality Optimization for Synthesis of Efficient
Out-of-Core Algorithms,"
in Proc. of the Intl. Conf. on High
Performance Computing (HiPC 03), 2003.
(Best Paper Award)
[pdf]
D. Cociorva, X. Gao,
S. Krishnan, Gerald
Baumgartner, C. Lam, P. Sadayappan, and J. Ramanujam,
"Global Communication Optimization for Tensor Contraction
Expressions under Memory Constraints," in Proc. 17th
International Parallel and Distributed Processing Symposium
(IPDPS 2003),
Nice, France, April 2003.
[pdf]
M. Kandemir, M. J. Irwin, G. Chen, J.
Ramanujam,
"Address Register Assignment for Reducing Code Size,"
in Proc. 12th International Conference on Compiler
Construction (CC 2003), Warsaw, Poland,
Lecture Notes in Computer Science, Vol. 2622, pp. 273-289,
Springer-Verlag, 2003.
[pdf]
D. Cociorva, G. Baumgartner,
C. Lam,
P. Sadayappan, J. Ramanujam, "Compile-Time Optimizations for
Tensor Contraction Expressions," in Proc. 10th Workshop
on Compilers for Parallel Computers (CPC 2003),
Leiden, The Netherlands, January 2003.
D. Cociorva, G. Baumgartner, C. Lam, P.
Sadayappan,
J. Ramanujam, M. Nooijen, D. Bernholdt, R. Harrison and
R. Pitzer, "A High-Level Approach to Synthesis of
High-Performance Codes for Quantum Chemistry," in Proceedings of Supercomputing 2002 (SC2002), November 2002.
[pdf]
D. Cociorva, G. Baumgartner, C. Lam,
P. Sadayappan,
J. Ramanujam, M. Nooijen, D. Bernholdt, and R. Harrison,
"Space-time trade-off optimization for a class of electronic
structure calculations," in Proc. ACM SIGPLAN 2002
Conference on Programming Language Design and Implementation
(PLDI), pp. 177-186, Berlin, Germany, June 2002.
[pdf]
D. Cociorva, G. Baumgartner,
C. Lam,
J. Ramanujam, and P. Sadayappan, "Compiler Support for
Optimizing Tensor Contraction Expressions in Quantum Chemistry
Computations," in Proc. Workshop on Performance
Optimization for High-Level Languages and Libraries
(POHLL-02), New York, NY, June 2002.
[pdf]
M. Kandemir, J. Ramanujam, and A. Choudhary, "Exploiting
shared scratch pad memory space in embedded multiprocessor
systems," in Proc. 39th Design Automation Conference,
pp. 219-224, New Orleans, LA, June 2002.
[pdf]
G. Baumgartner, D. Bernholdt, D. Cociorva, R. Harrison,
C. Lam,
M. Nooijen, J. Ramanujam, and P. Sadayappan, "A performance
optimization framework for compilation of tensor contraction
expressions into parallel programs," in Proc. 7th
International Workshop on High-Level Parallel Programming
Models and Supportive Environments (HIPS 02), (part of IPDPS
2002) Ft. Lauderdale, FL, IEEE Computer Society Press, April
2002.
[pdf]
J. Ramanujam, S. Deshpande, J. Hong,
and M. Kandemir, "A
heuristic for clock selection in high-level synthesis," in
Proc. ASP-DAC/VLSI Design 2002, pp. 414-419,
Bangalore, India, January 2002.
[pdf]
J. Ramanujam, S. Krishnamoorthy, J.
Hong, and M. Kandemir,
"Address code and arithmetic optimizations for embedded
systems," in Proc. ASP-DAC/VLSI Design 2002,
pp. 619-624, Bangalore, India, January 2002.
[pdf]
N. Crosbie, M. Kandemir, I. Kolcu, J.
Ramanujam, A. Choudhary,
"Strategies for improving data locality in embedded
applications," in Proc. ASP-DAC/VLSI Design 2002,
pp. 631-636, Bangalore, India, January 2002.
[pdf]
D. Cociorva, J. Wilkins, G.
Baumgartner, P. Sadayappan,
J. Ramanujam, M. Nooijen, D. Bernholdt, and
R. Harrison, "Towards Automatic Synthesis of High-Performance
Codes for Electronic Structure Calculations: Data Locality
Optimization," in Proc. of the Intl. Conf. on High
Performance Computing, Lecture Notes in Computer Science,
Vol. 2228, pp. 237-248, Springer-Verlag, 2001.
[pdf]
Sunil Atri, J. Ramanujam, and
M. Kandemir, "Improving variable
placement for embedded processors," in Languages and
Compilers for Parallel Computing, (S. Midkiff et
al.
Eds.), Lecture Notes in Computer Science, vol. 2017,
pp. 158-172, Springer-Verlag, 2001.
[pdf]
M. Kandemir,
J. Ramanujam, and U. Sezer, "Compiler support for
block buffering," in Proc. ACM/IEEE International
Symposium on Low Power Electronics and Design (ISLPED'01),
pp. 76-79, Huntington Beach, CA, August 2001.
I. Kadayif, M. Kandemir, N.
Vijaykrishnan, M. J. Irwin, and
J. Ramanujam, "Morphable cache architectures: potential
benefits," in Proc. ACM SIGPLAN 2001 Workshop on
Languages, Compilers, and Tools for Embedded Systems
(LCTES'2001), Snowbird, UT, June 2001. Also appears in
ACM SIGPLAN Notices,
vol. 36, no. 8, pp. 128-137, August 2001.
J. Ramanujam, J. Hong,
M. Kandemir, and A. Narayan, "Reducing
memory requirements of nested loops for embedded systems," in
Proc. 38th Design Automation Conference,
pp. 359-364, Las Vegas, NV, June 2001.
M. Kandemir, J. Ramanujam,
M. Irwin, V. Narayanan, I. Kadayif,
and A. Parikh, "Dynamic management of scratch-pad memory
space," in Proc. 38th Design Automation Conference,
pp. 690-695, Las Vegas, NV, June 2001.
D. Cociorva, J. Wilkins,
C.-C. Lam, G. Baumgartner,
P. Sadayappan, and J. Ramanujam, "Loop optimization for
a
class of memory-constrained computations," in Proc. 15th
ACM International Conference on Supercomputing (ICS'01),
pp. 103-113, Sorrento, Italy, June 2001.
J. Ramanujam, J. Hong,
M. Kandemir, and S. Atri, "Address
register-oriented optimizations for embedded processors," in
Proc. 9th Workshop on Compilers for Parallel Computers
(CPC 2001), pp. 281-290, Edinburgh, Scotland, June 2001.
S. Atri, J. Ramanujam, and M.
Kandemir, "Improving offset
assignment on embedded processors using transformations," in
Proc. High Performance Computing-HiPC 2000,
pp. 367-374, December 2000.
[pdf]
M. Kandemir and
J. Ramanujam, "Data relation vectors: A new
abstraction for data optimizations," in Proc. International Conference on Parallel Architectures
and
Compilation Techniques (PACT 00),
pp. 227-236, Philadelphia, PA, October 2000.
M. Narasimhan and
J. Ramanujam, "On lower bounds for
scheduling problems in high-level synthesis," in Proc. 37th Design Automation Conference, pp. 546-551, Los
Angeles, CA, June 2000.
V. Jain, S. Rele,
S. Pande, and J. Ramanujam, "Code
restructuring for improving real-time response through code
size, speed trade-offs on limited memory embedded DSPs," in
Languages and Compilers for Parallel Computing,
L. Carter and J. Ferrante (Eds.), Lecture Notes
in Computer Science, vol. 1863, pp. 459-463,
Springer-Verlag,
2000.
M. Kandemir, A. Choudhary, J.
Ramanujam, and P. Banerjee, "On
reducing false sharing while improving locality on shared
memory multiprocessors," in Proc. International
Conference
on Parallel Architectures and Compilation Techniques
(PACT 99), pp. 203-211, Newport Beach, CA,
October 1999.
M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "A
framework for interprocedural locality optimization using both
loop and data layout transformations," in Proc. 1999 International Conference on Parallel Processing,
pp. 95-102, Aizu, Japan, September 1999.
M. Kandemir, A. Choudhary, and J. Ramanujam, "Compiler
optimizations for I/O intensive computations," in
Proc. 1999 International Conference on Parallel
Processing, pp. 164-171, Aizu, Japan, September 1999.
M. Kandemir, A. Choudhary, and J. Ramanujam, "I/O conscious
tiling for disk-resident data sets," in Proc. Euro-Par'99,
pp. 430-439, Toulouse, France, September 1999.
M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and
E. Ayguade, "An integer linear programming approach to
optimizing cache locality," in Proc. 13th ACM International
Conference on Supercomputing (ICS 99),
pp. 500-509, Rhodes, Greece, June 1999.
M. Kandemir, A. Choudhary, J.
Ramanujam, and P. Banerjee, "A
graph based framework to detect optimal memory layouts for
improving data locality," in Proc. International Parallel
Processing Symposium (IPPS/SPDP 1999),
pp. 738-743, San Juan, Puerto Rico, April 1999.
M. Kandemir, A. Choudhary, and J.
Ramanujam, "Restructuring
I/O-intensive computations for locality," in Proc. Workshop
on High Performance Computation on Very Large Data Sets,
part of HPCN Europe 99,
pp. 1097-1106, Amsterdam, The Netherlands, April 1999.
M. Kandemir, A. Choudhary, J.
Ramanujam, and P. Banerjee,
"Improving locality using a graph-based technique for
detecting memory layouts of arrays," in Proc. 9th SIAM Conference on Parallel Processing for
Scientific Computing, San Antonio, TX,
March 1999 (proceedings only available in CD-ROM format).
M. Kandemir, J. Ramanujam,
A. Choudhary, and P. Banerjee, "An
iteration space transformation algorithm based on explicit
data layout representation for optimizing locality,"
in Languages and Compilers for Parallel
Computing, S. Chatterjee et al., (Eds.), Lecture Notes in
Computer Science, vol. 1656, pp. 34-50, Springer-Verlag,
1999.
M. Kandemir, A. Choudhary, J.
Ramanujam, and P. Banerjee,
"Improving locality using loop and data transformations in an
integrated framework," in Proc. 31st Annual ACM/IEEE
International Symposium on Microarchitecture (MICRO-31),
pp. 285-296, Dallas, TX, December 1998.
J. Ramanujam, A. Venkatachar, and
S. Dutta, "Efficient address
sequence generation for two-level mappings in High Performance
Fortran," in Proc. 1998
International Conference on High Performance Computing,
pp. 132-139, Chennai, India, December 1998.
M. Narasimhan and
J. Ramanujam, "Improving the computational
efficiency of ILP-based problems," in Proc. International Conference on Computer Aided Design
(ICCAD 98), pp. 593-596, San Jose, CA, November 1998.
M. Kandemir, A. Choudhary, J.
Ramanujam, and P. Banerjee, "A
matrix-based approach to the global locality optimization
problem," in Proc. International Conference on Parallel
Architectures and Compilation Techniques (PACT 98),
pp. 306-315, Paris, France, October 1998.
M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee,
"Data and loop transformations for optimizing locality," in
Proc. 3rd Workshop on Interaction Between Compilers and
Computer Architectures (INTERACT-3),, co-located with ASPLOS'98, San Jose, CA, October 1998.
M. Kandemir, A. Choudhary, J. Ramanujam, N. Shenoy,
and
P. Banerjee, "Enhancing spatial locality via data layout
optimizations," in Proc. Euro-Par'98 Parallel Processing,
pp. 422-434, Southampton, UK, September 1998.
M. Kandemir, N. Shenoy, P. Banerjee, J. Ramanujam,
and
A. Choudhary, "Minimizing data and synchronization costs in
one-way communication," in Proc. 1998 International
Conference on Parallel Processing,
pp. 180-188, Minneapolis, MN, August 1998.
M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee,
and
J. Ramanujam, "A hyperplane based approach for optimizing
spatial locality in loop nests," in Proc. 12th ACM
International Conference on Supercomputing (ICS'98),
pp. 69-76, Melbourne, Australia, July 1998.
J. Ramanujam, S. Dutta,
A. Venkatachar, and A. Thirumalai.
"Advanced compilation techniques for HPF," in Proc. 7th
International Workshop on Compilers for Parallel Computers,
P. Fritzson (Ed.), Linkoping, Sweden, pp. 57-68, June 1998.
M. Kandemir, A. Choudhary, J.
Ramanujam, and P. Banerjee.
"Optimizing spatial locality in loop nests using linear
algebra," in Proc. 7th International Workshop on
Compilers for Parallel Computers, P. Fritzson (Ed.),
Linkoping, Sweden, pp. 195-206, June 1998.
M. Kandemir, A. Choudhary, and J. Ramanujam,
"Improving locality in out-of-core computations
using data layout transformations,"
in Proc. 4th Workshop on Languages, Compilers, and
Run-Time Systems for Scalable Computers,
pp. 359-366, Pittsburgh, PA, May 1988.
M. Kandemir, P. Banerjee, A. Choudhary,
J. Ramanujam, and
N. Shenoy, "A generalized framework for global communication
optimization," in Proc. International Parallel
Processing Symposium (IPPS/SPDP 1998), pp. 69-73,
Orlando, FL, March-April 1998.
J. Ramanujam, S. Dutta, and
A. Venkatachar, "Code generation
for complex subscripts in data-parallel programs," in Languages and Compilers for Parallel Computing, Z. Li et
al., (Eds.), Lecture Notes in Computer Science, vol. 1366,
pp. 49-63, Springer-Verlag, 1998.
M. Kandemir, J. Ramanujam, and A. Choudhary, "Compiler
algorithms for optimizing locality and parallelism on shared
and distributed memory machines," in Proc. 1997
International Conference on Parallel Architectures and
Compilation Techniques (PACT 97), pp. 236-247, San
Francisco, CA, November 1997.
M. Kandemir, A. Choudhary, J. Ramanujam and M. Kandaswamy, "A
unified compiler algorithm for optimizing locality,
parallelism and communication in out-of-core computations,"
in Proc. Workshop on I/O in Parallel and
Distributed Systems (IOPADS'97),
pp. 79-92,
San Jose, CA, November 1997.
M. Kandemir, J. Ramanujam,
and A. Choudhary, "Optimizing
out-of-core computations using chain vectors," in Proc. EuroPar-97 Parallel Processing, pp. 601-608,
Passau,
Germany, August 1997.
M. Kandemir, J. Ramanujam, and A. Choudhary, "Improving
the
performance of out-of-core computations," in Proc. 1997
International Conference on Parallel Processing,
pp. 128-136, Bloomingdale, IL, August 1997.
M. Kandemir, J. Ramanujam, and A. Choudhary, "A compiler
algorithm for optimizing locality in loop nests," in Proc. 11th ACM International Conference on Supercomputing,
pp. 269-278, Vienna, Austria, July 1997.
A. Venkatachar, J. Ramanujam, and A. Thirumalai,
"Generalized overlap regions for communication optimization in
data-parallel programs,"
in Languages and Compilers for Parallel Computing,
D. Sehr et al., (Eds.), Lecture Notes in Computer Science,
vol. 1239, pp. 404-419,
Springer-Verlag, 1997.
M. Kandemir, R. Bordawekar,
A. Choudhary, and J. Ramanujam, "A
unified tiling approach for out-of-core computations," in
Proc. 6th Workshop on Compilers for Parallel
Computers, M. Gerndt (Ed.), Aachen, Germany,
pp. 323-334,
December 1996.
R. Bordawekar, A. Choudhary,
and J. Ramanujam, "A framework
for integrated communication and I/O placement," in Proc. Euro-Par'96 Parallel Processing,
pp. 541-552, Lyon, France,
August 1996.
R. Bordawekar, A. Choudhary, and J. Ramanujam,
"Automatic
optimization of communication in compiling out-of-core stencil
codes," in Proc. 10th ACM International Conference on
Supercomputing, pp. 366-373, Philadelphia, PA, May 1996.
A. Thirumalai and J. Ramanujam,
"Fast address sequence generation for data-parallel programs
using integer lattices,"
in Languages and Compilers for Parallel Computing,
C. Huang et al., (Eds.), Lecture Notes in Computer
Science, vol. 1033, pp. 191-208, Springer-Verlag, 1996.
A. Thirumalai and J. Ramanujam, "An efficient compile-time
approach to compute address sequences in data parallel
programs," in Proc. 5th International Workshop on
Compilers for Parallel Computers,
Malaga, Spain, pp. 581-605, June 1995.
J. Ramanujam and A. Narayan,
"Automatic data mapping and
program transformations," in Proc. Workshop on Automatic
Data Layout and Performance Prediction, sponsored by the
Center for Research on Parallel Computation, Rice University,
Houston, TX, April 1995 (only informal proceedings distributed
at the workshop).
J. Ramanujam and S. Vasanthakumar, "Statement-level
independent partitioning of uniform recurrences," in Proc. 9th International Parallel Processing Symposium,
pp. 229-233,
Santa Barbara, CA, April 1995.
S. D. Kaushik, C.-H. Huang, J. Ramanujam, and P. Sadayappan,
"Multi-phase array redistribution: modeling and evaluation,"
in Proc. 9th International Parallel Processing
Symposium, pp. 441-445,
Santa Barbara, CA, April 1995.
J. Ramanujam and A. Narayan, "Integrating data distribution
and loop transformations for distributed memory machines," in
Proc. 7th SIAM Conference on Parallel Processing for
Scientific Computing, D. Bailey et al., Eds., SIAM
Press, pp. 668-673, San Francisco, CA, February 1995.
J. Ramanujam and A. Mathew,
"Analysis of event synchronization in parallel programs,"
in Languages and Compilers for Parallel Computing,
K. Pingali et al., (Eds.), Lecture Notes in Computer
Science, vol. 892, pp. 300-315, Springer-Verlag, 1995.
J. Ramanujam, "Optimal software pipelining of nested loops,"
in Proc. 8th International Parallel Processing
Symposium, pp. 335-342, Cancun, Mexico, April 1994.
J. Ramanujam, "Non-unimodular transformations of nested
loops," in Proc. Supercomputing 92,
pp. 214-223, Minneapolis, MN, November 1992.
J. Ramanujam, "A linear algebraic view of loop
transformations and their interaction," in Proc. 5th
SIAM Conference on Parallel Processing for Scientific
Computing, D. Sorensen, Ed., SIAM Press, pp. 543-548,
1992.
J. Ramanujam and P. Sadayappan, "Multidimensional iteration
space tiling for nonshared memory machines," in Proc. Supercomputing 91, pp. 111-120,
Albuquerque, NM, November 1991.
J. Ramanujam and P. Sadayappan, "Access based data
decomposition in distributed memory machines," in Proc. 6th Distributed Memory Computing Conference,
pp. 196-199, Portland, OR, April 1991.
J. Ramanujam and P. Sadayappan, "Tiling of iteration spaces
for multicomputers," in Proc. 1990 International
Conference on Parallel Processing, vol. II,
pp. 179-186, St. Charles, IL, August 1990.
J. Ramanujam and P. Sadayappan, "Nested loop tiling for
distributed memory machines," in Proc. 5th Distributed
Memory Computing Conference,
pp. 1088-1096, Charleston, SC, April 1990.
P. Sadayappan, F. Ercal and J. Ramanujam,
"Distributed generation
of pairwise combinations on a hypercube,"
in Parallel Computing 89,
D. Evans, G. Joubert and F. Peters (Eds.),
Amsterdam, The Netherlands: North-Holland, pp. 299-304, 1990.
J. Ramanujam and P. Sadayappan, "A methodology for
parallelizing programs for multicomputers and complex memory
multiprocessors," in Proc. Supercomputing
89, pp. 637-646, Reno, NV, November 1989.
F. Ercal, P. Sadayappan and J. Ramanujam, "Parallel
graph
partitioning on a hypercube," in Proc. 4th
Hypercube Concurrent Computers and Applications Conference,
vol. 1, pp. 67-70, Monterey, CA, March 1989.
J. Ramanujam, F. Ercal and P. Sadayappan, "Task
allocation by
simulated annealing," in Proc. 3rd
International Conference on Supercomputing,
vol. 3, pp. 471-480, Boston, MA, May 1988.
F. Ercal, J. Ramanujam and P. Sadayappan, "Task allocation
onto a hypercube by recursive mincut bipartitioning," in Proc. 3rd Hypercube Concurrent Computers and
Applications Conference,
pp. 210-221, Pasadena, CA, January 1988.
J. Ramanujam and P. Sadayappan, "Optimization using neural
networks," in Proceedings of the 2nd IEEE International
Conference on Neural Networks,
vol. 2, pp. 325-332, San Diego, CA, July 1988.
A. Goel, J. Ramanujam and P. Sadayappan, "Towards
a neural
architecture for abductive reasoning," in Proceedings of the
2nd IEEE International Conference on Neural Networks, vol.
1, pp. 681-688, San Diego, CA, July 1988.
J. Ramanujam and P. Sadayappan, "Parameter identification for
constrained optimization using neural networks," in Proc. Connectionist Models Summer School,
pp. 154-161, Carnegie Mellon University, Pittsburgh, PA, June
1988,
Morgan Kaufman, San Mateo, CA.
A. Hartono, M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy,
B. Norris, J. Ramanujam, and P. Sadayappan.
PrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests.
Technical Report
OSU-CISRC-2/09-TR04.
Department of Computer Science and Engineering,
The Ohio State University, February 2009.
[pdf]
Q. Lu, U. Bondhugula, S. Krishnamoorthy, P. Sadayappan, J. Ramanujam,
Y. Chen, H. Lin, and T.-F. Ngai.
A Compile-Time Data Locality Optimization Framework
for NUCA Chip Multiprocessors.
Technical Report
OSU-CISRC-6/08-TR29.
Department of Computer Science and Engineering,
The Ohio State University, June 2008.
[pdf]
M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam,
A. Rountev, and P. Sadayappan, "Automatic Data Movement and Computation
Mapping for Multi-level Parallel Architectures with Explicitly Managed
Memories.
Technical Report
OSU-CISRC-2/08-TR05.
Department of Computer Science and Engineering,
The Ohio State University, February 2008.
[pdf]
M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan.
A Compiler Framework for Optimization of Affine Loop Nests
for General Purpose Computations on GPUs.
Technical Report
OSU-CISRC-12/07-TR78.
Department of Computer Science and Engineering,
The Ohio State University, December 2007.
[pdf]
U. Bondhugula, J. Ramanujam, and P. Sadayappan.
PLUTO: A Practical and Fully Automatic Polyhedral Program
Optimization Systems.
Technical Report OSU-CISRC-11/07-TR70,
Department of Computer
Science and Engineering, Ohio State University, November 2007.
[pdf]
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J.
Ramanujam,
A. Rountev, and P. Sadayappan.
Affine Transformations for Communication Minimal Parallelization
and Locality Optimization of Arbitrarily Nested Loop Sequences.
Technical Report OSU-CISRC-5/07-TR43, Department of Computer
Science and Engineering, Ohio State University, May 2007.
[pdf]
A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner,
D. Bernholdt, S.
Hirata, C. Lam, R. Pitzer, J. Ramanujam and P. Sadayappan. Automated
Operation
Minimization of Tensor Contraction Expressions in Electronic Structure
Calculations. Technical Report OSU-CISRC-2/05-TR10, Dept. of Comp. and
Info. Sci.,
The Ohio State University, 2005.
[pdf]
X. Gao, S. Sahoo, Q. Lu, G. Baumgartner, C. Lam, J.
Ramanujam,
and P. Sadayappan. Compiler Techniques for Efficient
Parallelization of Out-of-Core Tensor Contractions.
Technical Report OSU-CISRC-12/04-TR67,
Dept. of Comp. and Info. Sci., The Ohio State University, 2004.
[pdf]
D. Cociorva, J. Wilkins, G.
Baumgartner, P. Sadayappan,
J. Ramanujam, M. Nooijen, D. Bernholdt, and
R. Harrison. Space-Time Trade-Off Optimization for a Class of
Electronic Structure Calculations. Technical Report
OSU-CISRC-11/01-TR24, Dept. of Comp. and Info. Sci., The Ohio
State University, 2001.
[pdf]
M. Kandemir, J. Ramanujam, A. Choudhary, and P. Banerjee, "A
locality optimization algorithm based on explicit representation
of data layouts," Technical Report CSE-00-008, Department of
Computer Science and Engineering, The Pennsylvania State
University, May 2000.
M. Kandemir, J. Ramanujam, and A. Choudhary,
"A compiler algorithm for optimizing locality in loop nests,"
Technical Report
CPDC-TR-9802-010,
Center for Parallel and Distributed Computing, Northwestern
University,
February 1998.
M. Kandemir, A. Choudhary, J. Ramanujam, N. Shenoy and P. Banerjee,
"Enhancing spatial locality using data layout optimizations,"
Technical Report
CPDC-TR-97-07,
Center for Parallel and Distributed Computing, Northwestern
University,
December 1997.
M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J.
Ramanujam,
"Experiments with data layouts,"
Technical Report
CPDC-TR-97-06,
Center for Parallel and Distributed Computing, Northwestern
University,
October 1997.
M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J.
Ramanujam,
"A hyperplane based approach for optimizing spatial
locality in loop nests,"
Technical Report
CPDC-TR-97-04,
Center for Parallel and Distributed Computing, Northwestern
University,
October 1997.
M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam and N. Shenoy,
"A combined communication and synchronization optimization
algorithm for one-way communication,"
Technical Report
CPDC-TR-97-03,
Center for Parallel and Distributed Computing, Northwestern
University,
October 1997.
M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam and N. Shenoy,
"Optimizing communication using global dataflow analysis,"
Technical Report
CPDC-TR-97-02,
Center for Parallel and Distributed Computing, Northwestern
University,
October 1997.
M. Kandemir, R. Bordawekar, A. Choudhary, and J. Ramanujam,
"A unified tiling approach for out-of-core computations,"
Technical Report CACR-131,
Center for Advanced Computing Research,
California Institute of Technology,
November 1996.
R. Bordawekar, A. Choudhary, and J. Ramanujam,
"A framework for integrated communication and I/O placement,"
Technical Report CACR-118,
Center for Advanced Computing Research,
California Institute of Technology,
February 1996.
R. Bordawekar, A. Choudhary, and J. Ramanujam,
"Automatic optimization of communication in compiling
out-of-core stencil codes,"
Technical Report CACR-114,
Center for Advanced Computing Research,
California Institute of Technology,
November 1995.
R. Bordawekar, A. Choudhary, and J. Ramanujam,
"Compilation and communication strategies for out-of-core programs
on distributed-memory machines,"
Technical Report CACR-113,
Center for Advanced Computing Research,
California Institute of Technology,
November 1995.
S. Kaushik, C. Huang, J. Ramanujam and P. Sadayappan,
"Multiphase array redistribution: A communication efficient
approach
to array redistribution,"
Technical Report OSU-CISRC-9/94-TR52,
The Ohio State University, September 1994.
S. Pande,
J. Ramanujam, and Y. Robert, "Compiling for
scalable parallel systems," Editorial Note, Parallel
Processing Letters, Vol. 7, No. 4, 1997.
M. Kandemir, A. Choudhary,
J. Ramanujam and R. Bordawekar,
"Optimizing out-of-core computations in uniprocessors,"
Newsletter of the Technical Committee on Computer Architecture
(TCCA), IEEE Computer Society, Special Issue on Interaction
Between Compilers and Computer Architecture, pp. 25-27,
June
1997.
P. Sadayappan, Atanas Rountev, Robert Harrison, J. Ramanujam,
Gerald Baumgartner, and Jarek Nieplocha,
"Runtime Support for Multi-scale Applications on High-end Systems,"
talk,
NSF Next Generation Software (NGS) 2007 Workshop,
held in conjunction with
IEEE International Parallel and Distributed Processing Symposium,
Long Beach, California, USA, March 25-26 2007.
P. Sadayappan, A. Auer, G. Baumgartner, D. Bernholdt, R. Harrison, S.
Hirata,
C. Lam, M. Nooijen, R. Pitzer, J. Ramanujam, A. Bibireata, X. Gao,
S. Krishnamoorthy, S, Krishnan, Q. Lu, and A. Sibiryakov,
"Performance optimization issues in automatic synthesis of
high-performance codes for correlated electronic structure methods,"
talk, 228th ACS National Meeting, Philadelphia, PA, August
2004.
P. Sadayappan, A. Auer, G. Baumgartner, D.
Bernholdt, A. Bibireata,
V. Choppella, D. Cociorva, X. Gao, R. Harrison,
S. Hirata,
S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu,
M. Nooijen, R. Pitzer,
J. Ramanujam, and A. Sibiryakov, "A High-Level Approach to the
Synthesis of High-Performance Codes for Quantum Chemistry," poster,
44th Sanibel Symposium, University of Florida Quantum Theory
Project, Sanibel, FL, March 2004.
D. Bernholdt, A. Auer, G. Baumgartner, A.
Bibireata, V. Choppella,
D. Cociorva, X. Gao, R. Harrison, S. Hirata,
S. Krishnamoorthy,
S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R.
Pitzer, J. Ramanujam,
P. Sadayappan, A. Sibiryakov, and J. White, "A High-Level
Approach to
the Synthesis of High-Performance Codes for Quantum Chemistry,"
poster, Los Alamos Computer Science Institute Symposium
(LACSI), Los Alamos, NM, October 2003.
D. Bernholdt, A. Auer, G. Baumgartner, A.
Bibireata, V. Choppella,
D. Cociorva, X. Gao, R. Harrison, S. Hirata,
S. Krishnamoorthy,
S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R.
Pitzer, J. Ramanujam,
P. Sadayappan, and A. Sibiryakov, "Synthesizing Highly Optimized
Code
for Correlated Electronic Structure Calculations," talk, 226th
ACS National Meeting, New York, NY, September 2003.
P. Sadayappan, A. Auer, G. Baumgartner, D.
Bernholdt, A. Bibireata,
V. Choppella, D. Cociorva, X. Gao, R. Harrison,
S. Hirata,
S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu,
M. Nooijen, R. Pitzer,
J. Ramanujam, and A. Sibiryakov, "Automatic Synthesis of
High-Performance Parallel Programs for Electronic Structure Methods,"
poster, 226th ACS National Meeting, New York, NY, September
2003.
G. Baumgartner, D. Cociorva, C. Lam, P.
Sadayappan, R. Pitzer,
A. Bibireata, X. Gao, Q. Lu, S. Krishnamoorthy,
S. Krishnan,
A. Sibiryakov, D. Bernholdt, R. Harrison, V.
Choppella, S. Hirata,
M. Nooijen, A. Auer, and J. Ramanujam, "Synthesis of
High-Performance
Algorithms for Electronic Structure Calculations," poster, 225th
ACS National Meeting, New Orleans, LA, March 2003.
M. Nooijen, A. Auer, D. Bernholdt, V. Choppella,
D. Dean, R. Harrison,
T. Papenbrock, M. Strayer, T. White, S. Hirata,
G. Baumgartner,
D. Cociorva, P. Sadayappan, R. Pitzer, A.
Bibireata, X. Gao, Q. Lu,
S. Krishnamoorthy, S. Krishnan, A. Sibiryakov, and J.
Ramanujam,
"Computer-Aided Implementation of Many Body Methods: The Tensor
Contraction Engine," talk, 225th ACS National Meeting, New
Orleans, LA, March 2003.
G. Baumgartner, D. Bernholdt, D. Cociorva, R.
Harrison, S. Hirata,
C. Lam, M. Nooijen, R. Pitzer, J. Ramanujam, P.
Sadayappan, and
V. Choppella, "A High-Level Approach to the Synthesis of
High-Performance Codes for Quantum Chemistry," poster, 43rd
Sanibel Symposium, University of Florida Quantum Theory Project,
Sanibel, FL, February 2003.
D. Bernholdt, V. Choppella, D. Dean, R.
Harrison, T. Papenbrock,
M. Strayer, T. White, S. Hirata, G. Baumgartner,
D. Cociorva,
R. Pitzer, P. Sadayappan, J. Ramanujam, M. Nooijen, and
A. Auer, "A
High-Level Approach to the Synthesis of High-Performance Codes for
Quantum Chemistry," invited talk, University of Tennessee
Chemical Physics Workshop, Knoxville, Tennessee, February 2003.
D. Bernholdt, V. Choppella, D. Dean, R.
Harrison,
T. Papenbrock, M. Strayer, T. White, S. Hirata,
G. Baumgartner, D. Cociorva, R. Pitzer,
P. Sadayappan, J. Ramanujam, M. Nooijen, and A. Auer,
"A High-Level Approach to the Synthesis of High-Performance
Codes for Quantum Chemistry," talk,
SIAM Computational Science and Engineering '03,
San Diego, California, February 2003.
S. Hirata, G. Baumgartner, D. Bernholdt, D.
Cociorva, R. Harrison,
C. Lam, M. Nooijen, J. Ramanujam, and P. Sadayappan,
"Operatot and
Tensor Contraction Engines -- Computer-Aided Synthesis of
Coupled-Cluster
Programs of any given Excitation Order,"
talk, American Conference of Theoretical Chemistry,
Pittsburgh, PA, July 2002.
G. Baumgartner, D. Bernholdt, D. Cociorva, R.
Harrison, S. Hirata,
C. Lam, M. Nooijen, J. Ramanujam, and P. Sadayappan,
"Compilation of
a High-Level Quantum Chemistry Language into Efficient Parallel
Code," talk, Spring 2002 Workshop of the Midwest Society for
Programming Languages and Systems, Indiana University, Bloomington,
IN, April 2002.