Publications of
Jagannathan (Ram) Ramanujam

Department of Electrical and Computer Engineering
Louisiana State University
Elec. Engr. Building, South Campus Drive
Baton Rouge, LA 70803-5901, USA

Phone: +1 (225) 578-5628
Fax: +1 (225) 578-5200
Email: j x r {@} ece.lsu.edu
Web: http://www.ece.lsu.edu/jxr/jxr.html

horizontal rule

Links

bullet Books
bullet Edited Proceedings
bullet Book Chapters
bullet Refereed Journal Articles
bullet Refereed Articles in Conference Proceedings
bullet Technical Reports
bullet Other Articles
bullet Presentations at Symposia (no associated publications)

horizontal rule

Books

  1. E. Ayguade, G. Baumgartner, J. Ramanujam, and P. Sadayappan (editors), Languages and Compilers for Parallel Computing, Springer-Verlag, 2007.
  2. L. Benini, M. Kandemir, and J. Ramanujam (editors), Compilers and Operating Systems for Low Power, Kluwer Academic Publishers, Boston, MA, January 2004. ISBN: 1-4020-7573-1.

  3. Chua-Huang Huang and J. Ramanujam (editors), Proceedings of The 2003 International Conference on Parallel Processing Workshops. IEEE Computer Society Press, October 2003. ISBN: 9780769520186. Available at Amazon Barnes&Noble ISBN (10-digit): 0769520189.

horizontal rule

Edited Proceedings

  1. M. Schulz, S.Midkiff, J. Ramanujam, and P. Sadayappan (editors), Proceedings of the Joint Workshop on High-Level Parallel Programming Models and Supportive Environments and Performance Optimization for High-Level Languages and Libraries (HIPS-POHLL 2008), held in conjunction with the 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS 2008), Miami, FL, April 2008.

  2. G. Baumgartner, J. Ramanujam, A. Rountev, and P. Sadayappan (editors), Proceedings of the Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL-07), held in conjunction with the 21st IEEE International Parallel & Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, March 2007.

  3. G. Baumgartner, J. Ramanujam, and P. Sadayappan (editors), Proceedings of the Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL-06), held in conjunction with the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2006), Rhodes, Greece, April 2006.

  4. D. Marculescu and J. Ramanujam (editors), Proceedings of the Workshop on Compilers and Operating Systems for Low Power (COLP'03), held in conjunction with the International Conference on Parallel Architectures and Compilation (PACT 2003), September 2003, New Orleans, LA, USA.

  5. D. Marculescu and J. Ramanujam (editors), Proceedings of the Workshop on Compilers and Operating Systems for Low Power (COLP'02) held in conjunction with the International Conference on Parallel Architectures and Compilation (PACT 2002), September 2002, Charlottesville, VA, USA.

  6. G. Baumgartner, J. Ramanujam, and P. Sadayappan (editors), Proceedings of the Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL-02), held in conjunction with the 16th Annual ACM International Conference on Supercomputing (ICS'02), June 2002, New York, NY.

  7. L. Benini, M. Kandemir, and J. Ramanujam (editors), Proceedings of the Workshop on Compilers and Operating Systems for Low Power (COLP'01), held in conjunction with the International Conference on Parallel Architectures and Compilation (PACT 2001), October 2001, Barcelona, Spain.

  8. M. Kandemir and J. Ramanujam (editors), Proceedings of the Workshop on Compilers and Operating Systems for Low Power (COLP'00), in conjunction with the International Conference on Parallel Architectures and Compilation (PACT 2000), October 15-19, 2000, Philadelphia, PA.

  9. S. Pande, J. Ramanujam, and Y. Robert (editors), Proceedings of the Workshop on Challenges in Compiling for Scalable Parallel Systems, in conjunction with the 8th IEEE Symposium on Parallel and Distributed Processing, New Orleans, LA, October 1996.

horizontal rule

Book Chapters

  1. J. Ramanujam, "Automatic Data Distribution," in The Compiler Design Handbook: Optimizations and Machine Code Generation, (Y. N. Srikant and P. Shankar: Eds.), Chapter 12, pp. 409-459, CRC Press, Boca Raton, FL, 2002.

  2. J. Ramanujam, "Integer lattice based methods for local address generation for block-cyclic distributions," in Compiler Optimizations for Scalable Parallel Systems - Languages, Compilation Techniques, and Run Time Systems, S. Pande and D. P. Agrawal (Eds.), Lecture Notes in Computer Science, vol. 1808, pp. 597-645, Springer-Verlag, 2001. [pdf]

  3. A. Thirumalai, J. Ramanujam and A. Venkatachar, "Communication generation and optimization for HPF," in Languages, Compilers, and Run-Time Systems for Scalable Computers, B. Szymanski and B. Sinharoy, (Eds.), Chapter 29, pp. 311-316, Kluwer Academic Publishers, Norwell, MA, 1995.

  4. J. Ramanujam and P. Sadayappan, "Iteration space tiling for distributed memory machines," in Languages, Compilers and Environments for Distributed Memory Machines, J. Saltz and P. Mehrotra, (Eds.), North-Holland, Amsterdam, The Netherlands, pp. 255-270, 1992.

horizontal rule

Refereed Journal Articles

  1. H. Salamy and J. Ramanujam, "Storage Optimization through Offset Assignment with Variable Coalescing," ACM Transactions on Embedded Computing Systems (TECS), 2010.
  2. A. Hartono, Q. Lu, T. Henretty, S. Krishnamoorthy, H. Zhang, G. Baumgartner, D. E. Bernholdt, M. Nooijen, R. Pitzer, J. Ramanujam, and P. Sadayappan, "Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry," The Journal of Physical Chemistry A, Vol. 113 (45), pp. 12715-12723, 2009.  
  3. X. Gao, S. Krishnamoorthy, S. Sahoo, C. Lam, G. Baumgartner, J. Ramanujam, and P. Sadayappan, "Efficient Search-Space Pruning for Integrated Fusion and Tiling Transformations," Concurrency and Computation: Practice and Experience, 2007. [pdf]

  4. S. Krishnan, S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Ramanujam, P. Sadayappan, and V. Choppella, "Efficient Synthesis of Out-of-Core Algorithms Using a Nonlinear Optimization Solver," Journal of Parallel and Distributed Computing, vol. 66, no. 5, pp. 659-673, May 2006. [pdf]

  5. G. Chen, M. Kandemir, M. J. Irwin, and J. Ramanujam, "Reducing code size through address register assignment," ACM Transactions on Embedded Computing (TECS), vol. 5, no. 1, pp. 225-258, February 2006. [pdf]

  6. M. Kandemir, J. Ramanujam, and U. Sezer, "Improving the Energy Behavior of Block Buffering Using Compiler Optimizations," ACM Transactions on Design Automation of Electronic Systems, vol. 11, no. 1, pp. 228-250, January 2006. [pdf]

  7. J. Ramanujam, J. Hong, M. Kandemir, A. Narayan, and A. Agarwal, "Estimating and Reducing the Memory Requirements of Signal Processing Codes for Embedded Processor Systems," IEEE Transactions on Signal Processing, vol. 54, no. 1, pp. 286--294, January 2006. [pdf]

  8. A. Auer, G. Baumgartner, D. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. Harrison, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, "Automatic Code Generation for Many-Body Electronic Structure Methods: The Tensor Contraction Engine," Molecular Physics, vol. 104, no. 2, pp. 211--228, January 2006. [pdf]

  9. G. Baumgartner, A. Auer, D. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, "Synthesis of High-Performance Parallel Programs for a Class of ab initio Quantum Chemistry Models," Proceedings of the IEEE, vol. 93, no. 2, pp. 276-292, February 2005. [pdf]

  10. M. Kandemir, I. Kadayif, A. Choudhary, J. Ramanujam, and I. Kolcu, "Compiler-directed scratch pad memory optimization for embedded multiprocessors," IEEE Transactions on VLSI (TVLSI), vol. 12, no. 3, pp. 281-287, March 2004. [pdf]

  11. M. Kandemir, J. Ramanujam, M. Irwin, V. Narayanan, I. Kadayif, and A. Parikh, "A Compiler Based Approach for Dynamically Managing Scratch-Pad Memories in Embedded Systems," IEEE Transactions on Computer-Aided Design, vol. 23, no. 2, pp. 243-260, February 2004. [pdf]

  12. M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "Reducing false sharing and improving spatial locality in a unified compilation framework," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 4, pp. 337-354, April 2003. [pdf]

  13. M. Kandemir, A. Choudhary, and J. Ramanujam, "An I/O conscious tiling strategy for disk-resident data sets," The Journal of Supercomputing, vol. 21, no. 3, pp. 257-284, 2002. [pdf]

  14. M. Kandemir, J. Ramanujam, A. Choudhary, and P. Banerjee, "A layout-conscious iteration space transformation technique," IEEE Transactions on Computers, vol. 50, no. 12, pp. 1321-1336, December 2001.

  15. M. Narasimhan and J. Ramanujam, "A fast approach to computing exact solutions to the resource-constrained scheduling problem," ACM Transactions on Design Automation of Electronic Systems, vol. 6, no. 4, pp. 490-500, December 2001.

  16. M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and E. Ayguade, "Static and dynamic locality optimizations using integer linear programming," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 9, pp. 922-941, September 2001.

  17. M. Kandemir and J. Ramanujam, "Data relation vectors: A new abstraction for data optimizations," IEEE Transactions on Computers, vol. 50, no. 8, pp. 798-810, August 2001.

  18. V. Jain, S. Rele, S. Pande, and J. Ramanujam, "Compact and efficient code generation through program restructuring on limited memory embedded DSPs," IEEE Transactions on Computer-Aided Design, vol. 20, no. 4, pp. 477-494, April 2001.

  19. M. Kandemir, A. Choudhary, P. Banerjee, J. Ramanujam, and N. Shenoy, "Minimizing data and synchronization costs in one-way communication," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 12, pp. 1232-1251, December 2000.

  20. M. Kandemir, J. Ramanujam, and A. Choudhary, "Compiler algorithms for optimizing locality and parallelism on shared and distributed memory machines," Journal of Parallel and Distributed Computing, vol. 60, no. 8, pp. 924-965, August 2000.

  21. M. Kandemir, A. Choudhary, J. Ramanujam, and M. Kandaswamy, "A unified framework for optimizing locality, parallelism, and communication in out-of-core computations," IEEE Transactions of Parallel and Distributed Systems, vol. 11, no. 7, pp. 648-668, July 2000.

  22. M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, N. Shenoy, "A global communication optimization technique based on data flow analysis and linear algebra," ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 21, no. 6, pp. 1251-1297, November 1999.

  23. M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "A matrix-based approach to global locality optimization," Journal of Parallel and Distributed Computing, vol. 58, no. 2, pp. 190-235, September 1999.

  24. M. Kandemir, J. Ramanujam, and A. Choudhary, "Improving cache locality by a combination of loop and data transformations," IEEE Transactions on Computers, vol. 48, no. 2, pp. 159-167, February 1999.

  25. M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, "A linear algebra framework for automatic determination of optimal data layouts," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 2, pp. 115-135, February 1999.

  26. P. Sadayappan, F. Ercal and J. Ramanujam, "Partitioning graphs on message-passing machines by pairwise mincut," Information Sciences, vol. 111, no. 1-4, pp. 223-237, October 1998.

  27. M. Kandemir, A. Choudhary, J. Ramanujam and R. Bordawekar, "Compilation techniques for out-of-core parallel computations," Parallel Computing, vol. 24, no. 3-4, pp. 597-628, June 1998.

  28. M. Kandemir, A. Choudhary, J. Ramanujam and M. Kandaswamy, "Locality optimization algorithms for compilation of out-of-core codes," Journal of Information Science and Engineering, vol. 14, no. 1, pp. 107-138, March 1998.

  29. A. Venkatachar, J. Ramanujam, and A. Thirumalai, "Communication generation for block-cyclic distributions," Parallel Processing Letters, vol. 7, no. 2, pp. 195-202, 1997.

  30. A. Goel and J. Ramanujam, "A neural architecture for a class of abduction problems," IEEE Transactions on Systems Man and Cybernetics, vol. 26, no. 6, pp. 854-860, December 1996.

  31. A. Thirumalai and J. Ramanujam, "Efficient computation of address sequences in data parallel programs using closed forms for basis vectors," Journal of Parallel and Distributed Computing, vol. 38, no. 2, pp. 188-203, November 1996.

  32. R. Bordawekar, A. Choudhary, and J. Ramanujam, "Compilation and communication strategies for out-of-core programs on distributed memory machines," Journal of Parallel and Distributed Computing, vol. 38, no. 2, pp. 277-288, November 1996.

  33. R. Thakur, A. Choudhary and J. Ramanujam, "Efficient algorithms for array redistribution," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 6, pp. 587-594, June 1996.

  34. J. Ramanujam, "Beyond unimodular transformations," The Journal of Supercomputing, vol. 9, no. 4, pp. 365-389, 1995.

  35. J. Ramanujam and P. Sadayappan, "Mapping combinatorial optimization problems onto neural networks," Information Sciences, vol. 82, no. 3-4, pp. 239-255, January 1995.

  36. J. Ramanujam and P. Sadayappan, "Tiling multidimensional iteration spaces for multicomputers," Journal of Parallel and Distributed Computing, vol. 16, no. 2, pp. 108-120, October 1992.

  37. J. Ramanujam and P. Sadayappan, "Compile-time techniques for data distribution in distributed memory machines," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, pp. 472-482, October 1991.

  38. F. Ercal, J. Ramanujam and P. Sadayappan, "Task allocation by recursive mincut bipartitioning onto a hypercube," Journal of Parallel and Distributed Computing, vol. 10, no. 1, pp. 35-44, September 1990.

  39. F. Ercal, P. Sadayappan and J. Ramanujam, "Cluster partitioning approaches to mapping parallel programs onto a hypercube," Parallel Computing, vol. 13, no. 1, pp. 1-16, March 1990.

horizontal rule

Refereed Articles in Conference Proceedings

  1. L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam and P. Sadayappan, "Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework," in Proc. ACM/IEEE Conference on High Performance Computing SC10, New Orleans, LA, November 2010.
  2. S. Tavarageri, A. Hartono, M. Baskaran, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan, "Parametric Tiling of Affine Loop Nests," in Proc. 15th Workshop on Compilers for Parallel Computers (CPC 2010), Vienna, Austria, July 2010.
  3. M. Baskaran, A. Hartono, S. Tavarageri, T. Henretty, J. Ramanujam, and P. Sadayappan, "Parameterized Tiling Revisited," International Symposium on Code Generation and Optimization (CGO), Toronto, Canada, April 2010.
  4. A. Hartono, M. Baskaran, J. Ramanujam, and P. Sadayappan, "DynTile: Parametric Tiled Loop Generation for Parallel Execution on Multicore Processors," 24 International Parallel and Distributed Processing Symposium (2010 IPDPS Conference), Atlanta, April 2010.
  5. M. Baskaran, J. Ramanujam, and P. Sadayappan, "Automatic C-to-CUDA Code Generation for Affine Programs ," International Conference on Compiler Construction (CC), Paphos, Cyprus, March 2010.
  6. Q. Lu, C. Alias, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan, Y. Chen, H. Lin and T. Ngai, "Data Layout Transformation for Enhancing Locality on NUCA Chip Multiprocessors," in Proc. 18th International Conference on Parallel Architectures and Compilation Techniques (PACT 09), Raleigh, NC, September 2009.  

  7. Z. Yun, Z. Lei, G. Allen, D. S. Katz, T. Kosar, S. Jha, J. Ramanujam, "An innovative application execution toolkit for multicluster grids," in Proc. CLUSTER 2009, New Orleans, LA, September 2009. pp. 1-4.  

  8. A. Hartono, M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy, B. Norris, J. Ramanujam, and P. Sadayappan, "Parametric Multi-Level Tiling of Imperfectly Nested Loops," in Proc. 23nd ACM International Conference on Supercomputing, Yorktown Heights, New York, June 2009.  

  9. M. Baskaran, N. Vydhyanathan, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan, "Compiler-Assisted Dynamic Scheduling for Effective Parallelization of Loop Nests on Multicore Processors," in Proc. 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2009), Raleigh, NC, February 2009.  

  10. R. Sankaran, B. Ullmer, K. Kallakuri, S. Jandhyala, C. Toole, J. Ramanujam, and C. Laan, "Decoupling Interaction Hardware Design Using Libraries of Reusable Electronics," in Proc. 3rd International Conference on Tangible and Embedded Interaction (TEI'09), Cambridge, UK, February 2009.  

  11. Hassan Salamy and J. Ramanujam, "A Framework for Task Scheduling and Memory Partitioning for Multi-Processor System-on-Chip," in Proc. 4th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC 2009), Paphos, Cyprus, January 2009.  

  12. U. Bondhugula, M. Baskaran, A. Hartono, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, "A Polyhedral Framework for Automatic Parallelization and Locality Optimization," in Proc. 14th Workshop on Compilers for Parallel Computers (CPC 2009), Zurich, Switzerland, January 2009.  

  13. Hassan Salamy and J. Ramanujam, "Optimal Address Register Allocation for Arrays in DSP Applications," in Proc. 6th IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia 2008), Atlanta, GA, October 2008. 

  14. Hassan Salamy and J. Ramanujam, "Storage Optimization through Code Size Reduction for Digital Signal Processors," in Proc. 6th IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia 2008), Atlanta, GA, October 2008.  

  15. Jinpyo Hong and J. Ramanujam, "Address Register Allocation in Digital Signal Processors," in Proc. 2008 International Conference on Embedded Systems and Software (ICESS-08), Chengdu, China, July 2008.  

  16. Jinpyo Hong and J. Ramanujam, "Scheduling DAGs for Fixed-point DSP Processors by Using Worm Partitions," in Proc. 2008 International Conference on Embedded Systems and Software (ICESS-08), Chengdu, China, July 2008.  

  17. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, "A Practical and Automatic Polyhedral Program Optimization System," Proc. ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 08), Tucson, June 2008. [pdf [Extended version

  18. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, "A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs," in Proc. 22nd ACM International Conference on Supercomputing, Island of Kos, Greece, June 2008. [pdf]  [Extended version

  19. U. Bondhugula, M. Baskaran, A. Hartono, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, "Towards Effective Automatic Parallelization for Multicore Systems," in Proc. Workshop on Next Generation Software (NGS 2008), held in conjunction with the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, FL, April 2008. [pdf

  20. U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, "Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model," in Proc. CC 2008 - International Conference on Compiler Construction, Budapest, Hungary, March-April 2008. [pdf]  [Extended version]

  21. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev and P. Sadayappan, "Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories," in Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (PPoPP 2008), Salt Lake City, UT, February 2008. [pdf] [Extended version]

  22. S. Pinnepalli, Jinpyo Hong, and J. Ramanujam and Doris Carver, "Code Size Optimization for Embedded Processors using Commutative Transformations," in Proc. The 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA-07), Daegu, Korea, August 2007. [pdf]

  23. S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev and P. Sadayappan, "Effective Automatic Parallelization of Stencil Computations," in Proc. ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 07), San Diego, June 2007. [pdf]

  24. Jinpyo Hong and J. Ramanujam, "Memory Offset Assignment for DSPs," in Proc. 2007 International Conference on Embedded Systems and Software (ICESS-07), Daegu, Korea, May 2007. [pdf]

  25. U. Bondhugula, J. Ramanujam, and P. Sadayappan, "Automatic Mapping of Nested Loops to FPGAs," in Proc. ACM SIGPLAN 2007 Symposium on Principles and Practice of Parallel Programming (PPoPP 07), San Jose, CA, March 2007. [pdf]

  26. Hassan Salamy and J. Ramanujam, "An Effective Heuristic for Simple Offset Assignment with Variable Coalescing," Languages and Compilers for Parallel Computing, (C. Cascaval et al. Eds.), Lecture Notes in Computer Science, Springer-Verlag, 2007. [pdf]

  27. Atef Allam and J. Ramanujam, "Dynamic Memory Usage Optimization using ILP," Proc. 2nd International Computer Engineering Conference: Engineering the Information Society (ICENCO 2006), Cairo, Egypt, December 2006. [pdf]

  28. Atef Allam and J. Ramanujam, "ILP and Iterative LP Solutions for Peak and Average Power Optimization in HLS," Proc. 2nd International Computer Engineering Conference: Engineering the Information Society (ICENCO 2006), Cairo, Egypt, December 2006. [pdf]

  29. Atef Allam and J. Ramanujam, "Modified Force-Directed Scheduling for Peak and Average Power Optimization using Multiple Supply-Voltages," in Proc. International Conference on Integrated Circuit Design and Technology (ICICDT), Padova, Italy, May 2006. [pdf]

  30. Atef Allam and J. Ramanujam, "Simultaneous Peak and Average Power Optimization in Synchronous Sequential Designs Using Retiming and Multiple Supply Voltages," in Proc. International Conference on Integrated Circuit Design and Technology (ICICDT), Padova, Italy, May 2006. [pdf]

  31. A. Hartono, Q. Lu, X. Gao, S. Krishnamoorthy, M. Nooijen, G. Baumgartner, D. Bernholdt, R. Pitzer, J. Ramanujam, A. Rountev, and P. Sadayappan, "Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations," in Proc. International Conference on Computational Science 2006 (ICCS 2006), Reading, UK, Lecture Notes in Computer Science, Springer-Verlag, 2006. [pdf]

  32. A. Allam, J. Ramanujam, G. Baumgartner, and P. Sadayappan, "Memory Minimization for Tensor Contractions using Integer Linear Programming," Proc. Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL-06), held in conjunction with the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2006), Rhodes, Greece, April 2006. [pdf]

  33. X. Gao, S. Krishnamoorthy, Q. Lu, V. Choppella, G. Baumgartner, J. Ramanujam, and P. Sadayappan, "Search-Based Performance-Model Driven Optimization for Compilation of Tensor Contraction Expressions," in Proc. 12th Workshop on Compilers for Parallel Computers (CPC 2006), A Coruna, Spain, January 2006.
  34. X. Gao, S. Krishnamoorthy, S. Sahoo, C. Lam, G. Baumgartner, J. Ramanujam, and P. Sadayappan, "Efficient Search-Space Pruning for Integrated Fusion and Tiling Transformations," in Languages and Compilers for Parallel Computing, (E. Ayguade et al. Eds.), Lecture Notes in Computer Science, Springer-Verlag, 2006. [pdf]

  35. X. Gao, S. Sahoo, Q. Lu, G. Baumgartner, C. Lam, J. Ramanujam, and P. Sadayappan, "Performance Modeling and Optimization of Parallel Out-of-Core Tensor Contractions," in Proc. ACM SIGPLAN 2005 Symposium on Principles and Practice of Parallel Programming, pp. 266-276, Chicago, IL, June 2005. [pdf]

  36. A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner, D.E. Bernholdt, S. Hirata, C. Lam, R. Pitzer, J. Ramanujam, and P. Sadayappan, "Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations," in Proc. International Conference on Computational Science 2005 (ICCS 2005), Atlanta, GA, May 2005. [pdf]

  37. Q. Lu, X. Gao, S. Krishnamoorthy, G. Baumgartner, J. Ramanujam, and P. Sadayappan, "Empirical Performance-Model Driven Data Layout Optimization," Languages and Compilers for Parallel Computing, (R. Eigenmann et al. Eds.), Lecture Notes in Computer Science, Springer-Verlag, 2005. [pdf]

  38. D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam, "Memory-Constrained Communication Minimization for a Class of Array Computations," in Languages and Compilers for Parallel Computing, (W. Pugh et al. Eds.), Springer-Verlag, 2005. [pdf]

  39. G. Baumgartner, D. Bernholdt, V. Choppella, J. Ramanujam, and P. Sadayappan, "A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry: The Tensor Contraction Engine," in Proc. 11th Workshop on Compilers for Parallel Computers (CPC 2004), Chiemsee, Germany, July 2004.

  40. S. Krishnan, S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Ramanujam, and P. Sadayappan, "Efficient Synthesis of Out-of-core Algorithms Using a Nonlinear Optimization Solver," in Proc. 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), Santa Fe, New Mexico, April 2004. (Best Paper Award) [pdf]

  41. A. Bibireata, S. Krishnan, G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam, D. Bernholdt, and V. Choppella, "Memory-Constrained Data Locality Optimization for Tensor Contractions," in Languages and Compilers for Parallel Computing, (L. Rauchwerger et al. Eds.), Lecture Notes in Computer Science, Vol. 2958, pp. 93-108, Springer-Verlag, 2004. [pdf]

  42. S. Krishnan, S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam, D. Bernholdt, and V. Choppella, "Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms," in Proc. of the Intl. Conf. on High Performance Computing (HiPC 03), 2003. (Best Paper Award) [pdf]

  43. D. Cociorva, X. Gao, S. Krishnan, Gerald Baumgartner, C. Lam, P. Sadayappan, and J. Ramanujam, "Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints," in Proc. 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France, April 2003. [pdf]

  44. M. Kandemir, M. J. Irwin, G. Chen, J. Ramanujam, "Address Register Assignment for Reducing Code Size," in Proc. 12th International Conference on Compiler Construction (CC 2003), Warsaw, Poland, Lecture Notes in Computer Science, Vol. 2622, pp. 273-289, Springer-Verlag, 2003. [pdf]

  45. D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam, "Compile-Time Optimizations for Tensor Contraction Expressions," in Proc. 10th Workshop on Compilers for Parallel Computers (CPC 2003), Leiden, The Netherlands, January 2003.

  46. D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam, M. Nooijen, D. Bernholdt, R. Harrison and R. Pitzer, "A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry," in Proceedings of Supercomputing 2002 (SC2002), November 2002. [pdf]

  47. D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam, M. Nooijen, D. Bernholdt, and R. Harrison, "Space-time trade-off optimization for a class of electronic structure calculations," in Proc. ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI), pp. 177-186, Berlin, Germany, June 2002. [pdf]

  48. D. Cociorva, G. Baumgartner, C. Lam, J. Ramanujam, and P. Sadayappan, "Compiler Support for Optimizing Tensor Contraction Expressions in Quantum Chemistry Computations," in Proc. Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL-02), New York, NY, June 2002. [pdf]

  49. M. Kandemir, J. Ramanujam, and A. Choudhary, "Exploiting shared scratch pad memory space in embedded multiprocessor systems," in Proc. 39th Design Automation Conference, pp. 219-224, New Orleans, LA, June 2002. [pdf]

  50. G. Baumgartner, D. Bernholdt, D. Cociorva, R. Harrison, C. Lam, M. Nooijen, J. Ramanujam, and P. Sadayappan, "A performance optimization framework for compilation of tensor contraction expressions into parallel programs," in Proc. 7th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 02), (part of IPDPS 2002) Ft. Lauderdale, FL, IEEE Computer Society Press, April 2002. [pdf]

  51. J. Ramanujam, S. Deshpande, J. Hong, and M. Kandemir, "A heuristic for clock selection in high-level synthesis," in Proc. ASP-DAC/VLSI Design 2002, pp. 414-419, Bangalore, India, January 2002. [pdf]

  52. J. Ramanujam, S. Krishnamoorthy, J. Hong, and M. Kandemir, "Address code and arithmetic optimizations for embedded systems," in Proc. ASP-DAC/VLSI Design 2002, pp. 619-624, Bangalore, India, January 2002. [pdf]

  53. N. Crosbie, M. Kandemir, I. Kolcu, J. Ramanujam, A. Choudhary, "Strategies for improving data locality in embedded applications," in Proc. ASP-DAC/VLSI Design 2002, pp. 631-636, Bangalore, India, January 2002. [pdf]

  54. D. Cociorva, J. Wilkins, G. Baumgartner, P. Sadayappan, J. Ramanujam, M. Nooijen, D. Bernholdt, and R. Harrison, "Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization," in Proc. of the Intl. Conf. on High Performance Computing, Lecture Notes in Computer Science, Vol. 2228, pp. 237-248, Springer-Verlag, 2001. [pdf]

  55. Sunil Atri, J. Ramanujam, and M. Kandemir, "Improving variable placement for embedded processors," in Languages and Compilers for Parallel Computing, (S. Midkiff et al.  Eds.), Lecture Notes in Computer Science, vol. 2017, pp. 158-172, Springer-Verlag, 2001. [pdf]

  56. M. Kandemir, J. Ramanujam, and U. Sezer, "Compiler support for block buffering," in Proc. ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'01), pp. 76-79, Huntington Beach, CA, August 2001.

  57. I. Kadayif, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, and J. Ramanujam, "Morphable cache architectures: potential benefits," in Proc. ACM SIGPLAN 2001 Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'2001), Snowbird, UT, June 2001. Also appears in ACM SIGPLAN Notices, vol. 36, no. 8, pp. 128-137, August 2001.

  58. J. Ramanujam, J. Hong, M. Kandemir, and A. Narayan, "Reducing memory requirements of nested loops for embedded systems," in Proc. 38th Design Automation Conference, pp. 359-364, Las Vegas, NV, June 2001.

  59. M. Kandemir, J. Ramanujam, M. Irwin, V. Narayanan, I. Kadayif, and A. Parikh, "Dynamic management of scratch-pad memory space," in Proc. 38th Design Automation Conference, pp. 690-695, Las Vegas, NV, June 2001.

  60. D. Cociorva, J. Wilkins, C.-C. Lam, G. Baumgartner, P. Sadayappan, and J. Ramanujam, "Loop optimization for a class of memory-constrained computations," in Proc. 15th ACM International Conference on Supercomputing (ICS'01), pp. 103-113, Sorrento, Italy, June 2001.

  61. J. Ramanujam, J. Hong, M. Kandemir, and S. Atri, "Address register-oriented optimizations for embedded processors," in Proc. 9th Workshop on Compilers for Parallel Computers (CPC 2001), pp. 281-290, Edinburgh, Scotland, June 2001.

  62. S. Atri, J. Ramanujam, and M. Kandemir, "Improving offset assignment on embedded processors using transformations," in Proc. High Performance Computing-HiPC 2000, pp. 367-374, December 2000. [pdf]

  63. M. Kandemir and J. Ramanujam, "Data relation vectors: A new abstraction for data optimizations," in Proc. International Conference on Parallel Architectures and Compilation Techniques (PACT 00), pp. 227-236, Philadelphia, PA, October 2000.

  64. M. Narasimhan and J. Ramanujam, "On lower bounds for scheduling problems in high-level synthesis," in Proc. 37th Design Automation Conference, pp. 546-551, Los Angeles, CA, June 2000.

  65. V. Jain, S. Rele, S. Pande, and J. Ramanujam, "Code restructuring for improving real-time response through code size, speed trade-offs on limited memory embedded DSPs," in Languages and Compilers for Parallel Computing, L. Carter and J. Ferrante (Eds.), Lecture Notes in Computer Science, vol. 1863, pp. 459-463, Springer-Verlag, 2000.

  66. M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "On reducing false sharing while improving locality on shared memory multiprocessors," in Proc. International Conference on Parallel Architectures and Compilation Techniques (PACT 99), pp. 203-211, Newport Beach, CA, October 1999.

  67. M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "A framework for interprocedural locality optimization using both loop and data layout transformations," in Proc. 1999 International Conference on Parallel Processing, pp. 95-102, Aizu, Japan, September 1999.

  68. M. Kandemir, A. Choudhary, and J. Ramanujam, "Compiler optimizations for I/O intensive computations," in Proc. 1999 International Conference on Parallel Processing, pp. 164-171, Aizu, Japan, September 1999.

  69. M. Kandemir, A. Choudhary, and J. Ramanujam, "I/O conscious tiling for disk-resident data sets," in Proc. Euro-Par'99, pp. 430-439, Toulouse, France, September 1999.

  70. M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and E. Ayguade, "An integer linear programming approach to optimizing cache locality," in Proc. 13th ACM International Conference on Supercomputing (ICS 99), pp. 500-509, Rhodes, Greece, June 1999.

  71. M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "A graph based framework to detect optimal memory layouts for improving data locality," in Proc. International Parallel Processing Symposium (IPPS/SPDP 1999), pp. 738-743, San Juan, Puerto Rico, April 1999.

  72. M. Kandemir, A. Choudhary, and J. Ramanujam, "Restructuring I/O-intensive computations for locality," in Proc. Workshop on High Performance Computation on Very Large Data Sets, part of HPCN Europe 99, pp. 1097-1106, Amsterdam, The Netherlands, April 1999.

  73. M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "Improving locality using a graph-based technique for detecting memory layouts of arrays," in Proc. 9th SIAM Conference on Parallel Processing for Scientific Computing, San Antonio, TX, March 1999 (proceedings only available in CD-ROM format).

  74. M. Kandemir, J. Ramanujam, A. Choudhary, and P. Banerjee, "An iteration space transformation algorithm based on explicit data layout representation for optimizing locality," in Languages and Compilers for Parallel Computing, S. Chatterjee et al., (Eds.), Lecture Notes in Computer Science, vol. 1656, pp. 34-50, Springer-Verlag, 1999.

  75. M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "Improving locality using loop and data transformations in an integrated framework," in Proc. 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-31), pp. 285-296, Dallas, TX, December 1998.

  76. J. Ramanujam, A. Venkatachar, and S. Dutta, "Efficient address sequence generation for two-level mappings in High Performance Fortran," in Proc. 1998 International Conference on High Performance Computing, pp. 132-139, Chennai, India, December 1998.

  77. M. Narasimhan and J. Ramanujam, "Improving the computational efficiency of ILP-based problems," in Proc. International Conference on Computer Aided Design (ICCAD 98), pp. 593-596, San Jose, CA, November 1998.

  78. M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "A matrix-based approach to the global locality optimization problem," in Proc. International Conference on Parallel Architectures and Compilation Techniques (PACT 98), pp. 306-315, Paris, France, October 1998.

  79. M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "Data and loop transformations for optimizing locality," in Proc. 3rd Workshop on Interaction Between Compilers and Computer Architectures (INTERACT-3),, co-located with ASPLOS'98, San Jose, CA, October 1998.

  80. M. Kandemir, A. Choudhary, J. Ramanujam, N. Shenoy, and P. Banerjee, "Enhancing spatial locality via data layout optimizations," in Proc. Euro-Par'98 Parallel Processing, pp. 422-434, Southampton, UK, September 1998.

  81. M. Kandemir, N. Shenoy, P. Banerjee, J. Ramanujam, and A. Choudhary, "Minimizing data and synchronization costs in one-way communication," in Proc. 1998 International Conference on Parallel Processing, pp. 180-188, Minneapolis, MN, August 1998.

  82. M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, "A hyperplane based approach for optimizing spatial locality in loop nests," in Proc. 12th ACM International Conference on Supercomputing (ICS'98), pp. 69-76, Melbourne, Australia, July 1998.

  83. J. Ramanujam, S. Dutta, A. Venkatachar, and A. Thirumalai. "Advanced compilation techniques for HPF," in Proc. 7th International Workshop on Compilers for Parallel Computers, P. Fritzson (Ed.), Linkoping, Sweden, pp. 57-68, June 1998.

  84. M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. "Optimizing spatial locality in loop nests using linear algebra," in Proc. 7th International Workshop on Compilers for Parallel Computers, P. Fritzson (Ed.), Linkoping, Sweden, pp. 195-206, June 1998.

  85. M. Kandemir, A. Choudhary, and J. Ramanujam, "Improving locality in out-of-core computations using data layout transformations," in Proc. 4th Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, pp. 359-366, Pittsburgh, PA, May 1988.

  86. M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and N. Shenoy, "A generalized framework for global communication optimization," in Proc. International Parallel Processing Symposium (IPPS/SPDP 1998), pp. 69-73, Orlando, FL, March-April 1998.

  87. J. Ramanujam, S. Dutta, and A. Venkatachar, "Code generation for complex subscripts in data-parallel programs," in Languages and Compilers for Parallel Computing, Z. Li et al., (Eds.), Lecture Notes in Computer Science, vol. 1366, pp. 49-63, Springer-Verlag, 1998.

  88. M. Kandemir, J. Ramanujam, and A. Choudhary, "Compiler algorithms for optimizing locality and parallelism on shared and distributed memory machines," in Proc. 1997 International Conference on Parallel Architectures and Compilation Techniques (PACT 97), pp. 236-247, San Francisco, CA, November 1997.

  89. M. Kandemir, A. Choudhary, J. Ramanujam and M. Kandaswamy, "A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations," in Proc. Workshop on I/O in Parallel and Distributed Systems (IOPADS'97), pp. 79-92, San Jose, CA, November 1997.

  90. M. Kandemir, J. Ramanujam, and A. Choudhary, "Optimizing out-of-core computations using chain vectors," in Proc. EuroPar-97 Parallel Processing, pp. 601-608, Passau, Germany, August 1997.

  91. M. Kandemir, J. Ramanujam, and A. Choudhary, "Improving the performance of out-of-core computations," in Proc. 1997 International Conference on Parallel Processing, pp. 128-136, Bloomingdale, IL, August 1997.

  92. M. Kandemir, J. Ramanujam, and A. Choudhary, "A compiler algorithm for optimizing locality in loop nests," in Proc. 11th ACM International Conference on Supercomputing, pp. 269-278, Vienna, Austria, July 1997.

  93. A. Venkatachar, J. Ramanujam, and A. Thirumalai, "Generalized overlap regions for communication optimization in data-parallel programs," in Languages and Compilers for Parallel Computing, D. Sehr et al., (Eds.), Lecture Notes in Computer Science, vol. 1239, pp. 404-419, Springer-Verlag, 1997.

  94. M. Kandemir, R. Bordawekar, A. Choudhary, and J. Ramanujam, "A unified tiling approach for out-of-core computations," in Proc. 6th Workshop on Compilers for Parallel Computers, M. Gerndt (Ed.), Aachen, Germany, pp. 323-334, December 1996.

  95. R. Bordawekar, A. Choudhary, and J. Ramanujam, "A framework for integrated communication and I/O placement," in Proc. Euro-Par'96 Parallel Processing, pp. 541-552, Lyon, France, August 1996.

  96. R. Bordawekar, A. Choudhary, and J. Ramanujam, "Automatic optimization of communication in compiling out-of-core stencil codes," in Proc. 10th ACM International Conference on Supercomputing, pp. 366-373, Philadelphia, PA, May 1996.

  97. A. Thirumalai and J. Ramanujam, "Fast address sequence generation for data-parallel programs using integer lattices," in Languages and Compilers for Parallel Computing, C. Huang et al., (Eds.), Lecture Notes in Computer Science, vol. 1033, pp. 191-208, Springer-Verlag, 1996.

  98. A. Thirumalai and J. Ramanujam, "An efficient compile-time approach to compute address sequences in data parallel programs," in Proc. 5th International Workshop on Compilers for Parallel Computers, Malaga, Spain, pp. 581-605, June 1995.

  99. J. Ramanujam and A. Narayan, "Automatic data mapping and program transformations," in Proc. Workshop on Automatic Data Layout and Performance Prediction, sponsored by the Center for Research on Parallel Computation, Rice University, Houston, TX, April 1995 (only informal proceedings distributed at the workshop).

  100. J. Ramanujam and S. Vasanthakumar, "Statement-level independent partitioning of uniform recurrences," in Proc. 9th International Parallel Processing Symposium, pp. 229-233, Santa Barbara, CA, April 1995.

  101. S. D. Kaushik, C.-H. Huang, J. Ramanujam, and P. Sadayappan, "Multi-phase array redistribution: modeling and evaluation," in Proc. 9th International Parallel Processing Symposium, pp. 441-445, Santa Barbara, CA, April 1995.

  102. J. Ramanujam and A. Narayan, "Integrating data distribution and loop transformations for distributed memory machines," in Proc. 7th SIAM Conference on Parallel Processing for Scientific Computing, D. Bailey et al., Eds., SIAM Press, pp. 668-673, San Francisco, CA, February 1995.

  103. J. Ramanujam and A. Mathew, "Analysis of event synchronization in parallel programs," in Languages and Compilers for Parallel Computing, K. Pingali et al., (Eds.), Lecture Notes in Computer Science, vol. 892, pp. 300-315, Springer-Verlag, 1995.

  104. J. Ramanujam, "Optimal software pipelining of nested loops," in Proc. 8th International Parallel Processing Symposium, pp. 335-342, Cancun, Mexico, April 1994.

  105. J. Ramanujam, "Non-unimodular transformations of nested loops," in Proc. Supercomputing 92, pp. 214-223, Minneapolis, MN, November 1992.

  106. J. Ramanujam, "A linear algebraic view of loop transformations and their interaction," in Proc. 5th SIAM Conference on Parallel Processing for Scientific Computing, D. Sorensen, Ed., SIAM Press, pp. 543-548, 1992.

  107. J. Ramanujam and P. Sadayappan, "Multidimensional iteration space tiling for nonshared memory machines," in Proc. Supercomputing 91, pp. 111-120, Albuquerque, NM, November 1991.

  108. J. Ramanujam and P. Sadayappan, "Access based data decomposition in distributed memory machines," in Proc. 6th Distributed Memory Computing Conference, pp. 196-199, Portland, OR, April 1991.

  109. J. Ramanujam and P. Sadayappan, "Tiling of iteration spaces for multicomputers," in Proc. 1990 International Conference on Parallel Processing, vol. II, pp. 179-186, St. Charles, IL, August 1990.

  110. J. Ramanujam and P. Sadayappan, "Nested loop tiling for distributed memory machines," in Proc. 5th Distributed Memory Computing Conference, pp. 1088-1096, Charleston, SC, April 1990.

  111. P. Sadayappan, F. Ercal and J. Ramanujam, "Distributed generation of pairwise combinations on a hypercube," in Parallel Computing 89, D. Evans, G. Joubert and F. Peters (Eds.), Amsterdam, The Netherlands: North-Holland, pp. 299-304, 1990.

  112. J. Ramanujam and P. Sadayappan, "A methodology for parallelizing programs for multicomputers and complex memory multiprocessors," in Proc. Supercomputing 89, pp. 637-646, Reno, NV, November 1989.

  113. F. Ercal, P. Sadayappan and J. Ramanujam, "Parallel graph partitioning on a hypercube," in Proc. 4th Hypercube Concurrent Computers and Applications Conference, vol. 1, pp. 67-70, Monterey, CA, March 1989.

  114. J. Ramanujam, F. Ercal and P. Sadayappan, "Task allocation by simulated annealing," in Proc. 3rd International Conference on Supercomputing, vol. 3, pp. 471-480, Boston, MA, May 1988.

  115. F. Ercal, J. Ramanujam and P. Sadayappan, "Task allocation onto a hypercube by recursive mincut bipartitioning," in Proc. 3rd Hypercube Concurrent Computers and Applications Conference, pp. 210-221, Pasadena, CA, January 1988.

  116. J. Ramanujam and P. Sadayappan, "Optimization using neural networks," in Proceedings of the 2nd IEEE International Conference on Neural Networks, vol. 2, pp. 325-332, San Diego, CA, July 1988.

  117. A. Goel, J. Ramanujam and P. Sadayappan, "Towards a neural architecture for abductive reasoning," in Proceedings of the 2nd IEEE International Conference on Neural Networks, vol.  1, pp. 681-688, San Diego, CA, July 1988.

  118. J. Ramanujam and P. Sadayappan, "Parameter identification for constrained optimization using neural networks," in Proc. Connectionist Models Summer School, pp. 154-161, Carnegie Mellon University, Pittsburgh, PA, June 1988, Morgan Kaufman, San Mateo, CA.

horizontal rule

Technical Reports

  1. A. Hartono, M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy, B. Norris, J. Ramanujam, and P. Sadayappan. PrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests. Technical Report OSU-CISRC-2/09-TR04. Department of Computer Science and Engineering, The Ohio State University, February 2009. [pdf]

  2. Q. Lu, U. Bondhugula, S. Krishnamoorthy, P. Sadayappan, J. Ramanujam, Y. Chen, H. Lin, and T.-F. Ngai. A Compile-Time Data Locality Optimization Framework for NUCA Chip Multiprocessors. Technical Report OSU-CISRC-6/08-TR29. Department of Computer Science and Engineering, The Ohio State University, June 2008. [pdf]

  3. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, "Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories. Technical Report OSU-CISRC-2/08-TR05. Department of Computer Science and Engineering, The Ohio State University, February 2008. [pdf]

  4. M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs. Technical Report OSU-CISRC-12/07-TR78. Department of Computer Science and Engineering, The Ohio State University, December 2007. [pdf]

  5. U. Bondhugula, J. Ramanujam, and P. Sadayappan. PLUTO: A Practical and Fully Automatic Polyhedral Program Optimization Systems. Technical Report OSU-CISRC-11/07-TR70, Department of Computer Science and Engineering, Ohio State University, November 2007. [pdf]

  6. U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Affine Transformations for Communication Minimal Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences. Technical Report OSU-CISRC-5/07-TR43, Department of Computer Science and Engineering, Ohio State University, May 2007. [pdf]

  7. A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner, D. Bernholdt, S. Hirata, C. Lam, R. Pitzer, J. Ramanujam and P. Sadayappan. Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations. Technical Report OSU-CISRC-2/05-TR10, Dept. of Comp. and Info. Sci., The Ohio State University, 2005. [pdf]

  8. X. Gao, S. Sahoo, Q. Lu, G. Baumgartner, C. Lam, J. Ramanujam, and P. Sadayappan. Compiler Techniques for Efficient Parallelization of Out-of-Core Tensor Contractions. Technical Report OSU-CISRC-12/04-TR67, Dept. of Comp. and Info. Sci., The Ohio State University, 2004. [pdf]

  9. D. Cociorva, J. Wilkins, G. Baumgartner, P. Sadayappan, J. Ramanujam, M. Nooijen, D. Bernholdt, and R. Harrison. Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations. Technical Report OSU-CISRC-11/01-TR24, Dept. of Comp. and Info. Sci., The Ohio State University, 2001. [pdf]

  10. M. Kandemir, J. Ramanujam, A. Choudhary, and P. Banerjee, "A locality optimization algorithm based on explicit representation of data layouts," Technical Report CSE-00-008, Department of Computer Science and Engineering, The Pennsylvania State University, May 2000.

  11. M. Kandemir, J. Ramanujam, and A. Choudhary, "A compiler algorithm for optimizing locality in loop nests," Technical Report CPDC-TR-9802-010, Center for Parallel and Distributed Computing, Northwestern University, February 1998.

  12. M. Kandemir, A. Choudhary, J. Ramanujam, N. Shenoy and P. Banerjee, "Enhancing spatial locality using data layout optimizations," Technical Report CPDC-TR-97-07, Center for Parallel and Distributed Computing, Northwestern University, December 1997.

  13. M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, "Experiments with data layouts," Technical Report CPDC-TR-97-06, Center for Parallel and Distributed Computing, Northwestern University, October 1997.

  14. M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, "A hyperplane based approach for optimizing spatial locality in loop nests," Technical Report CPDC-TR-97-04, Center for Parallel and Distributed Computing, Northwestern University, October 1997.

  15. M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam and N. Shenoy, "A combined communication and synchronization optimization algorithm for one-way communication," Technical Report CPDC-TR-97-03, Center for Parallel and Distributed Computing, Northwestern University, October 1997.

  16. M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam and N. Shenoy, "Optimizing communication using global dataflow analysis," Technical Report CPDC-TR-97-02, Center for Parallel and Distributed Computing, Northwestern University, October 1997.

  17. M. Kandemir, R. Bordawekar, A. Choudhary, and J. Ramanujam, "A unified tiling approach for out-of-core computations," Technical Report CACR-131, Center for Advanced Computing Research, California Institute of Technology, November 1996.

  18. R. Bordawekar, A. Choudhary, and J. Ramanujam, "A framework for integrated communication and I/O placement," Technical Report CACR-118, Center for Advanced Computing Research, California Institute of Technology, February 1996.

  19. R. Bordawekar, A. Choudhary, and J. Ramanujam, "Automatic optimization of communication in compiling out-of-core stencil codes," Technical Report CACR-114, Center for Advanced Computing Research, California Institute of Technology, November 1995.

  20. R. Bordawekar, A. Choudhary, and J. Ramanujam, "Compilation and communication strategies for out-of-core programs on distributed-memory machines," Technical Report CACR-113, Center for Advanced Computing Research, California Institute of Technology, November 1995.

  21. S. Kaushik, C. Huang, J. Ramanujam and P. Sadayappan, "Multiphase array redistribution: A communication efficient approach to array redistribution," Technical Report OSU-CISRC-9/94-TR52, The Ohio State University, September 1994.

horizontal rule

Other Articles

  1. S. Pande, J. Ramanujam, and Y. Robert, "Compiling for scalable parallel systems," Editorial Note, Parallel Processing Letters, Vol. 7, No. 4, 1997.

  2. M. Kandemir, A. Choudhary, J. Ramanujam and R. Bordawekar, "Optimizing out-of-core computations in uniprocessors," Newsletter of the Technical Committee on Computer Architecture (TCCA), IEEE Computer Society, Special Issue on Interaction Between Compilers and Computer Architecture, pp. 25-27, June 1997.

horizontal rule

Presentations at symposia (no associated publications)

  1. P. Sadayappan, Atanas Rountev, Robert Harrison, J. Ramanujam, Gerald Baumgartner, and Jarek Nieplocha, "Runtime Support for Multi-scale Applications on High-end Systems," talk, NSF Next Generation Software (NGS) 2007 Workshop, held in conjunction with IEEE International Parallel and Distributed Processing Symposium, Long Beach, California, USA, March 25-26 2007.

  2. P. Sadayappan, A. Auer, G. Baumgartner, D. Bernholdt, R. Harrison, S. Hirata, C. Lam, M. Nooijen, R. Pitzer, J. Ramanujam, A. Bibireata, X. Gao, S. Krishnamoorthy, S, Krishnan, Q. Lu, and A. Sibiryakov, "Performance optimization issues in automatic synthesis of high-performance codes for correlated electronic structure methods," talk, 228th ACS National Meeting, Philadelphia, PA, August 2004.

  3. P.  Sadayappan, A.  Auer, G.  Baumgartner, D.  Bernholdt, A.  Bibireata, V.  Choppella, D.  Cociorva, X.  Gao, R.  Harrison, S.  Hirata, S.  Krishnamoorthy, S.  Krishnan, C.  Lam, Q. Lu, M.  Nooijen, R.  Pitzer, J. Ramanujam, and A.  Sibiryakov, "A High-Level Approach to the Synthesis of High-Performance Codes for Quantum Chemistry," poster, 44th Sanibel Symposium, University of Florida Quantum Theory Project, Sanibel, FL, March 2004.

  4. D.  Bernholdt, A.  Auer, G.  Baumgartner, A.  Bibireata, V.  Choppella, D.  Cociorva, X.  Gao, R.  Harrison, S.  Hirata, S.  Krishnamoorthy, S.  Krishnan, C.  Lam, Q. Lu, M.  Nooijen, R.  Pitzer, J. Ramanujam, P. Sadayappan, A.  Sibiryakov, and J.  White, "A High-Level Approach to the Synthesis of High-Performance Codes for Quantum Chemistry," poster, Los Alamos Computer Science Institute Symposium (LACSI), Los Alamos, NM, October 2003.

  5. D.  Bernholdt, A.  Auer, G.  Baumgartner, A.  Bibireata, V.  Choppella, D.  Cociorva, X.  Gao, R.  Harrison, S.  Hirata, S.  Krishnamoorthy, S.  Krishnan, C.  Lam, Q. Lu, M.  Nooijen, R.  Pitzer, J. Ramanujam, P. Sadayappan, and A.  Sibiryakov, "Synthesizing Highly Optimized Code for Correlated Electronic Structure Calculations," talk, 226th ACS National Meeting, New York, NY, September 2003.

  6. P.  Sadayappan, A.  Auer, G.  Baumgartner, D.  Bernholdt, A.  Bibireata, V.  Choppella, D.  Cociorva, X.  Gao, R.  Harrison, S.  Hirata, S.  Krishnamoorthy, S.  Krishnan, C.  Lam, Q. Lu, M.  Nooijen, R.  Pitzer, J. Ramanujam, and A.  Sibiryakov, "Automatic Synthesis of High-Performance Parallel Programs for Electronic Structure Methods," poster, 226th ACS National Meeting, New York, NY, September 2003.

  7. G.  Baumgartner, D.  Cociorva, C.  Lam, P.  Sadayappan, R.  Pitzer, A.  Bibireata, X.  Gao, Q. Lu, S.  Krishnamoorthy, S.  Krishnan, A.  Sibiryakov, D.  Bernholdt, R.  Harrison, V.  Choppella, S.  Hirata, M.  Nooijen, A.  Auer, and J. Ramanujam, "Synthesis of High-Performance Algorithms for Electronic Structure Calculations," poster, 225th ACS National Meeting, New Orleans, LA, March 2003.

  8. M.  Nooijen, A.  Auer, D.  Bernholdt, V.  Choppella, D.  Dean, R.  Harrison, T.  Papenbrock, M.  Strayer, T.  White, S.  Hirata, G.  Baumgartner, D.  Cociorva, P.  Sadayappan, R.  Pitzer, A.  Bibireata, X.  Gao, Q. Lu, S.  Krishnamoorthy, S.  Krishnan, A.  Sibiryakov, and J. Ramanujam, "Computer-Aided Implementation of Many Body Methods: The Tensor Contraction Engine," talk, 225th ACS National Meeting, New Orleans, LA, March 2003.

  9. G.  Baumgartner, D.  Bernholdt, D.  Cociorva, R.  Harrison, S.  Hirata, C.  Lam, M.  Nooijen, R.  Pitzer, J. Ramanujam, P.  Sadayappan, and V.  Choppella, "A High-Level Approach to the Synthesis of High-Performance Codes for Quantum Chemistry," poster, 43rd Sanibel Symposium, University of Florida Quantum Theory Project, Sanibel, FL, February 2003.

  10. D.  Bernholdt, V.  Choppella, D.  Dean, R.  Harrison, T.  Papenbrock, M.  Strayer, T.  White, S.  Hirata, G.  Baumgartner, D.  Cociorva, R.  Pitzer, P. Sadayappan, J. Ramanujam, M.  Nooijen, and A.  Auer, "A High-Level Approach to the Synthesis of High-Performance Codes for Quantum Chemistry," invited talk, University of Tennessee Chemical Physics Workshop, Knoxville, Tennessee, February 2003.

  11. D.  Bernholdt, V.  Choppella, D.  Dean, R.  Harrison, T.  Papenbrock, M.  Strayer, T.  White, S.  Hirata, G.  Baumgartner, D.  Cociorva, R.  Pitzer, P. Sadayappan, J. Ramanujam, M.  Nooijen, and A.  Auer, "A High-Level Approach to the Synthesis of High-Performance Codes for Quantum Chemistry," talk, SIAM Computational Science and Engineering '03, San Diego, California, February 2003.

  12. S.  Hirata, G.  Baumgartner, D.  Bernholdt, D.  Cociorva, R.  Harrison, C.  Lam, M.  Nooijen, J. Ramanujam, and P.  Sadayappan, "Operatot and Tensor Contraction Engines -- Computer-Aided Synthesis of Coupled-Cluster Programs of any given Excitation Order," talk, American Conference of Theoretical Chemistry, Pittsburgh, PA, July 2002.

  13. G.  Baumgartner, D.  Bernholdt, D.  Cociorva, R.  Harrison, S.  Hirata, C.  Lam, M.  Nooijen, J. Ramanujam, and P.  Sadayappan, "Compilation of a High-Level Quantum Chemistry Language into Efficient Parallel Code," talk, Spring 2002 Workshop of the Midwest Society for Programming Languages and Systems, Indiana University, Bloomington, IN, April 2002.

horizontal rule

Last updated: