Workshop on Compilers and Operating Systems for Low Power (COLP'03)

In conjunction with
PACT'03: International Conference on Parallel Architectures and Compilation Techniques
New Orleans, Louisiana, USA.
September 27, 2003


Accepted Papers

Evaluating the Potential of an Energy Reduction Technique Based on Timing Constraint Speculation

Asami Tanino and Toshinori Sato

Keywords: timing constraints, speculative execution, fault tolerance


Recently, we proposed a technique for improving energy efficiency named Constructive Timing Violation (CVT) [17]. In this paper, in order to evaluate the practicality of the CVT, we design Carry Select Adder (CSLA) using Verilog HDL, and investigate the distribution of timing failures depending on timing constraints. It is found that the CVT improves energy efficiency by over 40% while the fault probability is 50% when the clock frequency is 2 times faster than that satisfies timing constraints due to critical paths.

Energy-Constrained Performance Optimizations For Real-Time Operating Systems

Tarek A. AlEnawy and Hakan Aydin

Keywords: Real-time scheduling, real-time operating systems, energy management, power-aware scheduling, power-aware systems.


In energy-constrained settings, most real-time operating systems take the approach of minimizing the energy consumption while meeting all the task deadlines. However, it is possible that the available energy budget is not sufficient to meet all deadlines and some deadlines will inevitably have to be missed. In this paper, we present a framework through which the operating system can select jobs for execution in order to achieve two alternative performance objectives: 1) maximizing the number of deadlines met , and 2) maximizing the total reward (utility) of jobs that meet their deadlines during the operation. We present an optimal algorithm that achieves the first objective. We prove that achieving the latter objective is NP-Hard and propose some fast heuristics for this problem. We evaluate the performance of the heuristics through simulation studies.

Event-Driven Energy Accounting of Dynamic Thermal Management

Frank Bellosa, Simon Kellner, Martin Waitz, and Andreas Weissel

Keywords: Operating System, Power Management, Energy Estimation, Thermal Management, Scheduling


With increasing clock speed and level of integration in today's processors, memories, and I/O-controllers, power dissipation is becoming a definitive concern of system design. Control-theoretic techniques have proven to manage the heat dissipation and temperature starting from the level of functional blocks within the processor up to the level of complete systems, so that a thermal emergency will never be reached. However application-, user- or service-specific requirements had to be neglected. In this work we investigate dynamic thermal management with respect to the demands of individual applications, users or services. We present an event-driven approach to determine on-the-fly the energy consumption on a fine grained level and describe a model to estimate the temperature without the need for measurement. With this power and thermal model--combined with the well-known facility of resource containers--it is possible to throttle the execution of individual tasks according to their energy-specific characteristics and the thermal requirements of the system. In addition to throttling we investigate a modified process scheduler which allots CPU time according to the power contribution of each task to the current temperature level of the processor. Experiments using a Pentium 4 architecture running a modified Linux show that a given temperature limit for the CPU will not be exceeded while tasks are scheduled according to their energy consumption.

Parametric Intra-Task Dynamic Voltage Scheduling

Burt Walsh, Robert van Engelen, Kyle Gallivan, Johnnie Birch, and Yixin Shou

Keywords: DVS, WCET, parametric, loops, IntraVS


This paper presents a parametric intra-task dynamic voltage scheduling (IntraVS) method that scales voltage/ frequency based upon the parameterization of the remaining worst case exection cycles (RWEC) of a task. The parametric RWEC of the task is determined by static analysis of code. The parameterization of the RWEC of loops with symbolic bounds, whose iteration space sizes are runtime dependent, allows the strategic placement of voltage scaling operations early on the execution path before the loop actually executes. This allows for greater energy reduction than with methods that scale voltage after loops are executed.

Performance and Energy Optimization of Multimedia Applications using DMA Combined with Prefetch

M. Dasygenis, E. Brockmeyer, D. Soudris, F. Catthoor, A. Thanailakis, and G. Papakostas

Keywords: DMA, software prefetching, pipeline, embedded, memory access latency


Memory access latency is becoming an increasing performance problem in modern embedded systems. While throughput can be addressed by simply widening data paths and by using several memory banks (at the cost of area and energy), no such simple solution strategy is available for reducing latency. To counter this bottleneck, numerous techniques, methodologies or performance enhancements, involving hardware and/or software means, have been proposed to hide memory access latency. This paper presents a novel systematic approach to hide this latency using the Direct Memory Access (DMA) mode, which is present in all modern memories, combined with a software prefetch mechanism. It is shown that in applications that make use of large block transfers, the off-chip memory accesses can be hidden from the processor resulting in efficient designs in terms of performance (while still reducing energy for the same task execution). Experimental results on six well known multimedia and imaging applications that were measured using the TI C6201 Device Simulator, illustrate that the memory latency can be masked efficiently, improving the performance by more than 45%.

Fast System-Level Prototyping of Power-Aware Dynamic Memory Managers for Embedded Systems

David Atienza, Stylianos Mamagkakis, Marc Leeman, Francky Catthoor, Jose M. Mendias, Dimitrios Soudris, Geert Deconinck

Keywords: dynamic memory management, code-level power consumption profiling


The rapid evolution in sub-micron process technology allows presently more complex systems to be implemented in embedded devices. In the near future, portable consumer devices must run multimedia and wireless network applications that require an enormous computational performance (1-40GOPS) at a low energy consumption (0.1-2W). In these multimedia and wireless network applications, the dynamic memory subsystem is currently one of the main sources of power consumption and its inattentive management can severely affect the performance and power consumption of the whole system. Within this context, the construction and system-level power evaluation of custom dynamic memory managers is one of the most important and difficult parts for an efficient mapping of such dynamic applications on low-power and high-speed embedded systems. Moreover, they are subjected to design-time constraints due to market competition. As a result, current design technologies fall behind these requirements and consistent high-level design methodologies able to handle such complexity and enabling a short time-to-market are in great need. In this paper, we present a new system-level approach to model complex custom dynamic memory managers integrating a detailed power profiling method. This approach enables the developer to obtain power consumption estimates, memory usage and memory access values to refine the dynamic memory management of the system in a very early stage of the design flow and to explore the large search space of dynamic memory manager implementations without a time-consuming programming effort.

Background Data Format Optimization for Efficient Sub-Word Parallel Program Generation

P. Op de Beeck, M. Miranda, F. Catthoor and G. Deconinck

Keywords: subword parallelism, simd, low power, data layout, memory


This paper illustrates the strong interaction between background data format organization and foreground data in the context of speed and power efficient Sub-Word level Parallel (SWP) program generation. Such interaction, if not considered well, results in an excess of (un)packing operations that are typically required to match the format of data stored in memory and the one required by the sub-word parallel processing units. To avoid this overhead a careful selection of the data format for background storage is needed. We propose a conceptual methodology for efficiently exploiting such degree of freedom. The approach is demonstrated on two real-life application kernels, where source code level data format transformations are applied to the data in background memory such that a substantially better exploitation of the sub-word level functional units can be achieved by the compiler. A significant reduction in packing instruction overhead, data (up to a factor 4) and instruction memory accesses (up to 50%) is obtained. This results in a substantial reduction of power consumed by the register file (less (un)packing operations), the data bus (less transactions), the data memory hierarchy (less accesses) and the instruction memory hierarchy (less accesses and more compact code). Also the execution time for the optimized application running on a TriMedia TM1300 evaluation board shows substantial gains.

Self-Configuring Heterogeneous Server Clusters

Taliver Heath, Bruno Diniz, Enrique V. Carrera, Wagner Meira Jr., and Ricardo Bianchini

Keywords: energy conservation, heterogeneous clusters, server clusters


Previous research on cluster-based servers has focused on homogeneous systems. However, real-life clusters are almost invariably heterogeneous in terms of the performance, capacity, and power consumption of their hardware components. In this paper, we describe a self-configuring Web server for a heterogeneous cluster. The self-configuration is guided by analytical models of throughput and power consumption. Our experimental results for a cluster comprised of traditional and blade nodes show that the modelbased server can consume 29% less energy than an energyoblivious server, with only a negligible loss in throughput. The results also show that our server conserves more than twice as much energy as an energy-conscious server that we previously proposed for homogeneous clusters.

Dual Mode Algorithm for Energy Aware Fixed Priority Scheduling with Task Synchronization

Ravindra Jejurikar, Rajesh Gupta

Keywords: low power, synchronization, rate monotonic scheduling, frequency inheritance, slowdown factor.


Slowdown factors determine the extent of slowdown a computing system can experience based on functional and performance requirements. Dynamic Voltage Scaling (DVS) of a processor based on slowdown factors can lead to considerable energy savings. We address the problem of computing static and dynamic slowdown factors in the presence of task synchronization. In this work, tasks are allowed to have different slowdown factors based on the task characteristics. We present the Dual Mode (DM) algorithm for the Rate Monotonic (RM) scheduling policy. We introduce the notion of frequency inheritance which is required to guarantee the task deadlines. Sufficient constraints have been identified for the feasibility of the task set under synchronization. We formulate this problem of computing the static slowdown factors for tasks as an optimization problem to minimize the total energy consumption of the system. Our simulation experiments show on an average 10% energy gains over the known slowdown techniques.

Workshop Co-Chairs

Diana Marculescu, Carnegie Mellon University  
J. Ramanujam, Louisiana State University