The MERIT Summer Research Program
 
 
 
 
 
 
 
 
 
  The A. James Clark School of Engineering
  University of Maryland Home

ICE '00 Project Descriptions

ICE Project Descriptions: Summer 2001

1. Automated Synthesis of Embedded Multiprocessors
Prof. Shuvra Bhattacharyya

Unlike general-purpose multiprocessors, multiprocessor systems for embedded applications (such as cellular phones, videoconferencing systems, radar devices, etc.) can be streamlined to support a specific set of high-level functions. Furthermore, issues of backward compatibility and fast compilation times are not of major concern because these systems are rarely, if ever, modified after production. This dramatically increases the design space that can be considered when implementing embedded computing systems.

Due to the vast and complex nature of the design space for an embedded application, the development of automated tools for system-level synthesis is of increasing importance. Such a tool takes as input a high-level language specification of an embedded application; a library of hardware components (such as different types of microprocessors, application-specific integrated circuits, memories, and buses); and a set of optimization constraints and objectives (for example, "find an implementation that minimizes the overall rate"). Given these inputs, the tool attempts to derive an efficient hardware architecture (a collection of processing components and their interconnections), and a mapping of the specified application onto this architecture.

This project will involve the design, implementation, and evaluation of algorithms for system-level synthesis. Students will gain experience designing and implementing complex software that involves the following concepts:

  • Technology- and implementation-dependent modeling of application functionality using fundamentals of graph theory;
  • Deterministic heuristics vs. stochastic optimization techniques, such as genetic algorithms and;
  • Simulated annealing for exploring complex design spaces.

Additionally, we will work with the interaction of these two optimization methodologies:

  • Multi-objective, Pareto optimization. In a design space involving implementation metrics (such as throughput, power consumption, and cost), a Pareto point is a design that is not superseded in quality by another design in all dimensions. Modeling and simulation of embedded multiprocessor architectures.
  • Hardware/software co-design. Students will gain practical research experience that is related to the fields of electronic design automation (representative companies: Cadence, and Synopsys); compiler technology (Hewlett Packard, Microsoft); and digital signal processing (Texas Instruments, Motorola).

2. Multi-Threaded Processor Architectures
Prof. Manoj Franklin

The primary objective of the project is to investigate techniques to carry out different aspects of multi-threading so as to exploit instruction-level and thread-level parallelism from ordinary programs. The techniques to be investigated could be software (compiler-based) or hardware, depending on the student's interest.  In either case, the project would involve working with existing software tools and simulators, enhancing them, and conducting simulation experiments with the modified simulators.  This project should give undergraduate students not only a good experience of how research is done in the computer architecture field, but also a good idea of what the latest research in this area is.

3. Dynamic Memory-Management in Embedded Real-Time Systems
Prof. Bruce Jacob

Memory management has recently made the transition from general-purpose systems to embedded systems, in part to facilitate the rapid development of embedded applications.

It is playing an increasingly significant role in embedded systems as more designers take advantage of low-overhead embedded operating systems that provide virtual memory (for example, Windows CE or Inferno), and as more designers choose object-oriented software platforms in which run-time garbage collection is pervasive (for example, Sun's Java Virtual Machine or Hewlett-Packard's HP runtime environment). However, the MMUs in today's embedded processors are virtually identical to those in high-performance processors, despite the fact that embedded systems have significantly different goals compared to high-performance systems. Most embedded processors either have a full MMU or none at all; Windows CE compliance requires a full MMU. A few exceptions exist, such as the rudimentary MMU of the ARM740T and '940T (its MMU is a simple protection unit, not a full address-translation unit: it supports some but not all of the features of virtual memory); the design is worth exploring but cannot be used to support Windows CE.

This project explores the design space for embedded-system memory management and characterizes the issues on both the hardware and software sides of the interface (Jacob & Mudge, 1997; Jacob & Mudge, 1998). We are also developing a combined hardware-software approach to real-time memory management that achieves the following goals: (1) the performance of the memory-management software is deterministic and lends itself to simple timing analysis; (2) the memory-management code is extremely small; and (3) the memory-management hardware is smaller and less power hungry than in present designs (MMUs often use structures that are relatively large and consume lots of power).

4. Embedded DRAM Organizations
Prof. Bruce Jacob

The growing gap between memory access time and processor speed in recent years has led processor architects, DRAM architects, and memory-system designers to rely heavily on high-performance mechanisms such as lockup-free caches, out-of-order execution, hardware and software prefetching mechanisms, and multi-threading. These               mechanisms are quite effective at reducing, hiding, or tolerating large memory latencies; however, they do so at the expense of exacerbating the memory bandwidth problem Burger, 1996).

One trend that is helping to solve the bandwidth problem is the development of new DRAM architectures, such as Synchronous DRAM, Enhanced Synchronous DRAM, Synchronous Link, Virtual Channel, and Rambus. All of these architectures are improvements over the traditional DRAM architecture; our studies show that the newest members of the set reduce bandwidth overhead by a factor of four compared to the                oldest members (Cuppu, 1999). Another trend that can help solve the bandwidth problem is the use of embedded DRAM-processor organizations that incorporate the DRAM array onto the same die as the processor core (Kozyrakis 1997; Sase, 1997; Nunomura,1997). This provides several benefits, including a wider memory bus, a faster memory bus, and drastically reduced energy consumption (Fromm, 1997) due largely to the reduced number of off-chip memory requests.

This project investigates future issues in memory-system design, processor organization, and execution models. To date, we have performed a thorough performance evaluation of DRAM architectures (Cuppu, 1999) and are currently investigating their real-time behavior. The embedded-DRAM organization, much as its sibling system-on-a-chip, is well positioned to serve as a foundation for a host of microprocessor-based execution models that can exploit tremendous memory bandwidth. Likely models include vector processing, single-chip parallel processing, and DSP.

5. Memory System Support for Pointer-Based Applications
Prof. Donald Yeung

The growing gap between memory access time and processor speed in recent years has led processor architects, DRAM architects, and memory-system designers to rely heavily on high-performance mechanisms such as lockup-free caches, out-of-order execution, hardware and software prefetching mechanisms, and multi-threading. These mechanisms are quite effective at reducing, hiding, or tolerating large memory latencies; however, they do so at the expense of exacerbating the memory bandwidth problem (Burger, 1996).

One trend that is helping to solve the bandwidth problem is the development of new DRAM architectures, such as Synchronous DRAM, Enhanced Synchronous DRAM, Synchronous Link, Virtual Channel, and Rambus. All of these architectures are improvements over the traditional DRAM architecture; our studies show that the newest members of the set reduce bandwidth overhead by a factor of four compared to the oldest members (Cuppu, 1999). Another trend that can help solve the bandwidth problem is the use of embedded DRAM-processor organizations that incorporate the DRAM array onto the same die as the processor core (Kozyrakis 1997; Sase, 1997; Nunomura, 1997). This provides several benefits, including a wider memory bus, a faster memory bus, and drastically reduced energy consumption (Fromm, 1997) due largely to the reduced number of off-chip memory requests.

This project investigates future issues in memory-system design, processor organization, and execution models. To date, we have performed a thorough performance evaluation of DRAM architectures (Cuppu, 1999) and are currently investigating their real-time behavior. The embedded-DRAM organization, much as its sibling system-on-a-chip, is well positioned to serve as a foundation for a host of microprocessor-based execution models that can exploit tremendous memory bandwidth. Likely models include vector processing, single-chip parallel processing, and DSP. We have been investigating the appropriateness of DSP.

6. Techniques for Minimizing Code Size in Compilation Targeting Embedded Systems
Prof. Rajeev Barua

Embedded systems refer to the class of application-specific computer systems that are used as controllers and monitors in a variety of consumer and business applications. Such embedded systems are ubiquitous today in household appliances, consumer electronics,

communication systems, remote sensing and vehicle control. While many similarities exist between general-purpose computer systems and embedded systems, many of the design criteria differ. For embedded systems, low cost, low power, and small code size often are critically more important than performance at any cost.

An interesting project in this space is compilation of high-level code targeting embedded systems, with the objective of minimizing code size. A low code size is desirable when the entire machine code program is stored on-chip, contributing to low silicon area and power dissipation. Of course, the compiler must simultaneously optimize for the best performance possible at that code size. Note that code size is increased by several compiler transformations commonly employed to improve performance, such as loop unrolling and procedure inlining.

Given a certain code size budget, an interesting question is where to employ these code-size increasing transformations most profitably, such that the code size is within budget. Such research would likely profit from profiling information coupled with intelligent heuristics. The work will involve implementing the compiler algorithms and simulating the results. Evaluation of results would compare performance with both un-optimized code as well as code optimized regardless of code size. Time permitting, a comparison with hand-optimized code will also be done.

7. Protection Techniques for Intellectual Property Based System Design
Prof. Gang Qu

The advances in VLSI semiconductor technology and system-on-a-chip design paradigm, coupled with the shrinking time-to-market window, have changed the traditional system design methodology. Design reuse and intellectual property (IP) based design become more and more important. The key challenge nowadays for system designers is to find IPs and make necessary modification, as little as possible, to meet customer's requirements in a timely fashion. Leading by the Virtual Socket Interface Alliance (VSIA), system houses, semiconductor vendors, electronic design automation companies, and IP providers are working hard towards standards for IP reuse. This opens a new business model for system-on-chip industry. However, the potential of IP infringement is growing fast. The American Society for Industrial Secrets estimates that in the US alone, trade secret theft is in excess of $2 billion per month.

There have been various proposals for intellectual property protection, highlighted with the newly released IP protection white paper by VSIA. In this project, students will see the big picture of IP protection in the content of system design. In particular, they will learn the state-of-the-art IP protection techniques, such as digital signature, watermarking, and fingerprinting. More important, undergraduate students will gain the experience in implementing such techniques to embed information into a couple of real life system design problems.

8. Low Power Embedded System Design via Voltage Scaling
Prof. Gang Qu

Modern embedded systems require high mobility, low power consumption, and high performance among others to support applications with heavy workloads such as communication and digital signal processing applications. The dominant source of system's power dissipation is the dynamic power, which happens when we switch the signal from low to high or vice versa. Early simulation shows that voltage scaling is one of the most powerful tools to reduce system's power consumption. In this technique, power is saved by using a lower supply voltage when the system is not at peak performance. However, the trade-off is that the system's speed goes down with the lower voltage. The problem is how to build a voltage profile to minimize the system's power consumption without violating application's timing constraints. Undergraduate students will learn (1) how multiple voltages are integrated on a system, (2) how to do task scheduling on multiple voltage processor, and (3) how to develop on-line algorithms for real-time applications.

9. Satisfiability Problem and its Application in VLSI CAD
Prof. Gang Qu

In the (boolean) satisfiability problem, we are given a formula on a set of (boolean) variables, and we are asked to assign each variable either 0 or 1 to make the formula true. The formula consists of variables and three types of basic operations: (i) '+': x+y is true if at least one of the variables x and y gets a value '1'; (ii) '*': x*y is true if and only if both x and y get value '1'; (iii) ''': x' is true if and only if x gets a value '0'. For example, any of the following assignment will make formula x'+y*z true: {x=0, y=0, z=1}, {x=1, y=1, z=1}, {x=0, y=1, z=1}.

The satisfiability problem has numerous applications in computer science, complexity theory, and very large scale integrated(VLSI) circuits computer aided design(CAD). The problem is hard and many heuristics have been proposed trying to solve it. Because these problem solvers come from very different fields and target very different type of formulas, it is difficult to compare their performance. Our goals in this project include: (1) understanding the problem and basic ideas of different solvers, (2) building testbeds for different solvers, (3) developing new algorithms to solve the problem, and (4) improving C/C++/JAVA programing skills.