The MERIT Summer Research Program
 
 
 
 
 
 
 
 
 
  The A. James Clark School of Engineering
  University of Maryland Home

ICE '02 Project Descriptions

ICE Project Descriptions: Summer 2002

1. System Synthesis of Embedded Multiprocessors
Prof. Shuvra Bhattacharyya

Unlike general-purpose multiprocessors, multiprocessor systems for embedded applications (such as cellular phones, videoconferencing systems, radar devices, etc.) can be streamlined to support a specific set of high-level functions. Furthermore, issues of backward compatibility and fast compilation times are not of major concern because these systems are rarely, if ever, modified after production. This dramatically increases the design space that can be considered when implementing embedded computing systems.

Due to the vast and complex nature of the design space for an embedded application, the development of automated tools for system-level synthesis is of increasing importance. Such a tool takes as input a high-level language specification of an embedded application; a library of hardware components (such as different types of microprocessors, application-specific integrated circuits, memories, and buses); and a set of optimization constraints and objectives (for example, "find an implementation that minimizes the overall rate"). Given these inputs, the tool attempts to derive an efficient hardware architecture (a collection of processing components and their interconnections), and a mapping of the specified application onto this architecture.

This project will involve the design, implementation, and evaluation of algorithms for system-level synthesis. Students will gain experience designing and implementing complex software that involves the following concepts:

Technology- and implementation-dependent modeling of application functionality using fundamentals of graph theory; Deterministic heuristics vs. stochastic optimization techniques, such as genetic algorithms and; Simulated annealing for exploring complex design spaces.

Additionally, we will work with the interaction of these two optimization methodologies:

Multi-objective, Pareto optimization. In a design space involving implementation metrics (such as throughput, power consumption, and cost), a Pareto point is a design that is not superseded in quality by another design in all dimensions. Modeling and simulation of embedded multiprocessor architectures.

Hardware/software co-design. Students will gain practical research experience that is related to the fields of electronic design automation (representative companies: Cadence, and Synopsys); compiler technology (Hewlett Packard, Microsoft); and digital signal processing (Texas Instruments, Motorola).

2. Multi-Threaded Processor Architectures
Prof. Manoj Franklin

The primary objective of the project is to investigate techniques to carry out different aspects of multi-threading so as to exploit instruction-level and thread-level parallelism from ordinary programs. The techniques to be investigated could be software (compiler-based) or hardware, depending on the student's interest. In either case, the project would involve working with existing software tools and simulators, enhancing them, and conducting simulation experiments with the modified simulators. This project should give undergraduate students not only a good experience of how research is done in the computer architecture field, but also a good idea of what the latest research in this area is.

3. Embedded DRAM Organizations
Prof. Bruce Jacob

The growing gap between memory access time and processor speed in recent years has led processor architects, DRAM architects, and memory-system designers to rely heavily on high-performance mechanisms such as lockup-free caches, out-of-order execution, hardware and software prefetching mechanisms, and multi-threading. These mechanisms are quite effective at reducing, hiding, or tolerating large memory latencies; however, they do so at the expense of exacerbating the memory bandwidth problem Burger, 1996).

One trend that is helping to solve the bandwidth problem is the development of new DRAM architectures, such as Synchronous DRAM, Enhanced Synchronous DRAM, Synchronous Link, Virtual Channel, and Rambus. All of these architectures are improvements over the traditional DRAM architecture; our studies show that the newest members of the set reduce bandwidth overhead by a factor of four compared to the oldest members (Cuppu, 1999). Another trend that can help solve the bandwidth problem is the use of embedded DRAM-processor organizations that incorporate the DRAM array onto the same die as the processor core (Kozyrakis 1997; Sase, 1997; Nunomura,1997). This provides several benefits, including a wider memory bus, a faster memory bus, and drastically reduced energy consumption (Fromm, 1997) due largely to the reduced number of off-chip memory requests.

This project investigates future issues in memory-system design, processor organization, and execution models. To date, we have performed a thorough performance evaluation of DRAM architectures (Cuppu, 1999) and are currently investigating their real-time behavior. The embedded-DRAM organization, much as its sibling system-on-a-chip, is well positioned to serve as a foundation for a host of microprocessor-based execution models that can exploit tremendous memory bandwidth. Likely models include vector processing, single-chip parallel processing, and DSP.

4. Memory Management in Embedded Systems
Prof. Bruce Jacob

Architectures for High-Performance, Low-Power Embedded Systems
We have conceptualized a hardware/software co-designed processor architecture and real-time operating system (RTOS) framework that together eliminate most high-overhead operating system functions in an embedded system, thus maximizing the performance and predictability of real-time applications. This long-term project is targeted to design and build a simulator, a prototype processor, and an experimental RTOS to demonstrate these claims.

We are building a complete embedded-system simulator that simulates both the embedded microcontroller and the RTOS. This will enable us to gather a large amount of information on the behavior of real-time systems and will allow us to measure the effect of changes to the system architecture that require modifications to both hardware and software. Current measurement techniques do not allow such flexibility; software simulators that execute applications directly on an emulated processor neglect operating system activity, and systems that attach logic probes to real hardware obtain accurate measurements but do not allow modifications to the processor architecture.

Over the past two summers, MERIT students, along with our graduate students have developed and refined SimBed, an execution-driven simulation testbed that measures the execution behavior and power consumption of embedded applications and RTOSs by executing them on an accurate architectural model of a microcontroller with simulated real-time stimuli. We are currently working with three RTOSs: uC/OS-II, a popular public-domain embedded real-time operating system; Echidna, a sophisticated, industrial-strength (commercial) RTOS; and NOS, a bare-bones multi-rate task scheduler reminiscent of typical "roll-your-own" RTOSs found in many commercial embedded systems. We are currently modeling three different microprocessors: the Motorola M-CORE microcontroller, a low-power, 32-bit CPU core with 16-bit instructions; the Texas Instruments 'C6000 DSP, a high-performance VLIW digital signal processor; and the Digital/Intel StrongARM, one of the most popular low-power/high-performance microprocessors available today.

Tasks within this project include modeling hardware architectures in C and/or Verilog, building new hardware constructs for real-time processing, developing enhancements for real-time operating systems, and developing real-time embedded applications.

5. Memory System Support for Unstructured Applications
Prof. Donald Yeung

The performance of commercial microprocessors continues to improve at a staggering pace. However, our ability to feed these processors with data fast enough to keep them constantly busy is falling far behind. In the time it takes industry to double the performance of processors, memory system performance improves by only 7%. Consequently, application performance becomes increasingly limited by the memory system.

At the same time, application trends are causing a shift from scientific codes to non-numeric applications as the dominant workload for future high-performance computers. Non-numeric workloads span a wide range of applications, including databases, search engines, media and signal processing, 3D virtual environments, and internet applications written in C++ or Java. These emerging workloads are memory intensive, thus exposing the memory bottleneck caused by technology trends. But more importantly, non-numeric applications perform unstructured computations that create a myriad of memory system problems for which there are currently very few effective solutions.

The goal of this project is to develop architectural techniques to increase the memory performance of unstructured applications. Currently, several techniques are under investigation. First, we are developing novel prefetching techniques that can tolerate the memory latency of irregular memory references commonly found in unstructured applications. Second, we are investigating techniques that enable software to control the memory fetch size to increase the effective memory bandwidth of sparse memory accesses. Third, we are studying runtime optimization techniques that acquire on-line profile information and perform memory optimizations as the application is running. And finally, we are also building compiler support to exploit the architectural techniques without requiring programmer intervention.

6. Techniques for Minimizing Code Size in Compilation Targeting Embedded Systems
Prof. Rajeev Barua

Embedded systems refer to the class of application-specific computer systems that are used as controllers and monitors in a variety of consumer and business applications. Such embedded systems are ubiquitous today in household appliances, consumer electronics, communication systems, remote sensing and vehicle control. While many similarities exist between general-purpose computer systems and embedded systems, many of the design criteria differ. For embedded systems, low cost, low power, and small code size often are critically more important than performance at any cost.

An interesting project in this space is compilation of high-level code targeting embedded systems, with the objective of minimizing code size. A low code size is desirable when the entire machine code program is stored on-chip, contributing to low silicon area and power dissipation. Of course, the compiler must simultaneously optimize for the best performance possible at that code size. Note that code size is increased by several compiler transformations commonly employed to improve performance, such as loop unrolling and procedure inlining.

Given a certain code size budget, an interesting question is where to employ these code-size increasing transformations most profitably, such that the code size is within budget. Such research would likely profit from profiling information coupled with intelligent heuristics. The work will involve implementing the compiler algorithms and simulating the results. Evaluation of results would compare performance with both un-optimized code as well as code optimized regardless of code size. Time permitting, a comparison with hand-optimized code will also be done.

7. Protection Techniques for Intellectual Property Based System Design
Prof. Gang Qu

The advances in VLSI semiconductor technology and system-on-a-chip design paradigm, coupled with the shrinking time-to-market window, have changed the traditional system design methodology. Design reuse and intellectual property (IP) based design become more and more important. The key challenge nowadays for system designers is to find IPs and make necessary modification, as little as possible, to meet customer's requirements in a timely fashion. Leading by the Virtual Socket Interface Alliance (VSIA), system houses, semiconductor vendors, electronic design automation companies, and IP providers are working hard towards standards for IP reuse. This opens a new business model for system-on-chip industry. However, the potential of IP infringement is growing fast. The American Society for Industrial Secrets estimates that in the US alone, trade secret theft is in excess of $2 billion per month.

There have been various proposals for intellectual property protection, highlighted with the newly released IP protection white paper by VSIA. In this project, students will see the big picture of IP protection in the content of system design. In particular, they will learn the state-of-the-art IP protection techniques, such as digital signature, watermarking, and fingerprinting. More important, undergraduate students will gain the experience in implementing such techniques to embed information into a couple of real life system design problems.

8. Low Power Embedded System Design via Voltage Scaling
Prof. Gang Qu

Modern embedded systems require high mobility, low power consumption, and high performance among others to support applications with heavy workloads such as communication and digital signal processing applications. The dominant source of system's power dissipation is the dynamic power, which happens when we switch the signal from low to high or vice versa. Early simulation shows that voltage scaling is one of the most powerful tools to reduce system's power consumption. In this technique, power is saved by using a lower supply voltage when the system is not at peak performance. However, the trade-off is that the system's speed goes down with the lower voltage. The problem is how to build a voltage profile to minimize the system's power consumption without violating application's timing constraints. Undergraduate students will learn (1) how multiple voltages are integrated on a system, (2) how to do task scheduling on multiple voltage processor, and (3) how to develop on-line algorithms for real-time applications.

9. Satisfiability Problem and its Application in VLSI CAD
Prof. Gang Qu

In the (boolean) satisfiability problem, we are given a formula on a set of (boolean) variables, and we are asked to assign each variable either 0 or 1 to make the formula true. The formula consists of variables and three types of basic operations: (i) '+': x+y is true if at least one of the variables x and y gets a value '1'; (ii) '*': x*y is true if and only if both x and y get value '1'; (iii) ''': x' is true if and only if x gets a value '0'. For example, any of the following assignment will make formula x'+y*z true: {x=0, y=0, z=1}, {x=1, y=1, z=1}, {x=0, y=1, z=1}.

The satisfiability problem has numerous applications in computer science, complexity theory, and very large scale integrated(VLSI) circuits computer aided design(CAD). The problem is hard and many heuristics have been proposed trying to solve it. Because these problem solvers come from very different fields and target very different type of formulas, it is difficult to compare their performance. Our goals in this project include: (1) understanding the problem and basic ideas of different solvers, (2) building testbeds for different solvers, (3) developing new algorithms to solve the problem, and (4) improving C/C++/JAVA programming skills.

10. Visual Programming Environment for Building Real-Time Embedded Systems
Prof. David Stewart

Visual programming environments have been used in desktop systems for years. Modeling languages such as UML have been developed to meet the need for visual programming environments. This trend is slowly making itself known in Real-Time Embedded Systems. Tools such as Rational Rose Real-Time and Ptolemy are examples of visual programming tools for real-time embedded systems. The proposed work is to create a tool that generates code for a component-based embedded real-time system. Most programming will be done in Java, while code generated will be in C.

Responsibilities include developing a visual programming interface for the Echidna RTOS. This involves UML software design of class hierarchies, implementation of the design in Java, testing and debugging the code, and demonstrating the project to potential users. Required experience includes Java and C programming. Prior exposure to graphical user interface programming, data structures, and algorithms is desirable.

11. Building Real-Time Embedded System Testbed using the LEGO Mindstorm RCX 2.0
Prof. David Stewart

The LEGO Mindstorm Robotics Invention System is designed as a computer engineering learning platform for junior high to high school students. However the simplified programming interface is not sufficient to use the platform for more sophisticated research and training, as performed in universities. The objective of the project is to assemble the LEGO Mindstorm RCX 2.0 hardware testbed and enhance the software development environment to create a sophisticated testbed for embedded real-time system training and research.

Responsibilities include installing sensors and actuators for the LEGO mindstorm hardware platform, developing device drivers using a reconfigurable software model for each sensor and actuator, and develop sample applications demonstrating the features. Significant experience with C or C++ is required. Background in assembly language, microprocessors, real-time operating systems, or sensor-based control systems can be helpful.