The MERIT Summer Research Program
 
 
 
 
 
 
 
 
 
  The A. James Clark School of Engineering
  University of Maryland Home

ICE Project Descriptions: Summer 2003


1. Architectures for High-Performance, Low-Power Embedded Systems
Prof. Bruce Jacob

We have conceptualized a hardware/software co-designed processor architecture and real-time operating system (RTOS) framework that together eliminate most high-overhead operating system functions in an embedded system, thus maximizing the performance and predictability of real-time applications. This long-term project is targeted to design and build a simulator, a prototype processor, and an experimental RTOS to demonstrate these claims.

We are building a complete embedded-system simulator that simulates both the embedded microcontroller and the RTOS. This will enable us to gather a large amount of information on the behavior of real-time systems and will allow us to measure the effect of changes to the system architecture that require modifications to both hardware and software. Current measurement techniques do not allow such flexibility; software simulators that execute applications directly on an emulated processor neglect operating system activity, and systems that attach logic probes to real hardware obtain accurate measurements but do not allow modifications to the processor architecture.

Over the past three summers, MERIT students, along with our graduate students have developed and refined SimBed, an execution-driven simulation testbed that measures the execution behavior and power consumption of embedded applications and RTOSs by executing them on an accurate architectural model of a microcontroller with simulated real-time stimuli. We are currently working with three RTOSs: uC/OS-II, a popular public-domain embedded real-time operating system; Echidna, a sophisticated, industrial-strength (commercial) RTOS; and NOS, a bare-bones multi-rate task scheduler reminiscent of typical "roll-your-own" RTOSs found in many commercial embedded systems. We are currently modeling three different microprocessors: the Motorola M-CORE microcontroller, a low-power, 32-bit CPU core with 16-bit instructions; the Texas Instruments 'C6000 DSP, a high-performance VLIW digital signal processor; and the Digital/Intel StrongARM, one of the most popular low-power/high-performance microprocessors available today.

Tasks within this project include modeling hardware architectures in C and/or Verilog, building new hardware constructs for real-time processing, developing enhancements for real-time operating systems, and developing real-time embedded applications.

2. Robust Computer Architectures
Prof. B. Jacob

The devices with which computer chips are built are decreasing in detail size and voltage level very rapidly making them more vulnerable to electrical upset, either from external sources or internal interference. Previous work in the area of circuit-level fault tolerance has focused on surviving small numbers of random transient or stuck-at errors. The solution in the memory system (both the DRAM system and the cache system) has been to provide ECC bits (error-correcting codes) that detect and correct such errors introduced into the memory system. By using enough redundant bits, one can catch any number of errors this way.

The solution on the processor side has been to replicate resources at different levels of granularity. For example, some systems have multiple identical processors performing the same task at the same time and use a voting algorithm to ignore any erroneous results. Other systems replicate components within the architecture -- for example, by having multiple adders that perform identical computations with a similar voting method to choose the correct result. These approaches are intended to catch transient errors that occur in one (but not usually more than one) of the processors or components involved. As the susceptibility of circuits increases with smaller, higher-performance parts, the likelihood that errors are transient and random decreases rapidly.

There is a solution. Rollback recovery has been long used in the field of distributed systems and transaction processing to provide high degrees of reliability in the face of occasional catastrophic failures (e.g. disk crashes). A reliable software system using the technique periodically saves to reliable storage just enough state that the system can be successfully restarted using only that saved state. When a catastrophic failure occurs and is detected, the system is restarted from that saved state. We propose to use rollback recovery not at the system-software level but at the microarchitecture level (i.e. chip level) to provide a high degree of reliability. Periodically, the microprocessor will dump consistent system state (the contents of its internal storage and any recent changes to external storage) to a safe location. Upon detection of an error, this state will be restored to the processor, and the processor will begin executing from this ?known good? state.

Tasks within this project include modeling hardware architectures in Verilog, designing and fabricating (and testing) new hardware prototypes, and developing testbed applications.

3. Satisfiability Problem and its Application in VLSI CAD
Prof. Gang Qu

In the (boolean) satisfiability problem, we are given a formula on a set of (boolean) variables, and we are asked to assign each variable either 0 or 1 to make the formula true. The formula consists of variables and three types of basic operations: (i) '+': x+y is true if at least one of the variables x and y gets a value '1'; (ii) '*': x*y is true if and only if both x and y get value '1'; (iii) ''': x' is true if and only if x gets a value '0'. For example, any of the following assignment will make formula x'+y*z true: {x=0, y=0, z=1}, {x=1, y=1, z=1}, {x=0, y=1, z=1}.

The satisfiability problem has numerous applications in computer science, complexity theory, and very large scale integrated(VLSI) circuits computer aided design(CAD). The problem is hard and many heuristics have been proposed trying to solve it. Because these problem solvers come from very different fields and target very different type of formulas, it is difficult to compare their performance. Our goals in this project include: (1) understanding the problem and basic ideas of different solvers, (2) building testbeds for different solvers, (3) developing new algorithms to solve the problem, and (4) improving C/C++/JAVA programming skills.

4. Subordinate Multithreading Architectures and Applications
Prof. Donald Yeung

Today, multithreading is beginning to penetrate the mass computing market, and is already available in production volumes (e.g., Intel's Pentium 4 with Hyperthreading). As multithreaded processors gain widespread acceptance, it becomes critical for workloads to effectively exploit the available thread-level parallelism. One obvious source of thread-level parallelism is multiprogrammed workloads (executing multiple applications together); unfortunately, many multiprogrammed workloads cannot provide a sustained source of thread-level parallelism. Another source of thread-level parallelism is parallel workloads. However, this approach requires explicit parallelism, which is usually too challenging for compilers and too labor-intensive for humans.

Given spare execution resources in future under-utilized multithreaded processors, an extremely promising approach is subordinate multithreading. In addition to running workload (or ``main'') threads, subordinate multithreading also runs subordinate threads to perform computations on behalf of the main threads. These helper threads can assist or extend the functionality of the main thread in some fashion, or attempt to directly improve application performance. Currently, we are investigating using such helper threads to perform data prefetchting We have recently built the first compiler to automatically generate prefetching code that runs as subordinate threads to improve the performance of the main thread. In ongoing research, we are also investigating new uses of helper threads to include traditional runtime or operating system level optimizations and functions, such as dynamic compilation, garbage collection, and on-line performance feedback.

Students participating in this project will investigate new hardware techniques to support subordinate multithreading, as well as study novel applications of subordinate threads. Tasks will include developing simulation models in the context of multithreaded processor simulators, porting applications to simulation infrastructure, and running experiments.

5. Synthesis-assistance and Compilation Software for Embedded Systems
Prof. Rajeev Barua

Embedded systems refer to the class of application-specific computer systems used today as controllers and monitors in a variety of consumer and business applications. Such embedded systems are ubiquitous today in cell phones, DVD players, PDAs, household appliances, consumer electronics, communication systems, remote sensing and vehicle control, to name just a few. Since 1999, the dollar volume (total sales) from embedded CPUs has exceeded that of desktop CPUs such as the Pentium, and is growing much more rapidly. Over $50 Billion in embedded CPUs were sold in the year 2001.

Embedded systems promise to revolutionize our day-to-day lives with ever-increasing intelligence and connectivity at decreasing cost. Yet, many of the software technologies for embedded systems remain antiquated, from compilers that produce code whose performance and power consumption is substantially inferior to assembly language programs, to synthesis software that provides little guidance to the designer on what decisions to make. These shortcomings decrease system performance and increase time-to-market and software and hardware development cost.

This project focuses on developing fundamental technologies to propel the software for embedded systems to the next level of automation. Opportunities along two directions are being explored: increased automation of the synthesis of embedded soft cores, and new compiler strategies for the management of heterogeneous memories in embedded systems. When deployed, these innovations will lead to a quantum leap in the time to market, cost and performance of embedded designs. Both directions rely on improved compiler analysis of application domains.

MERIT interns on this project will function as full-fledged group members, and will work with Dr. Barua and his graduate students in delivering key infrastructure components or technologies. Only work that is critical to the project will be assigned to interns; subsequently if the intern is able to complete the project, there is a good chance of a co-authorship on a conference or journal publication.

Prerequisites:Programming experience in C and/or C++ is a requirement -- the more the better. Courses in Data Structures(CMSC 420) and Computer Organization (ENEE 350) will be a significant plus.