Download Free Chip Multiprocessors Offer An Economical Scalable Architecture For Future Microprocessors Thread Level Speculation Support Allows Them To Speed Up Past Software Book in PDF and EPUB Free Download. You can read online Chip Multiprocessors Offer An Economical Scalable Architecture For Future Microprocessors Thread Level Speculation Support Allows Them To Speed Up Past Software and write the review.

The Hydra chip multiprocessor (CMP) integrates four MIP-based processors and their primary caches on a single chip together with a shared secondary cache. A standard CMP offers implementation and performance advantages compared to wide-issue superscalar design & However, it must be programmed with a more complicated parallel programming model to obtain maximum performance. To simplify parallel programming, the Hydra CMP supports thread-level speculation and memory renaming, a paradigm that allows performance similar to a uniprocessor of comparable die area on integer programs. This article motivates the design of a CMP, describes the architecture of the Hydra design with a focus on its speculative thread support, and describes our prototype implementation.
Chip multiprocessors - also called multi-core microprocessors or CMPs for short - are now the only way to build high-performance microprocessors, for a variety of reasons. Large uniprocessors are no longer scaling in performance, because it is only possible to extract a limited amount of parallelism from a typical instruction stream using conventional superscalar instruction issue techniques. In addition, one cannot simply ratchet up the clock speed on today's processors, or the power dissipation will become prohibitive in all but water-cooled systems. Compounding these problems is the simple fact that with the immense numbers of transistors available on today's microprocessor chips, it is too costly to design and debug ever-larger processors every year or two. CMPs avoid these problems by filling up a processor die with multiple, relatively simpler processor cores instead of just one huge core. The exact size of a CMP's cores can vary from very simple pipelines to moderately complex superscalar processors, but once a core has been selected the CMP's performance can easily scale across silicon process generations simply by stamping down more copies of the hard-to-design, high-speed processor core in each successive chip generation. In addition, parallel code execution, obtained by spreading multiple threads of execution across the various cores, can achieve significantly higher performance than would be possible using only a single core. While parallel threads are already common in many useful workloads, there are still important workloads that are hard to divide into parallel threads. The low inter-processor communication latency between the cores in a CMP helps make a much wider range of applications viable candidates for parallel execution than was possible with conventional, multi-chip multiprocessors; nevertheless, limited parallelism in key applications is the main factor limiting acceptance of CMPs in some types of systems. After a discussion of the basic pros and cons of CMPs when they are compared with conventional uniprocessors, this book examines how CMPs can best be designed to handle two radically different kinds of workloads that are likely to be used with a CMP: highly parallel, throughput-sensitive applications at one end of the spectrum, and less parallel, latency-sensitive applications at the other. Throughput-sensitive applications, such as server workloads that handle many independent transactions at once, require careful balancing of all parts of a CMP that can limit throughput, such as the individual cores, on-chip cache memory, and off-chip memory interfaces. Several studies and example systems, such as the Sun Niagara, that examine the necessary tradeoffs are presented here. In contrast, latency-sensitive applications - many desktop applications fall into this category - require a focus on reducing inter-core communication latency and applying techniques to help programmers divide their programs into multiple threads as easily as possible. This book discusses many techniques that can be used in CMPs to simplify parallel programming, with an emphasis on research directions proposed at Stanford University. To illustrate the advantages possible with a CMP using a couple of solid examples, extra focus is given to thread-level speculation (TLS), a way to automatically break up nominally sequential applications into parallel threads on a CMP, and transactional memory. This model can greatly simplify manual parallel programming by using hardware - instead of conventional software locks - to enforce atomic code execution of blocks of instructions, a technique that makes parallel coding much less error-prone. Contents: The Case for CMPs / Improving Throughput / Improving Latency Automatically / Improving Latency using Manual Parallel Programming / A Multicore World: The Future of CMPs
Mathematics of Computing -- Parallelism.
This book presents new concepts, techniques and promising programming models for designing software for chips with "many" (hundreds to thousands) processor cores. Given the scale of parallelism inherent to these chips, software designers face new challenges in terms of operating systems, middleware and applications. This will serve as an invaluable, single-source reference to the state-of-the-art in programming many-core chips. Coverage includes many-core architectures, operating systems, middleware, and programming models.
Multithreaded architectures now appear across the entire range of computing devices, from the highest-performing general purpose devices to low-end embedded processors. Multithreading enables a processor core to more effectively utilize its computational resources, as a stall in one thread need not cause execution resources to be idle. This enables the computer architect to maximize performance within area constraints, power constraints, or energy constraints. However, the architectural options for the processor designer or architect looking to implement multithreading are quite extensive and varied, as evidenced not only by the research literature but also by the variety of commercial implementations. This book introduces the basic concepts of multithreading, describes a number of models of multithreading, and then develops the three classic models (coarse-grain, fine-grain, and simultaneous multithreading) in greater detail. It describes a wide variety of architectural and software design tradeoffs, as well as opportunities specific to multithreading architectures. Finally, it details a number of important commercial and academic hardware implementations of multithreading.
The microprocessors will have more than a billion logic transistors on a single chip in the near future. Several alternatives have been suggested for obtaining highest performance with billion-transistor chips. To achieve the highest performance possible, an on-chip multiprocessor will become one promising alternative to the current superscalar microprocessor. It may execute multiple threads effectively on multiple processors in parallel if the application program is parallelized properly. This increases the utilization of the processor and provides latency tolerance for the latency caused from data dependency and cache misses. The Electronics and Telecommunications Research Institute (ETRI) in South Korea developed an on-chip multiprocessor RAPTOR Simulator "RapSim", which contains four SPARC microprocessor cores in it. To support this 4-way multiprocessor simulator, Multithreaded Mini Operating System (MMOS) was developed by OSU MMOS group. RapSim runs multiple threads on multiple processor cores concurrently. POSIX threads was used to build Symmetric Multiprocessor (SMP) safe Pthreads package, called MMOS. Benchmarks should be properly parallelized by the programmer to run multiple threads across the multiple processors simultaneously. Performance simulation results shows the RAPTOR can exploit thread level parallelism effectively and offer a promising architecture for future on-chip multiprocessor designs.
Multiscalar Processors presents a comprehensive treatment of the basic principles of Multiscalar execution, and advanced techniques for implementing the Multiscalar concepts. Special emphasis is placed on highlighting the major challenges involved in Multiscalar processing. This book is organized into nine chapters, and provides an excellent synopsis of a large body of research carried out on multiscalar processors in the last decade. It starts with technology trends that provide an impetus to the development of multiscalar processors and shape the development of future processors. The work ends with a review of the recent developments related to multiscalar processors.
A Multi-Processor System-on-Chip (MPSoC) is the key component for complex applications. These applications put huge pressure on memory, communication devices and computing units. This book, presented in two volumes – Architectures and Applications – therefore celebrates the 20th anniversary of MPSoC, an interdisciplinary forum that focuses on multi-core and multi-processor hardware and software systems. It is this interdisciplinarity which has led to MPSoC bringing together experts in these fields from around the world, over the last two decades. Multi-Processor System-on-Chip 1 covers the key components of MPSoC: processors, memory, interconnect and interfaces. It describes advance features of these components and technologies to build efficient MPSoC architectures. All the main components are detailed: use of memory and their technology, communication support and consistency, and specific processor architectures for general purposes or for dedicated applications.
Thread-level speculation (TLS) makes it possible to parallelize general purpose C programs. This paper proposes software and hardware mechanisms that support speculative thread- level execution on a single-chip multiprocessor. A detailed analysis of programs using the TLS execution model shows a bound on the performance of a TLS machine that is promising. In particular, TLS makes it feasible to find speculative do across parallelism in outer loops that can greatly improve the performance of general-purpose applications. Exploiting speculative thread-level parallelism on a multiprocessor requires the compiler to determine where to speculate, and to generate SPMD (single program multiple data) code.We have developed a fully automatic compiler system that uses profile information to determine the best loops to execute speculatively, and to generate the synchronization code that improves the performance of speculative execution. The hardware mechanisms required to support speculation are simple extensions to the cache hierarchy of a single chip multiprocessor. We show that with our proposed mechanisms, thread-level speculation provides significant performance benefits.
Thread-level speculation is a technique that enables parallel execution of sequential applications on a multiprocessor. This paper describes the complete implementation of the support for thread- level speculation on the Hydra chip multiprocessor (CMP). The support consists of a number of software speculation control handlers and modifications to the shared secondary cache memory system of the CMP This support is evaluated using five representative integer applications. Our results show that the speculative support is only able to improve performance when there is a substantial amount of medium-grained loop-level parallelism in the application. When the granularity of parallelism is too small or there is little inherent parallelism in the application, the overhead of the software handlers overwhelms any potential performance benefits from speculative-thread parallelism. Overall, thread-level speculation still appears to he a promising approach for expanding the class of applications that can be automatically parallelized, but more hardware intensive implementations for managing speculation con- trol are required to achieve performance improvements on a wide class of integer applications.