My Science Talent Search Paper

imacman287 · November 17, 2002 1:40AM

I was wondering if the Appleinsider community would take a look at my paper. I am a High School student and I don?t know anyone who would be able to read and discuss this, except you guys here.

Here is a draft of my paper that I am entering into the Intel Science Talent Search. There are probably tons of grammar and spelling errors, it is a draft, there are bound to be many problems. Any comments and/or suggestions would be greatly appreciated.

---------------------------------

Threaded-Parallel RISC:

Breaking the Speed Barriers of Today?s Computer Architectures.

Introduction and Purpose of Research

With the introduction of more and more transistors packed on ever shrinking microprocessor dies, the price to performance ratio of consumer microprocessors has increased phenomenally each year. As always, one of the top goals of today?s computer engineers is a cost-effective design that can be easily marketed. With the current manufacturing processes of today, die space is not as huge of an issue as it once was. Much of the products cost, and a risk of project failure, takes place in the design phase of the respected microprocessor, which can take many years. The additional design complexity often negatively affects the cost-effectiveness of the microprocessor, by the time the product is finished it may be three or more years behind current technological trends. Designers also have a bad tendency of adding more and more complex instructions and spending large amounts of time on sections of the processor that are rarely used.

The purpose of my research is to improve the performance of a new breed of microprocessors, while reducing the amount of development time and cost. The microprocessors of today are not very good at multitasking and working with parallel threads. This papers purpose is to make a good case for a simplification of various techniques, while promoting innovation with new design techniques.

Are Today?s RISC Microprocessors Becoming Too Complex?

\tIn a way it seams that the industry does not remember a chapter in its past. Conditions similar to what are currently being experienced in the past lead to the fall of the CISC model and proliferation of RISC. The ?complex? RISC processors of today are reverting to some of the traits that plagued CISC designs of yesteryear.

\tToday?s RISC microprocessors are generally a hodge-podge of separate specialized functional units, with very poor utilization rates. Even under best-case scenarios it has been shown that only 10 to 30 percent of the actual functional units are being used in actual full load environments. It seems that much attention has been placed on functionality that is rarely used. Designers should focus on making full load situations as manageable as possible. Implementing features from parallel processing architectures, and packing them into standard modular functional units, would add tremendous performance increases, while reducing design complexity.

Parallel Processing: Successes and Failures

\tHave you ever heard the saying that two heads are better than one? Parallel processing holds true to that statement. Parallel processing has been somewhat neglected in the field of computer science. Much of computer science has revolved around the one-at-a-time sequential way of doing things. Parallel processing allows more then one task to be completed inside a single microprocessor; some implementations include issuing multiple instructions on varying instruction and data streams, and extreme uses of pipelining. Today as multitasking has become an important part of the users computing experience, many wonder why that even today most computer systems are designed with very little thought towards parallel processing. Some implementations have just now recently reached the desktop, such as pipelining and instruction level parallelism. Many people feel that the industry can do better.

\tThere are a few roadblocks that stand in-the-way of parallel processing in more mainstream uses. Software inertia is probably the largest obstacle to overcome, as many programs would need to be rewritten to fully take advantage of a parallel architecture. Another challenge is derived from Amdahl?s law, there will be a limited return with the addition of more processors (or simultaneous instruction threads). Of the two largest challenges, Amdahl?s law is probably the largest obstacle. Sometimes it is just not cost effective enough to warrant the purchase of additional processing units with a lower return received. Software inertia is not as big of a problem because many new programs that are just not possible today would have to be developed from the ground up anyway, and compiler technology dealing with auto parallelization is rapidly improving.

Although Amdahl?s law will not go away, we can lessen to effects of it and still get a nice speed increase. On die transistor counts have doubled every 18 months. In an effort to keep pace with Moore?s law, semiconductor companies have developed smaller and smaller manufacturing processes, leading to large increases in the amount of transistors that can be packed into an individual chip. Even though transistor counts have grown greatly, the prices of the microprocessors produced have fallen! This can allow for more innovative and ambitious designs to prosper.

A Parallel Technique: Simultaneous Multithreading

\tA recent interest that has been ignited in world of computer architecture is Simultaneous Multithreading, or SMT. Simultaneous Multithreading is a parallel processing technique that allows more than one thread to be worked on inside a microprocessor at a time. This combined with instruction level parallelism allows the functional units of the microprocessor to be used far more than usual (around 80% in high load situations), and makes a more efficient design. Intel?s name for SMT is Hyperthreading, a technology they are now starting to include on their microprocessors. Even thought the current implementations are not as extensive as they could be, and are built on top of a quickly aging instruction set, they still offer around a 30 percent performance increase. They also only add a small performance overhead and around 5 percent to the amount of die space. Most of the current shortcomings in SMT, such as a fixed thread count, a cumbersome implementation, thread resource balancing issues, and few innovations will be addressed with my own extension of SMT included as a cornerstone of my parallel-threaded RISC architecture.

Dynamic Simultaneous Multithreading for TPRISC: Introduction

My implementation of Simultaneous Multithreading, Dynamic Simultaneous Multithreading (or DSMT) adds many new features to increase efficiency, improve performance, improve reliability, and reduce costs. If the effort was applied to a design that has its roots in parallel processing, some of the limitations of today?s sequential microprocessors will be over come. DSMT is integrated into the design of just about every part of TPRISC, as suggested by its name.

Dynamic Simultaneous Multithreading for TPRISC: Implementation

One of the most important aspects of DSMT is its high price to performance ratio. This is gained through the reuse of functional unit resources, while adding intelligent thread aware support systems. DSMT will feature a support register that will allow the number of active threads to be changed dynamically. The register will be designated for use by the operating system?s process scheduling routines. Although the operating system?s thread scheduler could just not dispatch any instructions, that implementation would not be very complete. If a thread that was not needed was still powered, it is wasting resources, increasing overhead, and increasing that amount of power required. A dynamic thread count will also add performance advantages in situations where either a high or low thread count would be more desirable. This would decrease overhead in many situations, and add more flexibility to handle high load demands in a more even and smooth fashion.

DSMT could also offer additional benefits such as error checking to happen in parallel. Although workload would be significantly increased, reliability in intensive environments could be greatly increased. This feature would be enabled by the machine?s BIOS, as it would not be a needed (or wanted) feature in certain environments. Although, the gain in reliability would be a very welcomed feature in mission critical environments such as corporate database servers and various military applications.

The costs of implementing DSMT would be similar to the cost of implementation in Intel?s Pentium 4 (3Ghz+) and Pentium 4 Xeon. Certain things would have to be duplicated including various registers (obliviously including the instruction pointer register), and various internal buses. Although the die space cost of Hyperthreading was only around 5 percent, my implementation will consume slightly more die space, but the price to performance ratio of improvements will be well worth it.

TPRISC: Modularity Overview

One of the most important aspects of TPRISC is its modular design nature. All major functional units will be created in a modular manner so that the microprocessor can be developed more rapidly, thus reducing costs. By taking this approach, the wheel would not have to be reinvented every time a new microprocessor needed to be designed. There are also more added advantages to this approach that may not be as apparent.

A modular design would encourage a more power efficient design, as individual units can be designed to be turned off if not being used. Also if an additional clock was used for just that unit, depending on demand, the individual unit?s clock speed could be adjusted. This would allow a good balance between processing speed and power usage, which is quickly becoming a major problem.

This could also be applied to problem of manufacturing defects, as long as the defect was inside of the functional unit that there are duplicates of, individual units could be disabled and the microprocessor as a whole would still be able to function. The microprocessors that have functional units disabled could then be used in situations that require lower power usage or as a spare in a multiprocessing environment. The microprocessors with reduced functionality would make the product more profitable. Their demand would be great in laptop, low-end, and power conscious multiple computer clusters. In the past the product would have to be either repaired if the defect was simple or just discarded.

Modularity would also make the product as a whole better. If modularity were a key factor in the design phase of the microprocessor then individual teams would be formed to focus on a single module. Individual modules could be designed with standards in mind, so that later they could be used in newer more specialized microprocessors.

TPRISC: Specialization

\tA huge advantage gained from the modular designed nature of TPRISC concept is the ability to easily design microprocessors designed for a target use. The modular nature of the functional units allows the design team to rapidly choose the number of units to be included. This allows the designers to fine-tune the final product. For example, database software uses the floating pointing unit very rarely, while the integer units are always in demand. A TPRISC microprocessor could be customized to contain double the integer units, while neglecting the floating pointing units. This concept would be useful only in situations where there will be a high demand, and high volume purchases.

\tAlso specialized SIMD instruction sets could be developed for a target customer. The Gecko microprocessor contained in the Nintendo GameCube has a special SIMD unit developed by IBM. This unit specializes in routines that are useful in artificial intelligent applications and 3d particle effects. If the Gecko was designed with a DSMT design philosophy the only part that would have needed to be developed is the SIMD, as the Gecko is derived from the PowerPC 750 series microprocessor. The Gecko SIMD if designed to be modular could then be included in later PowerPC processors allowing other lines to benefit from the research and development time allocated for the Gecko. This does add complexity to the design, but at least the research and development allocated for the unit could be used towards another product.

TPRISC: Problems with Sequential-Based Memory Hierarchies

\tOne of the largest problems with Intel?s version of SMT, Hyperthreading, is that the memory hierarchy is greatly strained when handling large workloads from it?s two threads. The memory interfaces are not SMT aware, leading to possible monopolization of certain threads in the respective levels of cache. If SMT aware memory interfaces were created, they could be designed to throttle memory bandwidth based on what the operating system deems necessary. This is a very important feature as memory bandwidth is quickly becoming a problem in demanding environments.

Also it would be easier to implement multiple memory interfaces for increase memory bandwidth. Although there are several roadblocks to the implementation of a parallel memory architecture on chip for a consumer level microprocessor, the potential benefits far out weight the problems. For example, three-dimensional visualization demands more and more processing power coupled with even greater thirst for memory bandwidth, in both the consumer and research settings. Most of the complexity associated with three-dimensional graphics has shifted off the central processing unit (CPU) to the graphics processing unit (GPU), tunneled through an AGP port. The associated memory bandwidth strain applied to the CPU makes running other processes on the same CPU a very difficult and slow process.

One way around this problem is to design a dual channel memory architecture. This is an effective way to increase bandwidth, but is not very easy to implement. Newer, more complex load/store memory units must be created to solve problems that under the TPRISC architectural description a simple addition of a multiple existing units would be sufficient. Also current sequential microprocessors must switch between channels, which increases latency and limits real bandwidth. The solution to this problem is the addition of parallel memory bus.

TPRISC: The Inclusion of Multiple Processors

A central feature of most parallel processing implementations is the ability to use multiple microprocessors. The most common type of multiprocessor technology used today is Symmetric MultiProcessing (or SMP). Although SMP is a great implementation, it can be improved. SMP requires that all the microprocessors connected to the system bus must be the same to function properly, a requirement that holds back SMP adoption rates in cost sensitive machines.

A common bus designed with DSMT in mind could allow threaded applications to span across multiple processors, even if they specialize in different things. Microprocessors that are designed for different target audiences could be on the same bus. This would promote the development of ?slave? processors that could specialize in operations such as floating pointing calculations, while having little or no support for integer calculations. Processors like this would be ideal in clusters that deal with scientific data, as they do not need high integer performance, but all the floating pointing performance possible. As long as the unflavored unit existed on the bus, they cluster would still function properly.

TPRISC: The Inclusion of Vector Processing

A current trend in the realm of Computer Architecture is the inclusion of a vector processing SIMD unit on die to enhance multimedia performance. That the trend may begin to back-fire as these SIMD units become more and more complex and specialized. It is my belief that these units should be designed to handle generic kinds of data, and not become extremely specialized. The TPRISC architecture should invest as a conservative amount of resources in SIMDs. A smaller, simpler SIMD could be designed to perform the most common vector tasks, and then reused in later reiterations of the microprocessor. The smaller vector units would also be easier to place on the bus in parallel. This would satisfy the needs of those that require the use of vector processing, while allowing a more balanced architecture to mature.

TPRISC: Conclusion and Future Simulations

I am currently finishing a TPRISC processor simulator that implements some of my ideas that are discussed here. The simulator will be able to validate my research, and will give the world a better view concerning the implementation of parallel processing. A more advanced design prototype could then be produced. Maybe my design will lead to a future where general purpose parallel microprocessors become as common as sequential microprocessors are today.

----------------------

I will probably re-write some of the sections BTW, I just want recieve some comments on my work.

Thanks!

[ 11-17-2002: Message edited by: imacman287 ]

imacman287 · November 17, 2002 1:43PM

Anyone care to comment on my paper?

scott · November 17, 2002 2:46PM

Where's HTH and others when we need them?

splinemodel · November 17, 2002 4:55PM

I'd love to comment, but I can't now. (too busy) You can email me at splinemodel at psmug.princeton dot edu.

Here's a very general comment about part one: Yes, RISC's are getting more complex. Back in the 80's when risc was the great, new thing, memory was much more expensive, and so packing more into each instruction was a good idea, plus compiler tech was good enough that writing C was the main way to go, and the compiler could optimize the assembly for use on a RISC easily. The new 970, I argue, is not a RISC chip. What risc chip has a separate risc execution core? weird. But anyway, RISC is not a perfect paradigm. Performance is, power usage is, die size is. So whatever suits the need best will be adopted.

The next thing, yes, is vector chips. The Itanium is one of these. Sometime in the future are quantum chips, which will be a giant leap and a completely different type of architecture and understanding.

I suppose a bump to the top won't hurt.

applenut · November 17, 2002 5:53PM

Not for nothing man but this is more of like a research paper than an actual project.

The research is good, but I think you need a lot more than what you have to actually have a chance at all

\

imacman287 · November 17, 2002 6:25PM

[quote]Originally posted by applenut:

Not for nothing man but this is more of like a research paper than an actual project.

The research is good, but I think you need a lot more than what you have to actually have a chance at all

\<hr></blockquote>

This is suppost to be just a research paper. I will go more indepth when the simulator for my architecture is created.

applenut · November 17, 2002 6:30PM

[quote]Originally posted by imacman287:



This is suppost to be just a research paper. I will go more indepth when the simulator for my architecture is created.<hr></blockquote>

I'm confused... so, there's more to your submission to Intel right? not just this

imacman287 · November 17, 2002 6:33PM

[quote]Originally posted by applenut:



I'm confused... so, there's more to your submission to Intel right? not just this<hr></blockquote>

Yea, many essays are required, and this will be used as background research into another science fair project.

<a href="http://www.sciserv.org/sts/"; target="_blank">Here is the Science Talent Search Website</a>

applenut · November 17, 2002 11:50PM

[quote]Originally posted by imacman287:



Yea, many essays are required, and this will be used as background research into another science fair project.

<a href="http://www.sciserv.org/sts/"; target="_blank">Here is the Science Talent Search Website</a><hr></blockquote>

oh ok... I stupid <img src="graemlins/smokin.gif" border="0" alt="[Chilling]" />

imacman287 · November 18, 2002 9:57AM

[quote]Originally posted by applenut:



oh ok... I stupid <img src="graemlins/smokin.gif" border="0" alt="[Chilling]" /> <hr></blockquote>

Not really, I did leave out some important things in my essay, as you pointed out. I am rewriting parts of it now. I was just saying that the research paper is not the only aspect.

My Science Talent Search Paper

Comments