If you have two computers with only one thread, a and b.
And computer a runs twice as fast as computer b.
Does this mean that computer a and can do twice as many operations in a given time in comparison to computer b?
To increase the Hertz of a computer, i.e. the speed of the computer and the number of operations it can achieve with in a given time frame, what has to change in the architecture?
Asked By : Mark Ramotowski
Answered By : Paul A. Clayton
The performance of a computer for most programs cannot be simply derived from the computer's capacity to perform operations (i.e., peak performance). As Shitikanth's answer states, the memory system and other bottlenecks can keep performance below the peak.
In addition, processor frequency is not necessarily proportional to processor performance. It is possible to increase performance without increasing frequency and to reduce performance by increasing frequency.
A substantial number of techniques can be used to increase performance.
Increasing the execution width of the processor (how many operations it can perform in parallel) can also increase performance. While increasing width tends to require reducing the frequency and programs rarely fully utilize the extra width at all times, no modern high performance processor only begins executing one instruction each cycle. (Supporting vector operations is somewhat similar, but that requires more support in software. Likewise using multiple processing cores or multiple threads within a single processing core can provide greater parallel operation while requiring more software effort.)
Speculation can also be used to increase performance. By doing work that might not have been necessarily, it is possible to increase the amount of parallelism available and to reduce execution delay. In current processors this includes branch prediction, out-of-order execution, and hardware prefetching. By predicting the path taken by a branch, the processor does not have to wait until the branch condition and target address are computed before fetching (and even executing) instructions after the branch.
(Increasing the accuracy of branch prediction can be very helpful for some workloads. Reducing the number of branch mispredictions reduces the amount of useless speculative work done by the processor. This can allow more electrical power/thermal budget to be applied to useful work. Such can also allow a deeper pipeline to have an acceptable amount of performance loss from the misprediction penalty. A deeper pipeline allows higher frequency but increases the cost of a branch misprediction--simplistically N pipeline stages of work must be discarded on a misprediction so increasing the number of pipeline stages increases the cost of mispredictions.)
Out-of-order execution allows (usually useful) work to be done while one or more instructions wait for a result from a previous instruction. Increasing the number of "future" instructions that can be considered as candidates for execution increases the processor's ability to tolerate delayed results. Out-of-order execution also allows more operations to execute in parallel by relaxing name dependencies (where an instruction reuses an address/name but does not have a value dependency).
Hardware prefetching speculates that the value prefetched will be used in the near future. Such allows the latency of memory accesses to be reduced. Theoretically prefetching could even be used to distribute the use of memory bandwidth through time (by scheduling useful prefetches during low demand access activity), though prefetching is generally a trade-off between latency and bandwidth use. In addition, it is often more bandwidth efficient to fetch multiple cache blocks that are within a single DRAM page. (Scheduling writebacks to memory can also increase effective bandwidth.)
Cache performance is also an area where computer performance can be improved. Placement policies (where a block is allocated within the physical cache--e.g., Non-Uniform Cache Architectures have been proposed and even implemented) and replacement policies (which block to drop from the cache when a new block is brought into the cache) can increase cache performance by reducing latency in more common or more timing critical cases.
Greater integration of the components of a system can also increase performance by reducing communication delay (and power use which is a significant limiter of computer performance).
In addition, specialized hardware can perform operations faster and/or more efficiently than more general-purpose hardware. (E.g., for decades performance-oriented processors have implemented specialized hardware to perform multiplication rather than reusing the hardware provided for addition. More recently, cryptographic and other specialized hardware has been added.)
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/13894
0 comments:
Post a Comment
Let us know your responses and feedback