Can you pipeline a pure Von Neumann architecture based CPU or do you need seperate data and instruction caches for this? If you include seperate instruction and data caches (then it isn't a von neumann CPU anymore, it's a modified Harvard), how do you unify the data of these caches so that they get stored in a single memory?
Asked By : gilianzz
Answered By : Paul A. Clayton
A processor that can only support one memory access per cycle can still be pipelined. Such a memory interface would represents a structural hazard for load and store operations. In addition, store operations would introduce the equivalent of a control hazard since they might change instructions that have already been fetched.
The structural hazard can be detected when the instruction is decoded and the pipeline stalled at instruction fetch at the appropriate point. This allows a load or store to execute at the cost of a pipeline bubble. For stores, the processor can speculate that a store does not address memory that holds instructions which will be in the pipeline (or at least that the store not change the semantics of any such instructions). This is comparable to predicting a branch. As branch prediction requires checking the condition and target, store conflict prediction requires a comparison of the store address with the addresses of all instructions that will have been fetched by the time the store data becomes visible to the instruction fetch stage. As a branch misprediction requires flushing the pipeline so would a store conflict misprediction.
If one allows a wider front end and buffering, the impact of the structural hazard can be reduced. Such buffering, even if it is after instruction decode, might be considered a cache.
A processor that breaks some instructions into multiple micro-ops and executes one micro-op per cycle could benefit from such a buffer even if only one instruction is fetched per cycle. Variable length instructions would naturally benefit from a buffer of fetched instructions since a fixed width fetch would often either fetch more than one instruction or fetch only part of an instruction.
If the data access width is greater than the typical instruction length, it also becomes natural to use this greater width for fetching instructions, buffering any excess.
The structural hazard related to loads and stores can avoided by multiporting (or accessing the cache twice within a single cycle) or pseudo-multiporting (banking) the memory. The processor would still have the quasi-control hazard for stores. The main benefits of such an implementation over having separate caches would be somewhat simpler handling of stores and faster operation when running self-modifying code. The processor could still handle the store issue with separate coherent caches, but implementing coherence and handling the delay in realization of stores in the instruction cache would add complexity.
I would argue that a modified Harvard architecture that implements a Von Neumann interface (i.e., is indistinguishable to software except with respect to performance) is Von Neumann. (Recent ISAs generally avoid such guarantees. The benefit of a simpler interface is generally not considered worth the complexity/performance tradeoffs.)
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/40107
0 comments:
Post a Comment
Let us know your responses and feedback