In cache coherence protocol implementation, designers must properly handle non-atomic operations, since coherence transactions cannot complete instantly. We start from the well-known snooping based MSI protocol base model (without considering atomicity), and discuss atomicity handling in real world.
Base Model
Given the following assumptions:
- All caches implement write-back + write-allocate policy
- Coherence requests are atomic, i.e., a coherence request is ordered in the same cycle as it is issued by a cache controller
- Zero latency from a coherence request to its response
The cache controller state transitions for snooping based MSI protocol is shown below:
| State | Processor Core Events | System Bus Event | ||||
| Coherence Req for Other Cores | ||||||
| Load | Store | Eviction | Other-GetS | Other-GetM | Other-PutM | |
| I | Issue GetS / S | Issue GetM / M | – | – | – | – |
| S | Load hit | Issue GetM / M | – / I | – | – / I | – |
| M | Load hit | Store hit | Issue PutM, send data to memory / I | Send data to requestor & memory / S | Send data to requestor / I | – |
In each cell of the table above, we list the actions that cache controllers take in response of processor core and system bus events, and the next state of the cache line. For example, if the processor core reads / loads a line in State I, the cache controller has to issue a coherence request “GetS” to the system bus, and update the cache line state from I to S.
Separately, the coherence request “PutM” means a line in State M is evicted / replaced by one of the caches in the system.
The memory controller has to track cache line status as well. We denote memory states from caches’ perspective:
- If no cache in the system has the cache line, then memory state for this line shall be State I
- If one or more caches in the system have the cache line in State S, then the memory state for this line shall be State S
- If any cache in the system has the cache line in State M, then the memory state for this line shall be State M
Memory state transitions for the protocol is shown accordingly:
| State | System Bus Event | ||
| GetS | GetM | PutM | |
| IorS | Send data to requestor / IorS | Send data to requestor / M | – |
| M | Update data in memory / IorS | – | Update data in memory / IorS |
Handling Non-Zero Response Latency
In the real world, there will be some delay from a coherence request to its response. For example, if the processor core reads / loads a line in State I, the cache controller has to issue a coherence request “GetS” to the system bus; instead of updating cache line state from I to S instantly, a transient state is required to denote that the data is still unavailable until data response returns, and the loads from the processor should be stalled. We can use “IS-D” to represent the transient state in this example.
Given the following assumptions:
- All caches implement write-back + write-allocate policy
- Coherence requests are atomic, i.e., a coherence request is ordered in the same cycle as it is issued
- Coherence transactions are atomic, i.e., a subsequent coherence request for the same cache line shall be stalled on the system bus until after the first coherence transaction for that line completes
The cache controller state transitions for snooping based MSI protocol is shown below:
| State | Processor Core Events | System Bus Event | |||||
| Own Transaction | Coherence Req for Other Cores | ||||||
| Load | Store | Eviction | Data Response | Other-GetS | Other-GetM | Other-PutM | |
| I | Issue GetS / IS-D | Issue GetM / IM-D | – | – | – | – | – |
| IS-D | Stall | Stall | Stall | Copy data into cache, load hit / S | – | – | – |
| IM-D | Stall | Stall | Stall | Copy data into cache, store hit / M | – | – | – |
| S | Load hit | Issue GetM / SM-D | – / I | – | – | – / I | – |
| SM-D | Load hit | Stall | Stall | Copy data into cache, store hit / M | – | – | – |
| M | Load hit | Store hit | Issue PutM, send data to memory / I | – | Send data to req & memory / S | Send data to req / I | – |
Note, “IS-D” and “IM-D” states are logically the same as State S and M, since the “GetS” and “GetM” requests from the cache controller have already been ordered. However, since data response has not yet arrived, loads, stores and evictions from processor cores are stalled in these states.
Similarly, the “SM-D” state is logically the same as State M, since the “GetM” request from the cache controller has already been ordered. While data response is still pending, loads can proceed, but stores and evictions are stalled.
Memory state transitions for the protocol is shown accordingly:
| State | System Bus Event | |||
| GetS | GetM | PutM | Data from Owner | |
| IorS | Send data as Data Response to req / IorS | Send data as Data Response to req / M | – | – |
| IorS-D | – | – | – | Update data in memory / IorS |
| M | – / IorS-D | – | – / IorS-D | – |
Note, memory side also requires a transient state “IorS-D”, to denote that the data response from owner to memory is pending, in the cases of modified data eviction (“PutM” received in State M) and sharing modified data across caches (“GetS” received in State M).
Reference

Leave a comment