In the previous post, we discussed non-zero delay from coherence requests to responses. However, coherence requests may also be non-atomic: a coherence request may not be instantly ordered when it is issued by a cache controller. For example, if there is a request queue between a cache controller and the system bus, coherence request atomicity is no longer guaranteed, and this is a fairly common implementation.
Considering non-atomic coherence requests adds even more intermediate states into the protocol. For instance, a cache controller intends to change a line from State I to State S, and issues a “GetS” request to the system bus. Until the cache controller recognizes its “GetS” is ordered on the system bus, the cache line state is denoted as “IS-AD”, which is logically the same as State I. Once the cache controller sees its own “GetS” request, the cache line state transitions to “IS-D” (logically the same as State S) before data response is returned.
Given the following assumptions:
- All caches implement write-back + write-allocate policy
- Coherence transactions are atomic, i.e., a subsequent coherence request for the same cache line shall be stalled on the system bus until after the first coherence transaction for that line completes
The cache controller state transitions for snooping based MSI protocol is shown below:
| State | Processor Core Events | System Bus Event | ||||||||
| Own Transaction | Coherence Req for Other Cores | |||||||||
| Load | Store | Eviction | Own-GetS | Own-GetM | Own-PutM | Data Rsp | Other-GetS | Other-GetM | Other-PutM | |
| I | Issue GetS / IS-AD | Issue GetM / IM-AD | – | – | – | – | – | – | – | – |
| IS-AD | Stall | Stall | Stall | – / IS-D | – | – | – | – | – | – |
| IS-D | Stall | Stall | Stall | – | – | – | Copy data into cache, load hit / S | – | – | – |
| IM-AD | Stall | Stall | Stall | – | – / IM-D | – | – | – | – | – |
| IM-D | Stall | Stall | Stall | – | – | – | Copy data into cache, store hit / M | – | – | – |
| S | Hit | Issue GetM / SM-AD | – / I | – | – | – | – | – | – / I | – |
| SM-AD | Hit | Stall | Stall | – | – / SM-D | – | – | – | – / IM-AD | – |
| SM-D | Hit | Stall | Stall | – | – | – | Copy data into cache, store hit / M | – | – | – |
| M | Hit | Hit | Issue PutM / MI-A | – | – | – | – | Send data to req & memory / S | Send data to req / I | – |
| MI-A | Hit | Hit | Stall | – | – | Send data to memory / I | – | Send data to req & memory / II-A | Send data to req / II-A | – |
| II-A | Stall | Stall | Stall | – | – | Send NoData to memory / I | – | – | – | – |
Note, if a core stores to a line in State S, the cache controller issues a “GetM” request and transitions to “SM-AD” state, before the “GetM” request is ordered and recognized by the cache controller. State “SM-AD” is logically the same as State S, thus loads can still proceed and the cache controller ignores “Other-GetS” requests. However, if an “Other-GetM” is ordered and recognized first, the cache controller must transition the state to “IM-AD” to prevent further load hits.
The State M to State I downgrade requires special care as well. When a cache line in State M gets evicted, the cache controller issues a “PutM” request and changes the cache line state to “MI-A”. If another core sends “GetS” or “GetM” for that line while it is still in State “MI-A”, the cache controller must respond as if it is still in State M, and transition to State “II-A” to wait for its own “PutM” to be ordered. Once the cache controller recognizes its own “PutM”, it cannot simply transition to State I, otherwise it will leave the memory stuck in a transient state, as the memory has already seen the “PutM” request. The cache controller cannot send the data to memory either, since the data may have already been modified in another core. The solution is to send a special “NoData” message to the memory, signaling such a message is from a non-owner and letting the memory exit the intermediate state.
Memory state transitions for the protocol is shown accordingly:
| State | System Bus Event | ||||
| GetS | GetM | PutM | Data from Owner | NoData | |
| IorS | Send data as Data Rsp to req / IorS | Send data as Data Rsp to req / M | – / IorS-D | – | – |
| IorS-D | – | – | – | Update data in memory / IorS | – / IorS |
| M | – / IorS-D | – | – / M-D | – | – |
| M-D | – | – | – | Update data in memory / IorS | – / M |
To elaborate all scenarios for State M to State I downgrade, when the cache line is in State “MI-A”:
- If another core sends “GetS” before the “Own-PutM” request is ordered, the cache controller transitions the cache line state to “II-A”, and serves the “GetS” by sending data response to the requesting core and the memory. Once the “Own-PutM” request is ordered, the memory state must be in State “IorS” (data response has updated the memory), and later transitions to State “IorS-D” (in response to the “PutM” request). After receiving “NoData” message, the memory state transitions back to “IorS”
- If another core sends “GetM” before the “Own-PutM” request is ordered, the cache controller transitions the cache line State to “II-A” and serves the “GetM” by sending data response to the requesting core. Once the “Own-PutM” request is ordered, the memory state must be in State M (the owner is another core), and later transitions to State “M-D” (in response of the “PutM” request), in which the memory expects a “NoData” message from the cache controller
- If no other cores send “GetS” or “GetM” before the “Own-PutM” request is ordered, the memory state transitions to “M-D” upon recognizing the “PutM” request; after the data response arrives at the memory, the memory state transitions to “IorS”
Obviously, we need to differentiate the memory state “IorS-D” and “M-D”, as they handle “NoData” messages differently.
Reference

Leave a comment