Handling non-atomic operations in snooping based MSI protocol (I)

In cache coherence protocol implementation, designers must properly handle non-atomic operations, since coherence transactions cannot complete instantly. We start from the well-known snooping based MSI protocol base model (without considering atomicity), and discuss atomicity handling in real world.

Base Model

Given the following assumptions:

  1. All caches implement write-back + write-allocate policy
  2. Coherence requests are atomic, i.e., a coherence request is ordered in the same cycle as it is issued by a cache controller
  3. Zero latency from a coherence request to its response

The cache controller state transitions for snooping based MSI protocol is shown below:

StateProcessor Core EventsSystem Bus Event
Coherence Req for Other Cores
LoadStoreEvictionOther-GetSOther-GetMOther-PutM
IIssue GetS / SIssue GetM / M
SLoad hitIssue GetM / M– / I– / I
MLoad hitStore hitIssue PutM, send data to memory / ISend data to requestor & memory / SSend data to requestor / I

In each cell of the table above, we list the actions that cache controllers take in response of processor core and system bus events, and the next state of the cache line. For example, if the processor core reads / loads a line in State I, the cache controller has to issue a coherence request “GetS” to the system bus, and update the cache line state from I to S.

Separately, the coherence request “PutM” means a line in State M is evicted / replaced by one of the caches in the system.

The memory controller has to track cache line status as well. We denote memory states from caches’ perspective:

  1. If no cache in the system has the cache line, then memory state for this line shall be State I
  2. If one or more caches in the system have the cache line in State S, then the memory state for this line shall be State S
  3. If any cache in the system has the cache line in State M, then the memory state for this line shall be State M

Memory state transitions for the protocol is shown accordingly:

StateSystem Bus Event
GetSGetMPutM
IorSSend data to requestor / IorSSend data to requestor / M
MUpdate data in memory / IorSUpdate data in memory / IorS

Handling Non-Zero Response Latency

In the real world, there will be some delay from a coherence request to its response. For example, if the processor core reads / loads a line in State I, the cache controller has to issue a coherence request “GetS” to the system bus; instead of updating cache line state from I to S instantly, a transient state is required to denote that the data is still unavailable until data response returns, and the loads from the processor should be stalled. We can use “IS-D” to represent the transient state in this example.

Given the following assumptions:

  1. All caches implement write-back + write-allocate policy
  2. Coherence requests are atomic, i.e., a coherence request is ordered in the same cycle as it is issued
  3. Coherence transactions are atomic, i.e., a subsequent coherence request for the same cache line shall be stalled on the system bus until after the first coherence transaction for that line completes

The cache controller state transitions for snooping based MSI protocol is shown below:

StateProcessor Core EventsSystem Bus Event
Own TransactionCoherence Req for Other Cores
LoadStoreEvictionData ResponseOther-GetSOther-GetMOther-PutM
IIssue GetS / IS-DIssue GetM / IM-D
IS-DStallStallStallCopy data into cache, load hit / S
IM-DStallStallStallCopy data into cache, store hit / M
SLoad hitIssue GetM / SM-D– / I– / I
SM-DLoad hitStallStallCopy data into cache, store hit / M
MLoad hitStore hitIssue PutM, send data to memory / ISend data to req & memory / SSend data to req / I

Note, “IS-D” and “IM-D” states are logically the same as State S and M, since the “GetS” and “GetM” requests from the cache controller have already been ordered. However, since data response has not yet arrived, loads, stores and evictions from processor cores are stalled in these states.

Similarly, the “SM-D” state is logically the same as State M, since the “GetM” request from the cache controller has already been ordered. While data response is still pending, loads can proceed, but stores and evictions are stalled.

Memory state transitions for the protocol is shown accordingly:

StateSystem Bus Event
GetSGetMPutMData from Owner
IorSSend data as Data Response to req / IorSSend data as Data Response to req / M
IorS-DUpdate data in memory / IorS
M– / IorS-D– / IorS-D

Note, memory side also requires a transient state “IorS-D”, to denote that the data response from owner to memory is pending, in the cases of modified data eviction (“PutM” received in State M) and sharing modified data across caches (“GetS” received in State M).

Reference

A Primer on Memory Consistency and Cache Coherence (Second Edition), by Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, David A. Wood

Subscribe

Enter your email to get updates from us. You might need to check the spam folder for the confirmation email.

Leave a comment