Handling non-atomic operations in snooping based MSI protocol (I)

In cache coherence protocol implementation, designers must properly handle non-atomic operations, since coherence transactions cannot complete instantly. We start from the well-known snooping based MSI protocol base model (without considering atomicity), and discuss atomicity handling in real world.

Base Model

Given the following assumptions:

All caches implement write-back + write-allocate policy
Coherence requests are atomic, i.e., a coherence request is ordered in the same cycle as it is issued by a cache controller
Zero latency from a coherence request to its response

The cache controller state transitions for snooping based MSI protocol is shown below:

State	Processor Core Events			System Bus Event
	Processor Core Events			Coherence Req for Other Cores
	Load	Store	Eviction	Other-GetS	Other-GetM	Other-PutM
I	Issue GetS / S	Issue GetM / M	–	–	–	–
S	Load hit	Issue GetM / M	– / I	–	– / I	–
M	Load hit	Store hit	Issue PutM, send data to memory / I	Send data to requestor & memory / S	Send data to requestor / I	–

In each cell of the table above, we list the actions that cache controllers take in response of processor core and system bus events, and the next state of the cache line. For example, if the processor core reads / loads a line in State I, the cache controller has to issue a coherence request “GetS” to the system bus, and update the cache line state from I to S.

Separately, the coherence request “PutM” means a line in State M is evicted / replaced by one of the caches in the system.

The memory controller has to track cache line status as well. We denote memory states from caches’ perspective:

If no cache in the system has the cache line, then memory state for this line shall be State I
If one or more caches in the system have the cache line in State S, then the memory state for this line shall be State S
If any cache in the system has the cache line in State M, then the memory state for this line shall be State M

Memory state transitions for the protocol is shown accordingly:

State	System Bus Event
State	GetS	GetM	PutM
IorS	Send data to requestor / IorS	Send data to requestor / M	–
M	Update data in memory / IorS	–	Update data in memory / IorS

Handling Non-Zero Response Latency

In the real world, there will be some delay from a coherence request to its response. For example, if the processor core reads / loads a line in State I, the cache controller has to issue a coherence request “GetS” to the system bus; instead of updating cache line state from I to S instantly, a transient state is required to denote that the data is still unavailable until data response returns, and the loads from the processor should be stalled. We can use “IS-D” to represent the transient state in this example.

Given the following assumptions:

All caches implement write-back + write-allocate policy
Coherence requests are atomic, i.e., a coherence request is ordered in the same cycle as it is issued
Coherence transactions are atomic, i.e., a subsequent coherence request for the same cache line shall be stalled on the system bus until after the first coherence transaction for that line completes

The cache controller state transitions for snooping based MSI protocol is shown below:

State	Processor Core Events			System Bus Event
	Processor Core Events			Own Transaction	Coherence Req for Other Cores
	Load	Store	Eviction	Data Response	Other-GetS	Other-GetM	Other-PutM
I	Issue GetS / IS-D	Issue GetM / IM-D	–	–	–	–	–
IS-D	Stall	Stall	Stall	Copy data into cache, load hit / S	–	–	–
IM-D	Stall	Stall	Stall	Copy data into cache, store hit / M	–	–	–
S	Load hit	Issue GetM / SM-D	– / I	–	–	– / I	–
SM-D	Load hit	Stall	Stall	Copy data into cache, store hit / M	–	–	–
M	Load hit	Store hit	Issue PutM, send data to memory / I	–	Send data to req & memory / S	Send data to req / I	–

Note, “IS-D” and “IM-D” states are logically the same as State S and M, since the “GetS” and “GetM” requests from the cache controller have already been ordered. However, since data response has not yet arrived, loads, stores and evictions from processor cores are stalled in these states.

Similarly, the “SM-D” state is logically the same as State M, since the “GetM” request from the cache controller has already been ordered. While data response is still pending, loads can proceed, but stores and evictions are stalled.

Memory state transitions for the protocol is shown accordingly:

State	System Bus Event
State	GetS	GetM	PutM	Data from Owner
IorS	Send data as Data Response to req / IorS	Send data as Data Response to req / M	–	–
IorS-D	–	–	–	Update data in memory / IorS
M	– / IorS-D	–	– / IorS-D	–

Note, memory side also requires a transient state “IorS-D”, to denote that the data response from owner to memory is pending, in the cases of modified data eviction (“PutM” received in State M) and sharing modified data across caches (“GetS” received in State M).

Reference

A Primer on Memory Consistency and Cache Coherence (Second Edition), by Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, David A. Wood

Chipress

Handling non-atomic operations in snooping based MSI protocol (I)

Subscribe

Leave a comment Cancel reply

Handling non-atomic operations in snooping based MSI protocol (I)

Spread the Words:

Subscribe

Leave a comment Cancel reply