In the previous post, we discussed about handling non-atomic requests in directory based MSI protocol by stalls. In cache controller transient states such as “IS-D”, “IM-A” and “SM-A”, we could allow forwarded request messages to make progress without stalling, at the expense of adding more transient states.
For example, when a cache controller has a line in State “IS-D” and receives an Inv message, it processes the request and changes the line’ state to “IS-D-I”, indicating the cache controller should change the line state to I after the “GetS” transaction completes. By not stalling the Inv message, the cache controller can improve its performance by continuing to process other forwarded request messages behind that Inv message in the queue.
Given the following assumptions:
- All caches implement write-back + write-allocate policy
- Separate networks for each message type for deadlock prevention
- The interconnection network / fabric enforces point-to-point ordering for forwarded request messages, i.e., if a directory sends two forwarded request messages to a cache controller, the messages arrive at that cache controller in order
- A complete directory accurately tracks the status and sharers of each cache line
The cache controller state transitions for directory based MSI protocol is shown below:
| State | Processor Core Events | Forwarded Request Messages | Response Messages | |||||||
| Load | Store | Eviction | Fwd-GetS | Fwd-GetM | Inv | Put-Ack | Data from Dir | Data from Owner | Inv-Ack | |
| I | Issue GetS / IS-D | Issue GetM / IM-AD | – | – | – | – | – | – | – | – |
| IS-D | Stall | Stall | Stall | – | – | Send Inv-Ack to Req / IS-D-I | – | Data[ack=0] / S | Data[ack=0] / S | – |
| IS-D-I | Stall | Stall | Stall | – | – | – | – | Data[ack=0] / I | Data[ack=0] / I | |
| IM-AD | Stall | Stall | Stall | Stall | Stall | – | – | Data[ack=0] / M Data[ack>0] / IM-A | Data[ack=0] / M | ack– |
| IM-A | Stall | Stall | Stall | – / IM-A-S | – / IM-A-I | – | – | – | – | if (last Inv-Ack) – / M else ack– |
| IM-A-S | Stall | Stall | Stall | – | – | Send Inv-Ack to Req / IM-A-SI | – | – | – | if (last Inv-Ack) Send data to Req & Dir / S else ack– |
| IM-A-SI | Stall | Stall | Stall | – | – | – | – | – | – | if (last Inv-Ack) Send data to Req & Dir / I else ack– |
| IM-A-I | Stall | Stall | Stall | – | – | – | – | – | – | if (last Inv-Ack) Send data to Req / I else ack– |
| S | Load hit | Issue GetM / SM-AD | Issue PutS / SI-A | – | – | Send Inv-Ack to Req / I | – | – | – | – |
| SM-AD | Hit | Stall | Stall | Stall | Stall | Send Inv-Ack to Req / IM-AD | – | Data[ack=0] / M Data[ack>0] / SM-A | – | ack– |
| SM-A | Hit | Stall | Stall | – / SM-A-S | – / SM-A-I | – | – | – | – | if (last Inv-Ack) – / M else ack– |
| SM-A-S | Stall | Stall | Stall | – | – | Send Inv-Ack to Req / SM-A-SI | – | – | – | if (last Inv-Ack) Send data to Req & Dir / I else ack– |
| SM-A-SI | Stall | Stall | Stall | – | – | – | – | – | – | if (last Inv-Ack) Send data to Req & Dir / I else ack– |
| SM-A-I | Stall | Stall | Stall | – | – | – | – | – | – | if (last Inv-Ack) Send data to Req / I else ack– |
| M | Load hit | Store hit | Issue PutM, send data to Dir / MI-A | Send Data[ack=0] to Req & Dir / S | Send Data[ack=0] to Req / I | – | – | – | – | – |
| MI-A | Stall | Stall | Stall | Send Data[ack=0] to Req & Dir / SI-A | Send Data[ack=0] to Req / II-A | – | – / I | – | – | – |
| SI-A | Stall | Stall | Stall | – | – | Send Inv-Ack to Req / II-A | – / I | – | – | – |
| II-A | Stall | Stall | Stall | – | – | – | – / I | – | – | – |
In the above table, for states like “IM-A-*” and “SM-A-*”, they enable forward progress when the cache controller is still gathering “Inv-Ack” messages from other caches after issuing “GetM” requests to the directory.
Note, for states like “IM-A-S*” and “SM-A-S*” (transitions caused by “Fwd-GetS” requests), the cache controller should send data response to both the requesting cache and the directory, after receiving all expected “Inv-Ack” messages; for states like “IM-A-I” and “SM-A-I” (transitions caused by “Fwd-GetM” requests), the cache controller should send data response to only the requesting cache, not the directory.
By denoting directory states from caches’ perspective, the directory state transitions for the protocol is shown accordingly:
| State | Request Messages | Response Messages | ||||
| GetS | GetM | PutS | PutM + Data from Owner | PutM + Data from Non-Owner | Data | |
| I | Send data to Req, add Req to sharer / S | Send data to Req, set Owner as Req / M | Send Put-Ack to Req | – | Send Put-Ack to Req | – |
| S | Send data to Req, add Req to sharer / S | Send data to req, send Inv to sharers, clear sharers, set Owner to Req / M | Remove Req from sharers, send Put-Ack to Req / S (not the last PutS) or I (the last PutS) | – | Remove Req from sharers, send Put-Ack to Req | – |
| M | Send Fwd-GetS to Owner, add Req and Owner to sharer, clear Owner / S-D | Send Fwd-GetM to Owner, set Owner to Req | Send Put-Ack to Req | Copy data to memory, clear Owner, send Put-Ack to Req / I | Send Put-Ack to Req | – |
| S-D | Stall | Stall | Remove Req from sharers, send Put-Ack | – | Remove Req from sharers, send Put-Ack | Copy data to memory / S |
Note, we still require stalls in transient state “S-D”, otherwise we would need to add an impractically large number of states to avoid stalling in all possible cases.
Reference

Leave a comment