A coherent DMA operation that reads memory should get the most recent version of data, even if the data resides in a cache in state M or O. Similarly, a coherent DMA operation that writes memory must invalidate stale copies in all caches.
Though it is straightforward to handle coherent DMA by adding a coherent cache to the DMA controller, it it not desirable for a couple of reasons:
- DMA operations have quite different locality patterns than CPU cores, and they stream through memory with very little temporal reuse
- When DMA writes data, it generally writes the entire cache line. Thus getting state M (with data response) before DMA writes is wasteful, since the entire data will be overwritten
Possible optimizations include:
- Adding support for getting state M without data response
- Make DMA work without hardware cache coherence support, by requiring the OS to selectively flush caches. However, explicit OS control is typically implemented in page granularity, instead of cache line granularity, making this approach inefficient. This approach is typically seen only in embedded systems, since OS must conservatively flush a page even if none of its cache lines are in any cache
Reference

Leave a comment