# Lecture 19: Cache Coherency

#### Vivek Kumar Computer Science and Engineering IIIT Delhi vivekk@iiitd.ac.in



CSE513: Parallel Runtimes for Modern Processors

# **Today's Class**

- Cache coherency
- MSI protocol
- MESI protocol



#### **Memory Hierarchy**



#### How Bad is Latency as we go Down?



- Another analogy
  - Normalizing with L1 latency, and assuming one seconds is equal to 4 cycles
    - L1 = one second
    - L2 = 7.5 seconds
    - Main memory = 2.5 minutes
    - Hard drive = in several days!

Animation source: https://overbyte.com.au/misc/Lesson3/CacheFun.html



CSE513: Parallel Runtimes for Modern Processors

#### Which is Better & Why?



CSE513: Parallel Runtimes for Modern Processors

### **General Cache Concepts**



- Cache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device
  - Temporal and spatial locality

Credits: Bryant and O'Hallaron, Lecuture 9, CMU 15-213/18-243

CSE513: Parallel Runtimes for Modern Processors

#### **General Cache Concepts**



CSE513: Parallel Runtimes for Modern Processors

7

Data is already in the cache in some cache

bytes on x86

Cache line is the

processors

Cache line size is 64

smallest granularity of load/store of any memory block from DRAM

line

#### **General Cache Concepts**



Data in line b is needed

*Line b is not in cache: Miss!* 

Line b is fetched from memory

*Line b is stored in cache* By evicting some old line

Credits: Bryant and O'Hallaron, Lecuture 9, CMU 15-213/18-243

CSE513: Parallel Runtimes for Modern Processors

Lecture 19: Cache Coherency

#### **The Cache Coherence Problem**



CSE513: Parallel Runtimes for Modern Processors

#### Defining Cache Coherence Differently (As we have studied Memory Consistency)

- **Program order** must be maintained at a single processor
  - A read by processor P to address X that follows a write by P to address X, should return the value of the write by P
    - Assuming no other processor wrote to X in between
- Write propagation to other processors
  - A read by processor P1 to address X that follows a write by processor P2 to X returns the written value... if the read and write are "sufficiently separated" in time (store buffers!)
    - Assuming no other writes to X occurs in between

#### • Write serialization

- Writes to the same address are serialized: two writes to address X by any two processors are observed in the same order by all processors
  - E.g., if values 1 and then 2 are written to address X, no processor observes X having value 2 before value 1

Credits: Fatahalian and Bryant, CMU 15-418/618



CSE513: Parallel Runtimes for Modern Processors

#### **Coherence using Shared Cache Only**



While it is easy to implement, it would be very costly and inefficient o Why?

# **Coherence using Private Caches**



- Snooping based coherency protocol
  - Each core's cache controller broadcast any memory operations it wishes to perform before actually carrying out that operation
  - Rest of the core's cache controllers having that memory operations acts like a good citizen and help others to carry out their intended operation

# **Private v/s Shared Cache Coherency**

- Coherency using shared cache
  - Cache just have to look up to the processor and do the load/store instructions issued by the processor
- Coherency using private caches
  - Each cache has its own core to which it look after, but it also pays attention to what is going on in other caches or what is going over the interconnect
    - They are snooping!



Lecture 19: Cache Coherency

#### **Snooping with Write-through Caches**



CSE513: Parallel Runtimes for Modern Processors

# **Snooping using Write-back Caches (1/5)**



- MSI write-back invalidation protocol
  - o <u>I</u>nvalid
    - Line not available on cache
  - o <u>S</u>hared
    - Line in read only mode
  - o <u>M</u>odified

Line in modified or dirty state

# **Snooping using Write-back Caches (2/5)**



- MSI write-back invalidation protocol
  - o <u>I</u>nvalid
    - Line not available on cache
  - o <u>S</u>hared
    - Line in read only mode
  - o <u>M</u>odified

Line in modified or dirty state

# **Snooping using Write-back Caches (3/5)**



- <u>MSI</u> write-back invalidation protocol
  - o <u>I</u>nvalid
    - Line not available on cache
  - o <u>S</u>hared
    - Line in read only mode
  - o <u>M</u>odified
    - Line in modified or dirty state

# **Snooping using Write-back Caches (4/5)**



- MSI write-back invalidation protocol
  - o <u>I</u>nvalid
    - Line not available on cache
  - o <u>S</u>hared
    - Line in read only mode
  - o <u>M</u>odified

Line in modified or dirty state

# **Snooping using Write-back Caches (5/5)**



- MSI write-back invalidation protocol
  - o <u>I</u>nvalid
    - Line not available on cache
  - o <u>S</u>hared
    - Line in read only mode
  - o <u>M</u>odified

Line in modified or dirty state

### **MSI Protocol Summary**



- A line in the **M** state can be modified without notifying other caches
- Processor can only write to lines in the M state
  - If line is not already exclusive in cache, cache controller must first broadcast a **read-exclusive** transaction to move the line into that state
    - Required even if the line is in Shared state
- When other processor's cache controller snoops a read exclusive for a line it contains
  - Must invalidate the line in its cache
  - Because if it didn't, then multiple caches will have the same line

Credits: Fatahalian and Bryant, CMU 15-418/618



CSE513: Parallel Runtimes for Modern Processors

### What is Wrong with MSI?



- Core-1 reads a data, and then wishes to modify it
  - The line is only in Core-1's, but it's cache controller still has to perform BusRdX operation for moving the line from "S" state to "M" state
    - Redundant traffic over the interconnect

# **MESI Protocol (1/3)**



- An additional state E
  - Exclusive clean
  - Implies no other cache has a copy of this line

# **MESI Protocol (2/3)**



Moving from E to M state

 $\cap$ 

- No action required to be performed on interconnect
- Present E state implies the line is not in any other cache

### **MESI Protocol (3/3)**



 C-1 cannot have it in E state, as C-2 also wants to

- also wants to own it for read purpose
- C-2 drops it from E to S state

C-1 wants to

has (E state)

read a line (B) which C-2 also

• C-1 has the line in S state

Lecture 19: Cache Coherency



Credits: Fatahalian and Bryant, CMU 15-418/618



#### **How Many Cache Misses Below?**



26

# **Next Lecture**

• False sharing

