Computer ArchitectureMulti-coreCache

Understanding Cache Coherence Protocols: MOESI vs. MESI

January 15, 2025·8 min read

A deep dive into how modern multi-core processors keep their caches consistent, exploring the tradeoffs between MOESI and MESI protocols and when adaptive approaches shine.

The Problem: Shared Memory in Multi-Core Systems

When multiple processor cores share the same memory, each core typically has its own private cache for performance. But what happens when Core 0 writes to a memory address that Core 1 has cached? Without a coherence protocol, Core 1 would read stale data — a correctness nightmare.

Cache coherence protocols solve this by defining a set of states each cache line can be in, and rules for state transitions when reads, writes, and bus messages occur.

MESI: The Classic Four-State Protocol

MESI defines four states:

Modified (M): The line is dirty and exclusive — only this cache has it, and it differs from main memory.
Exclusive (E): The line is clean and exclusive — only this cache has it, matching main memory.
Shared (S): The line is clean and potentially held by other caches too.
Invalid (I): The line is not valid in this cache.

The key insight is the Exclusive state: if a core has exclusive access to a clean line, it can transition to Modified on a write *without any bus traffic*. This silent upgrade is a huge performance win for data that's only accessed by one core.

MOESI: Adding the Owned State

MOESI extends MESI with a fifth state:

Owned (O): The line is potentially shared with other caches, but *this* cache is responsible for supplying the data on requests (not main memory).

The Owned state enables dirty sharing — when a core with a Modified line receives a read request from another core, instead of writing the line back to memory and transitioning to Shared, it can transition to Owned while the requesting core gets a Shared copy. This avoids an expensive memory write-back.

When Does Adaptive Make Sense?

In my implementation of an adaptive MOESI variant, I found that the optimal strategy depends heavily on the workload:

Producer-consumer patterns: MOESI wins because dirty sharing avoids write-backs on the critical path.
Migratory data: Both protocols perform similarly, as the data keeps moving exclusively between cores.
Widely-shared read-mostly data: The overhead of tracking the Owned state can hurt — simpler MESI-like behavior is better.

The adaptive protocol monitors sharing patterns and switches between invalidation-based and update-based coherence actions at runtime. On a 16-core SMP simulation, this reduced bus traffic by up to 18% on mixed workloads compared to static MOESI.

Key Takeaways

Coherence protocols are fundamentally about trading off bus traffic, memory bandwidth, and cache utilization.
The Owned state in MOESI is most valuable when dirty sharing is common.
Adaptive approaches can outperform static protocols, but add complexity to the controller state machine.
When evaluating protocols, sweep across cache sizes, associativity, and block sizes — the optimal choice changes with the memory hierarchy configuration.

Designing a CNN Accelerator in Verilog: Lessons from RTL

Back to all posts