Understanding CAS: Compare and Swap for Concurrent Operations

In the realm of concurrent programming, managing shared resources efficiently and safely is paramount. One fundamental technique that underpins many lock-free algorithms is CAS, or Compare and Swap. Essentially, Cas Compare And Swap is an atomic operation that checks if a variable’s value matches a given expected value. If they match, it swaps the variable’s value with a new provided value. While the concept of CAS compare and swap might initially seem intricate, it’s surprisingly straightforward once you grasp its mechanics. Let’s delve deeper into understanding CAS compare and swap and its significance in concurrent systems.

You might encounter the term CAS frequently when exploring articles or videos related to concurrency. It’s important to recognize that CAS is the common abbreviation for compare and swap operations.

Video Explanation: Compare and Swap in Detail

For those who prefer visual learning, a video tutorial can be highly beneficial in understanding CAS compare and swap. Here’s a video that explains the concept in detail: Compare and Swap Tutorial Video

The “Check Then Act” Pattern and Concurrency Issues

A prevalent pattern in concurrent algorithm design is the “check then act” sequence. This pattern involves a program first inspecting the state of a variable and then performing an action based on that observed state. Consider this simplified example of a problematic lock implementation to illustrate the concept:

 public class ProblematicLock {
     private volatile boolean locked = false;

     public void lock() {
         while(this.locked) {
             // busy wait - until this.locked == false
         }
         this.locked = true;
     }

     public void unlock() {
         this.locked = false;
     }
 }

This ProblematicLock class, intentionally flawed for demonstration purposes, highlights the “check then act” pattern. The lock() method initially checks if the locked variable is false within the while loop. If it is indeed false, the loop terminates, and the method proceeds to act by setting locked to true. This sequence of checking and then acting is central to the “check then act” pattern.

However, in a multithreaded environment where multiple threads might access the same ProblematicLock instance, this lock() method is susceptible to race conditions. Imagine this scenario:

Thread A checks the value of locked and finds it to be false (the expected value). Simultaneously, before Thread A can set locked to true, Thread B also checks locked and also observes false. Both threads, acting on their independent checks, proceed to set locked to true. Consequently, both threads believe they have acquired the lock, violating the fundamental principle of mutual exclusion in locking mechanisms. This situation is a classic example of a race condition, arising from the non-atomic nature of the “check then act” pattern in this context.

Atomicity: Ensuring Thread Safety in “Check Then Act”

To guarantee correctness in multithreaded applications and prevent race conditions, “check then act” operations must be atomic. Atomicity, in this context, signifies that the entire “check” and “act” sequence executes as a single, indivisible unit. Once a thread begins executing an atomic block of code, no other thread can interrupt or interfere until the block’s execution is complete. Essentially, atomic operations provide mutual exclusion at the code block level.

In Java, a straightforward way to achieve atomicity for a code block is by using the synchronized keyword. For a comprehensive understanding of synchronized, refer to the Java synchronized tutorial. Let’s revisit the ProblematicLock example and enhance its lock() method with the synchronized keyword:

 public class ProblematicLock {
     private volatile boolean locked = false;

     public <b>synchronized</b> void lock() {
         while(this.locked) {
             // busy wait - until this.locked == false
         }
         this.locked = true;
     }

     public void unlock() {
         this.locked = false;
     }
 }

By declaring the lock() method as synchronized, we ensure that only one thread can execute it at any given time on the same MyLock instance. The synchronized keyword effectively transforms the lock() method into an atomic operation, resolving the race condition issue.

The Performance Implications of Blocking Threads

While synchronized blocks provide atomicity, they introduce the concept of thread blocking. When multiple threads attempt to enter a synchronized block concurrently, only one thread gains access, while the others are blocked, waiting for their turn. Once the currently executing thread exits the synchronized block, one of the waiting threads is then allowed to proceed.

The act of entering a synchronized block itself isn’t inherently expensive if the thread is granted immediate access. However, the blocking of threads when contention occurs is where performance costs arise. Blocked threads become inactive, and the operating system or execution platform manages their suspension and subsequent resumption.

Furthermore, there’s no precise guarantee regarding when a blocked thread will be unblocked once the synchronized block becomes available. The unblocking process is typically managed by the OS scheduler, and although the delay is usually not significant (not on the scale of seconds or minutes), some amount of time is inevitably spent in thread blocking and context switching. During this waiting period, the blocked thread remains idle, potentially wasting CPU cycles that could have been utilized for productive work. This blocking scenario is visually represented below:

Hardware-Level Atomic Compare and Swap Operations: A Non-Blocking Alternative

Modern CPUs offer built-in support for atomic compare and swap operations at the hardware level. These CAS compare and swap operations provide an alternative to synchronized blocks and other blocking synchronization mechanisms in certain situations. The CPU itself guarantees the atomicity of CAS compare and swap operations, ensuring that only one thread can execute a CAS compare and swap operation at any given moment, even across multiple CPU cores. We will explore code examples demonstrating this shortly.

Utilizing hardware-provided CAS compare and swap functionality instead of OS or platform-level synchronization primitives (like locks or mutexes) eliminates the need for the OS to manage thread blocking and unblocking. This significantly reduces the latency associated with waiting to access shared resources, leading to decreased congestion and improved overall throughput in concurrent applications. This non-blocking approach is illustrated in the diagram below:

As depicted, threads attempting to access a shared data structure using CAS compare and swap are not fully blocked. Instead, they continuously attempt the CAS compare and swap operation until it succeeds, granting them access to the shared resource. This “spin waiting” approach minimizes the delay before a thread can access the shared data structure.

However, it’s important to acknowledge that prolonged spin waiting, where a thread repeatedly executes CAS compare and swap in a loop, can consume CPU cycles that could otherwise be used for other tasks. The effectiveness of CAS compare and swap in minimizing delays depends on the duration for which shared data structures are held by other threads. In practice, shared resources are often held for relatively short periods, making prolonged spin waiting less frequent. Nonetheless, factors like contention levels, the complexity of the shared data structure, and system load can influence the overall performance. In contrast, blocked threads, while seemingly less CPU-intensive during their blocked state, incur the overhead of context switching and scheduling when they are unblocked.

Compare and Swap in Java: Leveraging Atomic Classes

Since Java 5, developers have had access to hardware-level CAS compare and swap operations through the atomic classes provided in the java.util.concurrent.atomic package. Key classes in this package include:

AtomicBoolean
AtomicInteger
AtomicLong
AtomicReference

The primary advantage of using these Java 5+ atomic classes for CAS compare and swap operations is that they directly leverage the underlying hardware CAS compare and swap instructions of the CPU on which the Java application is running. This hardware acceleration significantly enhances the performance of CAS compare and swap based concurrent code compared to software-based implementations.

CAS as a Guard: Implementing Lock-Free Synchronization

CAS compare and swap functionality can be effectively employed as a guard mechanism to protect critical sections, preventing simultaneous execution by multiple threads. This allows for the creation of lock-free synchronization primitives.

Consider this example of reimplementing the lock() method from the earlier ProblematicLock class, this time utilizing AtomicBoolean and CAS compare and swap to create a lock that functions as a guard:

 public class CompareAndSwapLock {
     private AtomicBoolean locked = new AtomicBoolean(false);

     public void unlock() {
         this.locked.set(false);
     }

     public void lock() {
         while(!this.locked.compareAndSet(false, true)) {
             // busy wait - until compareAndSet() succeeds
         }
     }
 }

Notice that the locked variable is now an AtomicBoolean instead of a simple boolean. The AtomicBoolean class provides the crucial compareAndSet() method. This method atomically compares the current value of the AtomicBoolean instance to an expected value (false in this case). If they match, it atomically sets the AtomicBoolean to a new value (true). The compareAndSet() method returns true if the swap was successful (meaning the value was indeed false and was changed to true), and false otherwise.

In the CompareAndSwapLock example, the compareAndSet(false, true) call attempts to atomically transition the locked AtomicBoolean from false to true. Because CAS compare and swap operations are atomic at the hardware level, only one thread can successfully execute compareAndSet() at a time when the value is initially false. Consequently, only one thread will be able to observe the AtomicBoolean having the value false and successfully swap it to true, thereby exiting the while loop and acquiring the “lock”. Subsequent attempts by other threads to acquire the lock (by calling lock()) will result in compareAndSet() returning false (as locked will now be true), causing them to remain in the while loop, spin waiting until the lock is released via unlock() which sets locked back to false.

CAS as an Optimistic Locking Mechanism: Enabling Concurrent Operations with Conflict Resolution

CAS compare and swap can also be leveraged to implement optimistic locking strategies. Optimistic locking allows multiple threads to proceed into a critical section concurrently, but only one thread is permitted to commit its changes successfully at the end of the critical section if no conflicts arise.

Here’s an example of a concurrent counter class that employs an optimistic locking approach using CAS compare and swap:

 public class OptimisticLockCounter{
     private AtomicLong count = new AtomicLong();

     public void inc() {
         boolean incSuccessful = false;
         while(!incSuccessful) {
             long value = this.count.get();
             long newValue = value + 1;
             incSuccessful = this.count.compareAndSet(value, newValue);
         }
     }

     public long getCount() {
         return this.count.get();
     }
 }

In the inc() method of OptimisticLockCounter, the current value of the count (an AtomicLong instance) is retrieved using this.count.get(). A new incremented value (newValue) is then calculated based on the retrieved value. Finally, this.count.compareAndSet(value, newValue) is invoked to attempt to atomically update the AtomicLong with the newValue.

The crucial aspect of optimistic locking lies in the compareAndSet() operation. It checks if the current value of the AtomicLong is still the same as the value that was initially read. If it is unchanged (meaning no other thread has modified it in the interim), compareAndSet() succeeds, atomically updating the AtomicLong to newValue, and incSuccessful becomes true, exiting the while loop. However, if another thread has concurrently incremented the AtomicLong in the meantime, the compareAndSet() call will fail because the expected value (value) no longer matches the actual current value stored in the AtomicLong. In this case, compareAndSet() returns false, incSuccessful remains false, and the inc() method iterates again within the while loop. It re-reads the current value of count, recalculates the incremented value, and attempts the compareAndSet() again. This retry mechanism ensures that increments are eventually applied correctly even under concurrent access, embodying the optimistic approach of assuming no conflicts and retrying if conflicts are detected.