读书笔记丨《Rust Atomics and Locks》

Rust 并发基础

多线程

开启线程：

// 开启线程
let t = thread::spawn(|| {
  // do something
})

// 等待线程退出
t.join().unwrap()

控制线程周期：

thread::scope(|s| {
  s.spawn(|| {
    // do something
  })
  s.spawn(|| {
    // do something
  })
});
// 这里会确保 scope 里面的线程都执行完毕并退出

内部可变性（Interior Mutability）

Rust 内存安全默认基于以下规则：同一时间，一个数据只能被多个不可变引用（&T）共享，或仅被一个可变引用（&mut T）独占。这种编译时检查避免了数据竞争，但不够灵活，例如多个线程同时持有了同一个引用，那它必然只能是不可变引用，也就无法实现多线程并发修改同一数据的功能了。

内部可变性允许通过不可变引用（&T）修改数据，即使数据本身未被声明为 mut。其核心思想是：“看似不可变的外部接口，内部可变”。例如，RefCell 允许在运行时动态借用可变引用，而非依赖编译时的静态检查。

UnsafeCell

UnsafeCell<T> 是 Rust 内部可变性的基石，所有提供内部可变性的类型（如 Cell、RefCell）均基于它实现。

类型	线程安全	运行时检查	适用数据类型	典型场景
`Cell<T>`	❌	❌	`Copy` 类型	简单值的快速修改
`RefCell<T>`	❌	✔️	任意类型	单线程复杂数据结构
`RwLock<T>`	✔️	✔️（阻塞）	任意类型	多线程共享数据
`UnsafeCell`	❌	❌	任意类型	底层安全抽象的实现

UnsafeCell<T> 的定义如下：

/// The core primitive for interior mutability in Rust.
///
/// If you have a reference `&T`, then normally in Rust the compiler performs optimizations based on
/// the knowledge that `&T` points to immutable data. Mutating that data, for example through an
/// alias or by transmuting a `&T` into a `&mut T`, is considered undefined behavior.
/// `UnsafeCell<T>` opts-out of the immutability guarantee for `&T`: a shared reference
/// `&UnsafeCell<T>` may point to data that is being mutated. This is called "interior mutability".
#[repr(transparent)]
pub struct UnsafeCell<T: ?Sized> {
    value: T,
}

其中 #[repr(transparent)] 保证了 UnsafeCell<T> 和 T 类型内存布局的一致性，这是后面各种指针可以直接安全强转的基础。

它有 2 个核心方法，分别为 get_mut 和 get，其中 get_mut 比较简单，如下：

/// Returns a mutable reference to the underlying data.
///
/// This call borrows the `UnsafeCell` mutably (at compile-time) which
/// guarantees that we possess the only reference.
pub const fn get_mut(&mut self) -> &mut T {
    &mut self.value
}

参数本来就是 &mut 的，所以直接返回 &mut self.value 就可以了，天然满足编译器检查要求，不需要其他额外的骚操作。

get 就不一般了，因为 get 的作用是，从一个不可变引用中获取可变引用！ 这明显是要打破编译器的引用检查机制，所以需要一些骚操作。它的实现如下：

/// Gets a mutable pointer to the wrapped value.
pub const fn get(&self) -> *mut T {
    self as *const UnsafeCell<T> as *const T as *mut T
}

发明 Rust 的绝对是个套娃大佬...

self as *const UnsafeCell<T> as *const T as *mut T 的三步转换过程可分解为：

原始指针获取 self as *const Self：将当前对象的不可变引用（&self）转换为不可变的原生指针（*const Self），这一步仅获取地址，不涉及内存操作，是安全的。
类型强制转换 *const Self as *const T：将指针类型从指向 Self（即 UnsafeCell<T> 类型）转换为指向内部存储的 T 类型。由于 UnsafeCell 的内存布局与 T 完全一致（#[repr(transparent)]），此转换在内存对齐上无风险。
可变性重解释 *const T as *mut T：将不可变指针强制转换为可变指针。这一步是 绕过 Rust 默认不可变引用规则的关键，允许通过共享引用修改内部数据，但需开发者自行保证安全性。

我们可以通过回答 2 个问题来进一步感受 Rust 的设计哲学与安全边界：

① 为什么需要多步转换？

安全隔离：Rust默认禁止通过不可变引用（&T）修改数据，但UnsafeCell是内部可变性的底层原语，需通过指针转换绕过编译器检查。多步转换将“危险操作”限制在可控范围内。
类型系统约束：UnsafeCell 的get() 方法返回 *mut T，但方法参数是&self（不可变引用）。通过逐步转换，既满足方法签名要求，又实现内部可变性。

② 为什么不直接转换

不变性（Invariance）：UnsafeCell的泛型参数 T 的生命周期标记为不变（invariant），防止协变（covariance）导致悬垂指针。直接转换可能破坏生命周期约束。
裸指针的语义：*const T 和*mut T 在 Rust中代表不同的内存访问权限。强制转换需显式标记，提醒开发者注意潜在的数据竞争风险。

Cell

Cell<T> 允许在不可变引用的前提下修改数据。其设计核心是在 UnsafeCell 的基础上，通过限制数据访问方式和类型约束，在保证内存安全的同时突破 Rust 默认的借用规则。

它的定义如下：

/// A mutable memory location.
///
/// # Memory layout
///
/// `Cell<T>` has the same [memory layout and caveats as
/// `UnsafeCell<T>`](UnsafeCell#memory-layout). In particular, this means that
/// `Cell<T>` has the same in-memory representation as its inner type `T`.
#[repr(transparent)]
pub struct Cell<T: ?Sized> {
    value: UnsafeCell<T>,
}
unsafe impl<T: ?Sized> Send for Cell<T> where T: Send {}
impl<T: ?Sized> !Sync for Cell<T> {}

我们重点来看一下下面 2 行跟 Send 和 Sync 相关的实现，我们先回顾一下几个关键的 trait：

Send: 表示类型的所有权可以安全地跨线程转移。
Sync: 表示类型的不可变引用（&T）可以安全地跨线程共享。!Sync 表示明确禁止这一行为。
Sized: 表示类型的大小是固定的。!Sized 表示类型的大小是固定的。泛型类型参数默认隐式包含 T: Sized 约束，?Sized 表示支持动态大小的类型

所以：

unsafe impl<T: ?Sized> Send for Cell<T> where T: Send {} 表示在 T 是 Send 的前提下，允许将 Cell<T> 的所有权移动到其他线程。
impl<T: ?Sized> !Sync for Cell<T> {} 表示🈲止跨线程共享 &Cell<T>。

Rust Atomics and Locks

本文采用署名-非商业性使用-相同方式共享 4.0 国际许可协议，转载请注明出处。