008 - Data Layout and Cache-Friendly Rust
Data Layout and Cache-Friendly Rust
Why this matters
Memory access patterns often dominate performance. Fast code is not only about algorithms—it is also about data layout and locality.
Learning goals
By the end, you should be able to:
- Explain why contiguous memory often improves performance.
- Recognize Array of Structs (AoS) vs Struct of Arrays (SoA) tradeoffs.
- Apply simple layout choices to improve cache friendliness.
Mental model
CPU caches prefer predictable, contiguous reads.
- Good locality: sequential access in
Vec<T>. - Poor locality: pointer chasing across many heap allocations.
Python baseline
Python lists store references to Python objects, which can reduce data locality compared to tightly packed primitive arrays.
AoS vs SoA (concept)
- AoS:
Vec<Point {x, y}>— nice ergonomics for per-object operations. - SoA:
xs: Vec<f32>, ys: Vec<f32>— often better for column-wise numeric loops.
Runnable end-to-end example
#[derive(Clone, Copy)]
struct Point {
x: f32,
y: f32,
}
fn sum_aos(points: &[Point]) -> f32 {
points.iter().map(|p| p.x + p.y).sum()
}
fn sum_soa(xs: &[f32], ys: &[f32]) -> f32 {
xs.iter().zip(ys.iter()).map(|(x, y)| x + y).sum()
}
fn main() {
let n = 10_000;
// AoS layout
let points: Vec<Point> = (0..n)
.map(|i| Point {
x: i as f32,
y: (i * 2) as f32,
})
.collect();
// SoA layout
let xs: Vec<f32> = (0..n).map(|i| i as f32).collect();
let ys: Vec<f32> = (0..n).map(|i| (i * 2) as f32).collect();
let a = sum_aos(&points);
let b = sum_soa(&xs, &ys);
println!("sum_aos = {}", a);
println!("sum_soa = {}", b);
}
Practical guidance
- Keep hot-path data contiguous when possible.
- Benchmark with realistic workloads before redesigning structure.
- Choose AoS for ergonomics, SoA for heavy column-wise numeric processing.
Common pitfalls
- Optimizing layout before measuring.
- Overcomplicating code for tiny gains in non-critical paths.
- Confusing algorithmic improvements with locality improvements.
Quick practice
- Extend the example to include
zand compare AoS/SoA functions. - Measure both versions with
cargo run --releaseand compare timings manually.
Recap
Cache-friendly layout can unlock significant speedups. Rust gives you control over data representation so you can choose the right tradeoff for each workload.
Next lesson: 009 - Concurrency Memory Safety (Send, Sync).
Join the Journey Ahead!
If you're eager to continue this learning journey and stay updated with the latest insights, consider subscribing. By joining our mailing list, you'll receive notifications about new articles, tips, and resources to help you seamlessly pick up Rust by leveraging your Python skills.
Other articles in the series
- 000 - Memory Mastery Roadmap
- 001 - Memory Mastery: Zero-Cost Abstractions
- 002 - Stack vs Heap, Moves, and Clones
- 003 - Borrowing Rules in Real Functions
- 004 - Lifetimes Without Fear
- 005 - String, &str, and Allocation Patterns
- 006 - Smart Pointers (`Box`, `Rc`, `Arc`, `RefCell`)
- 007 - Interior Mutability and Tradeoffs
- 009 - Concurrency Memory Safety (`Send`, `Sync`)
- 010 - Profiling and Benchmarking Rust vs Python