Research note / structural placeholder

CUDA/HPC implementation notes

Placeholder page for performance-aware scientific computing notes.

Kernel structure

Numerical kernels should document memory access patterns, thread-block geometry, synchronization requirements, and reductions.

Performance metric

\( T_\mathrm{step} = T_\mathrm{load} + T_\mathrm{compute} + T_\mathrm{store} \)

Cluster workflow

Production notes should include compiler versions, Slurm scripts, GPU model, problem size, and reproducible profiling commands.