Research note / structural placeholder
CUDA/HPC implementation notes
Placeholder page for performance-aware scientific computing notes.
Kernel structure
Numerical kernels should document memory access patterns, thread-block geometry, synchronization requirements, and reductions.
Performance metric
\( T_\mathrm{step} = T_\mathrm{load} + T_\mathrm{compute} + T_\mathrm{store} \)
Cluster workflow
Production notes should include compiler versions, Slurm scripts, GPU model, problem size, and reproducible profiling commands.