CUDA SASS: Learning to Read NVIDIA's Native GPU ISA
TL;DR SASS is the real instruction stream executed by NVIDIA GPUs. PTX is not the final hardware ISA. It is a virtual ISA that ptxas lowers into architecture-specific SASS. The main things to learn first are: opcodes, registers, predicates, loads/stores, special registers, and modifiers. SASS is where you see performance-critical details that source code hides: final opcode selection, register usage, spills, and memory instructions. If PTX tells you the compiler’s intent, SASS tells you what the GPU will actually issue. Why Learn SASS at All? If you only write CUDA C++, it is tempting to stop at source code and trust the compiler. That works until performance becomes mysterious.