Skip to main content

CUDA Learning Path

You can start by reading this blog for a simple introduction: Introductory Blog

Then read this: Casual Discussion on High-Performance Computing and Performance Optimization: Memory Access

After that, read this, which is a translation of the official documentation: Official Documentation Translation

While reading the official docs, you can start hands-on experiments with simple matrix multiplication:

CUDA Matrix Multiplication Ultimate Optimization Guide

Also, learn about CUDA's two performance analysis tools: NVIDIA Nsight and NVIDIA Compute.

There isn't much material on this topic. There are a few simple tutorials on Bilibili, plus the official documentation (which is entirely in English).

NVIDIA Performance Analysis Tool nsight-compute Getting Started

This is my matrix multiplication implementation with some insights. You can also use it as a reference. If you want to practice matrix multiplication, you can use the template in the folder "0" inside this archive.