CUDA Basics
Learn the fundamental concepts of CUDA programming.
CUDA Program Structure
A typical CUDA program consists of code that runs on both the CPU (host) and GPU (device). The host code manages memory and launches kernels, while the device code runs in parallel on the GPU.
Your First CUDA Program
Let's look at a simple CUDA program that prints "Hello World" from multiple threads:
Loading...
Key Components:
__global__
keyword indicates a function that runs on the GPUblockIdx.x
andthreadIdx.x
are built-in variables for accessing block and thread indices- The
<<<2, 4>>>
syntax launches the kernel with 2 blocks of 4 threads each cudaDeviceSynchronize()
waits for all GPU operations to complete
Thread Hierarchy
CUDA organizes threads in a hierarchical structure:
- Threads are grouped into blocks
- Blocks are organized into a grid
- This hierarchy allows CUDA to scale across different GPU architectures
Memory Model
CUDA provides different types of memory:
- Global memory - accessible by all threads
- Shared memory - shared between threads in a block
- Local memory - private to each thread
- Constant memory - read-only memory accessible by all threads
Try modifying and running the example code in our playground to better understand how CUDA threads work.