5.0 Shared Memory and Constant Memory
Published on 2018-06-01 | Category: CUDA , Freshman | Comments: 0 | Views:
Abstract: This article provides an overview of Chapter 5 on CUDA shared memory and constant memory. Keywords: Shared Memory, Constant Memory
Shared Memory and Constant Memory
This article is an introduction to CUDA Chapter 5, outlining the general approach for this chapter. This is a brief article with no filler.
Shared Memory and Constant Memory
In this chapter, we will learn:
- Arranging data in shared memory
- Index conversion from 2D shared memory to linear global memory
- Resolving bank conflicts in different access patterns
- Caching data in shared memory to reduce global memory accesses
- Using shared memory to avoid uncoalesced global memory accesses
- Differences between constant cache and read-only cache
- Warp shuffle instruction programming
Previously, we focused primarily on global memory usage and how to improve global memory access efficiency through different methods. Although unoptimized memory access is not necessarily problematic because modern GPUs have L1 caches. However, uncoalesced memory access across global memory still leads to poor bandwidth utilization. Since uncoalesced memory access cannot be avoided in practical applications, using shared memory becomes the key to improving efficiency.
Summary
In this chapter, we focus on how to program with shared memory, how data is stored in shared memory, how data elements are mapped to memory banks (hardware) using different access patterns, and methods for improving kernel performance using shared memory.