Skip to main content

5.0 Shared Memory and Constant Memory

Published on 2018-06-01 | Category: CUDAFreshman | Comments: 0 | Views:

Abstract: This article provides an overview of Chapter 5 on CUDA shared memory and constant memory. Keywords: Shared Memory, Constant Memory

Shared Memory and Constant Memory

This article is an introduction to CUDA Chapter 5, outlining the general approach for this chapter. This is a brief article with no filler.

Shared Memory and Constant Memory

In this chapter, we will learn:

  • Arranging data in shared memory
  • Index conversion from 2D shared memory to linear global memory
  • Resolving bank conflicts in different access patterns
  • Caching data in shared memory to reduce global memory accesses
  • Using shared memory to avoid uncoalesced global memory accesses
  • Differences between constant cache and read-only cache
  • Warp shuffle instruction programming

Previously, we focused primarily on global memory usage and how to improve global memory access efficiency through different methods. Although unoptimized memory access is not necessarily problematic because modern GPUs have L1 caches. However, uncoalesced memory access across global memory still leads to poor bandwidth utilization. Since uncoalesced memory access cannot be avoided in practical applications, using shared memory becomes the key to improving efficiency.

Summary

In this chapter, we focus on how to program with shared memory, how data is stored in shared memory, how data elements are mapped to memory banks (hardware) using different access patterns, and methods for improving kernel performance using shared memory.