Skip to main content

6.5 Stream Callbacks

Published on 2018-06-20 | Category: CUDAFreshman | Comments: 0 | Views:

Abstract: This article introduces stream callbacks.

Keywords: Stream Callbacks

Stream Callbacks

Stream callbacks are a special technique, somewhat like event functions. A callback function is placed in a stream, and when all preceding tasks in that stream complete, the function is called. However, there is a special constraint -- the callback function must follow these rules:

  • The callback function cannot call CUDA APIs
  • It cannot perform synchronization

A stream callback function has a specific parameter signature and must be written as a function with the following parameters:

void CUDART_CB my_callback(cudaStream_t stream, cudaError_t status, void *data) {
printf("callback from stream %d\n", *((int *)data));
}

Then use:

cudaError_t cudaStreamAddCallback(cudaStream_t stream,cudaStreamCallback_t callback, void *userData, unsigned int flags);

to add it to a stream.

Complete code on GitHub: https://github.com/Tony-Tan/CUDA_Freshman (Stars are welcome!)

Partial code:

//
//
//
void CUDART_CB my_callback(cudaStream_t stream,cudaError_t status,void * data)
{
printf("call back from stream:%d\n",*((int *)data));
}
//
//
//
//
//
int main(int argc,char **argv)
{
//
//
//

//asynchronous calculation
int iElem=nElem/N_SEGMENT;
cudaStream_t stream[N_SEGMENT];
for(int i=0;i<N_SEGMENT;i++)
{
CHECK(cudaStreamCreate(&stream[i]));
}
cudaEvent_t start,stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start,0);
for(int i=0;i<N_SEGMENT;i++)
{
int ioffset=i*iElem;
CHECK(cudaMemcpyAsync(&a_d[ioffset],&a_h[ioffset],nByte/N_SEGMENT,cudaMemcpyHostToDevice,stream[i]));
CHECK(cudaMemcpyAsync(&b_d[ioffset],&b_h[ioffset],nByte/N_SEGMENT,cudaMemcpyHostToDevice,stream[i]));
sumArraysGPU<<<grid,block,0,stream[i]>>>(&a_d[ioffset],&b_d[ioffset],&res_d[ioffset],iElem);
CHECK(cudaMemcpyAsync(&res_from_gpu_h[ioffset],&res_d[ioffset],nByte/N_SEGMENT,cudaMemcpyDeviceToHost,stream[i]));
CHECK(cudaStreamAddCallback(stream[i],my_callback,(void *)(stream+i),0));
}
//timer
CHECK(cudaEventRecord(stop, 0));
int counter=0;
while (cudaEventQuery(stop)==cudaErrorNotReady)
{
counter++;
}

//
//
//
//
//
//
}

The result is:

re-6-5-2

Summary

This article introduced the last small feature in this series -- stream callbacks. The following sections will cover intermediate-level topics, so make sure to practice the material from earlier, or you will be lost later!