6.5 Stream Callbacks
Published on 2018-06-20 | Category: CUDA , Freshman | Comments: 0 | Views:
Abstract: This article introduces stream callbacks.
Keywords: Stream Callbacks
Stream Callbacks
Stream callbacks are a special technique, somewhat like event functions. A callback function is placed in a stream, and when all preceding tasks in that stream complete, the function is called. However, there is a special constraint -- the callback function must follow these rules:
- The callback function cannot call CUDA APIs
- It cannot perform synchronization
A stream callback function has a specific parameter signature and must be written as a function with the following parameters:
void CUDART_CB my_callback(cudaStream_t stream, cudaError_t status, void *data) {
printf("callback from stream %d\n", *((int *)data));
}
Then use:
cudaError_t cudaStreamAddCallback(cudaStream_t stream,cudaStreamCallback_t callback, void *userData, unsigned int flags);
to add it to a stream.
Complete code on GitHub: https://github.com/Tony-Tan/CUDA_Freshman (Stars are welcome!)
Partial code:
//
//
//
void CUDART_CB my_callback(cudaStream_t stream,cudaError_t status,void * data)
{
printf("call back from stream:%d\n",*((int *)data));
}
//
//
//
//
//
int main(int argc,char **argv)
{
//
//
//
//asynchronous calculation
int iElem=nElem/N_SEGMENT;
cudaStream_t stream[N_SEGMENT];
for(int i=0;i<N_SEGMENT;i++)
{
CHECK(cudaStreamCreate(&stream[i]));
}
cudaEvent_t start,stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start,0);
for(int i=0;i<N_SEGMENT;i++)
{
int ioffset=i*iElem;
CHECK(cudaMemcpyAsync(&a_d[ioffset],&a_h[ioffset],nByte/N_SEGMENT,cudaMemcpyHostToDevice,stream[i]));
CHECK(cudaMemcpyAsync(&b_d[ioffset],&b_h[ioffset],nByte/N_SEGMENT,cudaMemcpyHostToDevice,stream[i]));
sumArraysGPU<<<grid,block,0,stream[i]>>>(&a_d[ioffset],&b_d[ioffset],&res_d[ioffset],iElem);
CHECK(cudaMemcpyAsync(&res_from_gpu_h[ioffset],&res_d[ioffset],nByte/N_SEGMENT,cudaMemcpyDeviceToHost,stream[i]));
CHECK(cudaStreamAddCallback(stream[i],my_callback,(void *)(stream+i),0));
}
//timer
CHECK(cudaEventRecord(stop, 0));
int counter=0;
while (cudaEventQuery(stop)==cudaErrorNotReady)
{
counter++;
}
//
//
//
//
//
//
}
The result is:

Summary
This article introduced the last small feature in this series -- stream callbacks. The following sections will cover intermediate-level topics, so make sure to practice the material from earlier, or you will be lost later!