Skip to main content

6.4 Overlapping GPU and CPU Execution

Published on 2018-06-20 | Category: CUDAFreshman | Comments: 0 | Views:

Abstract: This article introduces how to overlap GPU and CPU execution.

Keywords: Overlapping GPU and CPU Execution

Overlapping GPU and CPU Execution

In addition to overlapping data transfer and kernel execution discussed above, another major concern is using the GPU while the CPU also performs computation. This is the focus of this article.

The example process in this article is as follows:

  • Kernels are dispatched to their respective streams
  • The CPU computes while waiting for events

The specific code is as follows:

cudaEvent_t start,stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start,0);
for(int i=0;i<N_SEGMENT;i++)
{
int ioffset=i*iElem;
CHECK(cudaMemcpyAsync(&a_d[ioffset],&a_h[ioffset],nByte/N_SEGMENT,cudaMemcpyHostToDevice,stream[i]));
CHECK(cudaMemcpyAsync(&b_d[ioffset],&b_h[ioffset],nByte/N_SEGMENT,cudaMemcpyHostToDevice,stream[i]));
sumArraysGPU<<<grid,block,0,stream[i]>>>(&a_d[ioffset],&b_d[ioffset],&res_d[ioffset],iElem);
CHECK(cudaMemcpyAsync(&res_from_gpu_h[ioffset],&res_d[ioffset],nByte/N_SEGMENT,cudaMemcpyDeviceToHost,stream[i]));
}
//timer
CHECK(cudaEventRecord(stop, 0));
int counter=0;
while (cudaEventQuery(stop)==cudaErrorNotReady)
{
counter++;
}
printf("cpu counter:%d\n",counter);

Complete code on GitHub: https://github.com/Tony-Tan/CUDA_Freshman (Stars are welcome!)

The result is:

re-1

As we can see, before the stop event executes, the CPU is continuously working -- achieving a parallel effect.

The key point in the code is:

cudaEventQuery(stop)

which is non-blocking. Otherwise, the CPU computation cannot continue.

Summary

This article is short, but as a very good example, it demonstrates parallelism between GPU and CPU.