多次元Block sizeのWarp idについて

目次

概要

CUDAのカーネル関数の呼び出しで,
const dim3 block_size(4, 4, 4);
kernel<<<1, block_size>>>(...);
としたときに起動されるスレッドのWarp idがどうなっているか,という話です.

調べ方

素直にWarp idを調べます.

結果

dim(4, 4, 4);
の場合,
$ ./a.out | sort
[ 0, 0, 0] : warpid =  0
[ 0, 0, 1] : warpid =  0
[ 0, 0, 2] : warpid =  1
[ 0, 0, 3] : warpid =  1
[ 0, 1, 0] : warpid =  0
[ 0, 1, 1] : warpid =  0
[ 0, 1, 2] : warpid =  1
[ 0, 1, 3] : warpid =  1
[ 0, 2, 0] : warpid =  0
[ 0, 2, 1] : warpid =  0
[ 0, 2, 2] : warpid =  1
[ 0, 2, 3] : warpid =  1
[ 0, 3, 0] : warpid =  0
[ 0, 3, 1] : warpid =  0
[ 0, 3, 2] : warpid =  1
[ 0, 3, 3] : warpid =  1
[ 1, 0, 0] : warpid =  0
[ 1, 0, 1] : warpid =  0
[ 1, 0, 2] : warpid =  1
[ 1, 0, 3] : warpid =  1
[ 1, 1, 0] : warpid =  0
[ 1, 1, 1] : warpid =  0
[ 1, 1, 2] : warpid =  1
[ 1, 1, 3] : warpid =  1
[ 1, 2, 0] : warpid =  0
[ 1, 2, 1] : warpid =  0
[ 1, 2, 2] : warpid =  1
[ 1, 2, 3] : warpid =  1
[ 1, 3, 0] : warpid =  0
[ 1, 3, 1] : warpid =  0
[ 1, 3, 2] : warpid =  1
[ 1, 3, 3] : warpid =  1
[ 2, 0, 0] : warpid =  0
[ 2, 0, 1] : warpid =  0
[ 2, 0, 2] : warpid =  1
[ 2, 0, 3] : warpid =  1
[ 2, 1, 0] : warpid =  0
[ 2, 1, 1] : warpid =  0
[ 2, 1, 2] : warpid =  1
[ 2, 1, 3] : warpid =  1
[ 2, 2, 0] : warpid =  0
[ 2, 2, 1] : warpid =  0
[ 2, 2, 2] : warpid =  1
[ 2, 2, 3] : warpid =  1
[ 2, 3, 0] : warpid =  0
[ 2, 3, 1] : warpid =  0
[ 2, 3, 2] : warpid =  1
[ 2, 3, 3] : warpid =  1
[ 3, 0, 0] : warpid =  0
[ 3, 0, 1] : warpid =  0
[ 3, 0, 2] : warpid =  1
[ 3, 0, 3] : warpid =  1
[ 3, 1, 0] : warpid =  0
[ 3, 1, 1] : warpid =  0
[ 3, 1, 2] : warpid =  1
[ 3, 1, 3] : warpid =  1
[ 3, 2, 0] : warpid =  0
[ 3, 2, 1] : warpid =  0
[ 3, 2, 2] : warpid =  1
[ 3, 2, 3] : warpid =  1
[ 3, 3, 0] : warpid =  0
[ 3, 3, 1] : warpid =  0
[ 3, 3, 2] : warpid =  1
[ 3, 3, 3] : warpid =  1
となり,threadIdx.z = 0, 1の32スレッドがWarp 0, threadIdx.z = 2, 3の32スレッドがWarp 1に割り当てられていることが分かります.
dim(8, 4, 4);
など,きれいな数字の場合きれいにWarp idが割り振られるようです.

少し気持ち悪目に

dim(3, 3, 5);
の場合,
$ ./a.out | sort 
[ 0, 0, 0] : warpid =  0
[ 0, 0, 1] : warpid =  0
[ 0, 0, 2] : warpid =  0
[ 0, 0, 3] : warpid =  1
[ 0, 0, 4] : warpid =  1
[ 0, 1, 0] : warpid =  0
[ 0, 1, 1] : warpid =  0
[ 0, 1, 2] : warpid =  1
[ 0, 1, 3] : warpid =  1
[ 0, 1, 4] : warpid =  1
[ 0, 2, 0] : warpid =  0
[ 0, 2, 1] : warpid =  0
[ 0, 2, 2] : warpid =  1
[ 0, 2, 3] : warpid =  1
[ 0, 2, 4] : warpid =  2
[ 0, 3, 0] : warpid =  0
[ 0, 3, 1] : warpid =  0
...
:thinking_face:

カテゴリー:CUDA
記事作成日:2019-06-16