概要
CUDAのカーネル関数の呼び出しで,const dim3 block_size(4, 4, 4);
kernel<<<1, block_size>>>(...);
としたときに起動されるスレッドのWarp idがどうなっているか,という話です.
調べ方
素直にWarp idを調べます.結果
dim(4, 4, 4);
の場合,
$ ./a.out | sort
[ 0, 0, 0] : warpid = 0
[ 0, 0, 1] : warpid = 0
[ 0, 0, 2] : warpid = 1
[ 0, 0, 3] : warpid = 1
[ 0, 1, 0] : warpid = 0
[ 0, 1, 1] : warpid = 0
[ 0, 1, 2] : warpid = 1
[ 0, 1, 3] : warpid = 1
[ 0, 2, 0] : warpid = 0
[ 0, 2, 1] : warpid = 0
[ 0, 2, 2] : warpid = 1
[ 0, 2, 3] : warpid = 1
[ 0, 3, 0] : warpid = 0
[ 0, 3, 1] : warpid = 0
[ 0, 3, 2] : warpid = 1
[ 0, 3, 3] : warpid = 1
[ 1, 0, 0] : warpid = 0
[ 1, 0, 1] : warpid = 0
[ 1, 0, 2] : warpid = 1
[ 1, 0, 3] : warpid = 1
[ 1, 1, 0] : warpid = 0
[ 1, 1, 1] : warpid = 0
[ 1, 1, 2] : warpid = 1
[ 1, 1, 3] : warpid = 1
[ 1, 2, 0] : warpid = 0
[ 1, 2, 1] : warpid = 0
[ 1, 2, 2] : warpid = 1
[ 1, 2, 3] : warpid = 1
[ 1, 3, 0] : warpid = 0
[ 1, 3, 1] : warpid = 0
[ 1, 3, 2] : warpid = 1
[ 1, 3, 3] : warpid = 1
[ 2, 0, 0] : warpid = 0
[ 2, 0, 1] : warpid = 0
[ 2, 0, 2] : warpid = 1
[ 2, 0, 3] : warpid = 1
[ 2, 1, 0] : warpid = 0
[ 2, 1, 1] : warpid = 0
[ 2, 1, 2] : warpid = 1
[ 2, 1, 3] : warpid = 1
[ 2, 2, 0] : warpid = 0
[ 2, 2, 1] : warpid = 0
[ 2, 2, 2] : warpid = 1
[ 2, 2, 3] : warpid = 1
[ 2, 3, 0] : warpid = 0
[ 2, 3, 1] : warpid = 0
[ 2, 3, 2] : warpid = 1
[ 2, 3, 3] : warpid = 1
[ 3, 0, 0] : warpid = 0
[ 3, 0, 1] : warpid = 0
[ 3, 0, 2] : warpid = 1
[ 3, 0, 3] : warpid = 1
[ 3, 1, 0] : warpid = 0
[ 3, 1, 1] : warpid = 0
[ 3, 1, 2] : warpid = 1
[ 3, 1, 3] : warpid = 1
[ 3, 2, 0] : warpid = 0
[ 3, 2, 1] : warpid = 0
[ 3, 2, 2] : warpid = 1
[ 3, 2, 3] : warpid = 1
[ 3, 3, 0] : warpid = 0
[ 3, 3, 1] : warpid = 0
[ 3, 3, 2] : warpid = 1
[ 3, 3, 3] : warpid = 1
となり,threadIdx.z = 0, 1の32スレッドがWarp 0, threadIdx.z = 2, 3の32スレッドがWarp 1に割り当てられていることが分かります.
dim(8, 4, 4);
など,きれいな数字の場合きれいにWarp idが割り振られるようです.
少し気持ち悪目に
dim(3, 3, 5);
の場合,
$ ./a.out | sort
[ 0, 0, 0] : warpid = 0
[ 0, 0, 1] : warpid = 0
[ 0, 0, 2] : warpid = 0
[ 0, 0, 3] : warpid = 1
[ 0, 0, 4] : warpid = 1
[ 0, 1, 0] : warpid = 0
[ 0, 1, 1] : warpid = 0
[ 0, 1, 2] : warpid = 1
[ 0, 1, 3] : warpid = 1
[ 0, 1, 4] : warpid = 1
[ 0, 2, 0] : warpid = 0
[ 0, 2, 1] : warpid = 0
[ 0, 2, 2] : warpid = 1
[ 0, 2, 3] : warpid = 1
[ 0, 2, 4] : warpid = 2
[ 0, 3, 0] : warpid = 0
[ 0, 3, 1] : warpid = 0
...
:thinking_face:
カテゴリー:CUDA
記事作成日:2019-06-16