I have some doubts about scheduling nvidia GPU tasks.
(1) If the warping of threads in the block (CTA) is completed, but there are other warps left, will this warp wait until the rest are over? In other words, all threads in a block (CTA) free their resource when all threads are finished, okay? I think this point should be correct, since the threads in the block share the shared memory and another resource, these resources are allocated in the CTA size manager.
(2) If all threads in a block (CTA) hang for a long time, for example, access to global memory? Will new CTA threads take up a resource, such as a processor? In other words, if a block (CTA) is sent to SM (stream processors), if it takes a resource before it is completed?
I would be grateful if anyone would recommend me any book or articles on GPU architecture. Thank!
(CTA) SM , SM ( , , , ,...). , . . SM. SM warp, . , , ( Pascal). warp SM warp-id.
warp , warp , warp, warp, warp-id .
, , SM , .
, , , warp warp. , . , , 1-2 . . warp , , . , . . SM , ALU, .
CUDA, . CUDA / . . SM SM , /, , , ..
. , , . , , Pascal, .
( ):
Q1. , ?
. , , - , .
Q2. , ? ?
, . .
< 3.2 . 3.2+ , , ( parallelism) gpu.
, - . , , , CUDA, , . , , .
Source: https://habr.com/ru/post/1677900/More articles:Role-based dynamic layout - htmlSpark dataframe: collect () vs select () - dataframejava.lang.StackOverflowError: stack size 8MB Error checking permission - javaКак предсказать желаемый класс, используя Naive Bayes в текстовой классификации - pythonRails 5.1.1 does not generate scaffold / controller, etc. - ruby-on-railsDuplicate CSV column in bash - bashThe Meaning and Use of Predicates Java 8 - javaWhy can't I access static interface methods using an instance variable - javaHow to encode the special character "@" in the username parameter in the URI of the FROM and TO header in the PJSIP library when sending a REGISTER message? - objective-cRegex replacement takes time for millions of documents, how to do it faster? - pythonAll Articles