The cuda API provides stateful library functions: two consecutive calls relate to each other. In short, context is its state.
The runtime API is a wrapper / helper for the driver API . In the driver API, you can see that the context is explicitly available, and for convenience you can use the context stack. There is one specific context that is shared between the driver and the runtime APU (see main context ).
The context contains all the management data for managing and using the device. For example, it contains a list of allocated memory, loaded modules containing device code, a mapping between the CPU and GPU memory for a zero copy, etc.
Finally, note that this post is more about experience than documentation.
source share