This is not a direct public recommendation on this issue. I usually just let TensorFlow distribute this information with
template<typename Device, typename Dtype> class MyOp: public OpKernel { { public: explicit MyOp(OpKernelConstruction *context) : OpKernel(context) {
- what is required for memory should be assigned by the TensorFlow context, not by custom calls to
cudaMalloc or new type[num] . - context should provide information for Allocator
- see below
For simplicity, consider just adding two matrices ( full example ). TensorFlow-Operations typically contain the following structure:
Description Op having REGISTER_OP responsible for validating the form and setting up the output form ( example )
OpKernel is responsible for allocating memory, getting a pointer to inputs and configuration files (see above or this )
Functor for the implementation itself, for example
Tensor* output = nullptr; Tensor* tmp_var = nullptr; OP_REQUIRES_OK(ctx, ctx->allocate_output(0, output_shape, &output)); OP_REQUIRES_OK(ctx, ctx->allocate_temp(0, some_shape, &tmp_var));
It remains only to implement
change
- allocate_persistent: use this if you need your data between Op calls, such as one-time index structures. [ example ]
- allocate_temp is only tmp memory that will not be saved at the end of the life of the
Compute method. [ example ]
But I highly recommend reading the comment in the source code here , and then decided depending on your use case.
source share