@BernardoGO's solution does not work when the CUDA build is enabled:
$ bazel build --copt=-O -c opt --config cuda -c dbg --strip=never //tensorflow/tools/pip_package:build_pip_package -s
/usr/include/c++/6/bits/stl_pair.h(327): error: calling a __host__ function("std::_Rb_tree_const_iterator< ::tensorflow::NcclManager::NcclStream *> ::_Rb_tree_const_iterator") from a __device__ function("std::pair< ::std::_Rb_tree_const_iterator< ::tensorflow::NcclManager::NcclStream *> , bool> ::pair< ::std::_Rb_tree_iterator< ::tensorflow::NcclManager::NcclStream *> &, bool &, (bool)1> ") is not allowed
/usr/include/c++/6/bits/stl_pair.h(327): error: identifier "std::_Rb_tree_const_iterator< ::tensorflow::NcclManager::NcclStream *> ::_Rb_tree_const_iterator" is undefined in device code
/usr/include/c++/6/bits/stl_algobase.h(1009): error: calling a __host__ function("__builtin_clzl") from a __device__ function("std::__lg") is not allowed
3 errors detected in the compilation of "/tmp/tmpxft_00007abb_00000000-6_nccl_manager.cpp1.ii".
It works only if it --copt=-Ois replaced by --copt=-O1, but -O1too much for convenient debugging.