When creating a guided session for distributed learning with this line:
with sv.managed_session(server.target, config=config) as sess, sess.as_default():
I get this error (full stack trace below) to the main worker:
tensorflow.python.framework.errors_impl.UnavailableError: attempt to connect http1.x server
It still seems that everything is fine on the parameter server, reports:
E1106 11:26:32.844686639 5543 ev_epoll1_linux.c:1051] grpc epoll fd: 8
2017-11-06 11:26:32.851773: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> localhost:12222}
2017-11-06 11:26:32.851863: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> 127.0.0.1:12223}
2017-11-06 11:26:32.856802: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:12222
I get this error only when using the new v1.4 tensor stream built from the source (found the same problem when installing from pip). Everything works fine in version 1.3. Does anyone know if there was a perfect change, I assume that with tensor it works with grpc?
, http2 vs http1? , GRPC, , protobuf http2, , , http1, , v1.3 v1.4
- ,
UnavailableError: http1.x
?
RedHat Linux ... . , .
:
E1106 11:28:24.383745692 5787 ev_epoll1_linux.c:1051] grpc epoll fd: 8
2017-11-06 11:28:24.391084: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize
GrpcChannelCache for job ps -> {0 -> 127.0.0.1:12222}
2017-11-06 11:28:24.391185: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize
GrpcChannelCache for job worker -> {0 -> localhost:12223}
2017-11-06 11:28:24.392285: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server
with target: grpc://localhost:12223
2017-11-06 11:28:37.875632: E tensorflow/core/distributed_runtime/master.cc:269] Master init: Unavailable:
Trying to connect an http1.x server
Traceback (most recent call last):
File "/app/sbtt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in
_do_call
return fn(*args)
File "/app/sbtt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1293, in
_run_fn
self._extend_graph()
File "/app/sbtt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1354, in
_extend_graph
self._session, graph_def.SerializeToString(), status)
File "/app/sbtt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473,
in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnavailableError: Trying to connect an http1.x server
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/pycharm-community-2017.2.3/helpers/pydev/pydevd.py", line 1599, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/opt/pycharm-community-2017.2.3/helpers/pydev/pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals)
File "/opt/pycharm-community-2017.2.3/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "worker.py", line 426, in <module>
main()
File "worker.py", line 418, in main
run(args, server)
File "worker.py", line 174, in run
sess.run(trainer.sync)
File "/app/sbtt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/app/sbtt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in
_run
feed_dict_tensor, options, run_metadata)
File "/app/sbtt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in
_do_run
options, run_metadata)
File "/app/sbtt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in
_do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnavailableError: Trying to connect an http1.x server