I want to use Python to parallel copy a local file to multiple remote hosts. I am trying to do this with asyncio
Paramiko as well, since I already use these libraries for other purposes in my program.
I use BaseEventLoop.run_in_executor()
by default ThreadPoolExecutor
, which is actually a new interface for the old library threading
along with the Paramiko SFTP function for copying.
Here is a simplified example.
import sys
import asyncio
import paramiko
import functools
def copy_file_node(
*,
user: str,
host: str,
identity_file: str,
local_path: str,
remote_path: str):
ssh_client = paramiko.client.SSHClient()
ssh_client.load_system_host_keys()
ssh_client.set_missing_host_key_policy(paramiko.client.AutoAddPolicy())
ssh_client.connect(
username=user,
hostname=host,
key_filename=identity_file,
timeout=3)
with ssh_client:
with ssh_client.open_sftp() as sftp:
print("[{h}] Copying file...".format(h=host))
sftp.put(localpath=local_path, remotepath=remote_path)
print("[{h}] Copy complete.".format(h=host))
loop = asyncio.get_event_loop()
tasks = []
for host in ['10.0.0.1', '10.0.0.2']:
task = loop.run_in_executor(
None,
functools.partial(
copy_file_node,
user='user',
host=host,
identity_file='/path/to/identity_file',
local_path='/path/to/local/file',
remote_path='/path/to/remote/file'))
tasks.append(task)
try:
loop.run_until_complete(asyncio.gather(*tasks))
except Exception as e:
print("At least one node raised an error:", e, file=sys.stderr)
sys.exit(1)
loop.close()
The problem that I see is that the file is copied serially to the hosts, and not in parallel. Therefore, if a copy takes 5 seconds for one host, it takes 10 seconds for two hosts, etc.
, SFTP dd
exec_command()
, .
, . ?
, , , . , ?