So, if you are using mrjob, I have had some success by simply placing pip calls directly in my .mrjob.conf file as a bootstrap action. This is not as elegant as using the requirements.txt file (it will load the same modules for all your tasks). For example, my conf file looks like this:
runners: emr: aws_access_key_id: xx aws_secret_access_key: xx ec2_key_pair: xx ec2_key_pair_file: xx ssh_tunnel_to_job_tracker: true bootstrap_cmds: - sudo apt-get install -y python-pip - sudo pip install pgnparser - sudo pip install boto
and it will load the pgnparser and boto for use in my mrjob .
source share