Values move between MapReduce steps using protocols, usually Raw, JSON, or Pickle.
You must make sure that roaming values can be correctly processed by the protocol you choose. I would suggest that there is no standard JSON representation of the collection, and perhaps there is no raw representation?
Try setting INTERNAL_PROTOCOL to Pickle, as:
class yourMR(MRJob): INTERNAL_PROTOCOL = PickleProtocol def map(self, key, value):
Note. MRJob will handle pickling and rolling for you, so don't worry about it. You can also set the INPUT and OUTPUT protocols if necessary (for several steps or for setting the output from the gearbox).
source share