Can mrjob task sets be defined?

I tried to derive a python set from mapper in mrjob. I changed the function signatures of my combinators and gearboxes accordingly.

However, I get this error:

Counters From Step 1 Unencodable output: TypeError: 172804 

When changing list sets, this error disappears. Are there certain python types that mappers cannot output in mrjob?

+4
source share
1 answer

Values ​​move between MapReduce steps using protocols, usually Raw, JSON, or Pickle.

You must make sure that roaming values ​​can be correctly processed by the protocol you choose. I would suggest that there is no standard JSON representation of the collection, and perhaps there is no raw representation?

Try setting INTERNAL_PROTOCOL to Pickle, as:

 class yourMR(MRJob): INTERNAL_PROTOCOL = PickleProtocol def map(self, key, value): # mapper def reduce(self, key, value): # reducer 

Note. MRJob will handle pickling and rolling for you, so don't worry about it. You can also set the INPUT and OUTPUT protocols if necessary (for several steps or for setting the output from the gearbox).

+6
source

Source: https://habr.com/ru/post/1435811/


All Articles