In my understanding, you would not use MrJob if you did not want to use the Hadoop cluster or Amazon's Hadoop services, even if this example uses the launch of local files.
MrJob mainly uses " Hadoop streaming " to submit a task.
This means that all inputs specified as files or folders from Hadoop are transferred to mapper and subsequent results to the reducer. The whole mapper receives an input fragment and considers that all the input data are schematically the same, so that it evenly analyzes and processes the key, the value for each data slice.
Based on this understanding, the inputs are schematically the same with the converter. The only way to include two different circuit data is to alternate them in one file so that the cartographer can understand what is vector data and which is matrix data.
You are actually doing it already.
You can simply improve this by specifying some qualifier if the row is matrix data or vector data. When you see the vector data, the previous matrix data will be applied to them.
matrix, 1, 2, ... matrix, 2, 4, ... vector, 3, 4, ... matrix, 1, 2, ... .....
But the process you talked about works well. You must have all the schematic data in one file.
It still has problems. Reducing the K, V card works better when the complete circuit is present on one line and contains a complete single processor.
In my opinion, you are already doing it right, but I think Map-Reduce is not a suitable mechanism for this kind of data. Hope someone clarifies this even further than I could.