Using AWS EMR version 5.2.1 as a data processing environment, dealing with a huge JSON file that has a complex scheme with many nested fields, Hive cannot handle it and errors as it reaches the current limit of 4000 characters in column lengths.
Error processing statement: FAILED: Runtime error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException (message: Invalid column type name: [...]
If you look at the documentation, there are already many problems associated with this problem or similar, although all are unresolved [1 , 2 ]. In this case, it is recommended that you change several Metastore fields to a different value to provide greater length for structure definitions.
COLUMNS_V2.TYPE_NAMETABLE_PARAMS.PARAM_VALUESERDE_PARAMS.PARAM_VALUESD_PARAMS.PARAM_VALUE
As indicated in the first issue, the proposed solution mentions:
[...] after setting the values, Metastore must also be configured and restarted. "
However, it is not indicated anywhere that it should also be configured next to the DB values.
Thus, after updating the fields, the current local Metastore (in this case mysql) from string to mediumtext and restarting the Metastore process can still not be reached, since the attempt to load JSON continues to fail with the same error.
Am I missing something or did someone find an alternative way to work around the problem?
source share