Parquet ClassCastException: parquet .io.MessageColumnIO cannot be dropped into parquet .io.PrimitiveColumnIO

I am trying to write a simple Scala program that dumps data into Parquet files in HDFS.

I create an Avro scheme, initialize ParquetWriterusing this scheme, match my records with GenericRecordsa specific scheme, and then try to write them using the parquet creator.

But unfortunately, I get the following exception when starting my program:

java.lang.ClassCastException: parquet.io.MessageColumnIO cannot be cast to parquet.io.PrimitiveColumnIO
    at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.getColumnWriter(MessageColumnIO.java:339)
    at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:376)
    at parquet.io.ValidatingRecordConsumer.addBinary(ValidatingRecordConsumer.java:211)
    at parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:260)
    at parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
    at parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
    at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
    at parquet.hadoop.ParquetWriter.write(ParquetWriter.java:324)

Schema Definition:

val avroSchema: Schema = SchemaBuilder.record("event_snapshots").fields()
  .requiredString("userid")
  .requiredString("event")
  .requiredString("firstevent")
  .requiredString("lastevent")
  .requiredInt("count")
  .endRecord()

val parquetSchema = new AvroSchemaConverter().convert(avroSchema)

Author:

val writeSupport = new AvroWriteSupport[GenericRecord](parquetSchema, avroSchema, null)

val blockSize = 256 * 1024 * 1024
val pageSize = 64 * 1024

val writer = new ParquetWriter[GenericRecord](outputDir, writeSupport,
  CompressionCodecName.SNAPPY, blockSize,
  pageSize, pageSize, false, true, configuration)

Recording and recording recording:

val recordBuilder = new GenericRecordBuilder(avroSchema)

recordBuilder.set(avroSchema.getField("userid"), userKey)
recordBuilder.set(avroSchema.getField("event"), eventKey)
recordBuilder.set(avroSchema.getField("firstevent"), 
  dateTimeDateFormat.format(firstEvent))
recordBuilder.set(avroSchema.getField("lastevent"),
  dateTimeDateFormat.format(lastEvent))
recordBuilder.set(avroSchema.getField("count"), event.count)

val record = recordBuilder.build()
writer.write(record)

Any ideas?

+4
source share

Source: https://habr.com/ru/post/1681675/


All Articles