I am creating a demo landscape for Cassandra, Apache Spark and Flume on my Mac (Mac OS X Yosemite with Oracle jdk1.7.0_55). The landscape should serve as proof of the concept of the new analytics platform, so I need some test data in my db cassander. I am using cassandra 2.0.8.
I created some demo data in excel and exported it as a CSV file. The structure looks like this:
ProcessUUID;ProcessID;ProcessNumber;ProcessName;ProcessStartTime;ProcessStartTimeUUID;ProcessEndTime;ProcessEndTimeUUID;ProcessStatus;Orderer;VorgangsNummer;VehicleID;FIN;Reference;ReferenceType 0F0D1498-D149-4FCC-87C9-F12783FDF769;AbmeldungKl‰rfall;1;Abmeldung Kl‰rfall;2011-02-03 04:05+0000;;2011-02-17 04:05+0000;;Finished;SIXT;4278;A-XA 1;WAU2345CX67890876;KLA-BR4278;internal
Then I created a keyspace and a column family in cqlsh using:
CREATE KEYSPACE dadcargate WITH REPLICATAION = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' }; use dadcargate; CREATE COLUMNFAMILY Process ( ProcessUUID uuid, ProcessID varchar, ProcessNumber bigint, ProcessName varchar, ProcessStartTime timestamp, ProcessStartTimeUUID timeuuid, ProcessEndTime timestamp, ProcessEndTimeUUID timeuuid, ProcessStatus varchar, Orderer varchar, VorgangsNummer varchar, VehicleID varchar, FIN varchar, Reference varchar, ReferenceType varchar, PRIMARY KEY (ProcessUUID)) WITH COMMENT='A process is like a bracket around multiple process steps';
The surname of the column and all columns in it are created with all lowercase letters - on the same day you will have to investigate this, but this is not so relevant at the moment.
Now I take a CSV file that has about 1600 entries and wants to import it in my table named process as follows:
cqlsh:dadcargate> COPY process (processuuid, processid, processnumber, processname, processstarttime, processendtime, processstatus, orderer, vorgangsnummer, vehicleid, fin, reference, referencetype) FROM 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE;
It produces the following error:
Record
Which is essentially true, since I have no timeUUID fields in my cvs-export.
If I try the COPY command without explicit column names like this (given the fact that I really miss two fields):
cqlsh:dadcargate> COPY process from 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE;
I get another error:
Bad Request: Input length = 1 Aborting import at record
Hm. This is strange, but everything is in order. Perhaps the COPY team does not like the fact that two fields are missing. I still find it strange, because the missing fields are, of course, there (from a structural point of view), but only empty.
I have another snapshot: I deleted the missing columns in excel, exported the file again as cvs, and tried to import the WITHOUT header line into my explicit csv column names, for example:
cqlsh:dadcargate> COPY process (processuuid, processid, processnumber, processname, processstarttime, processendtime, processstatus, orderer, vorgangsnummer, vehicleid, fin, reference, referencetype) FROM 'Process_BulkData-2.csv' WITH DELIMITER = ';' AND HEADER = TRUE;
I get this error:
Bad Request: Input length = 1 Aborting import at record
Can someone tell me what I'm doing wrong here? According to the copy command documentation, the way I configure my commands should work for at least two of them. Or I would have thought.
But no, I obviously missed something important here.