COPY cassandra table from csv file

I am creating a demo landscape for Cassandra, Apache Spark and Flume on my Mac (Mac OS X Yosemite with Oracle jdk1.7.0_55). The landscape should serve as proof of the concept of the new analytics platform, so I need some test data in my db cassander. I am using cassandra 2.0.8.

I created some demo data in excel and exported it as a CSV file. The structure looks like this:

ProcessUUID;ProcessID;ProcessNumber;ProcessName;ProcessStartTime;ProcessStartTimeUUID;ProcessEndTime;ProcessEndTimeUUID;ProcessStatus;Orderer;VorgangsNummer;VehicleID;FIN;Reference;ReferenceType 0F0D1498-D149-4FCC-87C9-F12783FDF769;AbmeldungKlrfall;1;Abmeldung Klrfall;2011-02-03 04:05+0000;;2011-02-17 04:05+0000;;Finished;SIXT;4278;A-XA 1;WAU2345CX67890876;KLA-BR4278;internal 

Then I created a keyspace and a column family in cqlsh using:

 CREATE KEYSPACE dadcargate WITH REPLICATAION = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' }; use dadcargate; CREATE COLUMNFAMILY Process ( ProcessUUID uuid, ProcessID varchar, ProcessNumber bigint, ProcessName varchar, ProcessStartTime timestamp, ProcessStartTimeUUID timeuuid, ProcessEndTime timestamp, ProcessEndTimeUUID timeuuid, ProcessStatus varchar, Orderer varchar, VorgangsNummer varchar, VehicleID varchar, FIN varchar, Reference varchar, ReferenceType varchar, PRIMARY KEY (ProcessUUID)) WITH COMMENT='A process is like a bracket around multiple process steps'; 

The surname of the column and all columns in it are created with all lowercase letters - on the same day you will have to investigate this, but this is not so relevant at the moment.

Now I take a CSV file that has about 1600 entries and wants to import it in my table named process as follows:

 cqlsh:dadcargate> COPY process (processuuid, processid, processnumber, processname, processstarttime, processendtime, processstatus, orderer, vorgangsnummer, vehicleid, fin, reference, referencetype) FROM 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE; 

It produces the following error:

 Record #0 (line 1) has the wrong number of fields (15 instead of 13). 0 rows imported in 0.050 seconds. 

Which is essentially true, since I have no timeUUID fields in my cvs-export.

If I try the COPY command without explicit column names like this (given the fact that I really miss two fields):

 cqlsh:dadcargate> COPY process from 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE; 

I get another error:

 Bad Request: Input length = 1 Aborting import at record #0 (line 1). Previously-inserted values still present. 0 rows imported in 0.009 seconds. 

Hm. This is strange, but everything is in order. Perhaps the COPY team does not like the fact that two fields are missing. I still find it strange, because the missing fields are, of course, there (from a structural point of view), but only empty.

I have another snapshot: I deleted the missing columns in excel, exported the file again as cvs, and tried to import the WITHOUT header line into my explicit csv column names, for example:

 cqlsh:dadcargate> COPY process (processuuid, processid, processnumber, processname, processstarttime, processendtime, processstatus, orderer, vorgangsnummer, vehicleid, fin, reference, referencetype) FROM 'Process_BulkData-2.csv' WITH DELIMITER = ';' AND HEADER = TRUE; 

I get this error:

 Bad Request: Input length = 1 Aborting import at record #0 (line 1). Previously-inserted values still present. 0 rows imported in 0.034 seconds. 

Can someone tell me what I'm doing wrong here? According to the copy command documentation, the way I configure my commands should work for at least two of them. Or I would have thought.

But no, I obviously missed something important here.

+6
source share
2 answers
Command

cqlsh COPY can be touching. However, in the COPY documentation is this line:

The number of columns in the CSV input is the same as the number of columns in the metadata of the Cassandra table.

With this in mind, I was able to import your data using COPY FROM , naming the empty fields ( processstarttimeuuid and processendtimeuuid respectively):

 aploetz@cqlsh :stackoverflow> COPY process (processuuid, processid, processnumber, processname, processstarttime, processstarttimeuuid, processendtime, processendtimeuuid, processstatus, orderer, vorgangsnummer, vehicleid, fin, reference, referencetype) FROM 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE; 1 rows imported in 0.018 seconds. aploetz@cqlsh :stackoverflow> SELECT * FROM process ; processuuid | fin | orderer | processendtime | processendtimeuuid | processid | processname | processnumber | processstarttime | processstarttimeuuid | processstatus | reference | referencetype | vehicleid | vorgangsnummer --------------------------------------+-------------------+---------+---------------------------+--------------------+-------------------+--------------------+---------------+---------------------------+----------------------+---------------+------------+---------------+-----------+---------------- 0f0d1498-d149-4fcc-87c9-f12783fdf769 | WAU2345CX67890876 | SIXT | 2011-02-16 22:05:00+-0600 | null | AbmeldungKl‰rfall | Abmeldung Kl‰rfall | 1 | 2011-02-02 22:05:00+-0600 | null | Finished | KLA-BR4278 | internal | A-XA 1 | 4278 (1 rows) 
+13
source

Download csv file to cassandra table

step1) install the cassandra bootloader using this sudo wget https://github.com/brianmhess/cassandra-loader/releases/download/v0.0.23/cassandra-loader URL

step2) sudo chmod + x cassandra-loader

a) Csv file name: "pt_bms_tkt_success_record_details_new_2016_12_082017-01-0312-30-01.csv"

b) the key name is "bms_test"

c) Table name: "pt_bms_tkt_success_record_details_new"

d) the columns "trx_id ...... trx_day"

step3) The location of the csv file and cassandra-loader is "cassandra3.7 / bin /"

step $) [stp @ ril-srv-sp3 bin] $. / cassandra-loader -f pt_bms_tkt_success_record_details_new_2016_12_082017-01-0312-30-01.csv -host 192.168.1.29 -schema "bms_test.pt_bms_tkt_sex_tu_tccess_tccess_tccess_recu_tss_tss_tccess_rec__ccess trx_record_type, trx_date, trx_show_date, cinema_str_id, session_id, ttype_code, item_id, item_var_sequence, trx_booking_id, venue_name, screen_by_tnum, price_group_code, area_cat_str_code, area_by_tnum, venue_capacity, amount_currentprice, venue_class, trx_booking_status_committed, booking_status, amount_paymentstatus, event_application, venue_cinema_companyname, venue_cinema_name, venue_cinema_type, venue_cinema_application, region_str_code, venue_city_name, sub_region_str_code, sub_region_str_name, EVENT_CODE, EVENT_TYPE, event_name, event_language, event_genre, event_censor_rating, event_release_date, event_producer_code_ event_prece_mement_mement_meme_ quantity_item_meme_ quantity_item_mement_meme_ quantity_item_mement_meme_mement_meme_mement_meme_mement_meme_mement_meme_mement_meme_mement_meme_mement_mement_mement_mement_mement_mement_mement_mement_mement_mement_mement_tem_item_mement_mement_mement_time amount_final, amount_tax, offer_isapplied, offer_ty re, OFFER_NAME, offer_amount, payment_lastmode, payment_lastamount, payment_reference1, payment_reference2, payment_bank, customer_loginid, customer_loginstring, offer_referral, customer_mailid, campaign_mobile, trans_str_s_prec_t_m_trans_transmit_m venue_multiplex, venue_state, mobile_type, transaction_range, life_cyclestate_from, transactions_after_offer, is_premium_transaction, city_type, holiday_season, week_type, event_popularity, transactionrange_after_discount, showminusbooking, input_source_name, channel, time_stamp, life_cyclestate_to, record_status, week_name, number_of_active_customers, event_genre1, event_genre2, event_genre3, event_genre4, event_language1, event_language2, event_language3, event_language4, event_release_date_range, showminusbooking_range, rez 1, Reserve 2, reserve3, reserve4, reserve5, payment_mode, payment_type, date_of_first_transaction, transaction_time_in_hours, showtime_in_hours, trx_day) ";

0
source

Source: https://habr.com/ru/post/982818/


All Articles