How to read a huge CSV file in Mule

I am using Mule Studio 3.4.0 Community Edition. I have a big problem with how to parse a large CSV file entering the endpoint of the file. The scenario is that I have 3 CSV files and I would put the contents of the file into the database. But when I try to download a huge file (about 144 MB), I get an OutOfMemory exception. I decided how to split / split my large CSV into smaller CSVs (I don't know if this is the best solution). Try to find a way to handle CSV without exception.

<file:connector name="File" autoDelete="true" streaming="true" validateConnections="true" doc:name="File"/> <flow name="CsvToFile" doc:name="CsvToFile"> <file:inbound-endpoint path="src/main/resources/inbox" moveToDirectory="src/main/resources/processed" responseTimeout="10000" doc:name="CSV" connector-ref="File"> <file:filename-wildcard-filter pattern="*.csv" caseSensitive="true"/> </file:inbound-endpoint> <component class="it.aizoon.grpBuyer.AddMessageProperty" doc:name="Add Message Property"/> <choice doc:name="Choice"> <when expression="INVOCATION:nome_file=azienda" evaluator="header"> <jdbc-ee:csv-to-maps-transformer delimiter="," mappingFile="src/main/resources/companies-csv-format.xml" ignoreFirstRecord="true" doc:name="CSV2Azienda"/> <jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="InsertAziende" queryTimeout="-1" connector-ref="jdbcConnector" doc:name="Database Azienda"> <jdbc-ee:query key="InsertAziende" value="INSERT INTO aw006_azienda VALUES (#[map-payload:AW006_ID], #[map-payload:AW006_ID_CLIENTE], #[map-payload:AW006_RAGIONE_SOCIALE])"/> </jdbc-ee:outbound-endpoint> </when> <when expression="INVOCATION:nome_file=servizi" evaluator="header"> <jdbc-ee:csv-to-maps-transformer delimiter="," mappingFile="src/main/resources/services-csv-format.xml" ignoreFirstRecord="true" doc:name="CSV2Servizi"/> <jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="InsertServizi" queryTimeout="-1" connector-ref="jdbcConnector" doc:name="Database Servizi"> <jdbc-ee:query key="InsertServizi" value="INSERT INTO ctrl_aemd_unb_servizi VALUES (#[map-payload:CTRL_ID_TIPO_OPERAZIONE], #[map-payload:CTRL_DESCRIZIONE], #[map-payload:CTRL_COD_SERVIZIO])"/> </jdbc-ee:outbound-endpoint> </when> <when expression="INVOCATION:nome_file=richiesta" evaluator="header"> <jdbc-ee:csv-to-maps-transformer delimiter="," mappingFile="src/main/resources/requests-csv-format.xml" ignoreFirstRecord="true" doc:name="CSV2Richiesta"/> <jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="InsertRichieste" queryTimeout="-1" connector-ref="jdbcConnector" doc:name="Database Richiesta"> <jdbc-ee:query key="InsertRichieste" value="INSERT INTO ctrl_aemd_unb_richiesta VALUES (#[map-payload:CTRL_ID_CONTROLLER], #[map-payload:CTRL_NUM_RICH_VENDITORE], #[map-payload:CTRL_VENDITORE], #[map-payload:CTRL_CANALE_VENDITORE], #[map-payload:CTRL_CODICE_SERVIZIO], #[map-payload:CTRL_STATO_AVANZ_SERVIZIO], #[map-payload:CTRL_DATA_INSERIMENTO])"/> </jdbc-ee:outbound-endpoint> </when> </choice> </flow> 

Please, I do not know how to solve this problem. Thanks in advance for any help

+4
source share
2 answers

As Steve said, csv-to-maps-transformer may try to load the entire file into memory before processing it. What you can try to do is split the csv file into smaller parts and send these parts to the VM for processing separately. First create a component to achieve this first step:

 public class CSVReader implements Callable{ @Override public Object onCall(MuleEventContext eventContext) throws Exception { InputStream fileStream = (InputStream) eventContext.getMessage().getPayload(); DataInputStream ds = new DataInputStream(fileStream); BufferedReader br = new BufferedReader(new InputStreamReader(ds)); MuleClient muleClient = eventContext.getMuleContext().getClient(); String line; while ((line = br.readLine()) != null) { muleClient.dispatch("vm://in", line, null); } fileStream.close(); return null; } } 

Then divide the main stream into two

 <file:connector name="File" workDirectory="yourWorkDirPath" autoDelete="false" streaming="true"/> <flow name="CsvToFile" doc:name="Split and dispatch"> <file:inbound-endpoint path="inboxPath" moveToDirectory="processedPath" pollingFrequency="60000" doc:name="CSV" connector-ref="File"> <file:filename-wildcard-filter pattern="*.csv" caseSensitive="true" /> </file:inbound-endpoint> <component class="it.aizoon.grpBuyer.AddMessageProperty" doc:name="Add Message Property" /> <component class="com.dgonza.CSVReader" doc:name="Split the file and dispatch every line to VM" /> </flow> <flow name="storeInDatabase" doc:name="receive lines and store in database"> <vm:inbound-endpoint exchange-pattern="one-way" path="in" doc:name="VM" /> <Choice> . . Your JDBC Stuff . . <Choice /> </flow> 

Maintain your current file-connector configuration to enable streaming. With this solution, csv data can be processed without having to first load the entire file into memory. NTN

+3
source

I believe csv-to-maps-transformer will make the whole file memorable. Since you are dealing with one large file, personally, I would just write a Java class to handle it. The File endpoint transfers the stream to your custom transformer. Then you can connect to JDBC and simultaneously extract information from the string without loading the entire file. I used OpenCSV to analyze CSV for me. So your java class will contain the following:

 protected Object doTransform(Object src, String enc) throws TransformerException { try { //Make a JDBC connection here //Now read and parse the CSV FileReader csvFileData = (FileReader) src; BufferedReader br = new BufferedReader(csvFileData); CSVReader reader = new CSVReader(br); //Read the CSV file and add the row to the appropriate List(s) String[] nextLine; while ((nextLine = reader.readNext()) != null) { //Push your data into the database through your JDBC connection } //Close connection. }catch (Exception e){ } 
+1
source

Source: https://habr.com/ru/post/1479329/


All Articles