Java code ends in space memory on AWS but not MacOSX

I need another look at this.

I wrote out a zip file of hundreds of gigabytes using this exact code without any changes locally on MacOSX.

With 100% unchanged code just deployed to an AWS instance running Ubuntu, the same code works with memory problems (empty space).

Here, the code that runs transfers MyBatis to a CSV file on disk:

File directory = new File(feedDirectory);
    File file;
    try {
        file = File.createTempFile(("feed-" + providerCode + "-"), ".csv", directory);
    } catch (IOException e) {
        throw new RuntimeException("Unable to create file to write feed to disk: " + e.getMessage(), e);
    }

    String filePath = file.getAbsolutePath();
    log.info(String.format("File name for %s feed is %s", providerCode, filePath));

    // output file
    try (FileOutputStream out = new FileOutputStream(file)) {
        streamData(out, providerCode, startDate, endDate);
    } catch (IOException e) {
        throw new RuntimeException("Unable to write feed to file: " + e.getMessage());
    }

    public void streamData(OutputStream outputStream, String providerCode, Date startDate, Date endDate) throws IOException {
    try (CSVPrinter printer = CsvUtil.openPrinter(outputStream)) {
        StreamingHandler<FStay> handler = stayPrintingHandler(printer);
        warehouse.doForAllStaysByProvider(providerCode, startDate, endDate, handler);
    }
}

private StreamingHandler<FStay> stayPrintingHandler(CSVPrinter printer) {
    StreamingHandler<FStay> handler = new StreamingHandler<>();
    handler.setHandler((stay) -> {
        try {
            EXPORTER.writeStay(printer, stay);
        } catch (IOException e) {
            log.error("Issue with writing output: " + e.getMessage(), e);
        }
    });
    return handler;
}

// The EXPORTER method
 import org.apache.commons.csv.CSVPrinter;
    public void writeStay(CSVPrinter printer, FStay stay) throws IOException {
    List<Object> list = asList(stay);
    printer.printRecord(list);
}

List<Object> asList(FStay stay) {
    List<Object> list = new ArrayList<>(46);
    list.add(stay.getUid());
    list.add(stay.getProviderCode());
    //....
    return list;
}

Here is a JVM heap graph (using jvisualvm) when I run it locally. I ran this sequentially with Java 8 (jdk1.8.0_51 and 1.8.0_112) locally and got great results. Even wrote a terabyte of data.

Please note that the pile looks great

^ 4 , 1,5 , 500 , CSV , .

, Ubuntu jdk 1.8.0_111, , (java.lang.OutOfMemoryError: Java heap space)

Xmx 8 16 25 . ... 10 ... .

, JVisualVm Ubuntu:

Same code, same operation

, , , , ( , )

, , :

  • - Ubuntu vs Mac OS X
  • VM AWS
  • AWS Ubuntu.
  • JDK - 1.8.0_111 Ubuntu, 1.8.0_51 1.8.0_112

- ?

try-with-resources flush/close .

, Ubuntu, , , - -, Ubuntu... OS X .

2

, , , , AWS , , ... , 10 , JVM 20 .

, Ubuntu/Java ?

3

CSVPrinter (OpenCSV CSVWriter Apache CSV) .

, , ... Ubuntu. OS X , .

, .

4

, . , InboundDataHandler redshift driver.

myBatis . , , ( ResultHandler < > () {//, }}, , - .

InboundDataHandler AWS/Redshift..., , myBatis... :

  • SqlSessionFactory.
  • Redshift, Ubuntu/AWS
  • ,

: heap dump screenshot

, SqlSessionFactoryBean:

 @Bean
public javax.sql.DataSource redshiftDataSource() throws ClassNotFoundException {
    log.info("Got to datasource config");
    // Dynamically load driver at runtime.
    Class.forName(dataWarehouseDriver);
    DataSource dataSource = new DataSource();
    dataSource.setURL(dataWarehouseUrl);
    dataSource.setUserID(dataWarehouseUsername);
    dataSource.setPassword(dataWarehousePassword);
    return dataSource;
}

@Bean
public SqlSessionFactoryBean sqlSessionFactory() throws ClassNotFoundException {
    SqlSessionFactoryBean factoryBean = new SqlSessionFactoryBean();
    factoryBean.setDataSource(redshiftDataSource());
    return factoryBean;
}

myBatis, , , ResultHandler:

warehouse.doForAllStaysByProvider(providerCode, startDate, endDate, new ResultHandler<FStay>() {
            @Override
            public void handleResult(ResultContext<? extends FStay> resultContext) {
                // do nothing

            }
        });

SQL- - ? , ... AWS. .

6 , . , . .

AWS, , . , myBatis:

<select id="doForAllStaysByProvider" fetchSize="1000" resultMap="FStayResultMap">        
    select distinct
        f_stay.uid,

.

, , AWS ( AWS, , ), , myBatis ResultHandler < > , .

, - jdbc redshift AWS , AWS ( aws, AWS), InboundDataHandler , fetchSize.

, , , - , AWS, 500 , , "force gc" jvisualvm, "" 100 :

it works

, !

+4
1
, .

- ​​, InboundDataHandler Amazon RedShift/postgres JDCB .

SqlSession , Amazon :

- JDBC, JDBC.

, ResultHandlers MyBatis... , , - , AWS Redshift JDBC AWS AWS-.

, 'fetchSize' MyBatis:

<select id="doForAllStaysByProvider" fetchSize="1000" resultMap="FStayResultMap">        
select distinct
    f_stay.uid,

! . , , .

, , Amazon, .

, JDBC Redshift - , Amazon - ..., , , .

, , , . , "" , , , .

, .

+3

Source: https://habr.com/ru/post/1659779/


All Articles