I need another look at this.
I wrote out a zip file of hundreds of gigabytes using this exact code without any changes locally on MacOSX.
With 100% unchanged code just deployed to an AWS instance running Ubuntu, the same code works with memory problems (empty space).
Here, the code that runs transfers MyBatis to a CSV file on disk:
File directory = new File(feedDirectory);
File file;
try {
file = File.createTempFile(("feed-" + providerCode + "-"), ".csv", directory);
} catch (IOException e) {
throw new RuntimeException("Unable to create file to write feed to disk: " + e.getMessage(), e);
}
String filePath = file.getAbsolutePath();
log.info(String.format("File name for %s feed is %s", providerCode, filePath));
try (FileOutputStream out = new FileOutputStream(file)) {
streamData(out, providerCode, startDate, endDate);
} catch (IOException e) {
throw new RuntimeException("Unable to write feed to file: " + e.getMessage());
}
public void streamData(OutputStream outputStream, String providerCode, Date startDate, Date endDate) throws IOException {
try (CSVPrinter printer = CsvUtil.openPrinter(outputStream)) {
StreamingHandler<FStay> handler = stayPrintingHandler(printer);
warehouse.doForAllStaysByProvider(providerCode, startDate, endDate, handler);
}
}
private StreamingHandler<FStay> stayPrintingHandler(CSVPrinter printer) {
StreamingHandler<FStay> handler = new StreamingHandler<>();
handler.setHandler((stay) -> {
try {
EXPORTER.writeStay(printer, stay);
} catch (IOException e) {
log.error("Issue with writing output: " + e.getMessage(), e);
}
});
return handler;
}
import org.apache.commons.csv.CSVPrinter;
public void writeStay(CSVPrinter printer, FStay stay) throws IOException {
List<Object> list = asList(stay);
printer.printRecord(list);
}
List<Object> asList(FStay stay) {
List<Object> list = new ArrayList<>(46);
list.add(stay.getUid());
list.add(stay.getProviderCode());
return list;
}
Here is a JVM heap graph (using jvisualvm) when I run it locally. I ran this sequentially with Java 8 (jdk1.8.0_51 and 1.8.0_112) locally and got great results. Even wrote a terabyte of data.

^ 4 , 1,5 , 500 , CSV , .
, Ubuntu jdk 1.8.0_111, , (java.lang.OutOfMemoryError: Java heap space)
Xmx 8 16 25 . ... 10 ... .
, JVisualVm Ubuntu:

, , , , ( , )
, , :
- - Ubuntu vs Mac OS X
- VM AWS
- AWS Ubuntu.
- JDK - 1.8.0_111 Ubuntu, 1.8.0_51 1.8.0_112
- ?
try-with-resources flush/close .
, Ubuntu, , , - -, Ubuntu... OS X .
2
, , , , AWS , , ... , 10 , JVM 20 .
, Ubuntu/Java ?
3
CSVPrinter (OpenCSV CSVWriter Apache CSV) .
, , ... Ubuntu. OS X , .
, .
4
, . , InboundDataHandler redshift driver.
myBatis . , , ( ResultHandler < > () {//, }}, , - .
InboundDataHandler AWS/Redshift..., , myBatis... :
- SqlSessionFactory.
- Redshift, Ubuntu/AWS
- ,
:

, SqlSessionFactoryBean:
@Bean
public javax.sql.DataSource redshiftDataSource() throws ClassNotFoundException {
log.info("Got to datasource config");
Class.forName(dataWarehouseDriver);
DataSource dataSource = new DataSource();
dataSource.setURL(dataWarehouseUrl);
dataSource.setUserID(dataWarehouseUsername);
dataSource.setPassword(dataWarehousePassword);
return dataSource;
}
@Bean
public SqlSessionFactoryBean sqlSessionFactory() throws ClassNotFoundException {
SqlSessionFactoryBean factoryBean = new SqlSessionFactoryBean();
factoryBean.setDataSource(redshiftDataSource());
return factoryBean;
}
myBatis, , , ResultHandler:
warehouse.doForAllStaysByProvider(providerCode, startDate, endDate, new ResultHandler<FStay>() {
@Override
public void handleResult(ResultContext<? extends FStay> resultContext) {
}
});
SQL- - ? , ... AWS. .
6
, . , . .
AWS, , . , myBatis:
<select id="doForAllStaysByProvider" fetchSize="1000" resultMap="FStayResultMap">
select distinct
f_stay.uid,
.
, , AWS ( AWS, , ), , myBatis ResultHandler < > , .
, - jdbc redshift AWS , AWS ( aws, AWS), InboundDataHandler , fetchSize.
, , , - , AWS, 500 , , "force gc" jvisualvm, "" 100 :

, !