It sounds like you have a few questions, so try breaking them.
Import to HDFS
You seem to be looking for Sqoop . Sqoop is a tool that allows you to easily transfer data to / from HDFS and can connect to various databases, including Oracle natively. Sqoop is compatible with the thin Oracle JDBC driver. Here's how you moved from Oracle to HDFS:
sqoop import --connect jdbc:oracle: thin@myhost :1521/db --username xxx --password yyy --table tbl --target-dir /path/to/dir
For more information: here and here . Please note that you can also import directly into the Hive table using Sqoop, which may be convenient for your analysis.
Treatment
As you noted, since your data is initially relational, it is recommended that you use Hive for analysis, as you may be more familiar with syntax like SQL. A pig is purer relational algebra, and the syntax is NOT SQL-like, it is more a matter of preference, but both approaches should work fine.
Since you can import data into Hive directly from Sqoop, your data should be directly prepared for processing after import.
In Hive, you can run your query and tell it to write the results to HDFS:
hive -e "insert overwrite directory '/path/to/output' select * from mytable ..."
Export to TeraData h2>
Cloudera released last year a connector for Teradata for Sqoop, as described here , so you should see how it looks exactly the way you want, Here's how you do it:
sqoop export --connect jdbc:teradata://localhost/DATABASE=MY_BASE --username sqooptest --password xxxxx --table MY_DATA --export-dir /path/to/hive/output
All this, of course, is feasible in any period of time in which you want, in the end, what will matter is the size of your cluster, if you want it to quickly scale your cluster as needed. The good thing with Hive and Sqoop is that the processing will be distributed in your cluster, so you have full control over the schedule.