Dividing data by column value into an indefinite number of tables using the ETL tool

I am trying to split a table into several tables based on the value of this column using Talend Open Studio. Let's say this column can contain any integer value 1, 2, 3, etc. Then, according to this value, these rows should go to table_1, table_2, table_3, etc.

It would be better if I could solve this when the number of different values ​​in this column is not known in advance, but now we can assume that all these output tables already exist. The bottom line is that the number of different values ​​and, therefore, the number of different tables is large enough to configure individual filters manually is not a parameter.

Can this be solved using Talend Open Studio or any open source simulation ETL tools like Pentaho Keetle?

Of course, I could just write a simple script myself, but I would prefer to use the right ETL tool, as the complete ETL process is quite complicated.

+4
source share
3 answers

In PDI or Pentaho Kettle, you can do this with a split. (Right-click parameter in step IIRC). Partitioning in PDI is designed specifically for this kind of problem.

+2
source

Yes, what can you do and split the data based on one column into another table, but for this you need to dynamically create a table: -

tFileInputDelimited-> tFlowtoIterate β†’ tFixedFlowInput-> and can use globalMap () to get the column values ​​and use them to split the data into different tables. β†’ And can use globalMap (Columnused to separate data) in the table name.

enter image description here

+1
source

The first solution that came to my mind was to use a replicator to transfer the current line to three filters that act as protection and allow you to skip lines with only 1 2 or 3 in this column. pic: http://i.imgur.com/FmvwU.png

But you can also build the table name dynamically, if that is what you want, pic: http://i.imgur.com/8LR7Q.png

0
source

Source: https://habr.com/ru/post/1441320/


All Articles