I am working on a project and I am sticking to the following scenario.
I have a table: superMerge (id, name, salary)
and I have two more tables: table1 and table2
all tables (table1, table2 and superMerge) have the same structure.
Now my task is to insert / update the supermarket table from table1 and table2. table1 is updated every 10 minutes and table2 every 20 minutes, so at time t = 20mins I have 2 jobs trying to update the same table (superMerge in this case.)
I want to understand how I can get this parallel insert / update / merge into a superMerge table using Spark or any other hadoop application.
source
share