I am working with Pentaho Data Integration (aka Kettle) and I have several Transformations, let's call them A, B, C, D, E.
B depends on A, D depends on C and E depends on B and D. In a job I'd like to run A, B and C, D in parallel:
-> A -> B _
-> C -> D----> E
where A and C run in parallel. Is there any way to execute E only iff B AND D were successful? Right now, looking at the Job metrics, E gets executed as soon as either B OR D are finished.
I just found http://forums.pentaho.org/showthread.php?t=75425 and it seems like it's not easily possible to achieve what I want.
I believe this can be done, but I don't have jobs big enough to really test this well, and it's awkward. Basically, you'll need 4 separate jobs in addition to your A,B,C,D, and E jobs. Let's call them Control Job, Job A_B, Job C_D, and Parallel Jobs.
You set them up like this:
Control Job: start -> Parallel Jobs -> E Parallel Jobs: -> Job A_B start< (Set Start step to run next jobs in parallel) -> Job C_D Job A_B: start -> A -> B Job C_D: start -> C -> D
The key is that A -> B and C -> D need to be in their own job step to retain the dependency. Then Parallel Jobs makes sure both parallel paths have completed before allowing control to proceed to E.