I'm facing the following problem in two of my spark job, the first one is an algorithm which treat 1M data and make a reduceByKey over 20000 partitions on an important number of keys.
The second is spark random forest on 300000 data over 10000partitions.
In both case i'm falling on this problem, spark kill and create hundreds of executors without moving forward...
INFO AppClient$ClientEndpoint: Executor updated: app-20170111091905-0328/16 is now RUNNING
INFO AppClient$ClientEndpoint: Executor updated: app-20170111091905-0328/15 is now EXITED (Command exited with code 1)
INFO SparkDeploySchedulerBackend: Executor app-20170111091905-0328/15 removed: Command exited with code 1
INFO BlockManagerMasterEndpoint: Trying to remove executor 15 from BlockManagerMaster.
In the first case it happen to the reduceBykey step.
For the second case it happen at : reduce at randomForest.scala:94
My config is as follow : spark 1.6.2, 3 slaves with one worker each with 4cores 6Go MEM, 2Go MEM for driver
I try to find answear but all i found was to increase the akka.framesize what i've done without changes and increase the MEM on worker...
I also red that for the second case it can going from a too large number of trees in the forest.
Moreover i didn't find any clear explanations which justify thoses answears.
Thank you in advance