I start the spark flow 24X7 and using the updateStateByKey function to save the calculated historical data, for example, in the case of the NetworkWordCount example.
I am trying to transfer a file with 3lac records with 1 second sleep for every 1500 records. I use 3 workers
- During the period updateStateByKey grows, then the program throws the following exception
ERROR operator: Exception in task ID 1635 java.lang.ArrayIndexOutOfBoundsException: 3
14/10/23 21:20:43 ERROR TaskSetManager: Task 29170.0:2 failed 1 times; aborting job 14/10/23 21:20:43 ERROR DiskBlockManager: Exception while deleting local spark dir: /var/folders/3j/9hjkw0890sx_qg9yvzlvg64cf5626b/T/spark-local-20141023204346-b232 java.io.IOException: Failed to delete: /var/folders/3j/9hjkw0890sx_qg9yvzlvg64cf5626b/T/spark-local-20141023204346-b232/24 14/10/23 21:20:43 ERROR Executor: Exception in task ID 8037 java.io.FileNotFoundException: /var/folders/3j/9hjkw0890sx_qg9yvzlvg64cf5626b/T/spark-local-20141023204346-b232/22/shuffle_81_0_1 (No such file or directory) at java.io.FileOutputStream.open(Native Method)
How to handle this? I think updateStateByKey should periodically reset as it grows rapidly, please share some examples on when and how to reset updateStateByKey .. or do I have other problems? shed some light.
Any help is greatly appreciated. thank you for your time
source share