I have batch jobs that I want to run on Kubernetes. As I understand it, Jobs:
If I choose restartPolicy: Never , this means that if the job fails, it will destroy the Pod and transfer to (possibly) another node. If restartPolicy: OnFailure , it will restart the container in the existing Pod. I believe that a certain number of failures cannot be repaired. Is there a way to prevent it from being rescheduled or restarted after a certain period of time and clearing incorrigible tasks?
My current thought on a workaround to this is to have some watchdog process that looks at retryTimes and clears jobs after a certain number of attempts.
source share