This question was partially asked here and here without further action, so maybe this is not a meeting place to ask this question, but I found out a little more information that I hope to get an answer to these questions.
I am trying to train object_detection in my own library of about 1k photos. I used the provided pipeline configuration file "ssd_inception_v2_pets.config". I guess I set up the training data correctly. It seems that the program will begin to train just fine. When he was unable to read the data, he received a warning with an error, and I fixed it.
My train_config settings are as follows, although I changed a few numbers to try to run it with fewer resources.
train_config: {
batch_size: 1000 #also tried 1, 10, and 100
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.04 # also tried .004
decay_steps: 800 # also tried 800720. 80072
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "~/Downloads/ssd_inception_v2_coco_11_06_2017/model.ckpt" #using inception checkpoint
from_detection_checkpoint: true
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}
Basically, I think that it happens that the computer quickly gets the resource, and I wonder if someone has an optimization that takes more time to build but uses less resources?
OR am I mistaken why the process is killing, and is there a way to get more information about this from the kernel?
This is the Dmesg information that I get after the process has been killed.
[711708.975215] Out of memory: Kill process 22087 (python) score 517 or sacrifice child
[711708.975221] Killed process 22087 (python) total-vm:9086536kB, anon-rss:6114136kB, file-rss:24kB, shmem-rss:0kB
source
share