Tensor object discovery training killed, hunger resource?

This question was partially asked here and here without further action, so maybe this is not a meeting place to ask this question, but I found out a little more information that I hope to get an answer to these questions.

I am trying to train object_detection in my own library of about 1k photos. I used the provided pipeline configuration file "ssd_inception_v2_pets.config". I guess I set up the training data correctly. It seems that the program will begin to train just fine. When he was unable to read the data, he received a warning with an error, and I fixed it.

My train_config settings are as follows, although I changed a few numbers to try to run it with fewer resources.

train_config: {
  batch_size: 1000 #also tried 1, 10, and 100
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.04  # also tried .004
          decay_steps: 800 # also tried 800720. 80072
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "~/Downloads/ssd_inception_v2_coco_11_06_2017/model.ckpt" #using inception checkpoint
  from_detection_checkpoint: true
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

Basically, I think that it happens that the computer quickly gets the resource, and I wonder if someone has an optimization that takes more time to build but uses less resources?

OR am I mistaken why the process is killing, and is there a way to get more information about this from the kernel?

This is the Dmesg information that I get after the process has been killed.

[711708.975215] Out of memory: Kill process 22087 (python) score 517 or sacrifice child
[711708.975221] Killed process 22087 (python) total-vm:9086536kB, anon-rss:6114136kB, file-rss:24kB, shmem-rss:0kB
+2
source share
3 answers

, , , Dmesg, .

8 , , , , .

0

, . , data_augmentation_options ssd_random_crop, 8 , .. 2,4. 1, , .

, epsilon , , 1e -6 " ". epsilon , , 1, , 1.

+2

. :

  • - , ( )
  • - , swap, . , , 10-100 , , .
  • : - , , . , .

, , CPU (). .

, :

:

train_input_reader: { tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED/pet_train.record" } label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" queue_capacity: 500 # change this number min_after_dequeue: 250 # change this number (strictly less than the above) }

eval_input_reader. 20, 10, train 400, 200, , . 8 .

+1
source

Source: https://habr.com/ru/post/1680444/


All Articles