Tensor object discovery training killed, hunger resource?

Question

Tensor object discovery training killed, hunger resource?

This question was partially asked here and here without further action, so maybe this is not a meeting place to ask this question, but I found out a little more information that I hope to get an answer to these questions.

I am trying to train object_detection in my own library of about 1k photos. I used the provided pipeline configuration file "ssd_inception_v2_pets.config". I guess I set up the training data correctly. It seems that the program will begin to train just fine. When he was unable to read the data, he received a warning with an error, and I fixed it.

My train_config settings are as follows, although I changed a few numbers to try to run it with fewer resources.

train_config: {
  batch_size: 1000 #also tried 1, 10, and 100
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.04  # also tried .004
          decay_steps: 800 # also tried 800720. 80072
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "~/Downloads/ssd_inception_v2_coco_11_06_2017/model.ckpt" #using inception checkpoint
  from_detection_checkpoint: true
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

Basically, I think that it happens that the computer quickly gets the resource, and I wonder if someone has an optimization that takes more time to build but uses less resources?

OR am I mistaken why the process is killing, and is there a way to get more information about this from the kernel?

This is the Dmesg information that I get after the process has been killed.

[711708.975215] Out of memory: Kill process 22087 (python) score 517 or sacrifice child
[711708.975221] Killed process 22087 (python) total-vm:9086536kB, anon-rss:6114136kB, file-rss:24kB, shmem-rss:0kB

+2

linux-kernel object-detection tensorflow protocol-buffers training-data

Derek halden Jul 17 '17 at 18:01

source share

3 answers

, . , data_augmentation_options ssd_random_crop, 8 , .. 2,4. 1, , .

, epsilon , , 1e ^-6 " ". epsilon , , 1, , 1.

+2

xbc 26 . '17 3:28

. :

- , ( )
- , swap, . , , 10-100 , , .
: - , , . , .

, , CPU (). .

, :

:
train_input_reader: { tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED/pet_train.record" } label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" queue_capacity: 500 # change this number min_after_dequeue: 250 # change this number (strictly less than the above) }

eval_input_reader. 20, 10, train 400, 200, , . 8 .

+1

Ciprian Tomoiagă 27 . '17 10:15

source share

Derek Halden · Accepted Answer · 2017-07-17T19:23:21+0000

, , , Dmesg, .

8 , , , , .

Tensor object discovery training killed, hunger resource?

More articles: