Limit the number of cores used

I am trying to limit the number of cores that a tf session uses, but it does not work. This is how I initialize the session:

sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=1, intra_op_parallelism_threads=1, use_per_session_threads=True)) 

The system has 12 cores / 24 threads, and I see that 40-60% of them are used at any given time. The system also has 8 GPUs, but I built the entire graph using tf.device('/cpu:0') .

UPDATE. To clarify, the graph itself is a simple LSTM-RNN, which is very close to the examples in the tf source code. For completeness, here is the full schedule:

 node_input = tf.placeholder(tf.float32, [n_steps, batch_size, input_size], name = 'input') list_input = [tf.reshape(i, (batch_size, input_size)) for i in tf.split(0, n_steps, node_input)] node_target = tf.placeholder(tf.float32, [n_steps, batch_size, output_size], name = 'target') node_target_flattened = tf.reshape(tf.transpose(node_target, perm = [1, 0, 2]), [-1, output_size]) node_max_length = tf.placeholder(tf.int32, name = 'batch_max_length') node_cell_initializer = tf.random_uniform_initializer(-0.1, 0.1) node_cell = LSTMCell(state_size, input_size, initializer = node_cell_initializer) node_initial_state = node_cell.zero_state(batch_size, tf.float32) nodes_output, nodes_state = rnn(node_cell, list_input, initial_state = node_initial_state, sequence_length = node_max_length) node_output_flattened = tf.reshape(tf.concat(1, nodes_output), [-1, state_size]) node_softmax_w = tf.Variable(tf.random_uniform([state_size, output_size]), name = 'softmax_w') node_softmax_b = tf.Variable(tf.zeros([output_size]), name = 'softmax_b') node_logit = tf.matmul(node_output_flattened, node_softmax_w) + node_softmax_b node_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(node_logit, node_target_flattened, name = 'cross_entropy') node_loss = tf.reduce_mean(node_cross_entropy, name = 'loss') node_optimizer = tf.train.AdamOptimizer().minimize(node_loss) node_op_initializer = tf.initialize_all_variables() 

It is important to note that if the first time I call tf.Session , I pass the appropriate parameters, then the session only works on one core. The problem is that in subsequent runs I cannot change the behavior, although I use use_per_session_threads , which is supposed to specifically take into account session-specific settings. That is, even after I close the session with sess.close() and start a new one with new parameters, the original behavior remains unchanged if I do not restart the python kernel (this is very expensive, because it takes almost hour).

+5
source share
2 answers

TensorFlow performs the optimization when DirectSession is created for the first time, it creates static thread pools, which will then be reused. If you want to change this, specify several different thread pools in session_inter_op_thread_pool and specify which one you want to use.

0
source

use_per_session_threads affects only inter_op_parallelism_threads and not intra_op_parallelism_threads . intra_op_parallelism_threads will be used for a pool of its own threads (see here ), which is always global, so subsequent sessions will no longer affect this.

Note that there are other TF functions that can also initiate the initialization of a pool of native threads, so it may happen that it is already initialized before you create the first tf.Session . One example is tensorflow.python.client.device_lib.list_local_devices() .

I solve this so that very early in my Python script, I create a dummy session with the corresponding values.

0
source

Source: https://habr.com/ru/post/1238995/


All Articles