Is creating multiple fragments of your data multiple streams, minimizing training time?

My main problem: I have a 204 gram tfrecord training file with 2 million images and 28 GB for checking the tf.record file, out of 302900 images. It takes 8 hours to prepare one era, and it will take 33 days to learn. I want to speed it up using a few threads and shards, but I'm a little confused about a few things.

There is a shard function in the tf.data.Dataset API , so they mentioned the following about the shard function in the documentation:

Creates a data set that includes only 1 / num_shards of this data set.

This dataset operator is very useful when starting distributed training, as it allows each employee to read a unique subset.

When reading a single input file, you can skip items as follows:

d = tf.data.TFRecordDataset(FLAGS.input_file) d = d.shard(FLAGS.num_workers, FLAGS.worker_index) d = d.repeat(FLAGS.num_epochs) d = d.shuffle(FLAGS.shuffle_buffer_size) d = d.map(parser_fn, num_parallel_calls=FLAGS.num_map_threads) 

Important reservations:

Be sure to bite before using any random operator (such as shuffling). Generally, it is best if the fragment operator is used at the beginning of the dataset pipeline. > For example, when reading from a TFRecord file set, a splinter before conversion> a data set for entering samples. This avoids reading every file for every employee. The following example is an example of an effective shape strategy within a complete> pipeline:

 d = Dataset.list_files(FLAGS.pattern) d = d.shard(FLAGS.num_workers, FLAGS.worker_index) d = d.repeat(FLAGS.num_epochs) d = d.shuffle(FLAGS.shuffle_buffer_size) d = d.repeat() d = d.interleave(tf.data.TFRecordDataset, cycle_length=FLAGS.num_readers, block_length=1) d = d.map(parser_fn, num_parallel_calls=FLAGS.num_map_threads) 

and this is my question:

1- Is there any connection between the number of tf.records files and the number of skulls? the number of shards (working) depends on the number of processor you have, or on the number of tf.records files you have? and how do I create it by simply setting the number of fragments to a specific number ?, or do we need to split the files into several files, and then set a certain number of fragments. pay attention to the number of workers related to the number of fragments.

2- What is the advantage of creating multiple tf.records files? some people said that here it is related to when you need to shuffle tf.records better for you, but using the Schuufle method exists in the tf.Dataset API, we do not need to do this, and other people said here to divide your data into parts smaller size. My question is. Do I need to split the tf.records file into several files as a first step.

3 Now we go to num_threads in the map function (num_paralle_calls in the new tensorflwo version), should be the same as the number of fragments that you have. When I searched, I found that some people say that if you have 10 shards and 2 threads, each thread will have 5 shards.

4 What about the d.interleave function, I know how it works, as mentioned in this example . But again, I missed the num_threads connection, the length of the loop, for example

5- If I want to use multiple GPUs, should I use shards? as mentioned in the comment accepted here

as a result, I am confused about the relationship between (number of files tf.records, num_shards (workers), cyclic length, num_thread (num_parallel_calls). And what is the best situation to create, to minimize training time for both cases (using multiple GPUs and use of one GPU)

+5
source share

Source: https://habr.com/ru/post/1273775/


All Articles