Torch / Lua, what is the structure of the neural network for mini-batch training?

I am still working on implementing a mini-packet gradient update on my Siamese neural network. I used to have an implementation problem that was resolved correctly here .

Now I realized that there was a mistake in the architecture of my neural network, which was due to my incomplete understanding of the correct implementation.

Until now, I have always used a top-down gradient approach, not related to the mini-channel, in which I went through the training elements one by one before updating the gradient. Now I want to implement gradient updating through a mini-package, starting with microchips consisting of N = 2 elements.

My question is: how can I change the architecture of my Siamese neural network so that it can process a mini packet of N = 2 elements instead of a single element?

This is the (simplified) architecture of my Siamese neural network:

nn.Sequential { [input -> (1) -> (2) -> output] (1): nn.ParallelTable { input |`-> (1): nn.Sequential { | [input -> (1) -> (2) -> output] | (1): nn.Linear(6 -> 3) | (2): nn.Linear(3 -> 2) | } |`-> (2): nn.Sequential { | [input -> (1) -> (2) -> output] | (1): nn.Linear(6 -> 3) | (2): nn.Linear(3 -> 2) | } ... -> output } (2): nn.CosineDistance } 

I have:

  • 2 identical Siamese neural networks (upper and lower)
  • 6 units of input
  • 3 hidden devices
  • 2 output devices
  • cosine distance function that compares the output of two parallel neural networks

Here is my code:

 perceptronUpper= nn.Sequential() perceptronUpper:add(nn.Linear(input_number, hiddenUnits)) perceptronUpper:add(nn.Linear(hiddenUnits,output_number)) perceptronLower= perceptronUpper:clone('weight', 'gradWeights', 'gradBias', 'bias') parallel_table = nn.ParallelTable() parallel_table:add(perceptronUpper) parallel_table:add(perceptronLower) perceptron = nn.Sequential() perceptron:add(parallel_table) perceptron:add(nn.CosineDistance()) 

This architecture works very well if I have a gradient update function that takes 1 element; how to change it so that it can manage the mini file ?

EDIT: I should probably use the nn.Sequencer () class , changing the last two lines of my code to:

 perceptron:add(nn.Sequencer(parallel_table)) perceptron:add(nn.Sequencer(nn.CosineDistance())). 

What do you guys think?

+5
source share
1 answer

Each nn module can work with mini filters. Some work only with mini filters, for example. (Spatial)BatchNormalization . The module knows how many dimensions its input should contain (let them say D), and if the module receives the tensor D + 1, it assumes that the first dimension is the dimension of the batch. For example, see the nn.Linear documentation :

The input tensor specified in the direct (input) must be either a vector (1D tensor) or a matrix (2D tensor). If the input is a matrix, then each row is considered an input sample of a given batch.

 function table_of_tensors_to_batch(tbl) local batch = torch.Tensor(#tbl, unpack(tbl[1]:size():totable())) for i = 1, #tbl do batch[i] = tbl[i] end return batch end inputs = { torch.Tensor(5):fill(1), torch.Tensor(5):fill(2), torch.Tensor(5):fill(3), } input_batch = table_of_tensors_to_batch(inputs) linear = nn.Linear(5, 2) output_batch = linear:forward(input_batch) print(input_batch) 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 [torch.DoubleTensor of size 3x5] print(output_batch) 0,3128 -1,1384 0,7382 -2,1815 1,1637 -3,2247 [torch.DoubleTensor of size 3x2] 

Ok, but what about containers ( nn.Sequential , nn.Paralel , nn.ParallelTable and others)? The container itself does not process the input, it simply sends the input (or its corresponding part) to the corresponding module that it contains. ParallelTable , for example, simply applies the ith member module to an element of the ith input table. Thus, if you want it to process the packet, each input [i] (the input is a table) should be a tensor with a batch size, as described above.

 input_number = 5 output_number = 2 inputs1 = { torch.Tensor(5):fill(1), torch.Tensor(5):fill(2), torch.Tensor(5):fill(3), } inputs2 = { torch.Tensor(5):fill(4), torch.Tensor(5):fill(5), torch.Tensor(5):fill(6), } input1_batch = table_of_tensors_to_batch(inputs1) input2_batch = table_of_tensors_to_batch(inputs2) input_batch = {input1_batch, input2_batch} output_batch = perceptron:forward(input_batch) print(input_batch) { 1 : DoubleTensor - size: 3x5 2 : DoubleTensor - size: 3x5 } print(output_batch) 0,6490 0,9757 0,9947 [torch.DoubleTensor of size 3] target_batch = torch.Tensor({1, 0, 1}) criterion = nn.MSECriterion() err = criterion:forward(output_batch, target_batch) gradCriterion = criterion:backward(output_batch, target_batch) perceptron:zeroGradParameters() perceptron:backward(input_batch, gradCriterion) 

Why does nn.Sequencer exist? Is it possible to use it instead? Yes, but it is not recommended . The sequencer takes a sequence table and applies a module to each element of the table, regardless of acceleration. In addition, he must make copies of this module, so this "batch mode" is much less effective than online training (without a package). The sequencer was designed as part of repetitive networks, it makes no sense to use it in your case.

+2
source

Source: https://habr.com/ru/post/1243275/


All Articles