How to reduce a fully connected (`InnerProduct ') level using truncated SVD

In Girshick, R Fast-RCNN (ICCV 2015) , section β€œ3.1 Truncated SVD for Faster Detection,” the author suggests using SVD to reduce the size and computation time of a fully connected level.

Given the prepared model ( deploy.prototxt and weights.caffemodel ), how can I use this trick to replace a fully connected layer with a shortened one?

+6
source share
2 answers

Some background of linear algebra
Singular value decomposition ( SVD ) is the decomposition of any matrix W into three matrices:

 W = USV* 

Where U and V are orthonormal matrices, and S is the diagonal with elements of decreasing size along the diagonal. One of the interesting properties of SVD is that it allows you to easily approximate W using a matrix of a lower rank: suppose that you truncate S to have only its k leading elements (instead of all elements on the diagonal), then

 W_app = U S_trunc V* 

is an approximation of rank k W

Using SVD to approximate a fully bonded layer
Suppose we have a deploy_full.prototxt model with a fully linked layer

 # ... some layers here layer { name: "fc_orig" type: "InnerProduct" bottom: "in" top: "out" inner_product_param { num_output: 1000 # more params... } # some more... } # more layers... 

In addition, we have trained_weights_full.caffemodel - trained parameters for the deploy_full.prototxt model.

  1. Copy deploy_full.protoxt to deploy_svd.protoxt and open it in the editor of your choice. Replace the fully bonded layer with these two layers:

     layer { name: "fc_svd_U" type: "InnerProduct" bottom: "in" # same input top: "svd_interim" inner_product_param { num_output: 20 # approximate with k = 20 rank matrix bias_term: false # more params... } # some more... } # NO activation layer here! layer { name: "fc_svd_V" type: "InnerProduct" bottom: "svd_interim" top: "out" # same output inner_product_param { num_output: 1000 # original number of outputs # more params... } # some more... } 
  2. There is a bit of network surgery in python:

     import caffe import numpy as np orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) # get the original weight matrix W = np.array( orig_net.params['fc_orig'][0].data ) # SVD decomposition k = 20 # same as num_ouput of fc_svd_U U, s, V = np.linalg.svd(W) S = np.zeros((U.shape[0], k), dtype='f4') S[:k,:k] = s[:k] # taking only leading k singular values # assign weight to svd net svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S) svd_net.params['fc_svd_V'][0].data[...] = V[:k,:] svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias # save the new weights svd_net.save('trained_weights_svd.caffemodel') 

Now we have deploy_svd.prototxt with trained_weights_svd.caffemodel which approximates the original network with much smaller multiplications and weights.

+7
source

Actually, Ross Girshick py-fast-rcnn repo includes an implementation for the SVD step: compress_net.py .

By the way, you usually need to fine-tune the compressed model in order to restore accuracy (or compress it in a more complicated way, see, for example, β€œ Accelerating Very Deep Convolutional Networks for Classification and Detection, ” Zhang et al.).

Also, for me, scipy.linalg.svd was faster than numpy svd.

+2
source

Source: https://habr.com/ru/post/1259442/


All Articles