Some background of linear algebra
Singular value decomposition ( SVD ) is the decomposition of any matrix W into three matrices:
W = USV*
Where U and V are orthonormal matrices, and S is the diagonal with elements of decreasing size along the diagonal. One of the interesting properties of SVD is that it allows you to easily approximate W using a matrix of a lower rank: suppose that you truncate S to have only its k leading elements (instead of all elements on the diagonal), then
W_app = U S_trunc V*
is an approximation of rank k W
Using SVD to approximate a fully bonded layer
Suppose we have a deploy_full.prototxt model with a fully linked layer
# ... some layers here layer { name: "fc_orig" type: "InnerProduct" bottom: "in" top: "out" inner_product_param { num_output: 1000
In addition, we have trained_weights_full.caffemodel - trained parameters for the deploy_full.prototxt model.
Copy deploy_full.protoxt to deploy_svd.protoxt and open it in the editor of your choice. Replace the fully bonded layer with these two layers:
layer { name: "fc_svd_U" type: "InnerProduct" bottom: "in" # same input top: "svd_interim" inner_product_param { num_output: 20 # approximate with k = 20 rank matrix bias_term: false # more params... } # some more... } # NO activation layer here! layer { name: "fc_svd_V" type: "InnerProduct" bottom: "svd_interim" top: "out" # same output inner_product_param { num_output: 1000 # original number of outputs # more params... } # some more... }
There is a bit of network surgery in python:
import caffe import numpy as np orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) # get the original weight matrix W = np.array( orig_net.params['fc_orig'][0].data ) # SVD decomposition k = 20 # same as num_ouput of fc_svd_U U, s, V = np.linalg.svd(W) S = np.zeros((U.shape[0], k), dtype='f4') S[:k,:k] = s[:k] # taking only leading k singular values # assign weight to svd net svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S) svd_net.params['fc_svd_V'][0].data[...] = V[:k,:] svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias # save the new weights svd_net.save('trained_weights_svd.caffemodel')
Now we have deploy_svd.prototxt with trained_weights_svd.caffemodel which approximates the original network with much smaller multiplications and weights.
source share