Gaussian RBM fails on a trivial example

I want to have a detailed understanding of bounded Boltzmann machines with continuous input variables. I am trying to develop the most trivial example possible so that behavior can be easily tracked. So there it is.

The input data is two-dimensional. Each data point is taken from one of two symmetric normal distributions (sigma = 0.03), whose centers are located well (15 times sigma). RBM has a two-dimensional hidden layer.

I expected to get an RBM that would generate two point clouds with the same facilities as in my train data. I even thought that after adding some sparseness restrictions I would have a hidden level equal to (0,1) for the data received from one distribution, and (1,0) for another.

I wrote the matlab code myself and tried some online solutions (e.g. DeepMat: https://github.com/kyunghyuncho/deepmat ), but no matter how small my step size is, RBM converges to a trivial solution in which the predicted the visible level is the average of the integer data. I tried to increase the dimension of the hidden layer, but it does not change anything. I also tried to normalize the data on the zero means and variance - no change. I also had sigma = 1 instead of 0.03, keeping the spread of 15 * sigma, again no change.

Since this problem is present not only in my code, but also in others, I thought that I could do something fundamentally wrong and try to use RBM in a way that cannot be used. I would appreciate comments / suggestions, or if someone could reproduce my problem.

+5
source share
1 answer

Look here for an explanation of which probability density functions over visible variables can be expressed using the Gauss-Bernoulli RBM. The following figure shows an illustration, where b is the apparent displacement, and w1 and w2 are weight vectors associated with hidden units.

Click for image as I need more reputation to post it directly ...

You see that RBM models a model of a Gaussian mixture with 2 ^ H components, where the average value of each component is a superposition of visible displacement and weight vectors associated with a subset of hidden units. The weight of each component refers to the displacement of the hidden units in this subset.

However, your problem of simulating a mixture of two Gaussian rays can best be represented using RBM with just one hidden unit, where the apparent displacement is equal to the average of one component and the sum of the visible displacement and weight vector, the hidden unit is equal to the average of the second component of the mixture. When your RBM has two hidden blocks, things get complicated, as this RBM models a Gaussian mixture with 4 components.

And even if your RBM has only one hidden unit, studying a Gaussian mixture where the two components are far apart will most likely fail when using training strategies such as contrast divergence and poorly initialized weights and biases.

+6
source

Source: https://habr.com/ru/post/1201712/


All Articles