Gaussian scikit-learn process - Exception

Question

Gaussian scikit-learn process - Exception

I want to use Gaussian Processes to solve the regression problem. My data is as follows: each vector X has a length of 37, and each vector Y has a length of 8.

I use the sklearn package in Python , but trying to use Gaussian processes results in an Exception :

 from sklearn import gaussian_process print "x :", x__ print "y :", y__ gp = gaussian_process.GaussianProcess(theta0=1e-2, thetaL=1e-4, thetaU=1e-1) gp.fit(x__, y__)

x: [[136. 137. 137. 132. 130. 130. 132. 133. 134.
135. 135. 134. 134. 1139. 1019. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 70. 24. 55. 0. 9. 0. 0.] [136. 137. 137. 132. 130. 130. 132. 133. 134. 135. 135. 134. 134. 1139. 1019. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 70. 24. 55. 0. 9. 0. 0.] [82. 76. 80. 103. 135. 155. 159. 156. 145. 138. 130. 122. 122. 689. 569. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. [156. 145. 138. 130. 122. 118. 113. 111. 105. 101. 98. 95. 95. 759. 639. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [112. 111. 111. 114. 114. 113. 114. 114. 112. 111. 109. 109. 109. 1109. 989. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [133. 130. 125. 124. 124. 123. 103. 87. 96. 121. 122. 123. 123. 399. 279. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [104. 109. 111. 106. 91. 86. 117. 123. 123. 120. 121. 115. 115. 549. 429. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [144. 138. 126. 122. 119. 118. 116. 114. 107. 105. 106. 119. 119. 479. 359. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
y: [[7. 9. 13. 30. 34. 37. 36. 41.] [7. 9. 13. 30. 34. 37. 36. 41.] [-4. -nine. -17. -21. -27. -28. -28. -20. ] [-1. -1. -4. -5. 20. 28. 31. 23.] [-1. -2. -3. -1. -4. -7. 8. 58.] [-1. -2. -14.33333333 -14. -13.66666667 -32. -26.66666667 -1. ] [1. 3.33333333 0. -0.66666667 3. 6. 22. 54.] [-2. -eight. -eleven. -17. -17. -16. -16. -23. ]]
----------------------------------------------- --- ------------------------- Tracking the exception (last call last) at () 11 gp = gaussian_process.GaussianProcess (theta0 = 1e-2, thetaL = 1e-4, thetaU = 1e-1) 12 ---> 13 gp.fit (x__, y __)
/usr/local/lib/python2.7/site-packages/sklearn/gaussian_process/gaussian_process.pyc in fit (self, X, y) 300 if (np.min (np.sum (D, axis = 1)) = = 0. 301 and self.corr! = Relation.pure_nugget): → 302 raise Exception ("Multiple input functions cannot have the same" 303 "target value.") 304
Exception: multiple input functions cannot have the same target value.

I found several topics related to the scikit-learn problem , but my version is updated.

+5

python scikit-learn regression gaussian forecasting

Julian Jan 11 '16 at 14:17

source share

1 answer

Farseer · Accepted Answer · 2016-01-11T14:27:07+0000

The issue is known, and it is still not resolved.

This happens because if you have the same points, your matrix is not reversible (singular). (this means that you cannot calculate A ^ -1 - this is part of the solution for the GP).

To solve this problem, simply add some small Gaussian noise to your examples or use another GP library.

You can always try to implement it, in fact it is not so difficult. The most important thing in GP is your kernel function, for example a Gaussian kernel:

 exponential_kernel = lambda x, y, params: params[0] * \ np.exp( -0.5 * params[1] * np.sum((x - y)**2) )

Now we need to build a covariance matrix, for example:

 covariance = lambda kernel, x, y, params: \ np.array([[kernel(xi, yi, params) for xi in x] for yi in y])

So, if you want to predict a new point x calculate its covariance:

 sigma1 = covariance(exponential_kernel, x, x, theta)

and apply the following:

 def predict(x, data, kernel, params, sigma, t): k = [kernel(x, y, params) for y in data] Sinv = np.linalg.inv(sigma) y_pred = np.dot(k, Sinv).dot(t) sigma_new = kernel(x, x, params) - np.dot(k, Sinv).dot(k) return y_pred, sigma_new

This is a very naive implementation, and for data with large sizes, the execution time will be high. The hardest part to calculate here is Sinv = np.linalg.inv(sigma) , which takes O(N^3) .

Gaussian scikit-learn process - Exception

More articles: