You do not have "unknown functions", you have incomplete data points.
This is actually a well-known problem in kNN, and there is a carefully tested template for solving the problem.
Although the problem is actually a "incomplete data" problem, in the context of kNN it is often (usually?) Called the sparseness problem.
In practice, the problem of sparseness in the construction of knn models, with the exception of, possibly, efficient storage / retrieval of data included in the model, is key to kNN.
For example, consider Amazon.comβs recommendation mechanism, in which product ratings as user-defined functions containing columns and users containing rows, so that this matrix is ββ100% complete, every Amazon customer would have to buy and view every Amazon sold. The actual sparseness of this matrix should be> 95%.
The most common method (and which is still the most advanced as far as I know) is known as NNMA or non-negative matrix approximation. This method is also often referred to incorrectly as NNMF, in which F stands for factorization. (NNMA is based on the factorization method, but the result is not a factor in the original data matrix.) I mention this because this alternative term, although incorrect, is widely used, so I would include it in my search queries.
In essence, this technique can be used to remove sparseness from a matrix or, alternatively, to fill in missing cells (i.e., the client on row R did not redo the product of column C).
You can find the full implementation of nnma, including the accompanying tutorial (in python + numpy) on Albert Au Young's Ching-man blog .
In addition, there are several python packages (available through PyPI) that contain packaged code for NNMA. I used only one of them, PyMF , which you can find in Google Code.
So you can see how NNMA works on its magic, here is my simple but complete implementation of NNMA in python + NumPy :
import numpy as NP def cf(q, v): """ the cost function """ qv = (q - v)**2 return NP.sum(NP.sum(qv, axis=0)) def nnma(d, max_iter=100): x, y = d.shape z = y w = NP.random.rand(x, y) h = NP.random.rand(y, z) for i in range(max_iter): wh = NP.dot(w, h) cost = cf(d, wh) if cost == 0: break hn = NP.dot(wT, d) hd = NP.dot(NP.dot(wT, w), h) h *= hn/hd wn = NP.dot(d, hT) wd = NP.dot(NP.dot(w, h), hT) w *= wn/wd return NP.dot(w, h)
To use this NNMA function, simply pass in a 2D array (matrix) with β0β for each missing cell (in other words, your data matrix with β0β is inserted for each missing value):
>>> d
So, as you can see, the results are not so bad, especially for a very simple implementation. All missing elements are filled, and the remaining values ββare close to the corresponding value from the original data matrix, for example, column 0, row 0 is 7.0 in the original data matrix and 6.998 in the approximate.