I have a pipeline that accepts a pandas DataFrame,, dfwith several text columns, combines them into a document and vectorizes the document, as a result scipy.sparse.csr_matrix, call it X.
Later, I execute the nearest neighbor queries using the X strings (which correspond to the rows of my original DataFrame), and when I want to, say, display the text name of the nearest neighbors of the vectors, I use a vector integer position in X, like this:
>>> print "Nearest neighbor name is", df.iloc[position_in_x,:]['my_name']
Is this a bad move, or can an integer position in a DataFrame be considered static if I don't add or remove from the DataFrame?
I wonder how others dealt with this. One solution that arises for me is to create row vectors of the Xnew column in df.
Thanks!
source
share