To extract important functions from a sparse matrix, I just want to remove columns whose average score is less than a certain threshold value. Given the following example
import numpy as np
counts = [[3, 0, 1],
[2, 0, 0],
[3, 0, 0],
[4, 0, 0],
[3, 2, 0],
[3, 0, 2]]
from sklearn.feature_extraction.text import TfidfTransformer
transformer = TfidfTransformer(smooth_idf=False)
tfidf = transformer.fit_transform(counts)
print (tfidf.toarray())
Now we calculate the average score of each function
summarizer_mean = lambda x: np.mean(x, axis=0)
print(summarizer_mean(tfidf))
Average results
[[ 0.81236766 0.14681658 02311266 ]]
How can I remove those columns whose average score is less than the threshold, say 0.23 in my case?
source
share