SKlearn: load data by reading multiple files in a directory

I can easily enter test data from a single file. However, whenever I try to enter data from mulitiple files into a directory, I get the following error: AttributeError: the "NoneType" object does not have the "lower" attribute. Please see my codes below, I will appreciate any help. Thank you

from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer from nltk.corpus import stopwords import numpy as np import numpy.linalg as LA import os path = "C:\zircon" def radfil(): for file in os.listdir(path): current = os.path.join(path, file) if os.path.isfile(current): data = open(current, "rb").read() print data train_set = [radfil()] test_set = ["The sun in the sky is bright."] stopWords = stopwords.words('english') vectorizer = CountVectorizer(stop_words=stopWords, min_df=1) #print vectorizer transformer = TfidfTransformer() #print transformer trainVectorizerArray = vectorizer.fit_transform(train_set).toarray() testVectorizerArray = vectorizer.transform(test_set).toarray() print 'Fit Vectorizer to train set', trainVectorizerArray print 'Transform Vectorizer to test set', testVectorizerArray 
+4
source share
1 answer

I assume that your error was caused by an attempt to perform a lower () operation on a variable of type None. Perhaps this happens when in

 trainVectorizerArray = vectorizer.fit_transform(train_set).toarray() 

Radfil () will return type None. Try combining data from files and adding a return statement to radfil (). This is all I can do without a full stack trace.

0
source

Source: https://habr.com/ru/post/1489273/


All Articles