I have an option here. It took 3 minutes on my machine (I really have to get a new one: P).
macbook 2006 2 GHz Intel Core 2 Duo 2 GB DDR2 SDRAM
Accuracy Achieved: 0.3555421686747
I am sure that if you set up a vector machine, you can get the best results.
First, I changed the csv file format so that it could be easier to import. I just replaced the first space with a comma, which can be used as a delimiter during import.
cat testing.csv | sed 's/\ /,/' > test.csv cat training.csv | sed 's/\ /,/' > train.csv
In python, I used pandas to read csv files and list comprehension to extract functions. This is much faster than for loops. Subsequently, I used sklearn to train a vector support machine.
import pandas from sklearn import svm from sklearn.metrics import accuracy_score featureList = ['obama','usa','bieber'] train_df = pandas.read_csv('train.csv',sep=',',dtype={'label':int, 'tweet':str}) test_df = pandas.read_csv('test.csv',sep=',',dtype={'label':int, 'tweet':str}) train_features = [[w in str(tweet) for w in featureList] for tweet in train_df.values[:,1]] test_features = [[w in str(tweet) for w in featureList] for tweet in test_df.values[:,1]] train_labels = train_df.values[:,0] test_labels = test_df.values[:,0] clf = svm.SVC(max_iter=1000) clf.fit(train_features, train_labels) prediction = clf.predict(test_features) print 'accuracy: ',accuracy_score(test_labels.tolist(), prediction.tolist())