A few parameters: use only the most 10000 most popular functions, passing max_features=10000 to CountVectorizer and convert the results to an array with dense numpy with an array method:
X_train_array = X_train.toarray()
Otherwise, reduce the dimension to 100 or 300 with:
pca = TruncatedSVD(n_components=300) X_reduced_train = pca.fit_transform(X_train)
However, in my experience, I could never improve RF performance than a well-tuned linear model (for example, logistic regression with the grid regularization parameter) on the initial sparse data (possibly with TF-IDF normalization).
source share