Keras - Text Classification - LSTM - How to enter text?

Question

Keras - Text Classification - LSTM - How to enter text?

I am trying to understand how to use LSTM to classify a specific dataset that I have.

I researched and found this keras and imdb example: https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py

However, I was confused about how the data set should be processed for input.

I know keras has pre-processing of text methods, but I'm not sure what to use.

x contains n lines with texts, and y classifies the text according to happiness / sorrow. Basically, 1.0 means 100% happy, and 0.0 means completely sad. numbers can vary, e.g. 0.25 ~~ etc.

So my question is: how to enter x and y? Should I use a bag of words? Any feedback is appreciated!

I coded this below, but I keep getting the same error #('Bad input argument to theano function with name ... at index 1(0-based)', 'could not convert string to float: negative')

 import keras.preprocessing.text import numpy as np np.random.seed(1337) # for reproducibility from keras.preprocessing import sequence from keras.models import Sequential from keras.layers.core import Dense, Activation from keras.layers.embeddings import Embedding from keras.layers.recurrent import LSTM print('Loading data...') import pandas thedata = pandas.read_csv("dataset/text.csv", sep=', ', delimiter=',', header='infer', names=None) x = thedata['text'] y = thedata['sentiment'] x = x.iloc[:].values y = y.iloc[:].values ################################### tk = keras.preprocessing.text.Tokenizer(nb_words=2000, filters=keras.preprocessing.text.base_filter(), lower=True, split=" ") tk.fit_on_texts(x) x = tk.texts_to_sequences(x) ################################### max_len = 80 print "max_len ", max_len print('Pad sequences (samples x time)') x = sequence.pad_sequences(x, maxlen=max_len) ######################### max_features = 20000 model = Sequential() print('Build model...') model = Sequential() model.add(Embedding(max_features, 128, input_length=max_len, dropout=0.2)) model.add(LSTM(128, dropout_W=0.2, dropout_U=0.2)) model.add(Dense(1)) model.add(Activation('sigmoid')) model.compile(loss='binary_crossentropy', optimizer='rmsprop') model.fit(x, y=y, batch_size=200, nb_epoch=1, verbose=1, validation_split=0.2, show_accuracy=True, shuffle=True) # at index 1(0-based)', 'could not convert string to float: negative')

+5

theano keras lasagne lstm

KenobiShan Apr 18 '16 at 17:40

source share

1 answer

Amw 5g · Accepted Answer · 2016-04-18T23:24:25+0000

See how you use the CSV analyzer to read the text. Make sure the fields are in Text, Sentiment format if you want to use the parser as you wrote it in your code.

Keras - Text Classification - LSTM - How to enter text?

More articles: