Keras - Text Classification - LSTM - How to enter text?

I am trying to understand how to use LSTM to classify a specific dataset that I have.

I researched and found this keras and imdb example: https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py

However, I was confused about how the data set should be processed for input.

I know keras has pre-processing of text methods, but I'm not sure what to use.

x contains n lines with texts, and y classifies the text according to happiness / sorrow. Basically, 1.0 means 100% happy, and 0.0 means completely sad. numbers can vary, e.g. 0.25 ~~ etc.

So my question is: how to enter x and y? Should I use a bag of words? Any feedback is appreciated!

I coded this below, but I keep getting the same error #('Bad input argument to theano function with name ... at index 1(0-based)', 'could not convert string to float: negative')

 import keras.preprocessing.text import numpy as np np.random.seed(1337) # for reproducibility from keras.preprocessing import sequence from keras.models import Sequential from keras.layers.core import Dense, Activation from keras.layers.embeddings import Embedding from keras.layers.recurrent import LSTM print('Loading data...') import pandas thedata = pandas.read_csv("dataset/text.csv", sep=', ', delimiter=',', header='infer', names=None) x = thedata['text'] y = thedata['sentiment'] x = x.iloc[:].values y = y.iloc[:].values ################################### tk = keras.preprocessing.text.Tokenizer(nb_words=2000, filters=keras.preprocessing.text.base_filter(), lower=True, split=" ") tk.fit_on_texts(x) x = tk.texts_to_sequences(x) ################################### max_len = 80 print "max_len ", max_len print('Pad sequences (samples x time)') x = sequence.pad_sequences(x, maxlen=max_len) ######################### max_features = 20000 model = Sequential() print('Build model...') model = Sequential() model.add(Embedding(max_features, 128, input_length=max_len, dropout=0.2)) model.add(LSTM(128, dropout_W=0.2, dropout_U=0.2)) model.add(Dense(1)) model.add(Activation('sigmoid')) model.compile(loss='binary_crossentropy', optimizer='rmsprop') model.fit(x, y=y, batch_size=200, nb_epoch=1, verbose=1, validation_split=0.2, show_accuracy=True, shuffle=True) # at index 1(0-based)', 'could not convert string to float: negative') 
+5
source share
1 answer

See how you use the CSV analyzer to read the text. Make sure the fields are in Text, Sentiment format if you want to use the parser as you wrote it in your code.

+3
source

Source: https://habr.com/ru/post/1247379/


All Articles