I am struggling to approach the bag of words / vocabulary method to represent my input as one hot vector for my model of a neural network in keras.
I would like to create a simple three-layer network, but I need help in understanding and developing an approach for converting my tagged data into text aimed at 7 labels, ranging from 0 to 1 in steps of 0.2.
I tried to use scicit vectorisers, but they are too tough, that is, they either indicate words or characters, while I need the sentence to be compared with vocabulary, which includes words, characters, punctuation and emoji. When I use tfid in a test sentence, it only considers words and ignores everything else. I also need guidance to take this hot approach and how it will be implemented in keras.
Really appreciate any help
Greetings
source
share