Fill in missing sequence values with neural networks

Question

Fill in missing sequence values with neural networks

I want to create a small project, and I want to use neural networks with python. I found that pyramids are the best solution. But so far, all the examples and questions that I have found cannot help me.

I have a sequence of numbers. Hundreds of lines. Some values are missing, and instead of a number there is an "x".

for instance

1425234838636**x**40543485435097**x**43953458345345430843967067045764607457607645067045**x**04376037654067458674506704567408576405

etc. This is just an example. Not my sequence.

I thought to read the values one by one and train my neural network, and when I find the x, I will predict the number, and I will continue to train it with the following numbers.

What I have found so far is training like this

 trainSet.addSample([0,0,0,0],[1])

with some inputs and some outputs.

Any tips how can I continue with it?

Edit: I understand something, and I would like to receive feedback, because I do not know if this is correct.

I still have the line above. I split it in a list, so I have a list where each object is a number.

 for ind in range(len(myList)): if not myList[ind] == "x" and not myList[ind+1]=="x": ds.addSample(myList[ind],myList[ind+1]) else: break net = FeedForwardNetwork() inp = LinearLayer(1) h1 = SigmoidLayer(1) outp = LinearLayer(1) net.addOutputModule(outp) net.addInputModule(inp) net.addModule(h1) net.addConnection(FullConnection(inp, h1)) net.addConnection(FullConnection(h1, outp)) net.sortModules() trainer = BackpropTrainer(net, ds) trainer.trainOnDataset(ds,1000) trainer.testOnData(verbose=True) lis[ind+1] = net.activate((ind,)) GO to the beggining and continue from the last "x" which replaced from the net.activate()

What do you think? Do you believe something like this will work?

+4

python artificial-intelligence time-series neural-network forecasting

Tasos May 20, '13 at 19:41

source share

3 answers

I can give you not a specific answer for this python library, but as I see you have a neural network and you give it form samples

  [i0 i1 ... in] -> [o0 o1 ... on]
     (input vector) (output vector)

Now you train a network with samples of vectors of length 1. Your network does not know about the sequence of numbers presented to it, this sequence is interesting only for the result of a trained network.

To get a network that knows about the sequence, you can represent the vectors of consecutive numbers as input and the one number you want as output. You leave sequences containing example X:

  Sequence: 1 2 3 4 X 2 3 4 5 6 7 8
     Training with input length 3, output length 1:
     [1 2 3] -> 4
     [2 3 4] -> 5 (the second one, as the first one is not available)
     [3 4 5] -> 6
     [4 5 6] -> 7
     [5 6 7] -> 8

I think that using this, your network can adapt a bit to the input sequence. How to extract the correct training sequences as input, I must go to the domain expert (you).

+1

Alpha one Jun 01 '13 at 13:40

source share

What you are describing is a statistical application called Imputation : replacing missing values in your data. The traditional approach is not related to neural networks, but, of course, several studies have been conducted in this direction . This is not my area, but I recommend that you check the literature.

+1

David marx Jun 01 '13 at 15:38

source share

Engineero · Accepted Answer · 2013-06-07T20:03:30+0000

In general, if you train your ANN using backpropagation, you basically train your I / O card. This means that your training set must contain known I / O relationships (none of your unknown values are included in the training set). Then ANN becomes an approximation of the actual relationship between your inputs and outputs.

Then you can call x = net.activate([seq]) , where seq is the input sequence associated with the unknown value of x .

If x is an unknown input sequence for a known result, you must invoke ANN inversion. I don’t think there is a simple way to call ANN in pybrain, but you can just train ANN with reverse training source. In other words, use your known results as learning inputs and their associated sequences as learning outcomes.

The main thing is to consider the appropriateness of the tool and training data for what you are trying to do. If you just want to predict x as a function of the previous number, I think you are training correctly. I assume that x will be a function of the previous n numbers, but in this case you want to update your dataset as:

 n = 10 for ind in range(len(myList)): # Don't overrun our bounds if ind == len(myList)-1: break # Check that our sequence is valid for i in range(ind-n, ind+1): if i >= 0 and myList[i] == "x": # we have an invalid sequence ind += i # start next seq after invalid entry break # Add valid training sequence to data set ds.addSample(myList[ind-n:ind],myList[ind+1])

Fill in missing sequence values ​​with neural networks

More articles:

Fill in missing sequence values with neural networks