How can I find the following value?

Given an array of 0 and 1, for example. array[] = {0, 1, 0, 0, 0, 1, ...}, how can I predict which next value will be as accurate as possible?

What methods are best suited for this kind of task?

+4
source share
4 answers

The forecasting method will depend on the interpretation of the data.

However, it seems that in this particular case we can make some general assumptions that could justify the use of certain machine learning methods.

  • Values ​​are generated one after another in chronological order
  • The values ​​depend on some (possibly not observable) external state. If the state repeats, then the values.

. , .

, , . , k. , k=1, - Markov chain.

k- . , k=3,

0,0,1,1,0,1,0,1,1,1,1,0,1,0,0,1...

:

(0,0,1) -> 1
(0,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 1
(1,0,1) -> 1
(0,1,1) -> 1
(1,1,1) -> 1
(1,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 0
(1,0,0) -> 1

, . 3 0,0,1, (0,0,1) .

k- . , , .

, k .

+3

. :

  • p
  • p

Python :

#!/usr/bin/env python

from __future__ import division

signal = [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0]

def maximum_likelihood(s, last=None):
    """
    The maximum likelihood estimator selects the parameter value which gives
    the observed data the largest possible probability.

    http://mathworld.wolfram.com/MaximumLikelihood.html

    If `last` is given, only use the last `n` values.
    """
    if not last:
        return sum(s) / len(s)
    return sum(s[:-last]) / last

if __name__ == '__main__':
    hits = []

    print('p\tpredicted\tcorrect\tsignal')
    print('-\t---------\t-------\t------')

    for i in range(1, len(signal) - 1):
        p = maximum_likelihood(signal[:i]) # p = maximum_likelihood(signal[:i], last=2)
        prediction = int(p >= 0.5)
        hits.append(prediction == signal[i])
        print('%0.3f\t%s\t\t%s\t%s' % (
            p, prediction, prediction == signal[i], signal[:i]))

    print('accuracy: %0.3f' % (sum(hits) / len(hits)))

:

# p       predicted  correct signal
# -       ---------  ------- ------
# 1.000   1          False   [1]
# 0.500   1          True    [1, 0]
# 0.667   1          True    [1, 0, 1]
# 0.750   1          False   [1, 0, 1, 1]
# 0.600   1          False   [1, 0, 1, 1, 0]
# 0.500   1          True    [1, 0, 1, 1, 0, 0]
# 0.571   1          False   [1, 0, 1, 1, 0, 0, 1]
# 0.500   1          True    [1, 0, 1, 1, 0, 0, 1, 0]
# 0.556   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1]
# 0.600   1          False   [1, 0, 1, 1, 0, 0, 1, 0, 1, 1]
# 0.545   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0]
# 0.583   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1]
# 0.615   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1]
# 0.643   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1]
# 0.667   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1]
# 0.688   1          False   [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1]
# 0.647   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0]
# 0.667   1          False   [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1]
# 0.632   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0]
# 0.650   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1]
# accuracy: 0.650

.

, , 3 , 0,7.

: gist.

+3

, 0s 1s , 0 1, .....

+2

, reset, - , ( node ) , (, ) , .

, , node, . , root node .

Then, when you load a new sequence, you can see which branch is “hot” (it would make a nice visualization like a heatmap / tree by the way), especially if the sequence is long enough. That is, if the order of the elements in the sequence plays a role in what comes next.

+1
source

Source: https://habr.com/ru/post/1650411/


All Articles