String cores in R

I am using the stringdot function available in R in the "Kernlab" package. Here is my code

library(kernlab)
x <- c("1","2","3")
y <- c("3","2","1")
lst <- list(x, y)
sk <- stringdot(length = 2, lambda = 1.2, type = "exponential", normalized = TRUE)
q <- kernelMatrix(sk,lst)

To my knowledge, the exponential kernel will create substrings of length 2. For example, here the strings will be 1-2,1-3,2-3from the first vector and 3-2,3-1,2-1from the second vector. He will try to match the input by creating various substrings of a given length and reducing the weight of the substrings according to the given value lambda.

In accordance with my expectations, the output should contain a value of 1 for (x, x) and (y, y) and a value of 0 for (x, y), since there are no common substrings between these inputs, but the output shows that the value of the pair ( x, y) should be 0.4723.

I do not understand why the similarity between x and y is 0.4723.

+6
1

kernelMatrix stringdot, , .

x kernelMatrix, :

if (is(x, "list")) 
  x <- sapply(x, paste, collapse = "")

, lst c("123", "321").

kernelMatrix ( sk - stringkernel):

sk("123", "123")    sk("123", "321")
                    sk("321", "321")

, , .

, :

stringdot(type = "exponential", lambda = 1.2)(123, 321)
#[1] 0.4723893

, length type = "exponential". stringkernel , exponential it lambda, . , , lambda - .

stringdot(type = "spectrum"), , length , , . 123 321 >= 2 , .

, ("\n") > 0 type = "exponential", , .

stringdot(type = "exponential", lambda = 1.2)("blowfish", "mage")
#[1] 0.05274495

, , @Rahul R Lodhi 2002. kernlab , R-, . , python github, , , . , -, python R, /.

:

stringkernel , .

sk_u <- stringdot(type = "exponential", lambda = 1.2, normalized = FALSE)
sk_n <- stringdot(type = "exponential", lambda = 1.2, normalized = TRUE)

lapply(list(unnormalised = sk_u, normalised = sk_n), function(f) {
  c(
    "ab,xyzabqr" = f("ab", "xyzabqr"),
    "ab,abpmnop" = f("ab", "abpmnop"),
    "ab,ab" = f("ab"),
    "xyzabqr,xyzabqr" = f("xyzabqr"),
    "abpmnop,abpmnop" = f("abpmnop")
  )
})

#$unnormalised
#     ab,xyzabqr      ab,abpmnop           ab,ab xyzabqr,xyzabqr abpmnop,abpmnop 
#       3.194444        3.194444        4.467593       20.814201       22.480868 

#$normalised
#     ab,xyzabqr      ab,abpmnop           ab,ab xyzabqr,xyzabqr abpmnop,abpmnop 
#      0.3312674       0.3187513       1.0000000       1.0000000       1.0000000 

, , . , (,) sk_u("ab", "xyzabqr") / sqrt(sk_u("ab") * sk_u("xyzabqr")), , sk_n("ab", "xyzabqr") , , "abpmnop" p.

+2

Source: https://habr.com/ru/post/1015827/


All Articles