Gmail-style advanced search syntax parsing?

I want to parse a search string similar to the one provided by Gmail using Perl. An example input would be "tag: thing by: {user1 user2} {-tag: a by: user3}". I want to put it in a tree structure like

{and => [
    "tag:thing",
    {or => [
       "by:user1",
       "by:user2",
    ]},
    {or => [
       {not => "tag:a"},
       "by:user3",
    ]},
}

General rules:

  • Tokens separated by default space with an AND operator.
  • Tokens in braces are alternatives (OR). Brackets can go before or after the field specifier. that is, "by: {user1 user2}" and "{by: user1 by: user2}" are equivalent.
  • Tokens with a hyphen prefix are excluded.

These elements can also be combined and nested: for example. "{by: user5 - {tag: k by: user3}}, etc."

- , . ? ( ?)

- ?

( DBIx:: Class.)

+1
4

Regex (, ). , , , , , CFG. CFG , . - Perl CFG-, cathartic.

+1

, .

:

my $search = "tag:thing by:{user1 user2} {-tag:a by:user3}"
my @tokens = split /(?![^{]*})\s+/, $search;
foreach (@tokens) {
    my $or = s/[{}]//g; # OR mode
    my ($default_field_specifier) = /(\w+):/;
}

, :

$_ = "by:{user1 z:{user2 3} } x {-tag:a by:user3} zz";
pos($_) = 0;
scan_query("");

sub scan_query {
    my $default_specifier = shift;
    while (/\G\s*((?:[-\w:]+)|(?={))({)?/gc) {
        scan_query($1), next if $2;
        my $query_token = $default_specifier . $1;
    }
    /\G\s*\}/gc;
}

Regexes :)!

0

YAPP , . , LARR (1) Parsing Automaton.

0

Parse :: Refdescent can generate parsers for this kind of thing. You probably need some experience with parsers to use it effectively.

0
source

Source: https://habr.com/ru/post/1775270/


All Articles