Where can I start this optimization problem?

I have a simple program that reads a bunch of things from the file system, filters the results and prints them. This simple program implements a domain-specific language to facilitate selection. This DSL is "compiled" into an execution plan that looks like this (Input was C:\Windows\System32\* OR -md5"ABCDEFG" OR -tf):

Index  Success  Failure  Description
    0        S        1  File Matches C:\Windows\System32\*
    1        S        2  File MD5 Matches ABCDEFG
    2        S        F  File is file. (Not directory)

The filter is applied to this file, and if it succeeds, the index pointer goes to the pointer indicated in the success field, and if it does not work, the index pointer goes to the number specified in the failure field. "S" means that the file passes the filter, F means that the file should be rejected.

Of course, a filter based on checking a simple file (! FILE_ATTRIBUTE_DIRECTORY) is much faster than checking based on an MD5 file, which requires opening and executing the actual file hash. Each β€œopcode” filter has a time class associated with it; MD5 receives a high time interval, ISFILE receives a low time interval.

I would like to change the execution order of this execution plan so that codes of operations that take a lot of time are executed as rarely as possible. For the above plan, this would mean that it should be:

Index  Success  Failure  Description
    0        S        1  File is file. (Not directory)
    1        S        2  File Matches C:\Windows\System32\*
    2        S        F  File MD5 Matches ABCDEFG

" " - NP-Complete ( , 511 ), . " " . , , IL .

:
{C:\Windows\Inf * AND -tp} {-tf C:\Windows\System32\Drivers *}

:

Index  Success  Failure  Description
    0        1        2  File Matches C:\Windows\Inf\*
    1        S        2  File is a Portable Executable
    2        3        F  File is file. (Not directory)
    3        F        S  File Matches C:\Windows\System32\Drivers\*

:

Index  Success  Failure  Description
    0        1        2  File is file. (Not directory)
    1        2        S  File Matches C:\Windows\System32\Drivers\*
    2        3        F  File Matches C:\Windows\Inf\*
    3        S        F  File is a Portable Executable
+3
2

, . , "", node, node .

:

{ C:\Windows\Inf* AND -tp } OR { -tf AND NOT C:\Windows\System32\Drivers* }
         1             2          3                 4

      OR
     /  \
  AND    AND
 /  \   /   \
1    2 3     4

AND (1, 2) (3, 4) , node. OR node .

AND OR , .

+3

@Greg Hewgill , AST, Intermediate. , - ( AST/shrug).

- , , , NOT.

Index  Success  Failure  Description
0        1        2  File Matches C:\Windows\Inf\*
1        S        2  File is a Portable Executable
2        3        F  File is file. (Not directory)
3        F        S  File Matches C:\Windows\System32\Drivers\*

(, S, F Node; NOT, ; node )

Index  Success  Failure  Description
0        1        2  File Matches C:\Windows\Inf\*
1        S        2  File is a Portable Executable
2        L1        F  File is file. (Not directory)

L1=NOT(cost(child))
    |
Pred(cost(PATH))

node ( Success Extracted node ; Failure ; node , Node).

Index  Success  Failure  Description
0        1        L3  File Matches C:\Windows\Inf\*
1        S        L3  File is a Portable Executable


L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
               /           \
L1=NOT(cost(child))     L2=IS(cost(child))
    |                       |
3=Pred(cost(PATH))      2=Pred(cost(ISFILE))

Node

Index  Success  Failure  Description
0        L5       L3  File Matches C:\Windows\Inf\*

L5=OR L3 L4 (cost(Min(L3,L4) + (1.0 - Selectivity(Min(L3,L4))) * Max(L3,L4)))
                    /                          \
                    |                       L4=IS(cost(child))
                    |                           | 
                    |                       1=Pred(cost(PORT_EXE))
                    |
L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
               /           \
L1=NOT(cost(child))     L2=IS(cost(child))
    |                       |
3=Pred(cost(PATH))      2=Pred(cost(ISFILE))

node ( , Success and Failure , node , , Node).

  • root OR, , , , , Failure.

  • root AND, , , Failure , Success.

:

L5=OR L3 L4 (cost(Min(L3,L4) + (1.0 - Selectivity(Min(L3,L4))) * Max(L3,L4)))
                    /                          \
                    |                       L4=AND(cost(as for L3))
                    |                             /               \
                    |                       L6=IS(cost(child))   L7=IS(cost(child))
                    |                           |                       |
                    |                       1=Pred(cost(PORT_EXE))   0=Pred(cost(PATH))
                    |
L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
               /           \
L1=NOT(cost(child))     L2=IS(cost(child))
    |                       |
3=Pred(cost(PATH))      2=Pred(cost(ISFILE))
+1

Source: https://habr.com/ru/post/1757579/


All Articles