Check if a string is a substring in Prolog

Is there a way to check if a string is a substring of another string in Prolog? I tried converting the string to a list of characters and then checking if the first set is the second subset, which is apparently not restrictive enough. This is my current code:

isSubstring(X,Y):- stringToLower(X,XLower), stringToLower(Y,YLower), isSubset(XLower,YLower). isSubset([],_). isSubset([H|T],Y):- member(H,Y), select(H,Y,Z), isSubset(T,Z). stringToLower([],[]). stringToLower([Char1|Rest1],[Char2|Rest2]):- char_type(Char2,to_lower(Char1)), stringToLower(Rest1,Rest2). 

If I test this with

isSubstring ("test", "tesZting").

he returns yes, but should not return.

+6
source share
4 answers

Prolog lines are lists where each element of the list is an integer value representing the code point of the corresponding character. The string "abc" is exactly equivalent to the list [97,98,99] (if your implementation of the prologue is used in Unicode or ASCII, otherwise the values ​​may differ). This leads to this (probably suboptimal from the perspective of the Big-O perspective), which basically says that X is a substring of S if

  • S has a suffix T such that and
  • X is the prefix T

Here is the code:

 substring(X,S) :- append(_,T,S) , append(X,_,T) , X \= [] . 

We restrict X to something other than an empty list (also like the string nil "" ), since one could conceptually find a huge number of substrings of zero length in any string: a string of length n has 2+ (n-1) Nil substrings, one between each character in the string, one preceding the first character and one following the last character.

+1
source

It is unclear what you mean by string. But since you say that you convert it to a list, you can mean atoms. For this purpose, ISO Prolog offers atom_concat/3 and sub_atom/5 .

 | ?- atom_concat(X,Y,'abc'). X = '', Y = abc ; X = a, Y = bc ; X = ab, Y = c ; X = abc, Y = ''. | ?- sub_atom('abcbcbe',Before,Length,After,'bcb'). Before = 1, Length = 3, After = 3 ; Before = 3, Length = 3, After = 1. 

Otherwise use DCG! Here is how

 seq([]) --> []. seq([E|Es]) --> [E], seq(Es). ... --> [] | [_], ... . subseq([]) --> []. subseq(Es) --> [_], subseq(Es). subseq([E|Es]) --> [E], subseq(Es). seq_substring(S, Sub) :- phrase((...,seq(Sub),...),S). seq_subseq(S, Sub) :- phrase(subseq(Sub),S). 

Confirmations

The first appearance of the above definition ... on p. 205, note 1 to

David B. Searls, "Learning DNA Linguistics with a Specific Sentence Grammar." NAKLP 1989, Volume 1.

+4
source

The problem is with your isSubset/2 .
There are two different situations that you tried to fix in one predicate. Either you are looking for the first position to try to match your substring, or you have already found this point, and check that the lines are built.

 isSubset([], _). isSubSet(Substring, String) :- findStart(Substring, String, RestString), line_up(Substring, RestString). findStart([], String, String). findStart([H|T], [H|T1], [H|T1]). findStart(Substring, [_|T], RestString) :- findStart(Substring, T, RestString). line_up([], _). line_up([H|T], [H|T1]) :- line_up(T, T1). 

You can combine them into a single predicate as follows:

 isSublist([], L, L). isSublist([H|T], [H|T1], [H|T1]) :- isSublist(T, T1, T1). isSublist(L, [_|T], Rest) :- isSublist(L, T, Rest). 
+1
source

Using DCG, you can do the following: (SWI)

 % anything substring anything substr(String) --> ([_|_];[]), String, ([_|_];[]). % is X a substring of Y ? substring(X,Y) :- phrase(substr(X),Y). 
+1
source

Source: https://habr.com/ru/post/958941/


All Articles