Search for a subsequence in a longer sequence

I need to find a sequence in another large sequence, for example, {1,3,2,3} present in {1,3,2,3,4,3} and {5,1,3,2,3} . Is there a way to do this fast with IEnumerable or with something else?

+6
source share
7 answers

This method will find a subsequence in the parent sequence of any type, which can be compared using Equals() :

 public static bool ContainsSubequence<T>(this IEnumerable<T> parent, IEnumerable<T> target) { bool foundOneMatch = false; using (IEnumerator<T> parentEnum = parent.GetEnumerator()) { using (IEnumerator<T> targetEnum = target.GetEnumerator()) { // Get the first target instance; empty sequences are trivially contained if (!targetEnum.MoveNext()) return true; while (parentEnum.MoveNext()) { if (targetEnum.Current.Equals(parentEnum.Current)) { // Match, so move the target enum forward foundOneMatch = true; if (!targetEnum.MoveNext()) { // We went through the entire target, so we have a match return true; } } else if (foundOneMatch) { return false; } } return false; } } } 

You can use it as follows:

 bool match = new[] {1, 2, 3}.ContainsSubsequence(new[] {1, 2}); // match == true match = new[] {1, 2, 3}.ContainsSubsequence(new[] {1, 3}); // match == false 

Note that it assumes that the target sequence has no null elements.

Refresh . Thanks for the support, that's all, but there is an error in the code above! If a partial match is found, but then does not go into full compliance, the process ends and not reset (which is clearly not true when applied to something like {1, 2, 1, 2, 3}.ContainsSubsequence({1, 2, 3}) ).

The above code works very well for a more general definition of a subsequence (i.e., adjacency is not required), but for reset processing (which most IEnumerators do not support) the target sequence should be listed in front. This leads to the following code:

 public static bool ContainsSubequence<T>(this IEnumerable<T> parent, IEnumerable<T> target) { bool foundOneMatch = false; var enumeratedTarget = target.ToList(); int enumPos = 0; using (IEnumerator<T> parentEnum = parent.GetEnumerator()) { while (parentEnum.MoveNext()) { if (enumeratedTarget[enumPos].Equals(parentEnum.Current)) { // Match, so move the target enum forward foundOneMatch = true; if (enumPos == enumeratedTarget.Count - 1) { // We went through the entire target, so we have a match return true; } enumPos++; } else if (foundOneMatch) { foundOneMatch = false; enumPos = 0; if (enumeratedTarget[enumPos].Equals(parentEnum.Current)) { foundOneMatch = true; enumPos++; } } } return false; } } 

This code is error free but will not work well for large (or infinite) sequences.

+5
source

Like @dlev, but it also handles {1,1,1,2}.ContainsSubsequence({1,1,2})

 public static bool ContainsSubsequence<T>(this IEnumerable<T> parent, IEnumerable<T> target) { var pattern = target.ToArray(); var source = new LinkedList<T>(); foreach (var element in parent) { source.AddLast(element); if(source.Count == pattern.Length) { if(source.SequenceEqual(pattern)) return true; source.RemoveFirst(); } } return false; } 
+4
source

You can try something like this to get you started. After converting this list to a string, you can find the sequence using a substring:

 if (String.Join(",", numericList.ConvertAll<string>(x => x.ToString()).ToArray()) { //get sequence } 
+1
source

If you are handling simple serializable types, you can do this quite easily by converting arrays to strings:

 public static bool ContainsList<T>(this List<T> containingList, List<T> containedList) { string strContaining = "," + string.Join(",", containingList) + ","; string strContained = "," + string.Join(",", containedList) + ","; return strContaining.Contains(strContained); } 

Please note that the extension method, so you can call it like this:

 if (bigList.ContainsList(smallList)) { ... } 
+1
source

It works for me

 var a1 = new List<int> { 1, 2, 3, 4, 5 }; var a2 = new List<int> { 2, 3, 4 }; int index = -1; bool res = a2.All( x => index != -1 ? (++index == a1.IndexOf(x)) : ((index = a1.IndexOf(x)) != -1) ); 
0
source

This function checks if the List contains a parent List target with some LINQ:

  public static bool ContainsSequence<T>(this List<T> parent, List<T> target) { for (int fromElement = parent.IndexOf(target.First()); (fromElement != -1) && (fromElement <= parent.Count - target.Count); fromElement = parent.FindIndex(fromElement + 1, p => p.Equals(target.First()))) { var comparedSequence = parent.Skip(fromElement).Take(target.Count); if (comparedSequence.SequenceEqual(target)) return true; } return false; } 
0
source

This is a pretty well-studied problem, and, according to my research, there are two algorithms that are optimal for this work, depending on your data .

Namely, this is the K nuth- M orris- P ratt algorithm and the Boyer-Moore algorithm.

Here I present my implementation of the KMP algorithm, originally discussed here .

It is written to handle the source or parent sequence of an indefinite length up to Int64.MaxValue .

As we can see, the internal implementation returns a sequence of indices at which a substring or target pattern is encountered. You can present these results through your choice of facade.

You can use it just like that

 var contains = new[] { 1, 3, 2, 3, 4, 3 }.Contains(new[] { 1, 3, 2, 3 }); 

Here is a working fiddle that shows the code in action .

Following is the fully commented code of my answer.

 namespace Code { using System; using System.Collections.Generic; using System.Linq; /// <summary> /// A generic implementation of the Knuth-Morris-Pratt algorithm that searches, /// in a memory efficient way, over a given <see cref="IEnumerable{T}"/>. /// </summary> public static class KMP { /// <summary> /// Determines whether a sequence contains the search string. /// </summary> /// <typeparam name="T"> /// The type of elements of <paramref name="source"/> /// </typeparam> /// <param name="source"> /// A sequence of elements /// </param> /// <param name="pattern">The search string.</param> /// <param name="equalityComparer"> /// Determines whether the sequence contains a specified element. /// If <c>null</c> /// <see cref="EqualityComparer{T}.Default"/> will be used. /// </param> /// <returns> /// <c>true</c> if the source contains the specified pattern; /// otherwise, <c>false</c>. /// </returns> /// <exception cref="ArgumentNullException">pattern</exception> public static bool Contains<T>( this IEnumerable<T> source, IEnumerable<T> pattern, IEqualityComparer<T> equalityComparer = null) { if (pattern == null) { throw new ArgumentNullException(nameof(pattern)); } equalityComparer = equalityComparer ?? EqualityComparer<T>.Default; return SearchImplementation(source, pattern, equalityComparer).Any(); } public static IEnumerable<long> IndicesOf<T>( this IEnumerable<T> source, IEnumerable<T> pattern, IEqualityComparer<T> equalityComparer = null) { if (pattern == null) { throw new ArgumentNullException(nameof(pattern)); } equalityComparer = equalityComparer ?? EqualityComparer<T>.Default; return SearchImplementation(source, pattern, equalityComparer); } /// <summary> /// Identifies indices of a pattern string in a given sequence. /// </summary> /// <typeparam name="T"> /// The type of elements of <paramref name="source"/> /// </typeparam> /// <param name="source"> /// The sequence to search. /// </param> /// <param name="patternString"> /// The string to find in the sequence. /// </param> /// <param name="equalityComparer"> /// Determines whether the sequence contains a specified element. /// </param> /// <returns> /// A sequence of indices where the pattern can be found /// in the source. /// </returns> /// <exception cref="ArgumentOutOfRangeException"> /// patternSequence - The pattern must contain 1 or more elements. /// </exception> private static IEnumerable<long> SearchImplementation<T>( IEnumerable<T> source, IEnumerable<T> patternString, IEqualityComparer<T> equalityComparer) { // Pre-process the pattern (var slide, var pattern) = GetSlide(patternString, equalityComparer); var patternLength = pattern.Count; if (patternLength == 0) { throw new ArgumentOutOfRangeException( nameof(patternString), "The pattern must contain 1 or more elements."); } var buffer = new Dictionary<long, T>(patternLength); var more = true; long sourceIndex = 0; // index for source int patternIndex = 0; // index for pattern using(var sourceEnumerator = source.GetEnumerator()) while (more) { more = FillBuffer( buffer, sourceEnumerator, sourceIndex, patternLength, out T t); if (equalityComparer.Equals(pattern[patternIndex], t)) { patternIndex++; sourceIndex++; more = FillBuffer( buffer, sourceEnumerator, sourceIndex, patternLength, out t); } if (patternIndex == patternLength) { yield return sourceIndex - patternIndex; patternIndex = slide[patternIndex - 1]; } else if (more && !equalityComparer.Equals(pattern[patternIndex], t)) { if (patternIndex != 0) { patternIndex = slide[patternIndex - 1]; } else { sourceIndex = sourceIndex + 1; } } } } /// <summary> /// Services the buffer and retrieves the value. /// </summary> /// <remarks> /// The buffer is used so that it is not necessary to hold the /// entire source in memory. /// </remarks> /// <typeparam name="T"> /// The type of elements of <paramref name="source"/>. /// </typeparam> /// <param name="buffer">The buffer.</param> /// <param name="source">The source enumerator.</param> /// <param name="sourceIndex">The element index to retrieve.</param> /// <param name="patternLength">Length of the search string.</param> /// <param name="value">The element value retrieved from the source.</param> /// <returns> /// <c>true</c> if there is potentially more data to process; /// otherwise <c>false</c>. /// </returns> private static bool FillBuffer<T>( IDictionary<long, T> buffer, IEnumerator<T> source, long sourceIndex, int patternLength, out T value) { bool more = true; if (!buffer.TryGetValue(sourceIndex, out value)) { more = source.MoveNext(); if (more) { value = source.Current; buffer.Remove(sourceIndex - patternLength); buffer.Add(sourceIndex, value); } } return more; } /// <summary> /// Gets the offset array which acts as a slide rule for the KMP algorithm. /// </summary> /// <typeparam name="T"> /// The type of elements of <paramref name="source"/>. /// </typeparam> /// <param name="pattern">The search string.</param> /// <param name="equalityComparer"> /// Determines whether the sequence contains a specified element. /// If <c>null</c> /// <see cref="EqualityComparer{T}.Default"/> will be used. /// </param> /// <returns>A tuple of the offsets and the enumerated pattern.</returns> private static (IReadOnlyList<int> Slide, IReadOnlyList<T> Pattern) GetSlide<T>( IEnumerable<T> pattern, IEqualityComparer<T> equalityComparer) { var patternList = pattern.ToList(); var slide = new int[patternList.Count]; int length = 0; int patternIndex = 1; while (patternIndex < patternList.Count) { if (equalityComparer.Equals( patternList[patternIndex], patternList[length])) { length++; slide[patternIndex] = length; patternIndex++; } else { if (length != 0) { length = slide[length - 1]; } else { slide[patternIndex] = length; patternIndex++; } } } return (slide, patternList); } } } 
0
source

Source: https://habr.com/ru/post/896735/


All Articles