Find string inside byte buffer

I am switching from C to Java. I am wondering how to find a string inside a byte buffer, is there something like memchr in java? A bytebuffer is only part of the string, the rest are raw bytes, so any java method should work with bytes + characters.

I am also looking for something like strsep in java to split the strings.

+4
source share
5 answers

You can convert ByteBuffer to String and use indexOf, which can work.

ByteBuffer bb = /* non-direct byte buffer */ String text = new String(bb.array(), 0, bb.position(), bb.remaing()); int index = text.indexOf(searchText); 

This has non-trivial overhead as it creates a string. An alternative is finding brute-force strings that will be faster, but take time to write.

+5
source

You will need to encode the character string in bytes using the correct character encoding for your application. Then use a string search algorithm such as Rabin-Karp or Boyer-Moore to find the resulting byte sequence in the buffer. Or, if your buffers are small, you can simply search for brute force.

I do not know about any open source implementations of these search algorithms, and they are not part of the Java kernel.

+4
source

From the fastest way to find a string in a text file with java :

The best implementation I found in MIMEParser: https://github.com/samskivert/ikvm-openjdk/blob/master/build/linux-amd64/impsrc/com/sun/xml/internal/org/jvnet/mimepull/MIMEParser .java

 /** * Finds the boundary in the given buffer using Boyer-Moore algo. * Copied from java.util.regex.Pattern.java * * @param mybuf boundary to be searched in this mybuf * @param off start index in mybuf * @param len number of bytes in mybuf * * @return -1 if there is no match or index where the match starts */ private int match(byte[] mybuf, int off, int len) { 

It is also necessary:

  private void compileBoundaryPattern(); 
+1
source

The String class has a nice split.split method .

0
source

One option is to use a StringTokenizer that can split the string into an iterable set of tokens according to the specified separator (s). If necessary, the token collection may contain a separator. Example:

 String s = "abc:def-ghi|jkl"; StringTokenizer tokenizer = new StringTokenizer(s, ":-|"); while (tokenizer.hasMoreTokens()) { System.out.print(tokenizer.nextToken()); } 

Expected Result:

ABCDEFGHIJKL

0
source

Source: https://habr.com/ru/post/1388255/


All Articles