Efficient string splitting and manipulation in java

What is the most efficient way to split a string into a very simple delimiter?

Some background:

I am porting a function that I wrote in C with a bunch of java pointer arithmetic and it is incredibly slow (after some optimization it’s still 5 * slower). By profiling it, it turns out that the overhead in String.split

This function takes a host name or IP address and makes it common:

123.123.123.123 -. > * 123.123.123

abcexample.com -. > * Example.com

This can be done on several million items on a regular basis, so performance is a problem.

Edit: rules for conversion:

  • If it is an ip address, replace the first part
  • Otherwise, find the main domain name and make the previous part common.

foo.bar.com-> * .bar.com foo.bar.co.uk-> * .bar.co.uk

Now I have rewritten the use of lastIndexOf and substrings to work from the back, and performance has improved not by the day, but by the hour.

I will leave the question open for another 24 hours before relying on the best answer for future reference

Here is what I just came up with (the ip part is a minor check before calling this function)

private static String hostConvert(String in) {
    final String [] subs = { "ac", "co", "com", "or", "org", "ne", "net", "ad", "gov", "ed" };

    int dotPos = in.lastIndexOf('.');
    if(dotPos == -1)
        return in;
    int prevDotPos = in.lastIndexOf('.', dotPos-1);
    if(prevDotPos == -1)
        return in;
    CharSequence cs = in.subSequence(prevDotPos+1, dotPos);
    for(String cur : subs) {
        if(cur.contentEquals(cs)) {
            int start = in.lastIndexOf('.', prevDotPos-1);
            if(start == -1 || start == 0)
                return in;
            return "*" + in.substring(start);
        }
    }

    return "*" + in.substring(prevDotPos);
}

If there is room for further improvement, it would be good to hear.

+3
source share
3 answers

Something like this is about as fast as you can do it:

static String starOutFirst(String s) {
    final int K = s.indexOf('.');
    return "*" + s.substring(K);
}
static String starOutButLastTwo(String s) {
    final int K = s.lastIndexOf('.', s.lastIndexOf('.') - 1);
    return "*" + s.substring(K);
}

Then you can do:

    System.out.println(starOutFirst("123.123.123.123"));
    // prints "*.123.123.123"

    System.out.println(starOutButLastTwo("a.b.c.example.com"));
    // prints "*.example.com"

You may need to use a regular expression to find out which of the two methods applies to any given string.

+6

. indexOf ( "." ) . ()

, , split(), , (1 ).

+2

, , . ".". "*"? - ? , n- ". '*'?

, - Boyer-Moore. , , .

, String Java . , . CharSequence, , , . StringBuffer CharBuffer. concurrency , StringBuilder .

CharSequence String, . , , (.. {'*'}), , , , . , , String .

UPDATE

. , , CharSequence, ( , ), Strings -. StringBuffer StringBuilder, , , , . CharBuffer ; , nio : .

, , , , , "*" '*' .

, . : '.' "*" , , StringBuilder .

, gc() , - -op, , VM 1M String s. - , .

import java.util.ArrayList;
import java.util.Arrays;

public class StringSplitters {
    private static final String PREFIX = "*";
    private static final char DOT = '.';

    public static String starOutFirst(String s) {
        final int k = s.indexOf(DOT);
        return PREFIX + s.substring(k);
    }

    public static String starOutFirstSb(String s) {
        StringBuilder sb = new StringBuilder();
        final int k = s.indexOf(DOT);
        return sb.append(PREFIX).append(s.substring(k)).toString();
    }

    public static void main(String[] args) throws InterruptedException {
        double[] firstRates = new double[10];
        double[] firstSbRates = new double[10];
        double firstAvg = 0;
        double firstSbAvg = 0;
        double firstMin = Double.POSITIVE_INFINITY;
        double firstMax = Double.NEGATIVE_INFINITY;

        double firstSbMin = Double.POSITIVE_INFINITY;
        double firstSbMax = Double.NEGATIVE_INFINITY;

        for (int i = 0; i < 10; i++) {
            firstRates[i] = testFirst();

            firstAvg += firstRates[i];

            if (firstRates[i] < firstMin)
                firstMin = firstRates[i];
            if (firstRates[i] > firstMax)
                firstMax = firstRates[i];

            Thread.sleep(100);
            System.gc();
            Thread.sleep(100);
        }

        firstAvg /= 10.0d;

        for (int i = 0; i < 10; i++) {
            firstSbRates[i] = testFirstSb();

            firstSbAvg += firstSbRates[i];

            if (firstSbRates[i] < firstSbMin)
                firstSbMin = firstSbRates[i];
            if (firstSbRates[i] > firstSbMax)
                firstSbMax = firstSbRates[i];

            Thread.sleep(100);
            System.gc();
            Thread.sleep(100);
        }

        firstSbAvg /= 10.0d;

        System.out.printf("First:\n\tMin:\t%07.3f\tMax:\t%07.3f\tAvg:\t%07.3f\n\tRates:\t%s\n\n", firstMin, firstMax,
                firstAvg, Arrays.toString(firstRates));
        System.out.printf("FirstSb:\n\tMin:\t%07.3f\tMax:\t%07.3f\tAvg:\t%07.3f\n\tRates:\t%s\n\n", firstSbMin,
                firstSbMax, firstSbAvg, Arrays.toString(firstSbRates));

    }

    private static double testFirst() {
        ArrayList<String> strings = new ArrayList<String>(1000000);

        for (int i = 0; i < 1000000; i++) {
            int first = (int) (Math.random() * 128);
            int second = (int) (Math.random() * 128);
            int third = (int) (Math.random() * 128);
            int fourth = (int) (Math.random() * 128);

            strings.add(String.format("%d.%d.%d.%d", first, second, third, fourth));
        }

        long before = System.currentTimeMillis();
        for (String s : strings) {
            starOutFirst(s);
        }
        long after = System.currentTimeMillis();

        return 1000000000.0d / (after - before);
    }

    private static double testFirstSb() {
        ArrayList<String> strings = new ArrayList<String>(1000000);

        for (int i = 0; i < 1000000; i++) {
            int first = (int) (Math.random() * 128);
            int second = (int) (Math.random() * 128);
            int third = (int) (Math.random() * 128);
            int fourth = (int) (Math.random() * 128);

            strings.add(String.format("%d.%d.%d.%d", first, second, third, fourth));
        }

        long before = System.currentTimeMillis();
        for (String s : strings) {
            starOutFirstSb(s);
        }
        long after = System.currentTimeMillis();

        return 1000000000.0d / (after - before);
    }
}

First:
    Min:    3802281.369 Max:    5434782.609 Avg:    5185796.131
    Rates:  [3802281.3688212926, 5181347.150259067, 5291005.291005291, 5376344.086021505, 5291005.291005291, 5235602.094240838, 5434782.608695652, 5405405.405405405, 5434782.608695652, 5405405.405405405]

FirstSb:
    Min:    4587155.963 Max:    5747126.437 Avg:    5462087.511
    Rates:  [4587155.963302752, 5747126.436781609, 5617977.528089887, 5208333.333333333, 5681818.181818182, 5586592.17877095, 5586592.17877095, 5524861.878453039, 5524861.878453039, 5555555.555555556]
+2

Source: https://habr.com/ru/post/1747797/


All Articles