How to quickly encode and then compress a short string containing numbers in C #

I have lines that look like this:

000101456890 348324000433 888000033380 

These are strings that are the same length and contain only numbers.

I would like to find a way to encode and then ompress (reduce the length) the lines. The compression algorithm would simply compress ASCII characters, as they will be used as links to a web page.

So for example:

 www.stackoverflow.com/000101456890 goes to www.stackoverflow.com/aJks 

Is there a way to do this, some method that would quickly compress the work.

Thanks,

+6
source share
2 answers

To make this simple, you can consider each as long (there is a lot of space) and hex-encode; what gives you:

 60c1bfa 5119ba72b1 cec0ed3264 

base-64 will be shorter, but you will need to look at it like a big-endian (note that most .NET is little-endian) and ignore the leading 0 bytes. This gives you:

 Bgwb+g== URm6crE= zsDtMmQ= 

For instance:

  static void Main() { long x = 000101456890L, y = 348324000433L, z = 888000033380L; Console.WriteLine(Convert.ToString(x, 16)); Console.WriteLine(Convert.ToString(y, 16)); Console.WriteLine(Convert.ToString(y, 16)); Console.WriteLine(Pack(x)); Console.WriteLine(Pack(y)); Console.WriteLine(Pack(z)); Console.WriteLine(Convert.ToInt64("60c1bfa", 16).ToString().PadLeft(12, '0')); Console.WriteLine(Convert.ToInt64("5119ba72b1", 16).ToString().PadLeft(12, '0')); Console.WriteLine(Convert.ToInt64("cec0ed3264", 16).ToString().PadLeft(12, '0')); Console.WriteLine(Unpack("Bgwb+g==").ToString().PadLeft(12, '0')); Console.WriteLine(Unpack("URm6crE=").ToString().PadLeft(12, '0')); Console.WriteLine(Unpack("zsDtMmQ=").ToString().PadLeft(12, '0')); } static string Pack(long value) { ulong a = (ulong)value; // make shift easy List<byte> bytes = new List<byte>(8); while (a != 0) { bytes.Add((byte)a); a >>= 8; } bytes.Reverse(); var chunk = bytes.ToArray(); return Convert.ToBase64String(chunk); } static long Unpack(string value) { var chunk = Convert.FromBase64String(value); ulong a = 0; for (int i = 0; i < chunk.Length; i++) { a <<= 8; a |= chunk[i]; } return (long)a; } 
+8
source

I'm not sure Base 64 is URL safe because there is a “/” in its index table (the package function provided in the selected answer will give strings that don't match the URL).

You might consider replacing the '/' character with something more friendly URL or use a different base. For example, base 62 will do it here, for example.

Here is the generic code that translates back and forth from decimal to any number base <= 64 (this is probably faster than converting to bytes and then using Convert.ToBase64String ()):

 static void Main() { Console.WriteLine(Decode("101456890", 10)); Console.WriteLine(Encode(101456890, 62)); Console.WriteLine(Decode("6rhZS", 62)); //Result: //101456890 //6rhZS //101456890 } public static long Decode(string str, int baze) { long result = 0; int place = 1; for (int i = 0; i < str.Length; ++i) { result += Value(str[str.Length - 1 - i]) * place; place *= baze; } return result; } public static string Encode(long val, int baze) { var buffer = new char[64]; int place = 0; long q = val; do { buffer[place++] = Symbol(q % baze); q = q / baze; } while (q > 0); Array.Reverse(buffer, 0, place); return new string(buffer, 0, place); } public static long Value(char c) { if (c == '+') return 62; if (c == '/') return 63; if (c < '0') throw new ArgumentOutOfRangeException("c"); if (c < ':') return c - '0'; if (c < 'A') throw new ArgumentOutOfRangeException("c"); if (c < '[') return c - 'A' + 10; if (c < 'a') throw new ArgumentOutOfRangeException("c"); if (c < '{') return c - 'a' + 36; throw new ArgumentOutOfRangeException("c"); } public static char Symbol(long i) { if (i < 0) throw new ArgumentOutOfRangeException("i"); if (i < 10) return (char)('0' + i); if (i < 36) return (char)('A' + i - 10); if (i < 62) return (char)('a' + i - 36); if (i == 62) return '+'; if (i == 63) return '/'; throw new ArgumentOutOfRangeException("i"); } 
+2
source

Source: https://habr.com/ru/post/890592/


All Articles