EDIT: With changing requirements, I will stay in the spirit with regular expressions:
Regex.Replace(original, string.Format(@"(\p{{L}}{{{0}}})\p{{L}}+", maxLength), "$1...");
Output with maxLength = 6:
Here a text with tooooo... long words This is sweeee...! And someth... more.
The old answer is below because I liked the approach, although it is a bit ... dirty :-).
I made a small regex replacement to do this. This is in PowerShell at the moment (for prototyping, I will convert it to C # later):
'Here' a text with tooooooooooooo long words','This is sweeeeeeeeeeeeeeeet! And something more.' | % { [Regex]::Replace($_, '(\w*?)(\w)\2{2,}(\w*)', { $m = $args[0] if ($m.Value.Length -gt 6) { $l = 6 - $m.Groups[1].Length - $m.Groups[3].Length $m.Groups[1].Value + $m.Groups[2].Value * $l + $m.Groups[3].Value } }) }
Output:
Here a text with tooooo long words This is sweeet! And something more.
What this does is search for character runs ( \w for now, should be changed to something reasonable) that follow the pattern (something)(repeated character more than two times)(something else) . To replace it, he uses a function that checks to see if its length has the required maximum length, then calculates how long the repeated part can really still match the total length, and then shortens only the repeated part to that length.
This is messy. He will not be able to truncate words that are otherwise very long (for example, โsomethingโ in the second test sentence), and also to change the set of characters that make up words. Think that this can be a starting point if you want to go this route, but not a ready-made solution.
C # code:
public static string TrimLongWords(this string original, int maxCount) { return Regex.Replace(original, @"(\w*?)(\w)\2{2,}(\w*)", delegate(Match m) { var first = m.Groups[0].Value; var rep = m.Groups[1].Value; var last = m.Groups[2].Value; if (m.Value.Length > maxCount) { var l = maxCount - first.Length - last.Length; return first + new string(rep[0], l) + last; } return m.Value; }); }
A nicer option for a character class is likely to be similar to \p{L} , depending on your needs.