Remove all problematic characters in an intelligent way in C #

Is there any .Net library to remove all problematic line characters and only leave alphanumeric characters, hyphens and underscores (or a similar subset) in an intelligent way? This is used for use in URLs, file names, etc.

I am looking for something similar to stringex that can do the following:

Simple foreplay

"plain english" .to_url => "plain english"

"doesn’t mean anything .to_url =>" Its-nothing-at-all "

rock and roll .to_url => rock and roll

Highlight

"$ 12 costs Ruby power" .to_url => "12-dollars-worth-of-ruby power"

"10% if you're acting now .to_url =>" 10 percent off if you're act now "

You don’t even want to trust Iconv for this next part.

"kick it en Français .to_url =>" Kick-a-en-Francais "

"rock his style in Spain" .to_url => "Rock-on-espanol style"

"tell your readers 你好" .to_url => "Tell your readers-ni-hao"

+3
source share
9 answers

I could not find any library that does this, like in Ruby, so I ended up writing my own method. This is just in case:

/// <summary>
/// Turn a string into something that URL and Google friendly.
/// </summary>
/// <param name="str"></param>
/// <returns></returns>
public static string ForUrl(this string str) {
  return str.ForUrl(true);
}
public static string ForUrl(this string str, bool MakeLowerCase) {
  // Go to lowercase.
  if (MakeLowerCase) {
    str = str.ToLower();
  }

  // Replace accented characters for the closest ones:
  char[] from = "ÂÃÄÀÁÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöøùúûüýÿ".ToCharArray();
  char[] to = "AAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaceeeeiiiidnoooooouuuuyy".ToCharArray();
  for (int i = 0; i < from.Length; i++) {
    str = str.Replace(from[i], to[i]);
  }

  // Thorn http://en.wikipedia.org/wiki/%C3%9E
  str = str.Replace("Þ", "TH");
  str = str.Replace("þ", "th");

  // Eszett http://en.wikipedia.org/wiki/%C3%9F
  str = str.Replace("ß", "ss");

  // AE http://en.wikipedia.org/wiki/%C3%86
  str = str.Replace("Æ", "AE");
  str = str.Replace("æ", "ae");

  // Esperanto http://en.wikipedia.org/wiki/Esperanto_orthography
  from = "ĈĜĤĴŜŬĉĝĥĵŝŭ".ToCharArray();
  to = "CXGXHXJXSXUXcxgxhxjxsxux".ToCharArray();
  for (int i = 0; i < from.Length; i++) {
    str = str.Replace(from[i].ToString(), "{0}{1}".Args(to[i*2], to[i*2+1]));
  }

  // Currencies.
  str = new Regex(@"([¢€£\$])([0-9\.,]+)").Replace(str, @"$2 $1");
  str = str.Replace("¢", "cents");
  str = str.Replace("€", "euros");
  str = str.Replace("£", "pounds");
  str = str.Replace("$", "dollars");

  // Ands
  str = str.Replace("&", " and ");

  // More aesthetically pleasing contractions
  str = str.Replace("'", "");
  str = str.Replace("’", "");

  // Except alphanumeric, everything else is a dash.
  str = new Regex(@"[^A-Za-z0-9-]").Replace(str, "-");

  // Remove dashes at the begining or end.
  str = str.Trim("-".ToCharArray());

  // Compact duplicated dashes.
  str = new Regex("-+").Replace(str, "-");

  // Let url-encode just in case.
  return str.UrlEncode();
}
0
source

You can try this

string str = phrase.ToLower();  //optional
str = str.Trim();
str = Regex.Replace(str, @"[^a-z0-9\s_]", ""); // invalid chars        
str = Regex.Replace(str, @"\s+", " ").Trim(); // convert multiple spaces into one space
str = str.Substring(0, str.Length <= 400 ? str.Length : 400).Trim(); // cut and trim it
str = Regex.Replace(str, @"\s", "-");
+2
source

, . , Stackoverflow URL (, URL-.

,

+2

, , ( , , ):

#

:

ÜberUtils - 3:

, ( ) , , "", Hogan Microsoft Anti Cross Site , , -, , .

, (, ), Microsoft AntiXSS:

AntiXss

, ( ), AntiXSS, , - URL- "slug" (, , Qaru ).

#:

# Slug

+1

HTTPUtility.UrlEncode, , . , + ' . , , ,

0
0

, , , , , . $x = > x-dollars, x% = > x-percent. , . . Regex , , .

.

public static string ToUrl(this string text)
{
    return text.Trim().Regex.Replace(text, ..., ...);
}
0

-, Ruby- ( Perl) , , - , - ", " . , , , , , .

, . "" ASCII. Perl ( Ruby), .

0

I use something similar on my blog.

public class Post
{

    public string Subject { get; set; }

    public string ResolveSubjectForUrl()
    {
        return Regex.Replace(Regex.Replace(this.Subject.ToLower(), "[^\\w]", "-"), "[-]{2,}", "-");
    }

}
0
source

Source: https://habr.com/ru/post/1727956/


All Articles