C # - How to replace accented characters i.e. "-Γ‰" to "-Γ‰"

I am making a very simple Windows application using Visual Studio and C #, which edits subtitle files for movies. I need a program that adds space to dialogue suggestions when they are not there. For instance:

-Hey what?

-Nothing much.

to

- Hey what?

- Nothing special.

I used the toolbar to create an interface with a single button to select the desired file. This is the code I have for this button:

private void button1_Click(object sender, EventArgs e) { if (openFileDialog1.ShowDialog() == DialogResult.OK) { string text = File.ReadAllText(openFileDialog1.FileName, Encoding.GetEncoding("iso-8859-1")); text = text.Replace("-A", "- A"); File.WriteAllText(openFileDialog1.FileName, text, Encoding.GetEncoding("iso-8859-1")); } } 

What this means is basically to replace "-A" with "- A", thus creating space. This is the solution that I came up with, and I planned to do it with each letter, including letters with an accent, such as Γ€, Á, È, Γ‰, etc. Etc.

This does not work. If I put text = text.Replace ("- Γ‰", "- Γ‰"); the program does nothing.

I want to know how to fix this.

Thanks for reading, and if you have a better alternative for my application, please feel free to let me know.

+5
source share
2 answers

For comments, use Regex.

  var rx = new System.Text.RegularExpressions.Regex("^-([^ ])"); ... in your loop var text = rx.Replace(text, "- $1"); 

Basically, it is that it searches for a dash at the beginning of a line, but only followed by NOT a space. The () parameter means that the char following the dash must be "saved". Replaces the search in the provided string and replaces (doh!) The Matched text with a dash, a space, and the same character that was matched earlier. Whatever it is.

Source: https://xkcd.com/208/

Edit: you do not have a loop, you have a line containing the full content of the file, in which each line should contain a line of subtitles (right?). If so, you can configure the regex to treat the string as a list of strings, as this:

  var rx = new Regex("^-([^ ])", RegexOptions.Multiline); 

See this fiddle for an example: https://dotnetfiddle.net/ciFlAu

+5
source

For an accented character, consider using the Unicode view:

 string text = "-\u00C9"; //-Γ‰ text = text.Replace("-\u00C9", "- \u00C9")); 

And you can also use the free space to replace the space, just in case:

 string text = "-\u00C9"; text = text.Replace("-\u00C9", "-\u00A0\u00C9")); 

Then you can code using UTF-8 / UTF-16:

 File.WriteAllText(openFileDialog1.FileName, text, Encoding.GetEncoding("UTF-8")); 
+1
source

Source: https://habr.com/ru/post/1244702/


All Articles