I have a huge folder filled with XML documents, some of which may break because they contain these curly quotes, i.e. quotes from Microsoft Word i.e. intellectual quotes. I just want to check quickly to see what I am up against. Does anyone know how to fight for them so that I can easily find the criminals?
Edit
Here is a simplified example.
<?xml version="1.0" encoding="UTF-8"?> <items> <item>Pretend this is a curly quote: '</item> </items>
Curly quotes have the following Unicode code points and UTF-8 sequence:
Name CodePoint UTF-8 sequence ---- --------- -------------- LEFT SINGLE QUOTATION MARK U + 2018 0xE2 0x80 0x98 RIGHT SINGLE QUOTATION MARK U+2019 0xE2 0x80 0x99 SINGLE LOW-9 QUOTATION MARK U+201A 0xE2 0x80 0x9A SINGLE HIGH-REVERSED-9 QUOTATION MARK U+201B 0xE2 0x80 0x9B LEFT DOUBLE QUOTATION MARK U+201C 0xE2 0x80 0x9C RIGHT DOUBLE QUOTATION MARK U+201D 0xE2 0x80 0x9D DOUBLE LOW-9 QUOTATION MARK U+201E 0xE2 0x80 0x9E DOUBLE HIGH-REVERSED-9 QUOTATION MARK U+201F 0xE2 0x80 0x9F
XML UTF-8, .
, UTF-8 dalle, :
grep -r -P "\xE2\x80\x9C" .
-r , -P grep , Perl.
-r
-P
xml, , , , , , XML ( , ).
, „ ", - sed -i .bak 's/["„]/"/' file1 file2 ... ( Linux/OSX/cygwin Windows), , -.
„
"
sed -i .bak 's/["„]/"/' file1 file2 ...
, , , , . - // . :
Text | Error ---------------- O*Connor| Yes O'Connor| No O’Connor| No
CF-.
<cfif #REFind("[[:punct:],[:digit:]]",textName)# GT 0 > <cfset temp_name = textName.ReplaceAll(JavaCast( "string", "[^A-Za-z\u2018\u2019\u201A\u201B\u2032\u2035\'\-\ ]" ),JavaCast( "string", "" )) > <cfif (len(temp_name )EQ len(textName)) > <!--- If you find single quote or hyphen, do nothing ---> <cfelse> <cfset errormsg = The Text contains special charctaer"> </cfif>
: http://axonflux.com/handy-regexes-for-smart-quotes
Mac, grep ( .) GNU grep :
brew tap homebrew/dupes brew install homebrew/dupes/grep
:
ggrep -r -P "\xE2\x80\x9C" . etc.
dalle neubert script, , , .
Source: https://habr.com/ru/post/1795562/More articles:C ++ to create a game for Nintendo DS - c ++What does Lucene ScoreDoc.score mean? - javaJAXB Parsing - strange behavior - javahttps://translate.googleusercontent.com/translate_c?depth=1&pto=aue&rurl=translate.google.com&sl=ru&sp=nmt4&tl=en&u=https://fooobar.com/questions/1795560/add-library-jar-files-to-ejb-project&usg=ALkJrhhXwiu-13DQ8Pj96Z1QEsNkxadx-QCatch all the exceptions and do ... nothing? - c #Question about the utf-8 order - perlэквивалентно GMap2.savePosition в API карт V3? - javascriptMySQL COUNT - returns null result, not NULL - sqlWPF button with context menu, How to link the width of the context menu with the width of the button? - wpfSorting one array from another array? - sortingAll Articles