I'm not 100% sure, but I think it can be seen from the Unicode database http://www.unicode.org/Public/6.2.0/ucd/UnicodeData.txt .
For example, the entry for "à" is
00E0; LATIN SMALL LETTER A WITH GRAVE; Ll; 0; L; 0061 0300 ;;;; N; LATIN SMALL LETTER A GRAVE ;; 00C0 ;; 00C0
where field # 6 is the “Decomposition Display” in “a” + U + 0300 (COMBINED DISTRIBUTION), Therefore
CFStringTransform(..., kCFStringTransformStripCombiningMarks, ...)
converts "a" to "a".
The entries for "Đ" and "đ" are
0110; LATIN CAPITAL LETTER D WITH STROKE; Lu; 0; L ;;;;; N; LATIN CAPITAL LETTER D BAR ;;; 0111;
0111; LATIN SMALL LETTER D WITH STROKE; Ll; 0; L ;;;;; N; LATIN SMALL LETTER D BAR ;; 0110 ;; 0110
where field # 6 is empty, so these characters are not decomposed into “base character” and “combining character”.
So the question remains: what standard defines that the "normalized form" of "đ / Đ" is equal to "d / D"?
source share