How to find "paired characters" in perl?

Is it possible to programmatically display all the "paired" characters?

eg. when I received, for example, a character <, how to find the corresponding pair " >"?

The following code snippet prints each mirrored ascii character.

use 5.018;
use warnings;
use charnames qw(:full);
for my $n (0..127) {
    my $c = chr $n;
    printf "%02x: [%s] - %s\n", $n, $c, charnames::viacode($n) if $c =~ /\p{Bidi_Mirrored=Y}/;
}

prints:

28: [(] - LEFT PARENTHESIS
29: [)] - RIGHT PARENTHESIS
3c: [<] - LESS-THAN SIGN
3e: [>] - GREATER-THAN SIGN
5b: [[] - LEFT SQUARE BRACKET
5d: []] - RIGHT SQUARE BRACKET
7b: [{] - LEFT CURLY BRACKET
7d: [}] - RIGHT CURLY BRACKET

But the AFAIK property Bidi_Mirroreddoes not match the "paired" one, for example. left-right pairs, because, for example, the next char has a property Bidi_Mirrored, but it probably does not have any “pair”.

∰  U+02230 VOLUME INTEGRAL

And if this property is Bidi_Mirroredcorrect for "paired" characters, the question remains the same: how to find the code point of a "pair"? (or name)?

In short: want to print all unicode "paired" characters, for example. pairs like:

«  U+000AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
»  U+000BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

or

≤  U+02264 LESS-THAN OR EQUAL TO
≥  U+02265 GREATER-THAN OR EQUAL TO

....

, :

:

# Bidi_Paired_Bracket is a normative property of type Miscellaneous,
# which establishes a mapping between characters that are treated as
# bracket pairs by the Unicode Bidirectional Algorithm.
#
# Bidi_Paired_Bracket_Type is a normative property of type Enumeration,
# which classifies characters into opening and closing paired brackets
# for the purposes of the Unicode Bidirectional Algorithm.
#
# This file lists the set of code points with Bidi_Paired_Bracket_Type
# property values Open and Close. The set is derived from the character
# properties General_Category (gc), Bidi_Class (bc), Bidi_Mirrored (Bidi_M),
# and Bidi_Mirroring_Glyph (bmg), as follows: two characters, A and B,
# form a bracket pair if A has gc=Ps and B has gc=Pe, both have bc=ON and
# Bidi_M=Y, and bmg of A is B. Bidi_Paired_Bracket (bpb) maps A to B and
# vice versa, and their Bidi_Paired_Bracket_Type (bpt) property values are
# Open (o) and Close (c), respectively.
#
# For legacy reasons, the characters U+FD3E ORNATE LEFT PARENTHESIS and
# U+FD3F ORNATE RIGHT PARENTHESIS do not mirror in bidirectional display
# and therefore do not form a bracket pair.
#
# The Unicode property value stability policy guarantees that characters
# which have bpt=o or bpt=c also have bc=ON and Bidi_M=Y. As a result, an
# implementation can optimize the lookup of the Bidi_Paired_Bracket_Type
# property values Open and Close by restricting the processing to characters
# with bc=ON

, , , Bidi_Mirroring_Glyph aka (bmg) Bidi_Paired_Bracket aka (bpb) perl. AFAIK Unicode::UCD - , , , .

, 5.024 Unicode 8.0?:):)

+4

Source: https://habr.com/ru/post/1612937/


All Articles