Getting boost :: filesystem :: path in UTF-8 encoding std :: string, on Windows

We present the paths as boost::filesystem::path, but in some cases other APIs expect them as const char *(for example, to open a database file using SQLite).

From the documentation path::value_type is wchar_tunder Windows. As far as I know, Windows wchar_thas 2 bytes in UTF-16 encoding.

There is a native observer that returns indicating: string()std::string

If the string_type type is different from the String type, the conversion is performed by cvt.

cvtinitialized to the default built in codecvt. What is the behavior of this default built-in codecvt?

There is this forum entry that recommends using the instance utf8_codecvt_facetas a cvtvalue for portable convert to UTF-8. But it looks like this codec is actually designed to convert between UTF-8 and UCS-4 , not UTF-16.

What would be the best (and, if possible, portable) way to get an idea pathin std::string, making sure you need to convert from the correct encoding wchar_tif necessary?

+6
source share
1 answer

cvt is initialized in the default code. What is the behavior of this default built-in codecvt?

. Windows .

(, , ) std :: string, wchar_t ?

++ 11 std::codecvt_utf8_utf16. , , C++17, " , ".

, :

boost::filesystem::path::imbue( 
    std::locale( std::locale(), new std::codecvt_utf8_utf16<wchar_t>() ) );

path::string() UTF-16 UTF-8.

- std::wstring_convert< std::codecvt_utf8_utf16<wchar_t> > .

:

#include <boost/filesystem.hpp>
#include <iostream>
#include <codecvt>

void print_hex( std::string const& path );

int main()
{
    // Create UTF-16 path (on Windows) that contains the characters "Γ„Γ–Γœ".
    boost::filesystem::path path( L"\u00c4\u00d6\u00dc" );

    // Convert path using the default locale and print result.
    // On a system with german default locale, this prints "0xc4 0xd6 0xdc".
    // On a system with a different locale, this might fail.
    print_hex( path.string() );

    // Set locale for conversion from UTF-16 to UTF-8.
    boost::filesystem::path::imbue( 
        std::locale( std::locale(), new std::codecvt_utf8_utf16<wchar_t>() ) );

    // Because we changed the locale, path::string() now converts the path to UTF-8.
    // This always prints the UTF-8 bytes "0xc3 0x84 0xc3 0x96 0xc3 0x9c".
    print_hex( path.string() );

    // Another option is to convert only case-by-case, by explicitly using a code converter.
    // This always prints the UTF-8 bytes "0xc3 0x84 0xc3 0x96 0xc3 0x9c".
    std::wstring_convert< std::codecvt_utf8_utf16<wchar_t> > cvt;
    print_hex( cvt.to_bytes( path.wstring() ) );
}

void print_hex( std::string const& path )
{
    for( char c : path )
    {
        std::cout << std::hex << "0x" << static_cast<unsigned>(static_cast<unsigned char>( c )) << ' ';
    }
    std::cout << '\n';
}
+3

Source: https://habr.com/ru/post/1683265/


All Articles