Boost  v1.57.0
doxygen for www.boost.org
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
boost::locale::util::base_converter Class Reference

This class represent a simple stateless converter from UCS-4 and to UCS-4 for each single code point. More...

#include <util.hpp>

Collaboration diagram for boost::locale::util::base_converter:

Public Member Functions

virtual ~base_converter ()
 
virtual int max_len () const
 Return the maximal length that one Unicode code-point can be converted to, for example for UTF-8 it is 4, for Shift-JIS it is 2 and ISO-8859-1 is 1. More...
 
virtual bool is_thread_safe () const
 Returns true if calling the functions from_unicode, to_unicode, and max_len is thread safe. More...
 
virtual base_converterclone () const
 Create a polymorphic copy of this object, usually called only if is_thread_safe() return false. More...
 
virtual uint32_t to_unicode (char const *&begin, char const *end)
 Convert a single character starting at begin and ending at most at end to Unicode code-point. More...
 
virtual uint32_t from_unicode (uint32_t u, char *begin, char const *end)
 Convert a single code-point u into encoding and store it in [begin,end) range. More...
 

Static Public Attributes

static const uint32_t illegal =utf::illegal
 This value should be returned when an illegal input sequence or code-point is observed: For example if a UCS-32 code-point is in the range reserved for UTF-16 surrogates or an invalid UTF-8 sequence is found. More...
 
static const uint32_t incomplete =utf::incomplete
 This value is returned in following cases: The of incomplete input sequence was found or insufficient output buffer was provided so complete output could not be written. More...
 

Detailed Description

This class represent a simple stateless converter from UCS-4 and to UCS-4 for each single code point.

This class is used for creation of std::codecvt facet for converting utf-16/utf-32 encoding to encoding supported by this converter

Please note, this converter should be fully stateless. Fully stateless means it should never assume that it is called in any specific order on the text. Even if the encoding itself seems to be stateless like windows-1255 or shift-jis, some encoders (most notably iconv) can actually compose several code-point into one or decompose them in case composite characters are found. So be very careful when implementing these converters for certain character set.

Constructor & Destructor Documentation

virtual boost::locale::util::base_converter::~base_converter ( )
inlinevirtual

Member Function Documentation

virtual base_converter* boost::locale::util::base_converter::clone ( ) const
inlinevirtual

Create a polymorphic copy of this object, usually called only if is_thread_safe() return false.

References BOOST_ASSERT.

virtual uint32_t boost::locale::util::base_converter::from_unicode ( uint32_t  u,
char *  begin,
char const *  end 
)
inlinevirtual

Convert a single code-point u into encoding and store it in [begin,end) range.

If u is invalid Unicode code-point, or it can not be mapped correctly to represented character set, illegal should be returned

If u can be converted to a sequence of bytes c1, ... , cN (1<= N <= max_len() ) then

  1. If end - begin >= N, c1, ... cN are written starting at begin and N is returned
  2. If end - begin < N, incomplete is returned, it is unspecified what would be stored in bytes in range [begin,end)

References illegal, and incomplete.

virtual bool boost::locale::util::base_converter::is_thread_safe ( ) const
inlinevirtual

Returns true if calling the functions from_unicode, to_unicode, and max_len is thread safe.

Rule of thumb: if this class' implementation uses simple tables that are unchanged or is purely algorithmic like UTF-8 - so it does not share any mutable bit for independent to_unicode, from_unicode calls, you may set it to true, otherwise, for example if you use iconv_t descriptor or UConverter as conversion object return false, and this object will be cloned for each use.

virtual int boost::locale::util::base_converter::max_len ( ) const
inlinevirtual

Return the maximal length that one Unicode code-point can be converted to, for example for UTF-8 it is 4, for Shift-JIS it is 2 and ISO-8859-1 is 1.

virtual uint32_t boost::locale::util::base_converter::to_unicode ( char const *&  begin,
char const *  end 
)
inlinevirtual

Convert a single character starting at begin and ending at most at end to Unicode code-point.

if valid input sequence found in [begin,code_point_end) such as begin < code_point_end && code_point_end <= end it is converted to its Unicode code point equivalent, begin is set to code_point_end

if incomplete input sequence found in [begin,end), i.e. there my be such code_point_end that code_point_end > end and [begin, code_point_end) would be valid input sequence, then incomplete is returned begin stays unchanged, for example for UTF-8 conversion a *begin = 0xc2, begin +1 = end is such situation.

if invalid input sequence found, i.e. there is a sequence [begin, code_point_end) such as code_point_end <= end that is illegal for this encoding, illegal is returned and begin stays unchanged. For example if *begin = 0xFF and begin < end for UTF-8, then illegal is returned.

References boost::asio::begin, illegal, and incomplete.

Member Data Documentation

const uint32_t boost::locale::util::base_converter::illegal =utf::illegal
static

This value should be returned when an illegal input sequence or code-point is observed: For example if a UCS-32 code-point is in the range reserved for UTF-16 surrogates or an invalid UTF-8 sequence is found.

Referenced by from_unicode(), and to_unicode().

const uint32_t boost::locale::util::base_converter::incomplete =utf::incomplete
static

This value is returned in following cases: The of incomplete input sequence was found or insufficient output buffer was provided so complete output could not be written.

Referenced by from_unicode(), and to_unicode().


The documentation for this class was generated from the following file: