Libparserutils
|
Go to the source code of this file.
Data Structures | |
union | parserutils_charset_codec_optparams |
Charset codec option parameters. More... |
Macros | |
#define | PARSERUTILS_CHARSET_CODEC_NULL (0xffffffffU) |
Typedefs | |
typedef struct parserutils_charset_codec | parserutils_charset_codec |
typedef enum parserutils_charset_codec_errormode | parserutils_charset_codec_errormode |
Charset codec error mode. | |
typedef enum parserutils_charset_codec_opttype | parserutils_charset_codec_opttype |
Charset codec option types. | |
typedef union parserutils_charset_codec_optparams | parserutils_charset_codec_optparams |
Charset codec option parameters. |
Enumerations | |
enum | parserutils_charset_codec_errormode { PARSERUTILS_CHARSET_CODEC_ERROR_STRICT = 0 , PARSERUTILS_CHARSET_CODEC_ERROR_LOOSE = 1 , PARSERUTILS_CHARSET_CODEC_ERROR_TRANSLIT = 2 } |
Charset codec error mode. More... | |
enum | parserutils_charset_codec_opttype { PARSERUTILS_CHARSET_CODEC_ERROR_MODE = 1 } |
Charset codec option types. More... |
typedef struct parserutils_charset_codec parserutils_charset_codec |
Charset codec error mode.
A codec's error mode determines its behaviour in the face of:
The options provide a choice between the following approaches:
The default error mode is "loose".
In the "loose" case, the replacement character will depend upon:
If decoding, the replacement character will be:
U+FFFD (REPLACEMENT CHARACTER)
If encoding, the replacement character will be:
U+003F (QUESTION MARK) if the destination charset is not UTF-(8|16|32) U+FFFD (REPLACEMENT CHARACTER) otherwise.
In the "translit" case, the codec will attempt to transliterate into the destination charset, if encoding. If decoding, or if transliteration fails, this option is identical to "loose".
typedef union parserutils_charset_codec_optparams parserutils_charset_codec_optparams |
Charset codec option parameters.
Charset codec option types.
Charset codec error mode.
A codec's error mode determines its behaviour in the face of:
The options provide a choice between the following approaches:
The default error mode is "loose".
In the "loose" case, the replacement character will depend upon:
If decoding, the replacement character will be:
U+FFFD (REPLACEMENT CHARACTER)
If encoding, the replacement character will be:
U+003F (QUESTION MARK) if the destination charset is not UTF-(8|16|32) U+FFFD (REPLACEMENT CHARACTER) otherwise.
In the "translit" case, the codec will attempt to transliterate into the destination charset, if encoding. If decoding, or if transliteration fails, this option is identical to "loose".
parserutils_error parserutils_charset_codec_create | ( | const char * | charset, |
parserutils_charset_codec ** | codec ) |
Create a charset codec.
charset | Target charset |
codec | Pointer to location to receive codec instance |
Definition at line 38 of file codec.c.
References parserutils_charset_codec::errormode, handler_table, parserutils_charset_aliases_canon::mib_enum, parserutils_charset_codec::mibenum, parserutils_charset_aliases_canon::name, parserutils__charset_alias_canonicalise(), PARSERUTILS_BADENCODING, PARSERUTILS_BADPARM, PARSERUTILS_CHARSET_CODEC_ERROR_LOOSE, and PARSERUTILS_OK.
Referenced by filter_set_encoding(), and parserutils__filter_create().
parserutils_error parserutils_charset_codec_decode | ( | parserutils_charset_codec * | codec, |
const uint8_t ** | source, | ||
size_t * | sourcelen, | ||
uint8_t ** | dest, | ||
size_t * | destlen ) |
Decode a chunk of data in a codec's charset into UCS-4.
codec | The codec to use |
source | Pointer to pointer to source data |
sourcelen | Pointer to length (in bytes) of source data |
dest | Pointer to pointer to output buffer |
destlen | Pointer to length (in bytes) of output buffer |
source, sourcelen, dest and destlen will be updated appropriately on exit
Call this with a source length of 0 to flush any buffers.
Definition at line 163 of file codec.c.
References parserutils_charset_codec::decode, parserutils_charset_codec::handler, and PARSERUTILS_BADPARM.
Referenced by parserutils__filter_process_chunk().
parserutils_error parserutils_charset_codec_destroy | ( | parserutils_charset_codec * | codec | ) |
Destroy a charset codec.
codec | The codec to destroy |
Definition at line 86 of file codec.c.
References parserutils_charset_codec::destroy, parserutils_charset_codec::handler, PARSERUTILS_BADPARM, and PARSERUTILS_OK.
Referenced by filter_set_encoding(), parserutils__filter_create(), and parserutils__filter_destroy().
parserutils_error parserutils_charset_codec_encode | ( | parserutils_charset_codec * | codec, |
const uint8_t ** | source, | ||
size_t * | sourcelen, | ||
uint8_t ** | dest, | ||
size_t * | destlen ) |
Encode a chunk of UCS-4 data into a codec's charset.
codec | The codec to use |
source | Pointer to pointer to source data |
sourcelen | Pointer to length (in bytes) of source data |
dest | Pointer to pointer to output buffer |
destlen | Pointer to length (in bytes) of output buffer |
source, sourcelen, dest and destlen will be updated appropriately on exit
Definition at line 136 of file codec.c.
References parserutils_charset_codec::encode, parserutils_charset_codec::handler, and PARSERUTILS_BADPARM.
Referenced by parserutils__filter_process_chunk().
parserutils_error parserutils_charset_codec_reset | ( | parserutils_charset_codec * | codec | ) |
Clear a charset codec's encoding state.
codec | The codec to reset |
Definition at line 182 of file codec.c.
References parserutils_charset_codec::handler, PARSERUTILS_BADPARM, and parserutils_charset_codec::reset.
Referenced by parserutils__filter_reset().
parserutils_error parserutils_charset_codec_setopt | ( | parserutils_charset_codec * | codec, |
parserutils_charset_codec_opttype | type, | ||
parserutils_charset_codec_optparams * | params ) |
Configure a charset codec.
codec | The codec to configure |
type | The codec option type to configure |
params | Option-specific parameters |
Definition at line 107 of file codec.c.
References parserutils_charset_codec_optparams::error_mode, parserutils_charset_codec::errormode, parserutils_charset_codec_optparams::mode, PARSERUTILS_BADPARM, PARSERUTILS_CHARSET_CODEC_ERROR_MODE, and PARSERUTILS_OK.