substr_ctl {fansi} | R Documentation |
substr_ctl
is a drop-in replacement for substr
. Performance is
slightly slower than substr
. ANSI CSI SGR sequences will be included in
the substrings to reflect the format of the substring when it was embedded in
the source string. Additionally, other Control Sequences specified in
ctl
are treated as zero-width.
substr_ctl( x, start, stop, warn = getOption("fansi.warn"), term.cap = getOption("fansi.term.cap"), ctl = "all" ) substr2_ctl( x, start, stop, type = "chars", round = "start", tabs.as.spaces = getOption("fansi.tabs.as.spaces"), tab.stops = getOption("fansi.tab.stops"), warn = getOption("fansi.warn"), term.cap = getOption("fansi.term.cap"), ctl = "all" ) substr_sgr( x, start, stop, warn = getOption("fansi.warn"), term.cap = getOption("fansi.term.cap") ) substr2_sgr( x, start, stop, type = "chars", round = "start", tabs.as.spaces = getOption("fansi.tabs.as.spaces"), tab.stops = getOption("fansi.tab.stops"), warn = getOption("fansi.warn"), term.cap = getOption("fansi.term.cap") )
x |
a character vector or object that can be coerced to character. |
start |
integer. The first element to be replaced. |
stop |
integer. The last element to be replaced. |
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
term.cap |
character a vector of the capabilities of the terminal, can
be any combination "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), and "truecolor" (SGR codes starting with
"38;2" or "48;2"). Changing this parameter changes how |
ctl |
character, which Control Sequences should be treated specially. See the "_ctl vs. _sgr" section for details.
|
type |
character(1L) partial matching |
round |
character(1L) partial matching
|
tabs.as.spaces |
FALSE (default) or TRUE, whether to convert tabs to
spaces. This can only be set to TRUE if |
tab.stops |
integer(1:n) indicating position of tab stops to use when converting tabs to spaces. If there are more tabs in a line than defined tab stops the last tab stop is re-used. For the purposes of applying tab stops, each input line is considered a line and the character count begins from the beginning of the input line. |
substr2_ctl
and substr2_sgr
add the ability to retrieve substrings based
on display width, and byte width in addition to the normal character width.
substr2_ctl
also provides the option to convert tabs to spaces with
tabs_as_spaces prior to taking substrings.
Because exact substrings on anything other than character width cannot be
guaranteed (e.g. as a result of multi-byte encodings, or double display-width
characters) substr2_ctl
must make assumptions on how to resolve provided
start
/stop
values that are infeasible and does so via the round
parameter.
If we use "start" as the round
value, then any time the start
value corresponds to the middle of a multi-byte or a wide character, then
that character is included in the substring, while any similar partially
included character via the stop
is left out. The converse is true if we
use "stop" as the round
value. "neither" would cause all partial
characters to be dropped irrespective whether they correspond to start
or
stop
, and "both" could cause all of them to be included.
These functions map string lengths accounting for ANSI CSI SGR sequence
semantics to the naive length calculations, and then use the mapping in
conjunction with base::substr()
to extract the string. This concept is
borrowed directly from Gábor Csárdi's crayon
package, although the
implementation of the calculation is different.
The *_ctl
versions of the functions treat all Control Sequences specially
by default. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, fansi
will also parse, interpret,
and reapply the text styles they encode if needed. You can modify whether a
Control Sequence is treated specially with the ctl
parameter. You can
exclude a type of Control Sequence from special treatment by combining
"all" with that type of sequence (e.g. ctl=c("all", "nl")
for special
treatment of all Control Sequences but newlines). The *_sgr
versions
only treat ANSI CSI SGR sequences specially, and are equivalent to the
*_ctl
versions with the ctl
parameter set to "sgr".
Non-ASCII strings are converted to and returned in UTF-8 encoding.
fansi for details on how Control Sequences are interpreted, particularly if you are getting unexpected results.
substr_ctl("\033[42mhello\033[m world", 1, 9) substr_ctl("\033[42mhello\033[m world", 3, 9) ## Width 2 and 3 are in the middle of an ideogram as ## start and stop positions respectively, so we control ## what we get with `round` cn.string <- paste0("\033[42m", "\u4E00\u4E01\u4E03", "\033[m") substr2_ctl(cn.string, 2, 3, type='width') substr2_ctl(cn.string, 2, 3, type='width', round='both') substr2_ctl(cn.string, 2, 3, type='width', round='start') substr2_ctl(cn.string, 2, 3, type='width', round='stop') ## the _sgr variety only treat as special CSI SGR, ## compare the following: substr_sgr("\033[31mhello\tworld", 1, 6) substr_ctl("\033[31mhello\tworld", 1, 6) substr_ctl("\033[31mhello\tworld", 1, 6, ctl=c('all', 'c0'))