The Zarr "vlen-utf8" codec encodes an R character object to a raw byte string, and decodes a raw byte string to a R character object. The character object, typically a vector but possibly a matrix or array, should use UTF-8 encoding, which is the standard on modern platforms running R.
This codec is not part of the Zarr v.3 core specification but a commonly
used codec to serialize character strings into Zarr chunks. It is defined
for Zarr v.2. This implementation enables the use of the Zarr v.3
registered "string" data type, as well as Zarr v.2 "|O" data type.
The codec does not handle NA values. On encoding, NA values become
empty strings (""); on decoding empty strings are preserved (not set to
NA). This behaviour is adopted from Python, making it the most
interoperable arrangement. If support for NA values is needed at the
application level, use should be made of a sentinel character string (like
"NO_DATA") which then gets set to NA in the application. This will
obviously not be interoperable, at least not outside of the application
ecosystem.
Super classes
zarr::zarr_extension -> zarr::zarr_codec -> zarr_codec_vlenutf8
Methods
Inherited methods
Method encode()
This method writes an R character object to a raw vector.
Prior to writing, any NA values are converted to an empty string.