The Numpy UCS-4 format is a fixed-length character string format
where shorter string are padded on the right with 0's. This is not a Zarr
codec but specific to Numpy. It is included here because many Zarr v.2
stores have been written with this formatting of character strings. Since
it does not use a codec in Zarr v.2, it is an invalid configuration in Zarr
v.3 and this package because it embodies the array -> bytes step that is
mandatory in Zarr v.3 - this is why this mock codec is included. This
"codec" encodes an R character object to a raw byte string, and decodes a
raw byte string to a R character object. The character object, typically a
vector but possibly a matrix or array, should use UTF-8 encoding, which is
the standard on modern platforms running R.
This codec is not part of the Zarr v.3 core specification but a commonly
used process in Zarr v.2 on Python to serialize character strings into Zarr
chunks. This implementation enables the use of the Zarr v.2 "<U*" data
type. As a consequence, this codec can only decode data - new data is not
written in this format.
Super classes
zarr::zarr_extension -> zarr::zarr_codec -> zarr_codec_ucs4
Active bindings
endian(read-only) Retrieve the endianness of the storage of the data with this codec. A string with value of "big" or "little".
Methods
Inherited methods
Method new()
Create a new UCS-4 codec object.
Usage
zarr_codec_ucs4$new(chunk_shape, configuration)Arguments
chunk_shapeThe shape of a chunk of data of the array, an integer vector.
configurationA list with the configuration parameters for this codec. This is a list created in this package, it does not exist in the Zarr store as the Numpy UCS-4 method is not a real codec. The element
endianspecifies the byte ordering of the data type of the Zarr array. A string with value "big" or "little". The elementwidthgiven the fixed padded string width.