Creating a new CF data set
Creating a new CF data set is as easy as
my_ds <- create_ncdf(). This will create an empty data
set (duh!) but it will have a root group and some basic attributes:
library(ncdfCF)
(my_ds <- create_ncdf())
#> <Dataset> CF_dataset
#> Resource : (virtual)
#> Format : netcdf4
#> Collection : Generic netCDF data
#> Conventions: CF-1.13
#> Has groups : FALSE
#>
#> Attributes:
#> name type length value
#> Conventions NC_CHAR 7 CF-1.13
#> history NC_CHAR 62 Created with R package ncdfCF 0.8.0 on 2026-01-...The data set is created in memory (Resource : (virtual))
and you can save it to disk when you are done adding data variables to
it with
my_ds$save(fn = "~/path/to/netcdf/data/my_data.nc"). If you
prefer, you can also provide the file name to
create_ncdf(fn = "~/path/to/netcdf/data/my_data.nc") and
then save your edits with my_ds$save(). This alternative
pattern is easier and less error-prone when you add many objects to the
data set and want to do intermediate saving of your work to the netCDF
file.
Creating a new CF data variable
There are different ways that you can create a CFVariable:
- Convert a suitable R object such as an array or a matrix.
- Process data from existing netCDF resources into a new CFVariable.
Vector, matrix, array
Any vector, matrix or array of a suitable type (numeric, integer,
logical, character) can be converted to a CFVariable with
the as_CF() generic S3 method.
arr <- array(rnorm(120), dim = c(6, 5, 4))
as_CF("my_first_CF_object", arr)
#> <Variable> my_first_CF_object
#>
#> Values: [-2.612334 ... 2.755418]
#> NA: 0 (0.0%)
#>
#> Axes:
#> name length values
#> axis_1 6 [1 ... 6]
#> axis_2 5 [1 ... 5]
#> axis_3 4 [1 ... 4]
#>
#> Attributes:
#> name type length value
#> actual_range NC_DOUBLE 2 -2.612334, 2.755418Usable but not very impressive. The axes have dull names without any meaning and the coordinates are just a sequence along the axis.
If the R object has dimnames set, these will be used to
create more informed axes. More interestingly, if your array represents
some spatial data you can give your dimnames appropriate
names (“lat”, “lon”, “latitude”, “longitude”, case-insensitive) and the
corresponding axis will be created (if the coordinate values in the
dimnames are within the domain of the axis type). For
“time” coordinates, these are automatically detected irrespective of the
name.
# Note the use of named dimnames here - these will become the names of the axes
dimnames(arr) <- list(lat = c(45, 44, 43, 42, 41, 40), lon = c(0, 1, 2, 3, 4),
time = c("2025-07-01", "2025-07-02", "2025-07-03", "2025-07-04"))
(obj <- as_CF("a_better_CF_object", arr))
#> <Variable> a_better_CF_object
#>
#> Values: [-2.612334 ... 2.755418]
#> NA: 0 (0.0%)
#>
#> Axes:
#> axis name length values unit
#> Y lat 6 [45 ... 40] degrees_north
#> X lon 5 [0 ... 4] degrees_east
#> T time 4 [2025-07-01 ... 2025-07-04] days since 1970-01-01T00:00:00
#>
#> Attributes:
#> name type length value
#> actual_range NC_DOUBLE 2 -2.612334, 2.755418
# Axes are of a specific type and have basic attributes set
obj$axes[["lat"]]
#> <Latitude axis> [-8] lat
#> Length : 6
#> Axis : Y
#> Coordinates: 45, 44, 43, 42, 41, 40 (degrees_north)
#> Bounds : (not set)
#>
#> Attributes:
#> name type length value
#> actual_range NC_DOUBLE 2 40, 45
#> axis NC_CHAR 1 Y
#> standard_name NC_CHAR 8 latitude
#> units NC_CHAR 13 degrees_north
obj$axes[["time"]]
#> <Time axis> [-10] time
#> Length : 4
#> Axis : T
#> Calendar : standard
#> Range : 2025-07-01 ... 2025-07-04 (days)
#> Bounds : (not set)
#>
#> Attributes:
#> name type length value
#> actual_range NC_DOUBLE 2 20270, 20273
#> axis NC_CHAR 1 T
#> standard_name NC_CHAR 4 time
#> units NC_CHAR 30 days since 1970-01-01T00:00:00
#> calendar NC_CHAR 8 standardYou can use the as_CF() generic method also to convert a
terra::SpatRaster into a CFVariable. Keep in
mind, though, that terra has limited support for
multi-dimensional data and very limited support specifically for
vertical dimensions (depth in terra) and
calendars other than standard or
proleptic_gregorian. You are therefore advised to carefully
review the properties of the data variable derived from a
terra::SpatRaster.
Processing data
The ncdfCF package support processing of data through
extraction of subsets or profiles of the data in the netCDF resource, or
summarising data over the “time” axis of a data variable. All of these
operations return a CFVariable instance to the caller. You
can also perform arithmetical and mathematical operations on a
CFVariable which also returns a new CFVariable
to the caller.
# Open an existing netCDF resource for reading
fn <- system.file("extdata", "pr_day_EC-Earth3-CC_ssp245_r1i1p1f1_gr_20230101-20231231_vncdfCF.nc", package = "ncdfCF")
ds <- open_ncdf(fn)
ds$var_names
#> [1] "pr"
# The precipitation data variable is linked to the existing dataset but after an
# arithmetical operation, such as unit conversion to mm/day, it is no longer
# linked.
pr <- ds[["pr"]] * 86400
pr$set_attribute("units", "NC_CHAR", "mm")
pr$group
#> <CF Group> [-13] / (virtual)
#> Path : /
# Can operate on "virtual" CFVariables. Summarise the daily precipitation data
# to monthly means: the result is a new CFVariable in a new group.
pr_mon <- pr$summarise("pr_month", sum, "month")
pr_mon$group
#> <CF Group> [-14] / (virtual)
#> Path : /
# Add pr_mon to the new data set
my_ds$add_variable(pr_mon)
my_ds$variables()
#> $pr_month
#> <Variable> pr_month
#>
#> Values: [8.9e-05 ... 513.1605] mm
#> NA: 0 (0.0%)
#>
#> Axes:
#> axis name long_name length values
#> T time 12 [2023-01-16T12:00:00 ... 2023-12-16T12:00:00]
#> X lon Longitude 14 [5.625 ... 14.765625]
#> Y lat Latitude 14 [40.35078 ... 49.47356]
#> unit
#> days since 1850-01-01
#> degrees_east
#> degrees_north
#>
#> Attributes:
#> name type length value
#> actual_range NC_DOUBLE 2 8.9e-05, 513.160452
#> units NC_CHAR 2 mmManaging your new CFVariables in a single CFDataset
If you have only one or a few related data variables, then adding them all to the root group of the data set is the easiest solution. The keyword here is “related”: the data variables have the same axes (name, type, length, values). If, on the other you want to place many disparate data variables in a single netCDF file, you should consider organising the netCDF file such that “like” data variables are placed together in a group, while other data variables are placed in different groups.
A CF data set can have multiple groups, organised in a hierarchy to
match the characteristics of the data variables. A
CFDataset has a root group by the name of “/” and a
subgroup can be added with the
create_subgroup("new_subgroup") method. That returns the
new subgroup, to which new subgroups can be added, etc, etc, etc. As
with a CFDataset, CFVariable objects can be
added to any group. That provides for a very flexible arrangement.
complex_ds <- create_ncdf()
subgroup <- complex_ds$root$create_subgroup("sub1")
subsubgroup1 <- subgroup$create_subgroup("subsub1")
subsubgroup2 <- subgroup$create_subgroup("subsub2")
subsubgroup1$add_variable(pr_mon)
complex_ds$hierarchy()
#> <NetCDF objects> CF_dataset
#> * /
#> * sub1
#> * subsub1
#> | Axes : [time (12): 2023-01-16T12:00:00 ... 2023-12-16T12:00:00], [lon (14): 5.625 ... 14.765625], [lat (14): 40.35078 ... 49.47356]
#> | Variables: [-17: pr_month]
#> * subsub2