There are several ways to concatenante NectCDF files into a time series:

Way1 - CDO:

cdo cat b.e13.Bi1850C5.f19_g16.TS.0001-0999.ANN.nc b.e13.Bi1850C5.f19_g16.TS.1000-1999.ANN.nc b.e13.Bi1850C5.f19_g16.TS.2000-2999.ANN.nc ... b.e13.Bi1850C5.f19_g16.TS.TS.8000-8999.ANN.nc TS.test.cdo.merge.nc

Way2 - pycdo:

To install this package with conda run one of the following:

conda install -c conda-forge python-cdo 
conda install -c conda-forge/label/gcc7 python-cdo 
conda install -c conda-forge/label/cf201901 python-cdo 
conda install -c conda-forge/label/cf202003 python-cdo 

code:

from cdo import Cdo
cdo = Cdo()
...
#fname is a list of file path
cdo.cat(input = [x for x in fname],output = fout_name)  

Unfortunately, pycdo is not applicable for variable like MOC in pop.

Way3 - NCO:

ncrcat -h b.e13.Bi1850C5.f19_g16.TS.0001-0999.ANN.nc b.e13.Bi1850C5.f19_g16.TS.1000-1999.ANN.nc b.e13.Bi1850C5.f19_g16.TS.2000-2999.ANN.nc ... b.e13.Bi1850C5.f19_g16.TS.TS.8000-8999.ANN.nc TS.test.cdo.merge.nc

Way4 - pynco:

from nco import Nco
nco = Nco()
...
nco.ncrcat(input = [x for x in fname],output = fout_name)  

Unfortunately, pynco is not applicable for variable like MOC in pop.

Way5 - xarray:

open_mfdataset open multiple files as a single dataset.

    import xarray as xr
    ...
    time = np.arange(20000-5,11000-5,-10)
    merged_ds = xr.open_mfdataset(fname, combine='by_coords', concat_dim='time')
    merged_ds["time_bp"]=(['time'], time)
    print(merged_ds)
    
    merged_ds.to_netcdf(path = fout_name, encoding = {'TS': {'dtype':'float32'}})

If all the data is loaded to workspace from input file list, then how do we concatenates these data in to a single array?

转载自blog:
从xarray走向netCDF处理:合并与计算
Reference: http://www.meteoai.cn/post/%E4%BB%8Exarray%E8%B5%B0%E5%90%91netcdf%E5%A4%84%E7%90%86%E5%90%88%E5%B9%B6%E4%B8%8E%E8%AE%A1%E7%AE%97/

数据合并主要是两种形式

维度的拼接:如将日数据合成为年数据,就属于在时间维度上的合并。

变量的合并:如将多个物理量合到同一个Dataset中。

xarray围绕着这两种合并方式介绍了concatenate, merge, combine, update四种方法。我在这里就挑最常用的跟大家聊聊。

维度拼接

使用 concat() 方法可以实现维度的拼接。

下面是演示数据,来源于2018年和2019年前三个月的ERA-Interim月平均数据。

>>> ds2018
<xarray.Dataset>
Dimensions:    (latitude: 241, longitude: 480, time: 12)
Coordinates:
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 ... 357.75 358.5 359.25
  * latitude   (latitude) float32 90.0 89.25 88.5 87.75 ... -88.5 -89.25 -90.0
  * time       (time) datetime64[ns] 2018-01-01 2018-02-01 ... 2018-12-01
Data variables:
    u10        (time, latitude, longitude) float32 ...
    v10        (time, latitude, longitude) float32 ...
    t2m        (time, latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.6

>>> ds2019
<xarray.Dataset>
Dimensions:    (latitude: 241, longitude: 480, time: 3)
Coordinates:
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 ... 357.75 358.5 359.25
  * latitude   (latitude) float32 90.0 89.25 88.5 87.75 ... -88.5 -89.25 -90.0
  * time       (time) datetime64[ns] 2019-01-01 2019-02-01 2019-03-01
Data variables:
    u10        (time, latitude, longitude) float32 ...
    v10        (time, latitude, longitude) float32 ...
    t2m        (time, latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.6

ds2018时间维度为12,ds2019时间维度为3,下面使用 concat() 合并后时间维度为15

>>> xr.concat([ds2018, ds2019], dim='time')
<xarray.Dataset>
Dimensions:    (latitude: 241, longitude: 480, time: 15)
Coordinates:
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 ... 357.75 358.5 359.25
  * latitude   (latitude) float32 90.0 89.25 88.5 87.75 ... -88.5 -89.25 -90.0
  * time       (time) datetime64[ns] 2018-01-01 2018-02-01 ... 2019-03-01
Data variables:
    u10        (time, latitude, longitude) float32 -0.9599868 ... 4.5229325
    v10        (time, latitude, longitude) float32 3.1737509 ... -2.289166
    t2m        (time, latitude, longitude) float32 248.46857 ... 225.19632
Attributes:
    Conventions:  CF-1.6

变量合并

使用 merge() 方法,可以将ds2018中的u10和ds2019中的t2m合并到一起,而且在时间维上缺失会自动设置为nan。

>>> xr.merge([ds2018.u10, ds2019.t2m])
<xarray.Dataset>
Dimensions:    (latitude: 241, longitude: 480, time: 15)
Coordinates:
  * time       (time) datetime64[ns] 2018-01-01 2018-02-01 ... 2019-03-01
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 ... 357.75 358.5 359.25
  * latitude   (latitude) float32 90.0 89.25 88.5 87.75 ... -88.5 -89.25 -90.0
Data variables:
    u10        (time, latitude, longitude) float32 -0.9599868 -0.9599868 ... nan
    t2m        (time, latitude, longitude) float32 nan nan ... 225.19632

数据计算

最基本的计算就是进行加减乘除,任意一个DataArray或者Dataset都可以直接进行四则运算。

除此以外,xarray还可以帮你快速地求出平均值,方差,最小值,最大值等。你可以指定具体对那个维度进行计算,如果不指定维度默认会对所有维度进行计算。

比如要对经、纬两个维度进行平均,最后的结果只有时间维的12个值。

而且xarray在时间维上的计算还有很多贴心的用法,比如月数据转年数据,月数据转季节数据。

>>>temp = (ds['t2m'] - 273.15).groupby('time.season')

>>>ds2018.std(dim='time')
<xarray.Dataset>
Dimensions:    (latitude: 241, longitude: 480)
Coordinates:
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 ... 357.75 358.5 359.25
  * latitude   (latitude) float32 90.0 89.25 88.5 87.75 ... -88.5 -89.25 -90.0
Data variables:
    u10        (latitude, longitude) float32 1.9192954 1.9192954 ... 1.2133
    v10        (latitude, longitude) float32 1.3066719 1.3066719 ... 1.577495
    t2m        (latitude, longitude) float32 9.5681305 9.5681305 ... 11.313364

Add a variable into a NC file

data_set=xr.Dataset(coords={'lon': (['x', 'y'], lon),
                    'lat': (['x', 'y'], lat),
                    'time': pd.date_range('2021-01-01', periods=3)})
temp=...
data_set["Temperature"]=(['x', 'y', 'time'],  temp)

Last update: 01/07/2021