Saved file formats

Currently, it is possible to export three different types of files: configuration files, Numpy and NetCDF files.

.conf file format (Providentia)

Users can download their configurations (.conf) and use the corresponding files to launch Providentia again. You can use your configuration files in the dashboard, report, interpolation and download modes through the command line or make use of the load button.

Numeric file formats

The following table summarizes the variables that are available after exporting data into numpy (.npz) files and NetCDF (.nc) files.

As Providentia is capable of loading multiple network and species simultaneously, some of these variables are provided per [network]-[species].

Numpy (.npz file format)

Variables

Variable

Description

[network]-[species]_data

Values of the desired species for both observations and models

[network]-[species]_ghost_data

GHOST data variables used for additional filtering

[network]-[species]_metadata

Metadata of the observations which varies per month gives as a multidimensional array

time

Time in given resolution from the start date

data_labels

Labels associated with each data array, e.g. observations, model_1, etc.

ghost_data_variables

The names of the GHOST data variables used for additional filtering

resolution

Temporal resolution of data

start_date

Start date of data

end_date

End date of data

temporal_colocation

Boolean stating if observations and models have been temporally colocated

spatial_colocation

Boolean stating if data has been spatially colocated across [network]-[species]

filter_species

Data ranges per species used filter read data

ghost_version

Version of GHOST

Loading the data

Loading a .npz file in python is done simply by:

In [1]: import numpy as np                                                                           
In [2]: obs = np.load("/home/bsc32/bsc32099/PRV_sconco3_20160101_20160601.npz", allow_pickle=True)

Note it is necessary that the allow_pickle argument is set as True.

To investigate the variables that the loaded .npz has inside it, we can use the “files” method:

In [3]: obs.files                                                                                    
Out[3]: ['EBAS-sconco3_ghost_data', 'EBAS-sconco3_data', 'EBAS-sconco3_metadata'...]

Values for a data variable are returned by:

In [4]: data = obs['EBAS-sconco3_data']                                                                    

Metadata access is special in the .npz files. The metadata variable names can be returned by:

In [5]: metadata_vars = obs['metadata'].dtype.names

Any specific metadata field can be accessed by using one of the metadata variable names:

In [6]: latitude = obs['metadata']['latitude']

NetCDF (.nc file format)

Variables

Variable

Description

[network]-[species]_data

Values of the desired species for both observations and models

[network]-[species]_ghost_data

GHOST data variables used for additional filtering

[network]-[species]_[metadata_var]

Metadata of the observations which varies per month given per variable

[network]-[species]_qa

Quality assurance flags, GHOST performed quality control checks

[network]-[species]_flags

Data flags, standardised flags taken from the data provider

time

Time in given resolution from the start date

data_labels

Labels associated with each data array, e.g. observations, model_1, etc.

ghost_data_variables

The names of the GHOST data variables used for additional filtering

resolution, start_date, end_date, temporal_colocation, spatial_colocation, filter_species and ghost_version are stored as attributes of [network]-[species]_data.

Loading the data

You can read these files as you would usually do, typically using the library netCDF4:

from netCDF4 import Dataset 
dataset = Dataset("PRV_sconco3_20160101_20170101.nc")

Or xarray:

import xarray as xr 
dataset = xr.open_dataset("PRV_sconco3_20160101_20170101.nc")