Configuration fields

Configuration fields determine how Providentia behaves during a run using a configuration file. Below is a full list of all available parameters organised by mode.

These fields can also be set via command line arguments, for more details, see the Command line configuration page.

Shared parameters

Some of these fields are required depending on the mode. If a parameter required by a given mode is missing, Providentia will fail.

Parameter	Required in	Description	Default
`network`, `observation`, `framework`	Dashboard, Report, Interpolation	Network(s) to load observations from. Multiple values are allowed (e.g. `CAPMoN, EBAS`). Wildcards (``) expand to multiple variables (e.g. `vconcaerobin` → `vconcaerobin1`, `vconcaerobin2`, etc.). For GHOST networks, the selection is dictated by GHOST. For non-GHOST networks, available options are defined under the `nonghost_available_networks` key in `available_inputs.yaml` and can be modified by the user.	—
`model`, `models`, `experiments`, `experiment`	Dashboard, Report, Interpolation	Model(s). Model IDs can optionally include domain and/or ensemble information in the following formats: `modelID`, `modelID-ensemble`, `modelID-domain` or `modelID-domain-ensemble`. Also, `model`, `domain` and/or `ensemble` can be specified separately. Model IDs can also be mapped to alternative names (aliases) by appending them in parentheses after the ID (e.g. `mod1-dom-ens, mod2-dom-ens (altmod1, altmod2)`).	—
`species`	All modes	Species to load. Can be multiple (e.g. `sconco3, sconcno2`). Dictated by GHOST. See the Available Species page for options.	—
`resolution`	All modes	Temporal resolution of the observations to load (e.g. `hourly`, `daily`). For GHOST networks, the resolution is dictated by GHOST. For non-GHOST networks, available options are defined under the `nonghost_available_resolutions` key in `available_inputs.yaml` and can be modified by the user.
`start_date`	All modes	Comparison start date in `YYYYMMDD` format or `YYYYMM` when interpolation is enabled (e.g. `20170101`).	—
`end_date`	All modes	Comparison end date in `YYYYMMDD` format or `YYYYMM` when interpolation is enabled (e.g. `20180601`).	—
`ghost_version`	Optional	GHOST version used when a GHOST network is selected.	`1.5`
`ghost_features`	Optional	Level of GHOST features to utilise: `max`, `med` or `min`. `max` means all GHOST filter variables and metadata are read, `min` means no GHOST filter variables and very limited metadata are read , and `med` means GHOST native coverage filter variables are not read, and a curated selction is read.	`med`
`domain`	Optional	Domain of the model (e.g. `regional`, `global`). When multiple model IDs and multiple ensembles/domains are provided, all possible combinations of model, domain and ensemble will be used. Options are defined under the `available_domains` key in `available_inputs.yaml` and can be modified by the user.	All available
`ensemble`	Optional	Ensemble member number (e.g. `000`, `001`), or ensemble statistic of the model (e.g. `av` for ensemble average, or `av_an` for ensemble analysis average). When multiple model IDs and multiple ensembles/domains are provided, all possible combinations of model, domain and ensemble will be used.	All available, except in interpolation mode where the default is `000`.
`forecast`	Optional	Controls how forecast data is handled: `day`, `daily`, `combined`, `dayN`, `dailyN`, `combinedN`. Multiple values of the same type can be provided (e.g. `day`, `day2`, `day3`), but different types cannot be mixed (i.e. `day` options cannot be combined with `daily` or `combined`). To limit to specific forecast days, append the day number to the option (e.g. `day1`, `daily2`, `combined3`). This variable must be set to a valid value when performing interpolation for forecast data.	—
`filter_species`	Optional	Filter read species by other species data within a data range. The first value set is the lower bound to filter by, and the second value the upper bound. Place a sign before each bound value to inform if the filter should be inclusive or exclusive of the bound, (e.g. `>` or `>=`). If no sign is set then it is assumed the bound is inclusive, i.e. `>=`. If not wishing to set either the lower or upper bounds, a `:` can be used. Optionally, a fill value can also be given as a third value to impose what the filtered data is set to, by default this is `NaN`. Multiple filters can be set together separated by a comma (e.g. `network1:species1 (>lowerlim, <=upperlim, fillvalue), network2:species2 (:, <upperlim)`).	—
`ghost_root`	Optional	Root directory for GHOST observations, overwrites `data_paths.yaml`	From `data_paths.yaml`
`nonghost_root`	Optional	Root directory for non-GHOST observations, overwrites `data_paths.yaml`	From `data_paths.yaml`
`mod_root`	Optional	Root directory for interpolated model data, overwrites `data_paths.yaml`	From `data_paths.yaml`
`mod_to_interp_root`	Optional	Root directory for non-interpolated model data, overwrites `data_paths.yaml`	From `data_paths.yaml`
`config_dir`	Optional	Path to all configuration files.	`configurations/`
`cartopy_data_dir`	Optional	Cartopy data directory.	In HPC: `/gpfs/projects/bsc32/software/rhel/9.2/software/Cartopy/0.23.0-foss-2023b-Python-3.11.5/lib/python3.11/site-packages/cartopy/data`. In local: Downloaded from the internet on the fly.

If the number of networks and species are both multiple but not equal, Providentia will throw the error Error: The number of "network" and "species" fields is not the same. and the user will be required to clearly specify which networks and species they want. For example, this would not be accepted:

network = EBAS, EEA_AQ_eReporting
species = sconco3, sconcno2, sconcso2

But this would:

network = EBAS, EBAS, EBAS, EEA_AQ_eReporting, EEA_AQ_eReporting, EEA_AQ_eReporting
species = sconco3, sconcno2, sconcso2, sconco3, sconcno2, sconcso2

Parameters for analysis and visualization modes (Dashboard, Report, Library)

Apart from the common parameters, these are the fields used by all analysis and visualisation modes (Dashboard, Report, Library). All parameters in this section are optional.

Parameter	Description	Default
`statistic_mode`	Statistic mode: `Temporal\|Spatial`, `Spatial\|Temporal`, `Flattened`.	`Temporal\|Spatial`
`statistic_aggregation`	Aggregation statistic: `Median`, `Mean`, `p1`, `p5`, `p10`, `p25`, `p75`, `p90`, `p95`, `p99`.	Depends on `statistic_mode`: `Median` if `Temporal\|Spatial` or `Spatial\|Temporal`; no aggregation if `Flattened`
`timeseries_statistic_aggregation`	Timeseries aggregation statistic: `Median`, `Mean`, `p1`, `p5`, `p10`, `p25`, `p75`, `p90`, `p95`, `p99`.	`Median`
`periodic_statistic_mode`	Periodic statistic mode: `Independent`, `Cycle`.	`Independent`
`periodic_statistic_aggregation`	Periodic aggregation statistic: `Median`, `Mean`, `p1`, `p5`, `p10`, `p25`, `p75`, `p90`, `p95`, `p99`.	`Median`
`temporal_colocation`	Boolean variable to set if you want to temporally colocate the observation and model data.	`True`
`temporal_colocation_active`	Boolean variable	`False`
`spatial_colocation`	Boolean variable to set if you want to spatially colocate the observation and model data across multiple species.	`True`
`spatial_colocation_tolerance`	Spatial colocation tolerance to match stations by `longitudes`/`latitudes` and/or `measurement_altitudes` (in metres)	`19.053`
`spatial_colocation_validation`	Boolean variable to validate spatial colocation intersections via position using `spatial_colocation_tolerance`	`True`
`spatial_colocation_validation_tolerance`	Spatial colocation validation tolerance to validate station reference/station name match of stations by longitude/latitude position (in metres)	`10000.0`
`spatial_colocation_station_reference`	Boolean variable to indicate the use of `station_reference` variable for spatial colocation	`True`
`spatial_colocation_station_name`	Boolean variable to indicate usage of `station_name` variable for spatial colocation	`True`
`spatial_colocation_longitude_latitude`	Boolean variable to indicate the use of `longitude` and `latitude` variables for spatial colocation	`True`
`spatial_colocation_measurement_altitude`	Boolean variable to indicate the use of `measurement_altitude` variable for spatial colocation	`True`
`plot_characteristics_filename`	The path to the file containing the plot characteristics.	—
`observations_data_label`	Alias for observational data	`observations`
`lower_bound`	Filter out data lower than this set limit. If multiple species are being read then this can either be one value, setting the same limit across species or multiple values per species (e.g. `3, 4, 5`).	—
`upper_bound`	Filter out data above this set limit. If multiple species are being read then this can either be one value, setting the same limit across species or multiple values per species (e.g. `3, 4, 5`).	—
`map_extent`	Set the map plot extents with the syntax: minimum longitude, maximum longitude, minimum latitude, maximum latitude (e.g. `-30, 50, 20, 90`).	`[-180, 180, -90, 90]` in Dashboard, adapted to selected stations in Report and Library
`remove_extreme_stations`	Type of extreme stations removal, from the options given in `remove_extreme_stations.yaml`.	—
`resampling_resolution`	Resolution you want to resample your data to: `hourly`, `3hourly`, `6hourly`, `daily`, `monthly`, `annual`.	—
`multispecies_units`	Units of data in multispecies plots. Only accepts strings, if units for each species are: {‘sconco3’: ‘ug m-3’, ‘sconcno2’: ‘ug m-3’, ‘sconcco’: ‘mg m-3’, ‘sconcso2’: ‘ug m-3’}, choose only one between ug m-3 and mg m-3 (e.g. `ug m-3`) and the data of the species that are not in the chosen units will be converted.	—

Dashboard parameters

This parameter is used only in the Dashboard mode. It is optional.

Parameter	Description	Default
`active_dashboard_plots`	Plots that will be active in the dashboard once it is launched (e.g. `timeseries, periodic-violin, scatter, distribution`).	`timeseries, statsummary, distribution, periodic`

Report parameters

These parameters are used only in the Report mode. All of them are optional.

Parameter	Description	Default
`report_type`	Type of report to generate that defines which plots the report will contain, from the options given in `report_plots.yaml`.	`standard`
`report_summary`	Boolean variable to set if you wish to make specific plots for each station in subsection.	`True`
`report_stations`	Boolean variable to set if you wish to make summary plots across station subsection.	`False`
`report_title`	The header in the first page of the report (as in the PDF).	`Providentia Report`
`report_filename`	The filename of the report or the path to create the report (as in the PDF).	`PROVIDENTIA_Report`
`harmonise_stations`	Boolean variable to set if you wish to harmonise axes limits across stations for stations report.	`True`
`harmonise_summary`	Boolean variable to set if you wish to harmonise axes limits across subsections for summary report.	`True`

Interpolation parameters

These parameters are used only in the Interpolation mode. All of them are optional.

Parameter	Description	Default
`interp_n_neighbours`	Number of nearest neighbours used for interpolation	`4`
`interp_reverse_vertical_orientation`	Reverse vertical order of model levels	`False`
`interp_chunk_size`	Minimum number of jobs per interpolation chunk	`16`
`interp_job_array_limit`	Maximum number of chunks in the job array	`100`
`interp_multiprocessing`	Use multiprocessing instead of Greasy on HPC systems	`False`
`interp_spinup_timesteps`	Number of initial timesteps skipped for model spin-up	`0`
`interp_model_downsampling`	Statistic for the downsampling of the model resolution to the observational resolution: `mean`, `median`.	`mean`
`interp_model_upsampling`	Method for the upsampling of the model resolution to the observational resolution: `fill`, `gaps`. `fill` linearly fills between measurements, and `gaps` sets NaN values for times that the model does not have.	`fill`
`network_type`	Determines whether to use all GHOST or all non-GHOST networks when the `observation` field uses the `*` wildcard.	—
`model_resolution`	Model resolution if different from observations.	Same as `resolution`

Download parameters

These parameters are used only in the Download mode. All of them are optional.

Parameter	Description	Default
`dl_overwrite`	Indicates whether previously downloaded files should be overwritten: `True`, `False`.	—
`dl_ghost_source`	Determines where GHOST observations are downloaded from: `bsc`, `zenodo`.	—
`dl_interpolated`	Specifies whether the interpolated versions of the model output should be downloaded: `True`, `False`.	—
`dl_mode`	Selects what to download when both observations and model output are present in the configuration file: `obs`, `mod`, `both`.	—
`dl_thredds_update`	Specifies if the datasets information from Thredds should be updated per species every time we download data. The information is stored here: `True`, `False`.	—
`network_type`	Determines whether to use all GHOST or all non-GHOST networks when the observation field uses the `*` wildcard: `ghost`, `non-ghost`.	—
`dl_timeout`	Sets the timeout (in seconds) for downloads from HPC systems, covering interpolated and non-interpolated model data as well as GHOST and non-GHOST observations.	`180`
`model_resolution`	Model resolution if different from observations.	Same as `resolution`

Models

In Providentia, models can be set in different ways depending on how the model, domain and ensemble are defined.

1. Define model, domain and ensemble independently

You can specify each field separately:

model = cams61_monarch_ph3
domain = eu
ensemble = allmembers

You can also define only some of them:

model = cams61_monarch_ph3
domain = eu

model = cams61_monarch_ph3
ensemble = allmembers

Or only the model:

model = cams61_monarch_ph3

2. Combine model and domain

The domain can be included directly in the model name:

model = cams61_monarch_ph3-eu
ensemble = allmembers

Or:

model = cams61_monarch_ph3-eu

3. Combine model and ensemble

model = cams61_monarch_ph3-allmembers
domain = eu

4. Combine model, domain and ensemble

model = cams61_monarch_ph3-eu-allmembers

Aliases

Aliases can simplify long model names.

They work in two cases:

Combined model, domain and ensemble

model = cams61_monarch_ph3-eu-allmembers, cams_reanalysis_ensemble_validated-regional-000 (MONARCH, CAMS)

Independent fields with only one value each

model = cams61_monarch_ph3 (MONARCH)
domain = eu
ensemble = allmembers