Overview

Providentia’s download mode retrieves modelled and observational data from BSC systems and external sources (CAMS, Zenodo, ACTRIS) for local use.

Getting started

To start downloading data, simply add --download or --dl as a launch option along with the mandatory configuration file on the command line:

./bin/providentia --config=/path/to/file/example.conf --download
./bin/providentia --config=/path/to/file/example.conf --dl

This will get the data that needs to be downloaded from your configuration file and save it into the directories specified in settings/data_paths.yaml for local.

The download mode fetches all the content specified in your configuration file across all sections. To only run one specific section, add the --section parameter to the command.

Types of downloads

Providentia supports four types of downloads. For detailed instructions, please visit the respective pages:

  1. Download from BSC HPC Machines

    • Downloads GHOST and non-GHOST data and model outputs from BSC HPC machines. You must have a BSC account to access this feature.

    • How to get this type of download:

      • For GHOST networks, answer y to the prompt:
        Do you want to download observational data from the BSC remote machine? (Otherwise, GHOST observational data will be retrieved from Zenodo)
        or set dl_ghost_source to bsc.

      • For non-GHOST networks and interpolated/non-interpolated model data, no special action is required.

    • To see more information, check the BSC download page.

  2. Download of GHOST network data from Zenodo

    • Downloads observational data from GHOST networks from the GHOST Zenodo webpage.

    • How to get this type of download: answer n to the HPC prompt: Do you want to download observational data from the BSC remote machine? (Otherwise, GHOST observational data will be retrieved from Zenodo) or set dl_ghost_source to zenodo.

    • To see more information, check the Zenodo download page.

  3. Download of ACTRIS network data from Thredds

    • Downloads observational data from ACTRIS through ACTRIS Thredds.

    • How to get this type of download: write actris/actris on the network field in your configuration file.

    • To see more information, check the ACTRIS download page.

  4. Download of CAMS non-interpolated model data from the Atmosphere Data Store (ADS)

    • Downloads CAMS model output from the Atmosphere Data Store. You must have an ECMWF account to access this feature.

    • How to get this type of download: specify the model as cams_analysis, cams_forecast or cams_reanalysis in your configuration, and set dl_interpolated to False.

    • To see more information, check the CAMS download page.

  5. Download of ERA5 non-interpolated model data from the Climate Data Store (CDS) or the Simulation and Data Laboratory ‘Climate Science’ (SDL)

    • Downloads ERA5 model output from the Climate Data Store. An ECMWF account is required to access this feature. Alternatively, data can be downloaded from the Simulation and Data Laboratory ‘Climate Science’, which does not require an account.

    • How to get this type of download: specify the model as era5_reanalysis or era5_tropopause in your configuration, and set dl_interpolated to False.

    • To see more information, check the ERA5 download page.

Download configuration fields

All parameters that can be used in the download configuration files can be found in the Shared Parameters or Download Parameters sections of the Configuration Fields page.

Automation of the download

When running downloads, the questions presented during a download can be skipped by setting the appropriate variables. This allows downloads to be fully automated without any user input.

Each of these variables corresponds directly to one of the questions asked during a manual download.

Variable

Original Question

Expected Values

dl_overwrite

There are some files that were already downloaded in a previous download, do you want to overwrite them ([y]/n)?

True (overwrite existing files) or False (keep existing files)

dl_ghost_source

Do you want to download observational data from the BSC remote machine? (Otherwise, GHOST observational data will be retrieved from Zenodo) ([y]/n)

bsc (download from BSC remote machine) or zenodo (retrieve from Zenodo)

dl_interpolated

Model data was detected in the configuration file. Do you want to download the interpolated version? (Otherwise, the non-interpolated model data will be downloaded) ([y]/n)

True (download interpolated) or False (download non-interpolated)

dl_mode

Which type of data do you want to download? Observational, modelled or both? ([both]/obs/mod)

obs (download observations), mod (download models) or both (download both)

dl_thredds_update

File containing information of the files available in Thredds for {actris_parameter} ({info_path}) already exists. Do you want to update it (y/[n])?

True or False

network_type

Do you want to download all the GHOST networks? (Otherwise all the non-GHOST networks will be downloaded) ([y]/n)

ghost (use all GHOST networks) or non-ghost (use all non-GHOST networks)

Using wildcards

You can use the * wildcard in the following fields to automatically select all available values:

  • network, observation, framework

  • model, models, experiments, experiment

  • species

  • resolution

  • start_date

  • end_date

Example configuration file with wildcards

[WILDCARD]
network = EBAS
species = sconco3, sconcno2
resolution = hourly
start_date = *
end_date = *
model = cams61_emep_ph2-eu-000

Note: Using wildcards may result in large downloads, so use with caution.

Model resolution

Models and observations may have different native temporal resolutions. In Providentia, interpolation between model data and observations includes:

  • Upsampling → duplicates temporal steps

  • Downsampling → cuts or aggregates temporal steps

Now the data download and interpolation processes can be performed using the same configuration file, as in this example:

[PRV_sconco3_CHIMERE]
network = EBAS
species = sconco3
resolution = daily
start_date = 20180101
end_date = 20180601
model = cams61_chimere_ph2-eu-000 (CHIMERE)
model_resolution = hourly