Getting started

If you have access to the HPC machines at Barcelona Supercomputing Center (BSC), the first thing you need to decide is whether you want to use Providentia on a supercomputer (MN5 or Nord4) or on your local computer.

We recommend working on local machines to everyone, including the users at BSC, because the interactive features of the dashboard are faster and you do not need to wait in queue to get resources and use the software. The only disadvantage is that the data (models and observations) stored on HPC cannot be accessed directly and need to be downloaded onto your local machine using the download mode in advance. If you do not want to download the data and instead you prefer to use an HPC machine for your analysis, we recommend reading the Wiki section Connection setup.

If you do not have access to the machines, you will only be able to use the download mode to get CAMS model data and observations from limited sources, i.e. Zenodo for GHOST and NILU Thredds for ACTRIS. If you want to use your own data, consider checking the tutorial on how to format model data and reading the section Create your own data network to process and create observational netCDF files that Providentia can read.

Prerequisites

Providentia works best on Linux and macOS. For these machines, these are the prerequisites:

1. Git

Install Git by following the instructions here: https://git-scm.com/install/linux

2. Conda

Install conda. We recommend Miniconda because it is lightweight. You can download the .sh file for Linux from the official website after creating an account. If you get an error and your VPN is active, make sure to deactivate it before downloading it.

From the terminal on the WSL run:

cd Downloads
bash Miniconda3-latest-Linux-x86_64.sh

The software has not been designed to work on Windows. If you are a Windows user you have three options:

Run it from a Windows Subsystem for Linux (WSL) if you have Windows Pro. Read the instructions here. (Recommended)
Use a virtual machine software like Oracle VirtualBox to access Linux. Read the instructions here.
Run it in Git Bash. Read the instructions here.

Cloning the project

Use the following command to get a copy of the repository in your machine:

git clone https://github.com/bsc-es/providentia

When you have finished cloning the repository from Github, you are automatically in the branch master. It is recommended to use that branch as it contains the latest features and bug fixes.

Running the tool the first time

Once cloned, you should be able to open the dashboard by running this command from your terminal:

cd providentia
./bin/providentia

The first time the software runs in a local machine it will create a conda environment called providentia-env_v[version] with all the modules needed. If you encountered any other problem, feel free to contact us.

In HPC, the environment is not created by the user as it is stored in a shared folder. Every time we run Providentia on HPC, a wall time of 2 hours is requested, with 12 CPUs and 30Gb of total memory. This can be modified as desired using the bash options. You can check the available options with:

./bin/providentia --usage

Accessing the data

When you open the dashboard on a local machine the first time, you don’t see anything on the dropdowns and you need to place the data into a local directory. By default, the data is read from /home/{user}/data/providentia. If for some reason you want to store it elsewhere you can edit the paths in settings/data_paths.yaml.

Data directory tree and filename conventions

The datasets need to be saved following a very specific directory tree. The download mode takes care of that when saving the files, more details can be found in the download section. However, if you are using your own data you will need to take that into account.

By default, in the folder /home/{user}/data/providentia (or your preferred) there should be three folders:

mod: Interpolated model data as in: {GHOST version} -> {model}{domain}{ensemble} -> {resolution} -> {species} -> {network} -> {species}_{year}{month}.nc.
mod_to_interp: Model data to interpolate as in: {model} -> {domain} -> {resolution} -> {species} -> {species}_{year}{month}.nc.
obs: Observation datasets. For GHOST as in: ghost -> {network} -> {GHOST version} -> {resolution} -> {species} -> {species}{year}{month}.nc. For non-GHOST as in: nonghost -> {provider} -> {network} -> {resolution} -> {species} -> {species}{year}{month}.nc.

As observed, datasets must be saved per month, independently of their temporal resolution. An example of a working directory tree is the following:

├── mod
│   └── 1.5
│       └── cams61_emep_ph2-eu-000
│           └── hourly
│               └── sconcno2
│                   └── eea-eionet
│                       ├── sconcno2_201801.nc
│                       ├── sconcno2_201802.nc
│                       ├── sconcno2_201803.nc
│                       ├── sconcno2_201804.nc
│                       ├── sconcno2_201805.nc
│                       ├── sconcno2_201806.nc
│                       ├── sconcno2_201807.nc
│                       ├── sconcno2_201808.nc
│                       ├── sconcno2_201809.nc
│                       ├── sconcno2_201810.nc
│                       ├── sconcno2_201811.nc
│                       └── sconcno2_201812.nc
├── mod_to_interp
│   └── cams61_emep_ph2
│       └── eu
│           └── hourly
│               └── sconcno2
│                   ├── sconcno2_201801.nc
│                   ├── sconcno2_201802.nc
│                   ├── sconcno2_201803.nc
│                   ├── sconcno2_201804.nc
│                   ├── sconcno2_201805.nc
│                   ├── sconcno2_201806.nc
│                   ├── sconcno2_201807.nc
│                   ├── sconcno2_201808.nc
│                   ├── sconcno2_201809.nc
│                   ├── sconcno2_201810.nc
│                   ├── sconcno2_201811.nc
│                   └── sconcno2_201812.nc
└── obs
    ├── ghost
    │   └── EBAS
    │       └── 1.5
    │           └── hourly
    │               └── sconcno2
    │                   ├── sconcno2_201801.nc
    │                   ├── sconcno2_201802.nc
    │                   ├── sconcno2_201803.nc
    │                   ├── sconcno2_201804.nc
    │                   ├── sconcno2_201805.nc
    │                   ├── sconcno2_201806.nc
    │                   ├── sconcno2_201807.nc
    │                   ├── sconcno2_201808.nc
    │                   ├── sconcno2_201809.nc
    │                   ├── sconcno2_201810.nc
    │                   ├── sconcno2_201811.nc
    │                   └── sconcno2_201812.nc
    └── nonghost
        └── eea
            └── eionet
                └── hourly
                    └── sconcno2
                        ├── sconcno2_201801.nc
                        ├── sconcno2_201802.nc
                        ├── sconcno2_201803.nc
                        ├── sconcno2_201804.nc
                        ├── sconcno2_201805.nc
                        ├── sconcno2_201806.nc
                        ├── sconcno2_201807.nc
                        ├── sconcno2_201808.nc
                        ├── sconcno2_201809.nc
                        ├── sconcno2_201810.nc
                        ├── sconcno2_201811.nc
                        └── sconcno2_201812.nc

If you are running Providentia on HPC, you will already see that there are options to choose from in the menu on the top. The data is being read from the paths specified in settings/data_paths.yaml.

Providentia internal directories

When cloning the Providentia repository from GitHub, the project is automatically created with a predefined directory structure. These internal directories are required for the correct execution of Providentia and are used to store configuration files, intermediate outputs, results and visualizations.

Providentia uses the following directories to store essential files during execution:

├── configurations/     Configuration files required to run all Providentia modes.
├── logs/               Output files generated during interpolation.
├── notebooks/          Template notebooks.
├── plots/              Saved plots when Providentia is used as a module.
├── reports/            Generated reports.
├── saved_data/         Configuration, NetCDF and NumPy files when used as a module.
└── settings/           Files that configure various aspects of Providentia.

The settings/ directory contains the main configuration files that control Providentia’s behavior. These files can be modified by the user to adapt and configure Providentia’s behavior according to their needs.

├── basic_stats.yaml                 Defines the stats properties
├── color_palettes.yaml              Defines the possible color palettes
├── data_paths.yaml                  Defines dataset paths categorized by machine.              	
├── experiment_bias_stats.yaml       Defines the experiment stats properties
├── init_prov.yaml                   Stores initialization settings, including non-ghost available networks and resolutions.             
├── interp_experiments.yaml          Specifies locations for non-interpolated experiments. 
├── remove_extreme_stations.yaml     Defines criteria for filtering stations if you want to automatically remove them. 
├── exceedances.yaml                 Stores threshold values for exceedance statistics             	 
├── fairmode.yaml                    Stores configurations for the Fairmode plots.                
├── plot_characteristics.yaml        Configures plot appearance settings.	
└── report_plots.yaml                Defines plot types per report type.        

Statistics

Before explaining how to use each mode, it is important to note that statistics are computed in numerous ways, depending on the user’s needs. A thorough explanation can be found in the Statistics section.

Launching the dashboard

As explained, you can launch the dashboard by simply running:

./bin/providentia

An initial set of plots will be displayed, including the timeseries, distribution, statistics summary, and periodic plots. To take full advantage of Providentia, you can explore the wide range of plotting options described in Plot types and options. We also recommend reading the Plot customisation section.

More details can be found in the dashboard section.

Using a configuration file

If you want to define which data is loaded in advance, you can use a configuration file. Some examples can be found under the folder configurations, for more details read the section Configuration files.

Once you have a configuration file, you can specify it from the command line using the --config argument. If the file is located inside the configurations/ folder, only the file name is required:

./bin/providentia --config=example.conf

If the file is stored in a different location, you can provide the full path:

./bin/providentia --config=/path/to/file/example.conf

If you have multiple sections or subsections, a pop-up window will immediately appear where you can choose the section or subsection of interest. After that, the graphical window of Providentia will appear and you can begin using the tool.

Generating a report

With the configuration file you can also generate PDF reports. In order to do this, you should use the argument report:

./bin/providentia --config=example.conf --report

You can launch the dashboard or get a report for only one section by using the option --section. In order to indicate subsections, you will need to write the section name, followed by an interpunct (·) and the subsection name.

./bin/providentia --config=example.conf --report --section=All·France

The reports will be saved under the folder reports. You can add a path in the report_filename of the configuration file to change the default directory.

More details can be found in the report section.

Using Providentia backend functions

Providentia can be imported and used in your own Python scripts. Some examples on how to use Providentia’s backend functions can be found in the tutorials folder.

Also, a Jupyter notebook with an active conda environment can be launched with the following command:

./bin/providentia --notebook

More details can be found in the library section.

Interpolating your model data to observations

If you want to visualise data from your model, you will need to interpolate it to the network. Using a configuration file, you can start interpolating your model data to your desired observational network.

./bin/providentia --config=example.conf --interpolate

More details can be found in the interpolation section.

Enjoy!