Tips and tricks for developers

This page can be used by developers to better understand certain parts of the code and perform certain actions:

Migrate repository from Gitlab to Github

Things to do in advance:

  • Create a file that maps all users from Gitlab to Github. For example: avilanov in Gitlab is albavilanova in Github. You can use this file as reference https://docs.google.com/spreadsheets/d/1O7pKzNNfRjM8O-iP_fB2HM5AvJiBXq31FO40BLUuu_M/edit?gid=0#gid=0.

  • Request permission to Francesco or Albert to become an owner in https://github.com/BSC-ES.

  • Invite users who created issues in Gitlab as team members to BSC-ES group in Github.

  • Create empty repository in Github.

Once you have completed these steps:

  • Prepare to use node-gitlab-2-github repository, which will be used to transfer all issues and milestones:

git clone --mirror https://earth.bsc.es/gitlab/ac/providentia.git
cd providentia.git
git push --no-verify --mirror git@github.com:BSC-ES/providentia.git
git remote set-url --push origin git@github.com:BSC-ES/providentia.git
git fetch -p origin
git push --no-verify --mirror
  • Install node-gitlab-2-github: You can try to install the original repository but I don’t recommend it since so many things are failing. I fixed them in a forked version at https://github.com/albavilanova/node-gitlab-2-github.

git clone https://github.com/albavilanova/node-gitlab-2-github
cd node-gitlab-2-github
  • Edit the settings.ts within the repository that you’ve just cloned: Leave projectId as 0 the first time you run npm run start, that way it will throw an error showing you all the available project ids and their corresponding project names. Then update settings.ts with your id and run again. Our settings.ts was:

import Settings from './src/settings';
export default {
  gitlab: {
    url: 'https://earth.bsc.es/gitlab',
    token: 'YOUR-GITLAB-ACCOUNT-TOKEN',
    projectId: 574,
    listArchivedProjects: true,
    sessionCookie: "***",
  },
  github: {
    owner: 'BSC-ES',
    ownerIsOrg: true,
    token: 'YOUR-GITHUB-ACCOUNT-TOKEN',
    token_owner: 'albavilanova',
    repo: 'providentia',
    recreateRepo: false,
  },
  s3: {
    accessKeyId: null,
    secretAccessKey: null,
    bucket: null,
    region: null,
  },
  usermap: {
    'dbowdalo': 'denebowdalo',
    'avilanov': 'albavilanova',
    'pserrano': 'PaulaSerranoSierra',
    'avilamir': 'albertvilabsc',
    'avradi': 'amaliavr',
    'cmeikle': 'cmeikle',
    'ctena': 'charlio86',
    'cgile': 'carlottagile',
    'cferruz': 'cferruz',
    'eemili': 'emanueleemili',
    'fmacchia': 'f-macchia',
    'fbeninca': 'fbeninca',
    'hpetetin': 'hervepetetin',
    'jescriba': 'jeronimoescribano',
    'jmassagu': 'jomassa',
    'jyun': 'JayoungYun',
    'kdeoliv': 'kdeoliv',
    'kserrade': 'kserradell',
    'lilic': 'atmosphericdust',
    'mguevara': 'mguevarabsc',
    'molid': 'molidg',
    'ojorba': 'orioljorba',
    'pcamps': 'paulacamps',
    'rcruzalv': 'Rafaelaalves15',
    'rgrodofz': 'raphaelgrzg',
    'rgaratac': 'rgaratac1',
    'rsousse': 'rsousse',
    'tvintimi': 'tito-vintimilla',
    'yyousef': 'yarayo',
    'hnavarro': 'hectornav',
  },
  projectmap: {},
  conversion: {
    useLowerCaseLabels: true,
    addIssueInformation: true,
  },
  transfer: {
    description: true,
    milestones: true,
    labels: true,
    issues: true,
    mergeRequests: true,
    releases: true,
  },
  dryRun: false,
  exportUsers: true,
  useIssueImportAPI: true,
  usePlaceholderMilestonesForMissingMilestones: true,
  usePlaceholderIssuesForMissingIssues: true,
  useReplacementIssuesForCreationFails: true,
  useIssuesForAllMergeRequests: false,
  filterByLabel: undefined,
  trimOversizedLabelDescriptions: false,
  skipMergeRequestStates: [],
  skipMatchingComments: [],
  mergeRequests: {
    logFile: './merge-requests.json',
    log: false,
  },
  commitMap: {
  }
} as Settings;

Session cookie is obtained from Developer tools -> Application -> Cookies -> _gitlab_session while being in the Gitlab page. Tokens for Gitlab and Github need to be obtained from your account.

  • Run code

npm run start
Transferring Description
Transferring Milestones
Transferring Labels
Transferring Releases
Transferring Issues
...
  • Once it is finished, go to Github and check if everything is correct. You might see that the commits are not linked to your accounts, that’s because they are associated to your BSC emails and not your Gitlab emails. To solve this, create a .mailmap file in your repository. Our .mailmap-gitlab contained:

# FORMAT: <replace-with--name>  <replace-with-email>  <commit-name>  <commit-email>
# Omit commit-name or commit-email if same as replace-with.
# git log --pretty="%aN <%aE>%n%cN <%cE>" | sort | uniq

Alba Vilanova               <alba.vilanova@outlook.com>          Alba Vilanova             <alba.vilanova@bsc.es>
Dene Bowdalo                <denebowdalo@googlemail.com>         Dene Bowdalo              <dene.bowdalo@bsc.es>
Paula Serrano Sierra        <paulaserranosierra@gmail.com>       Paula Serrano Sierra      <paula.serrano@bsc.es>
Amalia Vradi                <amalia.vradi@bsc.es>                Amalia Vradi              <amalia.vradi@bsc.es>
Francesco Benincasa         <fbeninca@gmail.com>                 Francesco Benincasa       <francesco.benincasa@bsc.es>

Then:

git clone https://github.com/BSC-ES/providentia.git
cd providentia
git filter-repo --mailmap .mailmap --force
git remote add origin https://github.com/BSC-ES/providentia.git
git push --set-upstream origin master  --force
git push --mirror --force origin

It is also possible that the default branch is not set to be the master, you can change that from Settings -> General.

Run tests

To run all the pipeline tests in your local machine and read the data from the folder tests/data, you will need to add return "github" in the function get_machine() in auxiliar.py and then run:

conda activate providentia-env_v[version]
pytest tests

If you want to run a specific set of tests (from: test_apply_filter, test_make_plot, test_read_data, test_save, test_unit_converter), you can do so by specifying the set:

pytest tests/test_read_data.py

To run a specific test, you will need to edit these files and comment the functions you are not interested in testing.

If you want to recreate the expected data in the tests, set the variable GENERATE_OUTPUT in tests/aux_functions.py to True temporarily and run the tests once before reverting it back to False.

If you want to see the coverage report use:

coverage report -i -m

Upload to PyPI

Every time a version is released, we should update Providentia in PyPI. To do so, we will first create a source distribution in the folder `dist``:

python setup.py sdist

We can check if it can be installed doing:

pip install dist/providentia-X.X.X.tar.gz

If everything is correct, we can proceed to upload our distribution to the website.

To upload the package we need to install twine:

pip install twine

And get an API key in PyPI. I recommend creating a folder in your home directory called .pypirc, with the following content:

[pypi]
  username = __token__
  password = pypi-[key]

That way you won’t be asked for the your credentials anymore. Make sure the version name has been updated in init.py, and no more changes are needed since it is not allowed to update the repository for that version once it is uploaded even if deleted. Be very sure that everything is working fine at this point, as the uploaded version cannot be overwritten, only deleted.

To upload it, you can do:

twine upload dist/providentia-X.X.X.tar.gz

Check that all reports are created

To generate reports for all configuration files under configurations folder:

#!/bin/bash
folder_path="configurations"
error_log="error_log2.txt"
> "$error_log"
for file in "$folder_path"/*; do
        command="./bin/providentia --report --config="$file
  echo
  echo
  echo
  echo "ejecutando: $command"
  output=$(eval $command 2>&1)
  if [ $? -ne 0 ]; then
  	error_type=$(echo "$output" | grep -o -E '[a-zA-Z]*Error')
  	echo "Command failed with exit code $?."
  	echo "File: $file Error: $error_type" >> "$error_log"
  else
    echo "Command executed successfully."
  fi
done

Run Providentia inside Docker container

We need to first build our image:

docker build -t "providentia-image" .

To be able to display the dashboard we will also run these commands:

export DISPLAY=:1.0
xhost +local:*

We can then start the service using compose.yml:

docker compose run providentia bash

This will open a “session” inside our container, from which we can run Providentia.

Prior to that you should update the paths to the volumes (/data, /home/avilanov/software/Providentia) in compose.yml, used to access the data and repositories that are found in your local machine (outside the Docker container). The changes you do to your local files will be automatically reflected in the container.

Create the docs for the first time and upload to readthedocs

Install myst-parser, Sphinx and Sphinx-rtd-theme in your environment and add the modules to your requirements.txt file.

conda activate providentia-env_v[version]
pip install myst-parser Sphinx Sphinx-rtd-theme

Create a folder called docs inside your repository and do:

cd docs
sphinx-quickstart

This will ask you a few questions (we answered with yes to the question of whether we want to separate source and build folders) and a conf.py file, together with other files, will be generated in the docs/source folder. You can edit the configuration file. In our case, we added some extensions and plugins:

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = ["myst_parser", "sphinx.ext.todo", "sphinx.ext.viewcode", "sphinx.ext.autodoc", "sphinx_rtd_theme"]
templates_path = ['_templates']
exclude_patterns = []
myst_heading_anchors = 2

# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'sphinx_rtd_theme'
html_static_path = ['_static']
html_css_files = [
    'css/custom.css',
]

Add your markdown files to docs/source and add the .md filenames to index.rst in order to show their contents. Then run:

make clean
make html

Here you might get some errors related to the format of your .md files, fix them and run both commands again. This will generate the HTML files under docs/build, you can open them from the browser to make sure they look correct.

Now create an account and project in https://app.readthedocs.org/, and associate it with your project in Github, which must be open source. To show the documentation, you will need to create the file .readthedocs.yaml in your repository. Ours looks like this:

# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS, Python version, and other tools you might need
build:
  os: ubuntu-24.04
  tools:
    python: "mambaforge-22.9"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
   configuration: docs/source/conf.py

# Create environment
conda:
  environment: environment.yaml

Trigger a build from your latest commit containing this file and you should be ready to go.

Generate the docs

Edit the files under docs/source, then navigate to docs and simply run:

conda activate providentia-env_v3.0.1
cd docs
make clean
make html

Do not edit anything under docs/build as it gets deleted everytime you run make clean.

Create conda environments

Create local environment

If for some reason you want to create the environment from scratch, you can use:

conda env create -f environment.yaml

You might get a warning like:

WARNING conda.models.version:get_matcher(556): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.6.0.*, but conda is ignoring the .* and treating it as 1.6.0

This can be removed by updating conda:

conda update conda
conda install -n base conda=24.4.0 conda-build=24.3.0

Check what the latest versions of conda and conda-build are.

What the first command does is creating an environment called providentia-env_v3.0.1 with the Python version 3.11.5, and installing cartopy, jupyterlab, ghostscript and dask with conda, and the Python packages from requirements.txt using pip. The equivalent would be:

conda create -n providentia-env_v3.0.1 python=3.11.5 -c conda-forge --override-channels
conda activate providentia-env_v3.0.1
conda install -c conda-forge cartopy jupyterlab ghostscript dask --override-channels
pip install -r requirements.txt

Create providentia-env_v3.0.1-nord4 in Nord4

module unload intel
module load GCC/10.2.0
conda create -p /gpfs/projects/bsc32/repository/apps/conda_envs/providentia-env_v3.0.1-nord4 -y python=3.11.5 -c conda-forge --override-channels
conda activate /gpfs/projects/bsc32/repository/apps/conda_envs/providentia-env_v3.0.1-nord4
conda install -c conda-forge cartopy jupyterlab ghostscript dask --override-channels
pip install -r requirements.txt

Create providentia-env_v3.0.1 in MN5

conda create -p /gpfs/projects/bsc32/repository/apps/conda_envs/providentia-env_v3.0.1 -y python=3.11.5 -c conda-forge --override-channels
conda activate /gpfs/projects/bsc32/repository/apps/conda_envs/providentia-env_v3.0.1
conda install -c conda-forge cartopy jupyterlab ghostscript dask --override-channels
pip install -r requirements.txt

Memory Profiling Code

To memory-profile a function, you can use the Tracker method from the memray python module.

import memray

def function_to_profile1():
    pass

with memray.Tracker("output_file"):
    function_to_profile1()

Use the memray command in the terminal to create a human-readable output.

Setup

  • If working locally, install Memray with:

python3 -m pip install memray
  • If working on MN5, load the module with:

module load memray/1.17.1-foss-2023b-Python-3.11.5

Generate Outputs

To create an interactive flamegraph, use:

memray flamegraph output_file 

To create a text summary, use:

memray table output_file 

To get the detailed stats, use:

memray stats output_file 

Obtain Zenodo Filetrees

Here the code used to obtain the Zenodo filetrees for the release of the next GHOST versions in Zenodo.

In order to create individual filetrees, run a modified version of the Providentia Zenodo download.

First of all, the download_ghost_network_zenodo function only has to be runned once with initial_check = False. To do so comment these lines on configuration.py:

# download GHOST network
#initial_check_nc_files = download_fun(network, initial_check=True)
#files_to_download = self.select_files_to_download(initial_check_nc_files)
#if not initial_check_nc_files or files_to_download:
    download_fun(network, initial_check=False, files_to_download=files_to_download)

This makes download_ghost_network_zenodo run the next code:

# get the GHOST artifact value for the corresponding network
artifact_network = self.artifact_mapping[network]    

# create temporal dir to store the zip file and its tar components
self.temp_dir = os.path.join(self.download_instance.ghost_root, ".temp")
os.makedirs(self.temp_dir, exist_ok=True)

# download zip on the temporal directory
zip_path = self.download_zip(network, artifact_network)

# extract zip on the temporal directory
valid_files_info = self.extract_zip(files_to_download, zip_path, initial_check)

# extract tar on the temporal directory
self.extract_tar(valid_files_info)          

Before that modify the extract_zip and extract_tar functions from the zenodo.py module.

def extract_zip(self, network, zip_path):        
  # open the ZIP file
  with ZipFile(zip_path, "r") as zipf:            
      zipf.extractall(self.temp_dir)

      # list all files inside the ZIP
      self.file_paths = [join(self.temp_dir,f) for f in zipf.namelist() if not f.endswith("/")]

def extract_tar(self, network):
  import tarfile
  import json

  json_dict = {}
  for tar_path in tqdm(self.file_paths,desc=f"    Checking {network} tars",):
      with tarfile.open(tar_path, "r:*") as tar:
          _, _, _, _, _, _, _, _, network, resolution, species_tar = tar_path.split('/')
          species = species_tar[:-7]

          if network not in json_dict:
              json_dict[network] = {}
          
          if resolution not in json_dict[network]:
              json_dict[network][resolution] = {}

          if species not in json_dict[network][resolution]:
              json_dict[network][resolution][species] = []

          for member in tar.getmembers():
              file_name = member.name.split('/')[-1]
              if file_name.endswith('.nc'):
                  json_dict[network][resolution][species].append(file_name)

          json_dict[network][resolution][species] = list(sorted(json_dict[network][resolution][species]))
      
  final_path = f"/home/pserrano/providentia/settings/internal/zenodo/zenodo_networks/{self.download_instance.ghost_version}/{network}.json"
  with open(final_path, "w", encoding="utf-8") as f:
      json.dump(json_dict, f, indent=2)

  with open(final_path, "r", encoding="utf-8") as f:
      data = json.load(f)

After all the networks have generated individual filetrees, join all of them by running this script:

import os
import json
import yaml
import sys

ghost_version = '1.5.1'

path = f"/home/pserrano/providentia/settings/internal/zenodo/zenodo_networks/{ghost_version}"
ld = os.listdir(path)

yaml_path = f"/home/pserrano/providentia/settings/internal/zenodo/zenodo_{ghost_version}.yaml"
yaml_obj = yaml.safe_load(open(yaml_path))

valid_networks = yaml_obj.keys()
actual_networks = [i[:-5] for i in ld]

if set(valid_networks)-set(actual_networks) or set(actual_networks)-set(valid_networks):
    print(set(valid_networks)-set(actual_networks))
    print()
    print(set(actual_networks)-set(valid_networks))
    sys.exit()

data = {}

for json_file in sorted(ld):
    full_path = os.path.join(path, json_file)
    with open(full_path) as f:
        d = json.load(f)
    n = list(d.keys())[0]
    data[n] = d[n]

final_path = os.path.join(os.path.dirname(os.path.dirname(path)), f'zenodo_ghost_filetree_{ghost_version}.json')
with open(final_path, 'w') as f:
    json.dump(data, f,indent=4)

Too see more useful scripts regarding the filetrees this issue.