# Tips and tricks for developers This page can be used by developers to better understand certain parts of the code and perform certain actions: - [Migrate repository from Gitlab to Github](#migrate-repository-from-gitlab-to-github) - [Run tests](#run-tests) - [Upload to PyPI](#upload-to-pypi) - [Check that all reports are created](#check-that-all-reports-are-created) - [Run Providentia inside Docker container](#run-providentia-inside-docker-container) - [Create the docs for the first time and upload to readthedocs](#create-the-docs-for-the-first-time-and-upload-to-readthedocs) - [Generate the docs](#generate-the-docs) - [Create conda environments](#create-conda-environments) - [Memory Profiling Code](#memory-profiling-code) - [Obtain Zenodo Filetrees](#obtain-zenodo-filetrees) ## Migrate repository from Gitlab to Github Things to do in advance: - Create a file that maps all users from Gitlab to Github. For example: avilanov in Gitlab is albavilanova in Github. You can use this file as reference https://docs.google.com/spreadsheets/d/1O7pKzNNfRjM8O-iP_fB2HM5AvJiBXq31FO40BLUuu_M/edit?gid=0#gid=0. - Request permission to Francesco or Albert to become an owner in https://github.com/BSC-ES. - Invite users who created issues in Gitlab as team members to BSC-ES group in Github. - Create empty repository in Github. Once you have completed these steps: - Prepare to use node-gitlab-2-github repository, which will be used to transfer all issues and milestones: ```bash git clone --mirror https://earth.bsc.es/gitlab/ac/providentia.git cd providentia.git git push --no-verify --mirror git@github.com:BSC-ES/providentia.git git remote set-url --push origin git@github.com:BSC-ES/providentia.git git fetch -p origin git push --no-verify --mirror ``` - Install node-gitlab-2-github: You can try to install the original repository but I don't recommend it since so many things are failing. I fixed them in a forked version at https://github.com/albavilanova/node-gitlab-2-github. ```bash git clone https://github.com/albavilanova/node-gitlab-2-github cd node-gitlab-2-github ``` - Edit the settings.ts within the repository that you've just cloned: Leave projectId as 0 the first time you run `npm run start`, that way it will throw an error showing you all the available project ids and their corresponding project names. Then update `settings.ts` with your id and run again. Our `settings.ts` was: ``` import Settings from './src/settings'; export default { gitlab: { url: 'https://earth.bsc.es/gitlab', token: 'YOUR-GITLAB-ACCOUNT-TOKEN', projectId: 574, listArchivedProjects: true, sessionCookie: "***", }, github: { owner: 'BSC-ES', ownerIsOrg: true, token: 'YOUR-GITHUB-ACCOUNT-TOKEN', token_owner: 'albavilanova', repo: 'providentia', recreateRepo: false, }, s3: { accessKeyId: null, secretAccessKey: null, bucket: null, region: null, }, usermap: { 'dbowdalo': 'denebowdalo', 'avilanov': 'albavilanova', 'pserrano': 'PaulaSerranoSierra', 'avilamir': 'albertvilabsc', 'avradi': 'amaliavr', 'cmeikle': 'cmeikle', 'ctena': 'charlio86', 'cgile': 'carlottagile', 'cferruz': 'cferruz', 'eemili': 'emanueleemili', 'fmacchia': 'f-macchia', 'fbeninca': 'fbeninca', 'hpetetin': 'hervepetetin', 'jescriba': 'jeronimoescribano', 'jmassagu': 'jomassa', 'jyun': 'JayoungYun', 'kdeoliv': 'kdeoliv', 'kserrade': 'kserradell', 'lilic': 'atmosphericdust', 'mguevara': 'mguevarabsc', 'molid': 'molidg', 'ojorba': 'orioljorba', 'pcamps': 'paulacamps', 'rcruzalv': 'Rafaelaalves15', 'rgrodofz': 'raphaelgrzg', 'rgaratac': 'rgaratac1', 'rsousse': 'rsousse', 'tvintimi': 'tito-vintimilla', 'yyousef': 'yarayo', 'hnavarro': 'hectornav', }, projectmap: {}, conversion: { useLowerCaseLabels: true, addIssueInformation: true, }, transfer: { description: true, milestones: true, labels: true, issues: true, mergeRequests: true, releases: true, }, dryRun: false, exportUsers: true, useIssueImportAPI: true, usePlaceholderMilestonesForMissingMilestones: true, usePlaceholderIssuesForMissingIssues: true, useReplacementIssuesForCreationFails: true, useIssuesForAllMergeRequests: false, filterByLabel: undefined, trimOversizedLabelDescriptions: false, skipMergeRequestStates: [], skipMatchingComments: [], mergeRequests: { logFile: './merge-requests.json', log: false, }, commitMap: { } } as Settings; ``` Session cookie is obtained from Developer tools -> Application -> Cookies -> _gitlab_session while being in the Gitlab page. Tokens for Gitlab and Github need to be obtained from your account. - Run code ```bash npm run start Transferring Description Transferring Milestones Transferring Labels Transferring Releases Transferring Issues ... ``` - Once it is finished, go to Github and check if everything is correct. You might see that the commits are not linked to your accounts, that's because they are associated to your BSC emails and not your Gitlab emails. To solve this, create a .mailmap file in your repository. Our .mailmap-gitlab contained: ``` # FORMAT: # Omit commit-name or commit-email if same as replace-with. # git log --pretty="%aN <%aE>%n%cN <%cE>" | sort | uniq Alba Vilanova Alba Vilanova Dene Bowdalo Dene Bowdalo Paula Serrano Sierra Paula Serrano Sierra Amalia Vradi Amalia Vradi Francesco Benincasa Francesco Benincasa ``` Then: ```bash git clone https://github.com/BSC-ES/providentia.git cd providentia git filter-repo --mailmap .mailmap --force git remote add origin https://github.com/BSC-ES/providentia.git git push --set-upstream origin master --force git push --mirror --force origin ``` It is also possible that the default branch is not set to be the master, you can change that from Settings -> General. ## Run tests To run all the pipeline tests in your local machine and read the data from the folder `tests/data`, you will need to add `return "github"` in the function `get_machine()` in `auxiliar.py` and then run: ```bash conda activate providentia-env_v[version] pytest tests ``` If you want to run a specific set of tests (from: test_apply_filter, test_make_plot, test_read_data, test_save, test_unit_converter), you can do so by specifying the set: ```bash pytest tests/test_read_data.py ``` To run a specific test, you will need to edit these files and comment the functions you are not interested in testing. If you want to recreate the expected data in the tests, set the variable `GENERATE_OUTPUT` in `tests/aux_functions.py` to True temporarily and run the tests once before reverting it back to False. If you want to see the coverage report use: ```bash coverage report -i -m ``` ## Upload to PyPI Every time a version is released, we should update Providentia in PyPI. To do so, we will first create a source distribution in the folder `dist``: ```bash python setup.py sdist ``` We can check if it can be installed doing: ```bash pip install dist/providentia-X.X.X.tar.gz ``` If everything is correct, we can proceed to upload our distribution to the website. To upload the package we need to install twine: ```bash pip install twine ``` And get an API key in PyPI. I recommend creating a folder in your home directory called .pypirc, with the following content: ``` [pypi] username = __token__ password = pypi-[key] ``` That way you won't be asked for the your credentials anymore. Make sure the version name has been updated in __init__.py, and no more changes are needed since **it is not allowed to update the repository for that version once it is uploaded even if deleted**. Be very sure that everything is working fine at this point, as the uploaded version cannot be overwritten, only deleted. To upload it, you can do: ``` twine upload dist/providentia-X.X.X.tar.gz ``` ## Check that all reports are created To generate reports for all configuration files under `configurations` folder: ```bash #!/bin/bash folder_path="configurations" error_log="error_log2.txt" > "$error_log" for file in "$folder_path"/*; do command="./bin/providentia --report --config="$file echo echo echo echo "ejecutando: $command" output=$(eval $command 2>&1) if [ $? -ne 0 ]; then error_type=$(echo "$output" | grep -o -E '[a-zA-Z]*Error') echo "Command failed with exit code $?." echo "File: $file Error: $error_type" >> "$error_log" else echo "Command executed successfully." fi done ``` ## Run Providentia inside Docker container We need to first build our image: ```bash docker build -t "providentia-image" . ``` To be able to display the dashboard we will also run these commands: ```bash export DISPLAY=:1.0 xhost +local:* ``` We can then start the service using compose.yml: ```bash docker compose run providentia bash ``` This will open a "session" inside our container, from which we can run Providentia. Prior to that you should update the paths to the volumes (/data, /home/avilanov/software/Providentia) in `compose.yml`, used to access the data and repositories that are found in your local machine (outside the Docker container). The changes you do to your local files will be automatically reflected in the container. ## Create the docs for the first time and upload to readthedocs Install `myst-parser`, `Sphinx` and `Sphinx-rtd-theme` in your environment and add the modules to your `requirements.txt` file. ```bash conda activate providentia-env_v[version] pip install myst-parser Sphinx Sphinx-rtd-theme ``` Create a folder called docs inside your repository and do: ```bash cd docs sphinx-quickstart ``` This will ask you a few questions (we answered with yes to the question of whether we want to separate source and build folders) and a `conf.py` file, together with other files, will be generated in the `docs/source` folder. You can edit the configuration file. In our case, we added some extensions and plugins: ``` # -- General configuration --------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration extensions = ["myst_parser", "sphinx.ext.todo", "sphinx.ext.viewcode", "sphinx.ext.autodoc", "sphinx_rtd_theme"] templates_path = ['_templates'] exclude_patterns = [] myst_heading_anchors = 2 # -- Options for HTML output ------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output html_theme = 'sphinx_rtd_theme' html_static_path = ['_static'] html_css_files = [ 'css/custom.css', ] ``` Add your markdown files to `docs/source` and add the .md filenames to `index.rst` in order to show their contents. Then run: ```bash make clean make html ``` Here you might get some errors related to the format of your .md files, fix them and run both commands again. This will generate the HTML files under `docs/build`, you can open them from the browser to make sure they look correct. Now create an account and project in https://app.readthedocs.org/, and associate it with your project in Github, which must be open source. To show the documentation, you will need to create the file .readthedocs.yaml in your repository. Ours looks like this: ``` # Read the Docs configuration file # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details # Required version: 2 # Set the OS, Python version, and other tools you might need build: os: ubuntu-24.04 tools: python: "mambaforge-22.9" # Build documentation in the "docs/" directory with Sphinx sphinx: configuration: docs/source/conf.py # Create environment conda: environment: environment.yaml ``` Trigger a build from your latest commit containing this file and you should be ready to go. ## Generate the docs Edit the files under `docs/source`, then navigate to docs and simply run: ```bash conda activate providentia-env_v3.0.1 cd docs make clean make html ``` Do not edit anything under `docs/build` as it gets deleted everytime you run `make clean`. ## Create conda environments ### Create local environment If for some reason you want to create the environment from scratch, you can use: ``` conda env create -f environment.yaml ``` You might get a warning like: ``` WARNING conda.models.version:get_matcher(556): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.6.0.*, but conda is ignoring the .* and treating it as 1.6.0 ``` This can be removed by updating conda: ``` conda update conda conda install -n base conda=24.4.0 conda-build=24.3.0 ``` Check what the latest versions of [conda](https://github.com/conda/conda/releases) and [conda-build](https://github.com/conda/conda-build/releases) are. What the first command does is creating an environment called `providentia-env_v3.0.1` with the Python version 3.11.5, and installing `cartopy`, `jupyterlab`, `ghostscript` and `dask` with conda, and the Python packages from `requirements.txt` using pip. The equivalent would be: ```bash conda create -n providentia-env_v3.0.1 python=3.11.5 -c conda-forge --override-channels conda activate providentia-env_v3.0.1 conda install -c conda-forge cartopy jupyterlab ghostscript dask --override-channels pip install -r requirements.txt ``` ### Create providentia-env_v3.0.1-nord4 in Nord4 ```bash module unload intel module load GCC/10.2.0 conda create -p /gpfs/projects/bsc32/repository/apps/conda_envs/providentia-env_v3.0.1-nord4 -y python=3.11.5 -c conda-forge --override-channels conda activate /gpfs/projects/bsc32/repository/apps/conda_envs/providentia-env_v3.0.1-nord4 conda install -c conda-forge cartopy jupyterlab ghostscript dask --override-channels pip install -r requirements.txt ``` ### Create providentia-env_v3.0.1 in MN5 ```bash conda create -p /gpfs/projects/bsc32/repository/apps/conda_envs/providentia-env_v3.0.1 -y python=3.11.5 -c conda-forge --override-channels conda activate /gpfs/projects/bsc32/repository/apps/conda_envs/providentia-env_v3.0.1 conda install -c conda-forge cartopy jupyterlab ghostscript dask --override-channels pip install -r requirements.txt ``` ## Memory Profiling Code To memory-profile a function, you can use the `Tracker` method from the `memray` python module. ```python import memray def function_to_profile1(): pass with memray.Tracker("output_file"): function_to_profile1() ``` Use the `memray` command in the terminal to create a human-readable output. ### Setup - If working **locally**, install Memray with: ```bash python3 -m pip install memray ``` - If working on **MN5**, load the module with: ```bash module load memray/1.17.1-foss-2023b-Python-3.11.5 ``` ### Generate Outputs To create an interactive flamegraph, use: ```bash memray flamegraph output_file ``` To create a text summary, use: ```bash memray table output_file ``` To get the detailed stats, use: ```bash memray stats output_file ``` ## Obtain Zenodo Filetrees Here the code used to obtain the Zenodo filetrees for the release of the next GHOST versions in Zenodo. In order to create individual filetrees, run a modified version of the Providentia Zenodo download. First of all, the `download_ghost_network_zenodo` function only has to be runned once with `initial_check = False`. To do so comment these lines on `configuration.py`: ```python # download GHOST network #initial_check_nc_files = download_fun(network, initial_check=True) #files_to_download = self.select_files_to_download(initial_check_nc_files) #if not initial_check_nc_files or files_to_download: download_fun(network, initial_check=False, files_to_download=files_to_download) ``` This makes `download_ghost_network_zenodo` run the next code: ```python # get the GHOST artifact value for the corresponding network artifact_network = self.artifact_mapping[network] # create temporal dir to store the zip file and its tar components self.temp_dir = os.path.join(self.download_instance.ghost_root, ".temp") os.makedirs(self.temp_dir, exist_ok=True) # download zip on the temporal directory zip_path = self.download_zip(network, artifact_network) # extract zip on the temporal directory valid_files_info = self.extract_zip(files_to_download, zip_path, initial_check) # extract tar on the temporal directory self.extract_tar(valid_files_info) ``` Before that modify the `extract_zip` and `extract_tar` functions from the `zenodo.py` module. ```python def extract_zip(self, network, zip_path): # open the ZIP file with ZipFile(zip_path, "r") as zipf: zipf.extractall(self.temp_dir) # list all files inside the ZIP self.file_paths = [join(self.temp_dir,f) for f in zipf.namelist() if not f.endswith("/")] def extract_tar(self, network): import tarfile import json json_dict = {} for tar_path in tqdm(self.file_paths,desc=f" Checking {network} tars",): with tarfile.open(tar_path, "r:*") as tar: _, _, _, _, _, _, _, _, network, resolution, species_tar = tar_path.split('/') species = species_tar[:-7] if network not in json_dict: json_dict[network] = {} if resolution not in json_dict[network]: json_dict[network][resolution] = {} if species not in json_dict[network][resolution]: json_dict[network][resolution][species] = [] for member in tar.getmembers(): file_name = member.name.split('/')[-1] if file_name.endswith('.nc'): json_dict[network][resolution][species].append(file_name) json_dict[network][resolution][species] = list(sorted(json_dict[network][resolution][species])) final_path = f"/home/pserrano/providentia/settings/internal/zenodo/zenodo_networks/{self.download_instance.ghost_version}/{network}.json" with open(final_path, "w", encoding="utf-8") as f: json.dump(json_dict, f, indent=2) with open(final_path, "r", encoding="utf-8") as f: data = json.load(f) ``` After all the networks have generated individual filetrees, join all of them by running this script: ```python import os import json import yaml import sys ghost_version = '1.5.1' path = f"/home/pserrano/providentia/settings/internal/zenodo/zenodo_networks/{ghost_version}" ld = os.listdir(path) yaml_path = f"/home/pserrano/providentia/settings/internal/zenodo/zenodo_{ghost_version}.yaml" yaml_obj = yaml.safe_load(open(yaml_path)) valid_networks = yaml_obj.keys() actual_networks = [i[:-5] for i in ld] if set(valid_networks)-set(actual_networks) or set(actual_networks)-set(valid_networks): print(set(valid_networks)-set(actual_networks)) print() print(set(actual_networks)-set(valid_networks)) sys.exit() data = {} for json_file in sorted(ld): full_path = os.path.join(path, json_file) with open(full_path) as f: d = json.load(f) n = list(d.keys())[0] data[n] = d[n] final_path = os.path.join(os.path.dirname(os.path.dirname(path)), f'zenodo_ghost_filetree_{ghost_version}.json') with open(final_path, 'w') as f: json.dump(data, f,indent=4) ``` Too see more useful scripts regarding the filetrees [this](https://github.com/BSC-ES/providentia/issues/812) issue.