Paper with code

Paper with code#

Best Practices in Modern Software Development: 23.11.23

Henrik Finsberg

To produce a paper we write code#

We write code to pre-process data
We write code to run simulations
We write code to create figures and tables (post-processing)

You want to publish code along with your paper#

The first time you can use

cookiecutter
- Will prompt you with some questions
- scientificcomputing/generate-paper
template
- Will copy the template repo
- scientificcomputing/example-paper

The second time, it is OK to copy files from an old project

Write a README file#

The README file should contain info about

Short description of the code / paper
Which paper and how to cite
How to install the dependencies
How to reproduce the results / run the code

Setting up a reproducible environment#

Write a pyproject.toml with the dependencies you need

Compile a requirements.txt with the exact versions you used when creating the results with pip-compile (from pip-tools)
```
pip-compile --output-file=requirements.txt pyproject.toml
```

Compile extra dependencies

pip-compile --extra=docs --output-file=requirements-docs.txt pyproject.toml

Publish docker image with exact dependencies#

Write a Dockerfile which clones the repo and installs the dependencies
Build and push the image to some public registry
- For example, you can set up a GitHub action to push a new image when you create a new release

Add the code for reproducing the results#

Several ways to do this:

Add scripts for reproducing figures and tables
- Add asserts that will raise an error if results have changed
- Example: scientificcomputing/example-paper or scientificcomputing/example-paper-fenics
Add notebooks and execute them as part of building docs
- Also here you can add asserts
- Example: RangamaniLabUCSD/smart

Tips and tricks#

Make it possible to pass command line arguments to the scripts so that you can e.g change the path to the results or data
- This will also make it easier if you e.g need to run the scripts on a cluster where you need to get the data from a different path
Set up CI to run the scripts
- Upload the artifacts after the run

Handling data#

Large datasets should not be stored in git
Data can be stored locally, dropbox or google drive during development
Ideally you should share the data on Zenodo (https://zenodo.org) when publishing the paper. This will make sure you get a DOI for the data.
- It is also possible to upload data with restricted data on Zenodo
Create a script for downloading data
- scientificcomputing/example-paper-fenics
- Dropbox: Use ?dl=1
- Google drive: https://drive.google.com/uc?export=download&id=DRIVE_FILE_ID

Make sure to create a tag / release#

A tag is a specific snapshot of your repository, and by creating a tag it makes it easy to check out that version of the code.

You should create a tag (and a release) of the code

when you submit the paper
when the paper is published (if there are any changes from submission)
if there are bug fixes

Remember to write a changelog if you make a new release with info of what has changed since the previous version.

License and Citation#

Make sure that people can use your code and provide proper attribution

License
- Without a license, other cannot use your code without asking you first
Citation information
- Write a CITATION.cff

Next steps#

Check out the material at scientificcomputing/seminar-23-11-2023
Ask us for help - email, slack or in the office