Best Practices in Modern Software Development: Paper with code

Paper with code

Best Practices in Modern Software Development: 23.11.23

Henrik Finsberg

23.11.23 - Henrik Finsberg
Best Practices in Modern Software Development: Paper with code

To produce a paper we write code

  • We write code to pre-process data
  • We write code to run simulations
  • We write code to create figures and tables (post-processing)
23.11.23 - Henrik Finsberg
Best Practices in Modern Software Development: Paper with code

You want to publish code along with your paper

The first time you can use

  • The second time, it is OK to copy files from an old project
23.11.23 - Henrik Finsberg
Best Practices in Modern Software Development: Paper with code

Write a README file

The README file should contain info about

  • Short description of the code / paper
  • Which paper and how to cite
  • How to install the dependencies
  • How to reproduce the results / run the code
23.11.23 - Henrik Finsberg
Best Practices in Modern Software Development: Paper with code

Setting up a reproducible environment

Write a pyproject.toml with the dependencies you need

  • Compile a requirements.txt with the exact versions you used when creating the results with pip-compile (from pip-tools)
    pip-compile --output-file=requirements.txt pyproject.toml
    
  • Compile extra dependencies
    pip-compile --extra=docs --output-file=requirements-docs.txt pyproject.toml
    
23.11.23 - Henrik Finsberg
Best Practices in Modern Software Development: Paper with code

Publish docker image with exact dependencies

  • Write a Dockerfile which clones the repo and installs the dependencies
  • Build and push the image to some public registry
    • For example, you can set up a GitHub action to push a new image when you create a new release
23.11.23 - Henrik Finsberg
Best Practices in Modern Software Development: Paper with code

Add the code for reproducing the results

Several ways to do this:

23.11.23 - Henrik Finsberg
Best Practices in Modern Software Development: Paper with code

Tips and tricks

  • Make it possible to pass command line arguments to the scripts so that you can e.g change the path to the results or data
    • This will also make it easier if you e.g need to run the scripts on a cluster where you need to get the data from a different path
  • Set up CI to run the scripts
    • Upload the artifacts after the run
23.11.23 - Henrik Finsberg
Best Practices in Modern Software Development: Paper with code

Handling data

  • Large datasets should not be stored in git
  • Data can be stored locally, dropbox or google drive during development
  • Ideally you should share the data on Zenodo (https://zenodo.org) when publishing the paper. This will make sure you get a DOI for the data.
    • It is also possible to upload data with restricted data on Zenodo
  • Create a script for downloading data
23.11.23 - Henrik Finsberg
Best Practices in Modern Software Development: Paper with code

Make sure to create a tag / release

A tag is a specific snapshot of your repository, and by creating a tag it makes it easy to check out that version of the code.

You should create a tag (and a release) of the code

  • when you submit the paper
  • when the paper is published (if there are any changes from submission)
  • if there are bug fixes
  • Remember to write a changelog if you make a new release with info of what has changed since the previous version.
23.11.23 - Henrik Finsberg
Best Practices in Modern Software Development: Paper with code

License and Citation

Make sure that people can use your code and provide proper attribution

  • License

    • Without a license, other cannot use your code without asking you first
  • Citation information

    • Write a CITATION.cff
23.11.23 - Henrik Finsberg
Best Practices in Modern Software Development: Paper with code

Next steps

23.11.23 - Henrik Finsberg