Code and Data repositories#
Best Practices in Modern Software Development: 23.11.23
Henrik Finsberg
Why version control systems?#
To keep a history of what has been changed and why
To make it easy to go back to a previous version
To make changes while maintaining a working version
Image is used under a CC-BY 4.0 license. DOI: 10.5281/zenodo.3332807.
Setting up your first project#
Templates
Copy all files from an existing repository
Cookiecutters
Script that generate the files based on a few questions
What is GitHub?#
A place to host your remote repositories
To make it easier to collaborate with others
To keep a backup (on GitHub)
Create repositories under your group’s GitHub organization (this will make it easier if you are unavailable)
Code review process#
Open issue describing the problem. Use the issue tracker to discuss whether this needs to be solved.
Clone (or fork) the repository
Create a new branch (e.g
git checkout -b finsberg/fix-typo-in-readme
)Push your changes to this new branch
Create a pull request, refer to the issue and continue the discussion
Do you need to use branches?#
If you work alone and don’t work on multiple things at the same time, the it is probably OK to just work on the main
branch
Demo / Exercise#
Create a new repo on GitHub (ex:
example-paper
)Run the cookiecutter and add all the files in a single commit
git init # Initialize git remote add origin <url> # Add a new remote git add . # Add all files git commit -m "Add files" # Commit
Push the code to the main branch
git push -u origin main
Create a new brach and make some changes to the README file
git checkout -b edit-readme
Push the branch and open a PR
git push origin edit-readme
Versioning#
When you think that your code is ready for external users, it is time to create your first release
Your code should get a version number.
Create a release when you submit your paper.
MAJOR.MINOR.MICRO
Specify the version number in
pyproject.toml
Semantic or Calendar based versioning
Calendar based versioning#
YEAR.MONTH.DAY
YEAR.MONTH.NUMBER
YEAR.NUMBER
…
e.g 2023.11.4
Semantic versioning#
major.minor.micro
e.g0.1.2
Bump micro / patch: Bug fixes not affecting the API
Bump minor: Backward compatible API additions/changes
Bump major: Backward incompatible API changes
Typically start with
0
major version and bump to1
when ready for users.
Publish a new release#
Bump version in
pyproject.toml
Create a git tag once you have bumped the version
git tag v0.1.2 git push --tags
Create a release on GitHub and write a changelog
It is also possible to create a tag during this step
Write a changelog#
List the notable changes since the previous release
For the first release you don’t need a changelog
Information about changes are important for the users
Demo / Exercise: bump the version#
Change the version of your paper
Create a new tag and release
Licenses#
What can other users do with the material in your repository?
No license means the nobody can use, copy, distribute, or modify the work without consent from the author
Add a file called LICENSE to your repository. Go to GitHub, click “Add file” and type the name
LICENSE
and GitHub will provide you with some options
What license to choose?#
MIT: Permissive - Others can use your code in any way, and you will not be sued if the software doesn’t work (recommended in most cases)
GPL: Copyleft - derivative work must use the same license - good way to embrace open source but often problematic for commercial companies
LGPL: Similar to GPL but software can be used under different license
CC-BY-4.0 - Typically used for creative work (most journals use this)
Exercise:#
You want to use dolfinx: FEniCS/dolfinx
What license do you need to use?
You can choose if you want to use GPL or LGPL.
If you choose GPL, you need to use GPL license
If you choose LGPL, you can choose another license
Exercise:#
You would like to implement a new feature in dolfinx and use it in your code: FEniCS/dolfinx
What license do you need to use?
GPL or LGPL
Are you allowed to copy code from a repo with MIT license into your own repo?
MIT License
Copyright (c) 2023 Henrik Finsberg
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Yes, but you need to copy the license in to your repo.
Example: ComputationalPhysiology/mps
Data repositories and data sharing#
Large datasets (more than 50MB) should not be stored in your git repository
Git does not work well with binary files
Instead you should store large files in a data repository
Use Google Drive / Dropbox / Other while developing
Publish Data on Zenodo when ready (Zenodo and GitHub integrates well)