class: center, middle, inverse, title-slide # Tools for reproducible science ### Marie-Pierre Etienne ###
https://github.com/MarieEtienne/reproductibilite
### Septembre 2023 --- name: motiv # Motivations --- template: motiv ## Reproducible science In their [paper](../resources/bes2.1801.pdf) Alston and Rick [AR21] addresses the question of science reproducibility in ecology The [key aspects](../resources/bes2.1801_crop.pdf) It's a growing trend, more and more journal requires some elements of reproducibility -- ## Practically Where is the correct version of the report? .centering[ <img src="../resources/figs/rapports.png" width="450" height = "300" alt = "versionning" class="center"></img> ] --- template: motiv ## Reproducible science In their [paper](../resources/bes2.1801.pdf) Alston and Rick [AR21] addresses the question of science reproducibility in ecology The [key aspects](../resources/bes2.1801_crop.pdf) It's a growing trend, more and more journal requires some elements of reproducibility -- ## Practically A explanation in images [vidéo](https://youtu.be/s3JldKoA0zw) de [Ignasi Bartomeus](https://bartomeuslab.com/) -- ## Course objective Introducing tools to facilitate step 2. --- # Overview .pull-left[ ## Code / paper writing - .care[ [git](https://git-scm.com/)] for - keeping tracks of the different versions of the code - collective work - review and improve the quality of the code ## Around the code - .care[ [Rmarkdown](https://rmarkdown.rstudio.com/)], .care[ [Jupyter notebooks](https://jupyter.org/)], .care[ [quarto documents](https://quarto.org/)] to bring together code, results of the code, analysis of the code * vignettes in R packages, * R help pages, * some scientific papers, ([Computo](https://computo.sfds.asso.fr/) asks for submissions throuhg Jupyter Notebook or Rmarkdown document.) * dashboards .... ] -- .pull-right[ ## Control version software Reproducibility also relies on stabilizing the programm / libraries versions. The container system .care[ [Docker](https://www.docker.com/) ] allows to freeze a working software environment. ] --- # The winning trio of reproducible research with R - .care[git] - .care[Rmarkdown] - .care[Docker] --- name: gitintro # Introduction to git --- template: gitintro ## What is it ? * Developped by Linus Torvalds: *I'm an egotistical bastard, and I name all my projects after myself. First Linux, now [git](https://www.wordreference.com/enfr/git).* -- * Git is a version control system (the most widely used today). It allows tracking all changes made to code since its creation. * Git is incredibly efficient. --- template: gitintro ## Graphical representation of code development <img src="../resources/figs/git1.svg" width="900" height = "300" alt = "simple branche"></img> Cf example git_example git example ```r git log ``` --- # Configuration The purpose of Git is to keep tracks of different versions of a project and their authors. To do so, you need to be authenticated. Before using, specify your name ## The first use after installation ```r git config --global user.email "your email" git config --global user.name "your name"` git config -l # check ``` --- # Starting from the existing: cloning a remote repository * Clone the repository that will serve as our example `git@github.com:MarieEtienne/MODE_reproductible_example2023.git` in a Git console, type the following command: ```r cd # to navigate to your main directory ls # command to list the contents of a directory git clone repositoryname # command to actually clone the remote repository ls # we're checking what's now in the repository ``` -- * Type the command git log. * Or for a more visual representation, gitk. --- name: contribute # Your first commit * Open the file `penguins.Rmd` and add your name and the list of authors, then save your file. -- * Type the command git log in the console. -- * Enter the following commands: ```r git status # project status git diff ``` -- *Type the command `git add penguins.Rmd`, then repeat the previous commands. * Type the command `git commit -m "Added an author"`, and once again check the project status. -- * Draw the commit tree. -- <img src="../resources/figs/git2.svg" width="800" height = "250" alt = "simple branche"></img> --- template: contirbute ## Process overview <img src="../resources/figs/Vision_globale_git_step1.png" width="800" height = "300" alt = "bilan1"></img> -- ### The snapshot analogy - In the working directory, let's try different poses (the version of the project in the working directory) - When satisfied by the pose of one guy, ask him stop moving (Staging area) - When satisfied with the whole scene, take the photo (Commit in the local repo) --- template: contribute ## Share your progress with your co workers The current version of the project is indeed recorded in the version control system, but you are the only ones who know it. You may want to sharethese changes to the remote repository from which we started. * To view the differences with the remote repository, use `git diff origin/master`. * To send our changes, , use `git push`. * Is the content of the local repository now pushed to the server? * Are the local repository and the remote copy identical? -- `git diff master origin/master` -- * Modify the `penguins.Rmd` file. What will the `git diff master origin/master` command show, and why? * What will the `git diff origin/master` command show? --- template: contribute ## The graphical tree summary .pull-left[ * Before the push <img src="../resources/figs/git3.svg" width="600" height = "200" alt = "bilan1"></img> ] -- .pull-right[ * After the push <img src="../resources/figs/git3_bis.svg" width="600" height = "200x" alt = "bilan1"></img> ] --- template: contribute .pull-left[ ## The graphical tree summary <img src="../resources/figs/Vision_globale_git_step2.png" width="800" height = "300" alt = "bilan1"></img> ] -- .pull-right[ ### The snapshot analogy - In the working directory, let's try different poses (the version of the project in the working directory) - When satisfied by the pose of one guy, ask him stop moving (Staging area) - When satisfied with the whole scene, take the photo (Commit in the local repo) - Push the photo on Insta to share it (remote repo -- push) ] --- template: contribute ## Share your progress with your co workers The remote repo might have changed while we were working on our local version. Before bein allowed to push your modification, you have to accounts for the change made by others ```r git pull ``` -- If no one has modified the same portion of the file as you, Git will smoothly incorporate the changes made by others into your code. --- template: contribute ## Working together: graphical tree for a merge Before merging <img src="../resources/figs/git4.svg" width="680" height = "500" alt = "bilan1"></img> --- template: contribute ## Working together: graphical tree for a merge After merging <img src="../resources/figs/git5.svg" width="680" height = "500" alt = "bilan1"></img> --- template: contribute ## Working together: graphical tree for a merge After the push <img src="../resources/figs/git6.svg" width="680" height = "500" alt = "bilan1"></img> --- template: contribute ## Working together: handling conflicts If the same portion of the file has been modified on the server, Git panics; this is the -- .care[CONFLICT] ```r CONFLICT (content): Merge conflict in penguins.Rmd Automatic merge failed; fix conflicts and then commit the result. ``` -- You must resolve the conflict ```r <<<<<<< HEAD L'objectif de cette classe est de proposer des représentations graphiques de ces données en collaborant à l'aide de l'outil git. Il faut bien une raison d'utiliser git. ======= L'objectif de cette classe est de proposer des représentations graphiques de ces données en collaborant à l'aide de l'outil git. C'est un prétexte pour utiliser git. >>>>>>> conflit ``` * We edit the file to choose the correct version, then `git add, commit, push` ** Please take some time to produce conflicts and solve it ** --- template: contribute ## Working together: Graphical tree overview <img src="../resources/figs/Vision_globale_git_step3.png" width="800" height = "300" alt = "bilan1"></img> --- template: contribute ## Working together: less share, less pain To limit conflicts, conduct tests without interfering with others, there is the concept of *branching*. You work independantly for a while and merge from time to time. -- <img src="../resources/figs/git7.svg" width="800" height = "600" alt = "bilan1"></img> --- template: contribute ## Working together: branches practically speaking * Create your branch ```r git branch -a git branch test git branch -a ``` * To jump on a branch ```r git switch test git branch -a ``` Let's make whatever changes you like and then commit. when you return to the master branch, you will find your project just as you left it before branching. ** Take a moment to try** --- template: contribute ## Working together: branches practically speaking * Let's create your own branch and jump on it `git branch` (`git branch -a` to have the remotes branches also). The stared branch is the one we currently sit on. * Please remove `penguins.Rmd ` then register this change ```r rm penguins.Rmd # to remove the file git status git add penguins.Rmd # add the changes aka it's deletion -- ok it's weird git commit -m "Removing penguins.Rmd" ls #list the content of the working directory ``` * Jump back on the master branch `git switch master` and list the contents of your working directory. -- You may want to suppress an old unuseful branch with ```r git branch -d test ``` -- As the changes stored in this branch have not been pushed on the remote repo, you will lose every changes if you delete the branch, The vigilant Git system is worried about us. Are we not on the verge of losing important work? We reassure it by letting it know that we know exactly what we are doing by using the `-D` option ```r git branch -D test ``` --- template: contribute ## Working together: incorporate the changes made in another branch -- merge * Create a branch dev and switch to this branch. * Add a nice chart to penguins.Rmd and then incorporate your changes into the project on the dev branch. * Push your work to the remote repository (look at Git's instructions on git push options) (optional for now but generally useful). * Switch to the master branch and type the command git merge dev. You have integrated the changes made in dev into the master branch. * Draw the commit tree in this case. --- name: navigate # Navigate from one version to another --- template: navigate ## Go back to a previous version -- checkout To go back to a previous specific version, use the `checkout` command Open the files `penguins.Rmd` and execute ```r git log --oneline git checkout A_specific_hash_version_number ``` The current working version of `penguins.Rmd` is in the state of commit `A_specific_hash_version_number` Go back to the lates version on master by -- ```r git checkout master ``` Everything is back to normal! ** That's the Git Magic" --- template: navigate ## To suppress a local commit (not present on the remote repo) * Remove the file `penguins.Rmd`, and commit this change ```r git log --oneline git reset --hard The_previous_commit_number git log --oneline ``` -- Be very very very careful, it is a destructive process. All information regarding your previous commit is lost. It's a very dangerous way of living with Git, an unfortunate removal of a commit present on the remote server can break the commit trees for all your coworkers and you won't have friends anymore. --- template: navigate ## How to suppress a commit shared on the remote repo * Remove the file `penguins.Rmd` and commit this modification. You can "suppress" this commit by applying the opposit! That's what you do by ```r git log --oneline git revert commit_number_to_revert git log --oneline ``` --- template: gitintro ## Practice within your team, divide and conquer to produce * a short text to present the Palmer Penguins dataset, * insigthful plots of the Palmer penguins dataset, * a statistical analysis to deceide wheter or not the average body mass in the same whatever species you consider, Good Practice : Every one works on its own develoment branch and shae with others when he's ready --- # If we have time `git rebase` `git cherry pick` --- # Last few words * Git is a standard control version system in industry AND research for version control (wheter you use it with Github or not), * Being proficient with Git requires to use it as much as possible: The more problems you have, the more you learn * Go back to the documentation as much as needed * By no means, Git is an alternative to a proper documented code (in R you might want to use package like `formatR`, `styleR` and document it with `roxygen`) --- # Last few words ## Resources * [The Git Book](https://git-scm.com/book/en/v2) * [Atlassian tutorial](https://www.atlassian.com/git/tutorials/what-is-version-control) ## Cheat sheet [Git Cheat sheet](https://education.github.com/git-cheat-sheet-education.pdf) ## Tobe used as a Sandbox [Learning Git Demo](https://learngitbranching.js.org/?locale=fr_FR&NODEMO=) ## Références Alston, J. M. and J. A. Rick (2021). "A Beginner’s Guide to Conducting Reproducible Research". In: _Bulletin of the Ecological Society of America_ 102.2, pp. 1-14.