GitHub is well-known as a platform where software developers host their code and collaborate with their teams on a project. In this blog post, we'll show you how you can use the GitHub model to do the same thing and collaborate seamlessly on your research papers.
This blog post is co-authored with Ornella Altunyan, because we believe that GitHub is a great technology and tool to be used beyond pure software development.
The first thing you’ll want to do is set up Git. Git is the version control system that runs behind the scenes of any GitHub project—it’s what allows you to collaborate with others, go back to previous versions of your project, and view changes made by different members of your team. You may want to use Git from a command-line, but in the beginning, it might be easier to use the GitHub Desktop client.
Projects on GitHub are organized in repositories. You’ll create a new repository for your research paper, and choose who you want to have access. All your files, whether you’re using Markdown, LaTeX, or another typesetting or markup language (more on that later!) will live in this repository. You’ll want to clone the repository to your local machine, so that you have a copy of your files on your local machine.
The source of truth for your paper will live on the main branch of your repository – this branch is initialized when you create your repository. You can create multiple branches for different sections of your paper, and edit and merge them into your main branch when you’re finished. A commit is a snapshot of your repository at a given moment, and it might contain a set of changes that you’ve made to the information on a specific branch.
This is just a short introduction to all the features you can take advantage of when you use GitHub to collaborate on your research papers. Keep reading for more information, and a sample workflow that you can use to get started.
It is important to understand that GitHub is not a replacement for file storage, or a convenient storage for binary files. It was originally designed to be used as a source code repository, and thus it allows you to track changes between text documents. If you are planning on collaborating on Word documents, setting up a shared OneDrive location is a much better choice. For this reason, many people don’t consider GitHub to be a convenient collaboration platform for editing documents. However, scientists often write their papers in text format, most often – TeX or LaTeX. This makes it very convenient to use GitHub as a collaboration platform. It is one of the reasons we believe that GitHub is a very beneficial collaboration platform for scientists.
Using Git will give you many advantages:
Tracking changes between different editions of a document. Text documents can be easily compared to each other using the GitHub interface. This is useful even when you are working on a paper alone, because all changes are tracked, and you can always roll back to any previous state.
Working on different branches of the document and merging branches together. There are a few different styles of using Git for collaboration, so-called Git workflows. With branches, you and your collaborators can all work on specific parts of your project without conflicts, for prolonged periods of time.
Accepting contributions to your paper/code from outside. Github has a convenient mechanism of pull requests – suggestions from other users, that you can then approve and merge into the main content. For example, the Web Development for Beginners course was developed and hosted on GitHub originally by a group of around 10 people, and now it has more than 50 contributors, including people who are translating the course into different languages.
If you are very advanced (or have some friends who are into DevOps), you can setup GitHub Actions to automatically create a new PDF version of your paper every time changes are made to the repository.
Most scientists write their papers in LaTeX, mostly because it provides easy access to a lot of workflows in academia, like paper templates. There are also some good collaboration platforms specific to TeX, for example, Overleaf. However, it won't give you full control of your versioning and collaboration features like Git.
Writing in LaTeX also requires quite a bit of overhead, meaning that many layout features are quite verbose, for example:
\subsection{Section 1}
\begin{itemize}
\item Item 1
\item Item 2
\end{itemize}
In the world of software development, there is a language for writing formatted text documents – Markdown. Markdown looks just like a plain text document. For example, the text above would be formatted like this:
## Section 1
* Item 1
* Item 2
This document is much easier to read as plain text, but it is also formatted into a nice looking document by Markdown processors. There are also ways to include TeX formulae into markdown using specific syntax.
In fact, I've been writing all of my blog posts and most text content in Markdown for a few years, including posts with formulae. For scientific writing, the great Markdown processor (as well as live editing environment) integrated with TeX is madoko – I highly recommend you check it out. You can use it from the web interface (which has GitHub integration), and there's also an open-source command-line tool to convert your markdown writing into either LaTeX, or directly to PDF.
While you may continue using LaTeX with Git, I encourage you to look into markdown-based writing options. By the way, if you have some writing in different formats, such as Microsoft Word documents, it can be converted to Markdown using a tool called pandoc.
Main thing that git does is to allow you to structure your writing (whether it is code or scientific paper) into chunks called commits. Your code is tracked in the local repository, and once you have done some changes – you need to specifically commit them. Then, you can also synchronize your commits with others by using some remote common repository, called upstream.
Sounds complicated? When using GitHub Desktop most of the tasks are completely automated for you. Below we will describe the simplest way you can collaborate on a paper with your colleagues.
Create a new repository on GitHub. I set the visibility to Private so I can decide which collaborators I’d like to invite to contribute later.
Select Set up in Desktop to quickly set up your repository in GitHub Desktop.
Next, you’ll need to create a local clone of the repository on your machine. You may be prompted to reauthenticate to GitHub during this step.
|
|
Now, when I go back to GitHub Desktop, I can see those files have been added to my repository. I want to commit those files to the main branch. I can also publish my branch to push those changes to GitHub, and make them accessible to others who I’ll collaborate with.
|
|
I can open my repository in GitHub to check out all of the files and information. This is the link I’d send to a colleague if I wanted them to be able to clone the code onto their local machine, and help me out with some sections.
Since I’ve made my repository private, I’ll need to add collaborators in the Settings pane.
This is one example of a Git workflow you can use in conjunction with GitHub Desktop to collaborate on a research paper with your colleagues. There are several other ways that may serve your needs better—you may want to use the command line with VS Code, or edit your files on GitHub in the browser. Whatever method works for you is the best method, as long as you’re able to accomplish your goals.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.