Blog Post

Educator Developer Blog
7 MIN READ

Collaborate on research papers with GitHub

shwars's avatar
shwars
Icon for Microsoft rankMicrosoft
Apr 27, 2021

GitHub is well-known as a platform where software developers host their code and collaborate with their teams on a project. In this blog post, we'll show you how you can use the GitHub model to do the same thing and collaborate seamlessly on your research papers.

 

This blog post is co-authored with Ornella Altunyan, because we believe that GitHub is a great technology and tool to be used beyond pure software development.

Git, GitHub, and how it all works

 

The first thing you’ll want to do is set up Git. Git is the version control system that runs behind the scenes of any GitHub project—it’s what allows you to collaborate with others, go back to previous versions of your project, and view changes made by different members of your team. You may want to use Git from a command-line, but in the beginning, it might be easier to use the GitHub Desktop client.

 

Projects on GitHub are organized in repositories. You’ll create a new repository for your research paper, and choose who you want to have access. All your files, whether you’re using Markdown, LaTeX, or another typesetting or markup language (more on that later!) will live in this repository. You’ll want to clone the repository to your local machine, so that you have a copy of your files on your local machine.

 

The source of truth for your paper will live on the main branch of your repository – this branch is initialized when you create your repository. You can create multiple branches for different sections of your paper, and edit and merge them into your main branch when you’re finished. A commit is a snapshot of your repository at a given moment, and it might contain a set of changes that you’ve made to the information on a specific branch.

 

This is just a short introduction to all the features you can take advantage of when you use GitHub to collaborate on your research papers. Keep reading for more information, and a sample workflow that you can use to get started.

 

What should and should not be stored in Git

 

It is important to understand that GitHub is not a replacement for file storage, or a convenient storage for binary files. It was originally designed to be used as a source code repository, and thus it allows you to track changes between text documents. If you are planning on collaborating on Word documents, setting up a shared OneDrive location is a much better choice. For this reason, many people don’t consider GitHub to be a convenient collaboration platform for editing documents. However, scientists often write their papers in text format, most often – TeX or LaTeX. This makes it very convenient to use GitHub as a collaboration platform. It is one of the reasons we believe that GitHub is a very beneficial collaboration platform for scientists.

 

Why GitHub?

 

Using Git will give you many advantages:

  • Tracking changes between different editions of a document. Text documents can be easily compared to each other using the GitHub interface. This is useful even when you are working on a paper alone, because all changes are tracked, and you can always roll back to any previous state.

  • Working on different branches of the document and merging branches together. There are a few different styles of using Git for collaboration, so-called Git workflows. With branches, you and your collaborators can all work on specific parts of your project without conflicts, for prolonged periods of time.

  • Accepting contributions to your paper/code from outside. Github has a convenient mechanism of pull requests – suggestions from other users, that you can then approve and merge into the main content. For example, the Web Development for Beginners course was developed and hosted on GitHub originally by a group of around 10 people, and now it has more than 50 contributors, including people who are translating the course into different languages.

  • If you are very advanced (or have some friends who are into DevOps), you can setup GitHub Actions to automatically create a new PDF version of your paper every time changes are made to the repository.

 

LaTeX or Markdown?

 

Most scientists write their papers in LaTeX, mostly because it provides easy access to a lot of workflows in academia, like paper templates. There are also some good collaboration platforms specific to TeX, for example, Overleaf. However, it won't give you full control of your versioning and collaboration features like Git. 

 

Writing in LaTeX also requires quite a bit of overhead, meaning that many layout features are quite verbose, for example:

\subsection{Section 1}
\begin{itemize}
  \item Item 1
  \item Item 2
\end{itemize}

In the world of software development, there is a language for writing formatted text documents – Markdown. Markdown looks just like a plain text document. For example, the text above would be formatted like this:

## Section 1

* Item 1
* Item 2

This document is much easier to read as plain text, but it is also formatted into a nice looking document by Markdown processors. There are also ways to include TeX formulae into markdown using specific syntax.

 

In fact, I've been writing all of my blog posts and most text content in Markdown for a few years, including posts with formulae. For scientific writing, the great Markdown processor (as well as live editing environment) integrated with TeX is madoko – I highly recommend you check it out. You can use it from the web interface (which has GitHub integration), and there's also an open-source command-line tool to convert your markdown writing into either LaTeX, or directly to PDF.

 

While you may continue using LaTeX with Git, I encourage you to look into markdown-based writing options. By the way, if you have some writing in different formats, such as Microsoft Word documents, it can be converted to Markdown using a tool called pandoc.

 

Sample workflow

 

Main thing that git does is to allow you to structure your writing (whether it is code or scientific paper) into chunks called commits. Your code is tracked in the local repository, and once you have done some changes – you need to specifically commit them. Then, you can also synchronize your commits with others by using some remote common repository, called upstream.

Sounds complicated? When using GitHub Desktop most of the tasks are completely automated for you. Below we will describe the simplest way you can collaborate on a paper with your colleagues.

  1. Create a new repository on GitHub. I set the visibility to Private so I can decide which collaborators I’d like to invite to contribute later.
     

     

  2. Select Set up in Desktop to quickly set up your repository in GitHub Desktop. 

  3. Next, you’ll need to create a local clone of the repository on your machine. You may be prompted to reauthenticate to GitHub during this step. 



  4. I already have a couple of Markdown files that I’ve started working on saved to my computer. I can select View the files of your repository in Finder to open the folder where my local copy of the repository is stored, and drag in the files for my Table of Contents, Section 1, and Bibliography from my computer.

     

     

  5. Now, when I go back to GitHub Desktop, I can see those files have been added to my repository. I want to commit those files to the main branch. I can also publish my branch to push those changes to GitHub, and make them accessible to others who I’ll collaborate with. 



  6. Next, I’m going to create a new branch so I can go off and work on Section 2 of my paper. I’ll automatically end up on that branch after it has been created. There are a couple of options you’ll be able to select from for making changes to your file in this branch:
    • You can create a Pull Request from your current branch – if I wanted my colleague to be able to review the changes I’ve made in this branch, I’d use this option and send her the PR for review.
    • You can also open the repository in your external editor. I use VS Code to edit my files, so I can add section 2 of my paper there, and then commit it to my section2 branch.
    • If I already have section 2 of my paper saved somewhere on my computer, or if my colleague has sent me something they’ve worked on, I can follow the same workflow as above and check out the files in my repository on my machine, and add/remove files that way.
    • If I just need to make a small change, I’d open my repository in the browser and edit from there.

       

       

  7. I can open my repository in GitHub to check out all of the files and information. This is the link I’d send to a colleague if I wanted them to be able to clone the code onto their local machine, and help me out with some sections. 


    Since I’ve made my repository private, I’ll need to add collaborators in the Settings pane. 

     

  8. Once I’m happy with Section 2 of my paper, I can go ahead and merge it into the main branch of my repository. I switch over to the main branch, then choose a branch to merge into main, and choose section2. Then, I’ll want to push my changes back up to GitHub so that the main branch is updated with the newest changes for any future collaborators. 


This is one example of a Git workflow you can use in conjunction with GitHub Desktop to collaborate on a research paper with your colleagues. There are several other ways that may serve your needs better—you may want to use the command line with VS Code, or edit your files on GitHub in the browser. Whatever method works for you is the best method, as long as you’re able to accomplish your goals.

Updated Apr 27, 2021
Version 5.0
  • Reading this article took me back to the mid 80's when I used vi/nroff/troff to produce beautiful documents.  Funny how things have gone full circle, if this the end of WYSIWYG and the start of a move back to character terminals?? 😉

  • tweenturbo's avatar
    tweenturbo
    Copper Contributor

    How to Communciate with the API
    Determine the Request Parameters
    You interact with the Closure Compiler service by making HTTP POST requests to the Closure Compiler server. With every request you must send at least the following parameters:

    js_code or code_url
    The value of this parameter indicates the JavaScript that you want to compile. You must include at least one of these parameters, and you can include both. The js_code parameter must be a string that contains JavaScript, such as alert('hello'). The code_url parameter must contain the URL of a JavaScript .js file that's available via HTTP.

    You can also include named source parameters in the form js_code:path/to/filename.js. Each file will be created in a virtual filesystem, enabling standardized modules via the import and export statements supported in ECMASCRIPT6.

    compilation_level
    The value of this parameter indicates the degree of compression and optimization to apply to your JavaScript. There are three possible compilation levels: WHITESPACE_ONLY, SIMPLE_OPTIMIZATIONS, and ADVANCED_OPTIMIZATIONS. This example use WHITESPACE_ONLY compilation, which just strips comments and whitespace.

    The compilation_level parameter defaults to a value of SIMPLE_OPTIMIZATIONS.

    output_info
    The value of this parameter indicates the kind of information that you want from the compiler. There are four possible kinds of output: compiled_code, warnings, errors, and statistics. This example uses the value compiled_code, which tells the Closure Compiler service to output the compressed version of the JavaScript it receives in the request.

    output_format
    The format for the Closure Compiler service's output. There are three possible output formats: text, json, or xml. This example uses the value text, which outputs raw text.

    The output_format parameter defaults to a value of text.

    For more information about these required parameters and additional optional parameters, see the API Reference.

    The example in this introductory tutorial just sends one line of raw JavaScript to the Closure Compiler service, so it uses js_code instead of code_url. It uses a compilation_level of WHITESPACE_ONLY, asks for raw text output with an output_format of text, and asks for an output_info type of compiled_code.

    Make a Post Request to the Closure Compiler Service
    To get output from the Closure Compiler service, send the parameters you chose in Step 1 in a POST request to the Closure Compiler service API URL. One way to do this is with a simple HTML form like the one in the Hello World of the Closure Compiler Service API.

    To use a form like this during development, however, you would have to copy the output out of the browser and paste it into a .js file. If, instead, you write a small program to send the request to the Closure Compiler service, you can write the Closure Compiler output straight to a file. For example, the following python script sends the request to the Closure Compiler service and writes out the response:


    #!/usr/bin/python2.4

    import httplib, urllib, sys

    # Define the parameters for the POST request and encode them in
    # a URL-safe format.

    params = urllib.urlencode([
    ('js_code', sys.argv[1]),
    ('compilation_level', 'WHITESPACE_ONLY'),
    ('output_format', 'text'),
    ('output_info', 'compiled_code'),
    ])

    # Always use the following value for the Content-type header.
    headers = { "Content-type": "application/x-www-form-urlencoded" }
    conn = httplib.HTTPSConnection('closure-compiler.appspot.com')
    conn.request('POST', '/compile', params, headers)
    response = conn.getresponse()
    data = response.read()
    print data
    conn.close()
    Note: To reproduce this example, Windows users may need to install Python. See the Python Windows FAQ for instructions on installing and using Python under Windows.

    This script optimizes JavaScript passed to it as a command line argument. Paste the above code into a file called compile.py, change the permissions of the file to make it executable, and execute the following command:


    $ python compile.py 'alert("hello");// This comment should be stripped'
    This command prints out the compressed code from the Closure Compiler response:


    alert("hello");
    Because this example uses basic compilation, the compiler doesn't do anything other than strip off the comment.

    Here are a few things to note about this script:

    The parameters are passed to the request method of the HTTPSConnection as a URL-encoded string. After the call to urllib.urlencode, the params variable contains the following string:

    js_code=alert%28%22hello%22%29%3B%2F%2F+This+comment+should+be+stripped&output_info=compiled_code&out=text&compilation_level=WHITESPACE_ONLY

    If you write your own script, the script should post URL-encoded content like this.
    The request must always have a Content-type header of application/x-www-form-urlencoded