Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction to Git

What is Git?

Git is a ‘free and open source distributed version control system’.

PhD comics

Many of us will have had an experience similar to this. We are working on a dataset and we attempt to maintain multiple versions of this data, all at different stages of completeness, many of which may have comments, suggestions or other forms of input from other people.

Version control addresses this problem by recording the changes made to a file as we add to it, edit it, and improve it. Each ‘commit’ we make (‘commit’ is version control terminology for ‘save’) is recorded. We can go back to earlier versions of that file, or of other files we have committed, and look at the changes have been made.

Distribution of Management

True version control allow us to take this one step further. It can not only record changes to a file one person is working on but lets multiple people work on that same file and record the changes they make to it. It is then possible to merge these multiple files (‘versions’) into one.

This method came about to help developers work collaboratively on coding projects. With version control, groups did not have to wait for someone else to finish working, nor did they have compare changes manually (and laboriously).

When we make ‘commits’, we record a range of metadata about that change. As librarians, we might already know creating metadata is useful but an example of what information is recorded will illustrate why the information recorded by version control systems in particular is useful. ‘Commits’ record the time and date a commit was made. Although we can often see information about when we last edited or saved say, a Word document, the document itself reflects only the most recent version - all earlier versions are lost, unless we have saved them under different names.

When we make commits in a version control system, we can record a message explaining, in as much detail as we like, what changes we have made to a document or file. This makes it especially useful for collaborating. Rather than sending an email with a document with track changes and some comments, we can include all that information with the document itself. This makes it easy to get an overview of changes that have been made to a document by looking at a log of all the changes that have been made over time. And all earlier versions of the documents still remain in their original form, should we ever wish to ‘roll back’ to them.

Git vs. GitHub

We often hear the terms Git and GitHub used interchangeably but they are slightly different things. Git refers to the software and principles used for a particular flavour of version control system, also called VCS in short (there are other systems such as Mercurial and SVN). GitHub is a popular Web site which hosts Git repositories. The majority of the content that GitHub hosts is open source software, though increasingly it is being used for other projects such as open access journals, blogs, and constantly updated text books. GitHub is a great place to learn how to use Git but once you have learned the ideas and processes behind GitHub you can use Git on other storage systems. You can even host repositories on your own server or computer if you want to keep your files private or if you wanted to encrypt your repository. You can get private repositories on GitHub for a fee. Gitlab is an open source Git repository management and hosting software. You can host it on your own server and configure with unlimited private repositories.

Using Git

One of the main barriers to getting started with Git is the language. Although some of the language used in Git is fairly self-explanatory, other terms are not so clear. The best way to get to learn Git language - which consists of a number of verbs, or commands, e.g. add, commit, preceded by the word Git - is by using it but having an overview of the language and the way Git is used will provide a good starting point.

Exercise for our first git repository

We will try to do this session as a group, but those who prefer to go at a slower pace can follow the instructions on the github page.

Some Git commands/language - what they mean and how to use them

Repository: A repository is the place where projects and associated changes are stored. Repositories can contain one single readme file or hundreds of different folders making up the source code for extensive projects. We can create repositories in a number of different ways; we can make our own from scratch, we can fork (copy) an existing repository or we can create a Git repository from an existing folder we have been working on.

init: Create a repository

Whenever we use Git on the command line, we need to preface our command with git. This is so the computer knows we are trying to get Git to do something rather than another program.

To try out some of the Git commands, we can make a repository for today’s session. We can either delete this directory later or keep it as a place to test out new Git commands.

 


Code Block
languagebash
$ mkdir git_test 
$ cd git_test 
$ git init


 

This initiates git_test as a git repository.

If you do an ls now, the repository might seem empty. However, an ls -a will show the hidden files, which includes the new file .git.

This signifies that the directory is now a Git repository. Were the .git file ever to be deleted after you have begun committing files, all versioning of the data would be lost. We now need to configure Git locally - this need only be done once, i.e. the first time we are setting up Git. However, each of the settings can be changed at any time if we decide we want to use a different email address, say, or switch to a different text editor.

 


Code Block
languagebash
$ git config --global user.name "Mary Citizen" 
$ git config --global user.email "mary_citizen@gitmail.com" 
$ git config --global color.ui "auto" 
$ git config --global core.editor "nano -w"


 

Now we have set the directory up as a repository, and configured it locally, we can see how we are going.

Git status

we can use git status at any time to let us know what Git is up to.

If we try it now, we should get something like this

 


Code Block
languagebash
On branch master
Initial commit
nothing to commit (create/copy files and use "git add" to track)


 

This is telling us that we are on the master branch (more on this later) and that we have nothing to commit (nothing to save changes from). If we use list directory we can see currently we don’t have any files to track. Let’s change that by adding a txt file. Touch allows us to create an empty file.

 


Code Block
languagebash
$ ls 
$ touch git_test.txt


 

We now have a txt file. If we try git status again, we will get the following

 

Code Block
languagebash
On branch master
Initial commit
Untracked files:
  (use "git add <file>..." to include in what will be committed)

    git_test.txt

nothing added to commit but untracked files present (use "git add" to track)

This status is Git telling us that it has noticed a new file in our directory that we are not yet tracking. With colorised output, the filename will appear in red.

To change this, and to tell Git we want to track any changes we make to git_test.txt, we use git add

Code Block
languagebash
$ git add git_test.txt

This adds our txt file to the staging area (the area where Git checks for file changes). Because we are not alerted that this has happened, we might want to use git status again.

 

Code Block
languagebash
$ git status
On branch master
Initial commit
Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   git_test.txt

 

If we opted for colourised output, we can see that the file has changed colour (from red to green) and Git also tells us that it has got a new file.

Let’s make some changes to this file before we commit it:

Code Block
languagebash
$ nano git_test.txt

Open the file in nano or another text editor.

We should now be able to add some text to our text file. For now let’s just write ‘hello world’. If we try git status again. We should get the following message

Code Block
languagebash
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   git_test.txt

This lets us know that Git has spotted changes to our txt file but that it hasn’t yet ‘staged’ them. This means Git won’t currently record the changes we made. We can add the file to the staging area again

 

Code Block
languagebash
$ git add git_test.txt

 

We can now commit our first changes. Commit is similar to ‘saving’ a file to Git. However compared to saving, a lot more information about the changes we made is recorded and visible to us later.

Code Block
languagebash
$ git commit -m 'hello world'

[master (root-commit) 27c3f19] hello world
 1 file changed, 1 insertion(+)
 create mode 100644 git_test.txt

We can see that one file changed and we made one insertion which was our ‘hello world’. We have now recorded our changes and we can later go back and see when we made changes to our file and decided to add ‘hello world’. We do have a problem now though. At the moment our changes are only recorded on our computer. At the moment, if we wanted to work with someone else, they would have no way of seeing what we’ve done. Let’s fix that. Let’s jump to the GitHub website where we could decide to host some of our work. Hosting here will allow us to share our work with our friends and colleagues but will also allow other people to use or build on our work.

 

Sharing your work

  When we have logged in to GitHub, we will see an option to create a new repository. Let’s make one for the GitHub experiments we are going to do today. 

  • new repository

  • give it a name

 

GitHub will ask you to create README.md, add a license and a .gitignore file. Do not do any of that for now. Once we have created the new repository, we still need to link the repository we have on our computer with the one we’ve just set up on GitHub.

 

$ git remote add origin <web address of your repo.git>

 

This will add a repository on GitHub for our changes to be committed to.

 

Check that it is set up correctly with the command:

 

$ git remote -v

 

Then enter:

 

$ git push -u origin master

 

git push will add the changes made in our local repository to the repository on GitHub. The nickname of our remote is origin and the default local branch name is master. The -u flag tells Git to remember the parameters, so that next time we can simply run git push and Git will know what to do. Go ahead and push it!

 

You will be prompted to enter your GitHub username and password to complete the command.

 

When we do a git push, we will see Git ‘pushing’ changes upstream to GitHub. Because our file is very small, this won’t take long but if we had made a lot of changes or were adding a very large repository, we might have to wait a little longer. We can get to see where we’re at with git status.

 

On branch master
Your branch is up-to-date with 'origin/master'.

nothing to commit, working directory clean

 

This is letting us know where we are working (the master branch). We can also see that we have no changes to commit and everything is in order.

 

We can use git diff to see changes we have made before making a commit. Let’s add another line to our text file.

 

$ open git_test.txt
$ git diff

 

diff --git a/git_test.txt b/git_test.txt index 70c379b..1d55e1a 100644 --- a/git_test.txt +++ b/git_test.txt @@ -1 +1,2 @@ -Hello world \ No newline at end of file
+Hello world +More changes to my first github file \ No newline at end of file

 

We can see the changes we have made.

 

  1. The first line tells us that Git is producing output similar to the Unix diff command comparing the old and new versions of the file.
  2. The second line tells exactly which versions of the file Git is comparing; df0654a and 315bf3a are unique computer-generated identifiers for those versions.
  3. The third and fourth lines once again show the name of the file being changed.
  4. The remaining lines are the most interesting; they show us the actual differences and the lines on which they occur. In particular, the + markers in the first column show where we have added lines.

 

We can now commit these changes again:

 

$ git add git_test.txt
$ git commit -m 'second line of changes'

 

Say we are very forgetful and have already forgotten what we changes we have made. git log allows us to look at what we have been doing to our Git repository (in reverse chronological order, with the very latest changes first).

 

$ git status

 

commit 40d3f7af25e19c06fa839a570d51e38fdb374e80
Author: davanstrien <email@gmail.com>
Date:   Sun Oct 18 15:25:19 2015 +0100

    second line of changes

commit 27c3f19c3340d930234d592928b11b63403d0696
Author: davanstrien <email@gmail.com>
Date:   Sun Oct 18 13:27:31 2015 +0100

    hello world

 

This shows us the two commits we have made and shows the messages we wrote. It is important to try to use meaningful commit messages when we make changes. This is especially important when we are working with other people who might not be able to guess as easily what our short cryptic messages might mean.

 

We might get a lit bit lonely working away on our own and want to work with other people. Before we get to that, it is worth learning one more command.

 

git pull

 

We can try to see how this works by making changes on the GitHub website and then ‘pulling’ them back to our computer.

 

Let’s go to our repository. We can see our txt file and make changes. However, you may have noticed only our first change is there. This is because we didn’t push our last commit yet. This might seem like a mistake in design but it is often useful to make a lot of commits for small changes so you are able to make careful revisions later and you don’t necessarily want to push all these changes one by one.

 

Let’s go back and push our changes

 

$ git push

 

Now if we go back to our GitHub repo, we can see all our changes. Let’s add an extra line of text there, and commit these changes. When we commit changes on GitHub itself, we don’t have to push these.

 

Now let’s get the third line on to our computer.

 

$ git pull

 

remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/davanstrien/git_lesson_example
   40d3f7a..ef95e51  master     -> origin/master
Updating 40d3f7a..ef95e51
Fast-forward
 git_test.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

 

If we open our text file again

 

$ open git_test.txt

 

we can see our new lines.

 

When we begin collaborating on more complex projects, we may have to consider more aspects of Git functionality, but this should be a good start. In the second hour we can look more closely at collaborating and using GitHub pages.

 Review

 

Let’s review

 

It is likely that some things won’t have stuck from the last hour. To try to reinforce how things work we can work in groups to develop diagrams to illustrate Git functions and language. This should make carrying out more complicated aspects of Git clearer in our heads.

 

In groups:

 

  • illustrate the concepts discussed in the first hour
  • try to ‘draw’ what different commands mean
  • try to come up with synonyms for what the commands are doing.

 

Exercise - visualising git

 

In group work, spend some time trying to illustrate some of the commands we’ve used with Git:

 

  • try to express git commands in a non ‘git’ way
  • try to visualise what commits are doing to your repository

 

If you want to practise more feel free to keep practicising making changes to your file and committing the changes. If you want to explore more git commands, search for some more online or follow one of the suggested links below.

Group Exercise