Getting Started With Git

I this article I want to talk about Git and how to get started with it. For those new to software development practices it can seem a little daunting, and definitely confusing. I hope to break it down into some simple steps that will help you get started.

Amongst the many themes that run through software development, DevOps and the other connected disciplines are those of working in teams and having a shared understanding of the codebase, ideally with a single source and a history of changes made over time. Git is one of a number of tools that offer this functionality, and is one of the most popular and widely used.

Others such as Mercurial and Subversion are also used, but Git has developed into the most popular and commonly used. In this article we’ll only be discussing Git, but the principles are broadly applicable to other tools.

Finally, through most of this article I’ll be using commands for Linux/MacOS. If you’re on Windows you can either use the, in my opinion, awesome Windows Subsystem for Linux (WSL) or using PowerShell most of the commands will work with the exception of touch which you can replace with New-Item -ItemType File -Name <file_name>.

What is Git?

Right up front I want to split Git from tools such as GitHub and GitLab. Sure, they have “git” in the name and under the hood they do use Git, but Git is a tool and GitHub and GitLab are services that use Git. You do not need to sign up for GitHub, GitLab, or any other similar service to use Git; you can use Git without using GitHub or GitLab. we’ll come back to these services later, but for now let’s focus on Git.

So, what is Git? The answer many give is “Git is a version control system”. Let’s unpack that for a moment; what is a version control system? A version control system is a tool that allows you to track changes to a set of files over time. It allows you to see who made changes, when they were made, and what those changes were. It also allows you to revert to previous versions of the files if you need to.

If you’re like me and started out writing a few simple system administration scripts you may have simply saved your files in a tool such as Google Drive, or OneDrive and relied on the built-in feature of those products to maintain a version history of your files. You might also have developed a naming convention such as my_script_v1.sh, my_script_v2.sh, my_script_v3.sh and so on. These are both examples of a simple form of version control, but they are very simplistic. The file naming approach relies on manual work and you remembering to increment the version number, deciding when a change warrants a new version etc. The cloud sync tool approach is a little better with modern tools but often they have time limits on the number of changes to the file held and also they struggle when it comes to different changes being made by different people to the same file.

So what do we want, or need from our version control?

We want to be able to track changes to files over time
It would be nice to be able to see who made changes and when
It would be nice to see exact changes made
Reverse changes if needed could be useful
Perhaps it would be handy to be able to work on the same files as others
- What about differing changes to the same file, maybe an in production fix and a new feature being developed at the same time

Hopefully you can see that a simple file sync tool is not going to cut it. We need something more powerful. This is where Git comes in. Git gives us all these features and more!

Installing Git

Installing Git is pretty simple. You can download the latest version from the Git website and install it on your machine.

You can also use a package manager to install Git. For example, on a Debian based system you can use apt to install Git:

$ sudo apt install git

All major platforms, including Windows (with winget and Chocolatey) and macOS (with Homebrew) have package managers that can be used to install Git.

Getting Started

Before getting into the weeds I want to point you to my Git Knowledge Article as this article will give practical tips and tricks for using git to achieve specific goals and I’ll be updating that article over time with new hints and techniques.

One last thing before we get started, this tutorial like so many others you’ll find across the internet is a few basic commands and examples. Don’t mistake it for a comprehensive guide to everything you can do. Pop a brave pill and experiment as you go through this tutorial. You’ll learn a lot more that way. The worst case scenario is something doesn’t work quite as intended, delete the repository and start again. You’ll be fine!

Now, let’s get started with Git! The first thing we need to do is create a repository. A repository is a collection of files that Git will track. We can create a repository in a number of ways, but the simplest is to create a new directory and then initialise a new repository in that directory.

mkdir my_repo  # Create a new directory for our repository
cd my_repo   # Change into the new directory
git init  # Initialise a new repository
Initialized empty Git repository in /home/gwatts/Temp/my_repo/.git/  # Returned output

Before running git init all we have is a pretty normal, empty directory. After running git init a hidden .git directory is created. This is where Git stores all the information about the repository. You should never need to edit anything in this directory, but this where all the magic happens and your data is stored. If for some crazy reason you decide to stop using git, you can simply delete this directory and all the data will be gone, and just the files currently in the directory will remain.

One other thing that we should do while getting started is to tell git who we are by setting our name and email address. These tasks can be done using the git config command.

git config --global user.name "Graham Watts"  # Set the user name
git config --global user.email "some_email@spoof.com"  # Set the user email

Setting these values globally means that they will be used for all repositories on the machine. If you want to set them for a specific repository you can omit the --global flag. We’ll take a look at gitconfig in more detail at another time. If you forget to set these values you’ll get a warning when you try to commit changes to the repository.

Understanding the deep mechanisms of git is a story for another time, but for now we can think of the .git directory as a database that stores all the information about the repository.

Adding Files

Now that we have a repository we can start adding files to it. Let’s create a simple file and add it to the repository.

touch my_file.txt  # Create a new file
echo "Hello World" > my_file.txt  # Add some content to the file
git add my_file.txt  # Add the file to the repository file tracking
git commit -m "Added my_file.txt"  # Commit the changes to the file to the repository

Now, let’s unpack what just happened. We created a new file, added some content to it, and then added it to the repository. We then committed the changes to the repository. Let’s look at each of these steps in turn.

We created a new file using the touch command. This created an empty file called my_file.txt.
We added some content to the file using the echo command. The > symbol tells the shell to redirect the output of the echo command to the file my_file.txt. This overwrites the contents of the file with the new content.
By using the git add command we added the file to the list of files that the repository is tracking. This means that Git will now track changes to the file. If we make changes to the file Git will be able to see those changes and track them.
Finally, we committed the changes to the repository with git commit. This means that Git has taken a snapshot of the repository at this point in time. This snapshot is called a commit. We can think of a commit as a version of the repository. We can go back to this commit at any time and revert the repository to this point in time. We can also create new commits from this point in time and continue to develop the repository.

Note: When making a commit git expects us to provide a commit message. This is a short description of the changes made in the commit. It is good practice to provide a meaningful commit message. This is especially important when working on a team as it allows other developers to understand what changes were made in the commit. You can provide a commit message using the -m flag as we did above.

Note: If we don’t pass a commit message using -m then git will prompt us to do so, typically by opening our default text editor. This can be confusing for a new engineer as they may not know what to do. It is best to get into the habit of providing a commit message using the -m flag. That said, be brave, this is our little test sandbox, so try and it out and see what happens if you forget to provide a commit message. Maybe add another new file and commit it without a message.

Updating a File

Now that we have a file in our repository let’s make some changes to it. We can do this by simply editing the file. Let’s add another line to the file.

echo "This is a new line" >> my_file.txt  # Add another line to the file
git add my_file.txt  # Add the file changes to the repository file tracking
git commit -m "Added another line to my_file.txt"  # Commit the changes to the file to the repository

We have now made some changes to the file and committed them to the repository. We can see the changes we made by using the git diff command.

git diff HEAD~1 HEAD  # Show the changes between the last two commits
# returned output
diff --git a/my_file.txt b/my_file.txt
index 557db03..d87f1d6 100644
--- a/my_file.txt
+++ b/my_file.txt
@@ -1 +1,2 @@
 Hello World
+This is a new line

Don’t get too bogged down in the git diff command itself right now, what we have done here is to compare the most recent commit (called HEAD) with the 1 commit previous (HEAD~1). The + symbol indicates a line that has been added and the - symbol would indicate a line that has been removed, if there were any in our case. We can see that we have added a new line to the file.

Checking the Status of the Repository

Often as we’re working and we’re adding, removing, and changing files we want to know what the current status of the repository is. We can do this by using the git status command.

git status  # Show the current status of the repository

Let’s give it a quick test. Add a new file to the repository and then run git status. You should see something like this:

touch my_new_file.txt  # Create a new file
echo "This is my new file" > my_new_file.txt  # Add some content to the file
git status  # Show the current status of the repository
# returned output
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        my_new_file.txt
nothing added to commit but untracked files present (use "git add" to track)

What we can see here is that we have a new file in the repository that is not being tracked by Git. We can add this file to the repository by using the git add command and then re-run git status to see the changes.

git add my_new_file.txt  # Add the file to the repository file tracking
git status  # Show the current status of the repository
# returned output
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   my_new_file.txt

We can see that the file is now being tracked by Git and is ready to be committed. We do see something new with the references to --staged and unstage, we’ll cover this below in Working Area, Staged Changes, and Commits.

For now, let’s commit the changes to the repository.

git commit -m "Added my_new_file.txt"  # Commit the changes to the file to the repository

Our file and its changes is now safely stored in the repository. Carry on yourself and try out making changes, adding files and checking the status. Maybe see what happens if you delete a file!

Top tip: When learning like this I like to use the git status command a lot as I step through each change I make. It’s a great way to see what is going on and to check that I’m doing what I think I’m doing.

Working Area, Staged Changes, and Commits

Now; let’s unpack what we saw earlier with the references to --staged and unstage.

Git essentially has 3 logical areas in which our files and their changes can exist. These are:

Working Area
Staged Changes
Commits

Let’s step through each of these areas and see how they relate to each other.

Working Area

The working area is where we make changes to our files. This is where we add, remove, and change the contents of our files. This is the area that we are most familiar with as this is where we spend most of our time when working with files. This is pretty much our normal experience of working with files, even outside of using git.

For example:

gwatts@my-computer:~/Temp/my_repo$ touch some_other_file.txt
gwatts@my-computer:~/Temp/my_repo$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        some_other_file.txt

nothing added to commit but untracked files present (use "git add" to track)

Staged Changes

The staged changes area is where we add our changes to the repository. This is where we tell Git that we want to add our changes to the repository. We can add files to the staged changes area using the git add command. We can also remove files from the staged changes area using the git restore --staged command.

The staged area acts as an intermediate step and allows us to have files in our working area that we don’t want to add to the repository yet. Maybe it’s a new file we’re working on and we’re not ready to commit it yet. Or maybe it’s a file that we never want to commit as it’s something local to us. Hint: we can also something called .gitignore for this, but we’ll get to that later. For now, just know that files in the staged area are ready to be committed to the repository but have not yet been committed and files in the working area are not yet ready to be added. I hope that makes sense? Whether it does, or not, have a play around with adding (git add) and removing (git restore --staged <filename>) files from the staged area and see what happens.

gwatts@my-computer:~/Temp/my_repo$ git add some_other_file.txt
gwatts@my-computer:~/Temp/my_repo$ git status
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   some_other_file.txt

gwatts@my-computer:~/Temp/my_repo$ git restore --staged some_other_file.txt
gwatts@my-computer:~/Temp/my_repo$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        some_other_file.txt

nothing added to commit but untracked files present (use "git add" to track)

As we just said, but it bears repeating for clarity, changes in the staged changes area are not yet committed to the repository. We can see what files are in the staged changes area by using the git status command. If we somehow lost these files and their changes we couldn’t get them back from the repository, as they were never committed.

Top tip: Don’t leave files in the staged changes area for too long. If you do, you may forget what changes you made and why you made them. It’s best to commit your changes as soon as you can.

Commits

The commits area is where our changes are finally stored into the repository. Once files, and their changes, are added here we can inspect their history over time, revert back to previous versions, and generally feel safe that as long as the repo itself is safe changes committed here are safe too.

gwatts@Graham-T14:~/Temp/my_repo$ git add some_other_file.txt
gwatts@Graham-T14:~/Temp/my_repo$ git commit -m "Adding some_other_file.txt"
[master b0a110c] Adding some_other_file.txt
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 some_other_file.txt

Making Commits

While I’m here, I’ll make a note that different individuals and teams have different views when it comes to commits. Some advocate for fewer, larger commits as it keeps the commit history cleaner and there are less commits to look back through for changes etc. Others prefer small commits, made more often as it ensures changes are captures and reduces the risk of losing changes if something goes wrong and reduces the risk of merge conflicts when working with others.

There are arguments either way on this one. My advice is to not get caught up on it too much until you start working with a team and to follow their collective preference once you do. I personal fall on the side of smaller commits, more often, but I’m not going to get into a fight over it.

Viewing the Commit History

Once we have some changes to our repository we can start to see the history of our changes. We can do this by using the git log command.

git log  # Show the commit history of the repository

For our changes so far the output should look something like this:

commit b5259ef09423f32a0b84d5cc1a6713b0346733ab (HEAD -> master)
Author: Graham Watts <user_email>
Date:   Wed Jan 18 10:40:42 2023 +0000

    Added another line to my_file.txt

commit 99b4f82b3360a17f64c3554304ac2020e7e160a9
Author: Graham Watts <user_email>
Date:   Wed Jan 18 10:39:28 2023 +0000

    Added my_file.txt

With this tool we can see the history of our changes. We can see the commit message, the author, the date, and the commit hash. The commit hash is a unique identifier for the commit. We can use this to reference a specific commit in the future. We’ll see this in action later.

Getting Started with Branches

The last subject I want to cover in this getting started is branches. Branches are a really powerful tool in Git and are a great way to work on new features or changes without affecting the main codebase. We can create a branch using the git branch command.

git branch my_new_branch  # Create a new branch called my_new_branch

What this command does is create a copy of the current state of the repository and store it under a new name called my_new_branch. This new branch is also part of our repository. What this allows us to do is make changes to files in our repository without affecting the original copies on the previous branch.

We can see our new branch, and any others using the git branch command again with the -a switch.

git branch -a  # Show all branches in the repository
# returned output
* master
  my_new_branch

Where the * indicates the current branch we are on. We can switch between branches using the git checkout command.

git checkout my_new_branch  # Switch to the my_new_branch branch
# returned output
Switched to branch 'my_new_branch'

Running the git branch -a command again will show us that we are now on the my_new_branch branch.

git branch -a  # Show all branches in the repository
# returned output
  master
* my_new_branch

We can also see our branch in the output of the git status command.

On branch my_new_branch
nothing to commit, working tree clean

Now that we have a new branch let’s add one more new file to our repository but on this new branch.

# First make sure we're on the my_new_branch branch
git checkout my_new_branch
# Should return the following output if we're on the correct branch already
Already on 'my_new_branch'
# Now let's add a new file to the repository
touch another_new_file.txt
# Let's add some text for good measure
echo "This is another new file" >> another_new_file.txt
# Now let's add the file to the staged changes area
git add another_new_file.txt
# And commit the changes
git commit -m "Added another_new_file.txt"

If we inspect our repository now using ls we should see the new file.

ls -la
# returned output - something like this
total 24
drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 11:46 .
drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 10:36 ..
drwxr-xr-x 8 gwatts gwatts 4096 Jan 18 11:46 .git
-rw-r--r-- 1 gwatts gwatts   25 Jan 18 11:46 another_new_file.txt
-rw-r--r-- 1 gwatts gwatts   31 Jan 18 10:39 my_file.txt
-rw-r--r-- 1 gwatts gwatts   20 Jan 18 10:47 my_new_file.txt

Now, if we switch back to the master branch with the git checkout master command we should see that the new file is not there.

git checkout master
# returned output
Switched to branch 'master'
# Now let's check the files in the repository
ls -la
# returned output - something like this
total 20
drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 11:48 .
drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 10:36 ..
drwxr-xr-x 8 gwatts gwatts 4096 Jan 18 11:48 .git
-rw-r--r-- 1 gwatts gwatts   31 Jan 18 10:39 my_file.txt
-rw-r--r-- 1 gwatts gwatts   20 Jan 18 10:47 my_new_file.txt

It’s not that our file has been lost or magically disappeared. It’s just that it’s not on the master branch. If we switch back to the my_new_branch branch we can see that the file is still there.

git checkout my_new_branch
# returned output
Switched to branch 'my_new_branch'
# Now let's check the files in the repository
ls -la
# returned output - something like this
total 24
drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 11:46 .
drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 10:36 ..
drwxr-xr-x 8 gwatts gwatts 4096 Jan 18 11:46 .git
-rw-r--r-- 1 gwatts gwatts   25 Jan 18 11:46 another_new_file.txt
-rw-r--r-- 1 gwatts gwatts   31 Jan 18 10:39 my_file.txt
-rw-r--r-- 1 gwatts gwatts   20 Jan 18 10:47 my_new_file.txt

If we’re happy with the changes we’ve made in our new branch we can merge them back into the master branch. We can do this using the git merge command.

git checkout master  # Switch to the master branch
git merge my_new_branch  # Merge the my_new_branch branch into the master branch
# returned output
Updating df25842..3583577
Fast-forward
 another_new_file.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 another_new_file.txt

If we inspect our repository now we should see that the new file is now in the master branch.

ls -la
# returned output - something like this
total 24
drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 11:52 .
drwxr-xr-x 3 gwatts gwatts 4096 Jan 18 10:36 ..
drwxr-xr-x 8 gwatts gwatts 4096 Jan 18 11:52 .git
-rw-r--r-- 1 gwatts gwatts   25 Jan 18 11:52 another_new_file.txt
-rw-r--r-- 1 gwatts gwatts   31 Jan 18 10:39 my_file.txt
-rw-r--r-- 1 gwatts gwatts   20 Jan 18 10:47 my_new_file.txt

As with everything, there are many schools of thought and approaches to branches and merging. Typically you’ll want to keep your branches as small as possible and merge them back into the main branch as soon as possible. This will help to avoid merge conflicts and keep your codebase clean. But, again, when you start working with a team they may employ another approach. It’s important to understand the basics of branches and merging so that you can work with your team and understand what they’re doing.

Closing Thoughts

In this post we’ve covered the basics of working with Git and GitHub. We’ve covered the basics of creating a repository, adding files, committing changes, and pushing changes to GitHub. We’ve also covered the basics of branching and merging. In a future post I’ll cover some of the more advanced features of Git and GitHub including working with remotes, pull requests, and more.

Remember, check out my Git Knowledge Article for tips and tricks when working with Git.

Other Resources

Scott Hanselman has done some great videos on the basics of working in IT and software development including a really nice Git 101 video. I highly recommend checking it out.

And, of course, the official Git documentation is always a great resource. They even have a Git Book that you can read online or download a PDF copy of.

If this article helped inspire you please consider sharing this article with your friends and colleagues, or let me know via LinkedIn or X / Twitter. If you have any ideas for further content you might like to see please let me know too.

Twitter Facebook LinkedIn