Data Science Tools: Introduction to git-extras

Additional plugins for to make using repositories more manageable for Data Science.

Scope

Introduction to the open source package, it’s installation on a Ubuntu (GNU/Linux) environment and it’s application in general and for Data Science.

Introduction

It’s common practice for most Data Scientists to interact with Version Control Systems (VCS) like Git. Beyond the basic commands like , , , , and you might wander what else you need? If using , it’s common practice to evolve an analysis with multiple notebooks and therefore there isn’t that much involvement with other than an insurance policy to undo errors or file corruption.

The motivation here is that whilst Exploratory Data Analysis (EDA)is often a solo effort, there are occasions where collaboration with others is required such as Machine Learning (ML) model deployment and crucially for documentation.

With the context set, the git-extras package provides additional utilities to support most users of .

Git-Extras

Installation

The installation instructions cover Linux, MacOS and Windows; for Ubuntu use the package manager:

Having confirmed the package is available in the repository, it can be installed:

The following terminal cast captures the installation process:

Terminal cast for git-extras installation on Ubuntu | Cast by Author

Most GNU/Linux distributions are likely to have and older version of the package (in this case version 5.1) so not all commands will work such as .

Commands

The full list of commands are documented on the GitHub repository; but a few key examples are summarised below (in no particular order):

Example Commands

Git Summary

As you can see it provides a quick snapshot of the project, in this case it’s relatively new. Its an easy way to get a feel for the repo, analogous to using  in .

Git Create-branch

The command reduces the traditional and approach into a single command:

The code above lists branches both local and remote () and shows only a single branch called . The traditional approach to creating and syncing a branch requires 2 steps; the with the flag does the same in a single step. The final listing of branches shows that the new branch (02-git-extras) is available both locally and remotely ().

It’s important to “reserve” the remote branch name as soon as possible to prevent a clash with fellow contributors in particular with regards branches for documentation.

Git Rename-branch

Whilst the time saving from may not look significant, renaming branches both locally and remotely is not easy and therefore the following command makes this much easier:

The first form of the commands permits the renaming of any arbitrary branch by providing the old and new names. The list of branches () shows that the new name () is visible locally and remotely. This branch is checked out for the second form of the command, which renames the existing branch. Finally, the branches are listed again to show that both the local and remote branches align.

Git Ignore and Ignore-io

The first thing to do for a Data Science repo after is of course manage the  file. An unusual feature of  is of course that if you don’t want to sync the file itself is to add the filename to the list. The command adds a number of useful features to manage  files seamlessly:

By default the command lists the contents of both the local and global ignore files. Providing the flag will show only the local patterns, which can be added by simply providing a pattern after the command as is the case with compiled files: .

Rather than remembering common patterns for different Integrated Development Editors (IDEs), text editors (such as ) etc, community provided patterns can be obtained from gitignore.io:

.gitignore.io | Screenshot by Author | Content and Artwork by Toptal

The service allows users to combine different ignore patterns, for example  with python. The output is in the form of a text file:

The same output can be obtained from the command and displayed on the screen:

To add the ignore patterns to the local file use . The flag appends the patterns to the local file.

Git Show-tree, undo & setup

The last three are summarised briefly. The command replicates popular one-liners from Stack Overflow for showing the graph:

The command allows for jumping to a previous commit or a number of commits:

Finally, the command (by default in the current working directory), initialises , adds all the files and makes an initial commit. The command accepts an alternative directory as an argument.

Conclusion

The open source package adds additional commands for users that can reduce the friction associated with using the tool. The basic installation on an Ubuntu environment was illustrated and some of the key commands were showcased.

Addendum

Attribution

All , notebooks and terminal casts are by the author. All of the artwork is based on assets explicitly CC0, Public Domain license or SIL OFL and is therefore non-infringing. Theme is inspired by and based on my favourite theme: Gruvbox.

Changelog

2021–03–07: Added updated artwork, attribution info and addendum section.

Data Scientist and Chartered Aeronautical Engineer (MEng CEng EUR ING MRAeS) with over 15 years experience in the Aerospace, Defence and Rail Industry.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store