Have you ever felt that software seems to have a mind of its own, is sometimes stubborn, and works a bit against you? Unfortunately, this behaviour happens - and can lead to losing intermediate steps of our work, or even losing the entire results that we have computed so far.
To prevent this situation from happening we need to set up mechanisms that store the data properly, that let us go back to a previous step in our work process, and also let us compare differences between individual steps. The way to achieve this is to combine reliable hardware equipped with a file system like Ext3/Ext4, BTRFS or ZFS/OpenZFS, with a revision control system like Git on top, and a compatible remote backup service such as BackHub on GitHub Marketplace.
Let us assume that the issues of the file system and the remote backup service are answered. What’s left to consider is the revision control system in which we keep our data. An experienced user with technical background knows how to work with Git, how to commit the data, and how to see and revert the changes. In contrast, a less experienced user without technical background may concentrate on the individual project steps or the results themselves, and may have never have even heard the term revision or version control system. His greatest fear is losing the work on which he spent lots of time and energy, or the research results – whether by accident, software errors, or a malicious act.
The task of a system administrator for a work/research group is to set up and to maintain a reliable working environment, and to provide technology that supports the work/research group. The question still remains as to how to set up a mechanism that automatically saves file changes – either as soon as a change of file content is noticed (data is written to storage), or periodically, such as every ten minutes. Furthermore, no interaction of the end user should be required. It must be possible to run the service as a background job without any user attention required.
Technically, there are a few options available. For example:
See [7,8,9] for more details about the other solutions and tools mentioned above. The tool etckeeper  is not part of the list because it is meant to solely focus on configuration files.
Let’s now examine gitwatch, since it provides a more general solution.
Gitwatch describes itself as “[a] Bash script to watch a file or folder and commit changes to a git repo”. It is released under GPLv3, and uses the Inotify kernel subsystem in order to see content changes written to storage.
Prior to using Gitwatch make sure that the corresponding inotify software package is installed on your system – for Debian GNU/Linux the package is named ‘inotify-tools’ , and available for the Debian releases Jessie, Stretch, and Buster.
Gitwatch also requires that the data to be versioned is held in a local Git repository in which it can store observed changes to files. See the according Git commands to setup a Git repository .
At this time, gitwatch has not yet been packaged for Debian or Ubuntu. The software is available from Github, and does not require any compilation to use. Therefore, the initial step is retrieval of the script from the project website on Github (see listing 1):
$ git clone https://github.com/gitwatch/gitwatch.git
The package contains the bash script
gitwatch.sh as well as the file
gitwatch.service to set up an appropriate service for systemd. The
remaining files are of no big interest to end-users; besides a
machine-readable package description in the JSON format there are a
number of tests scripts included. To use Gitwatch as a regular
user it suffices to simply call the script
gitwatch.sh from the
command line (see the listings 3 and 4 below).
In order to make the script available system-wide, run the following command as a prior step from inside the retrieved git archive:
# install -b gitwatch.sh /usr/local/bin/gitwatch
As an administrative action, the command
installcopies the file
gitwatch.sh</code>to the directory <code>/usr/local/bin. Using the switch
--backup`) a potentially existing file with exactly the same
name is automatically archived. Thereafter, the command
available system-wide, and can be run by regular users as well.
You can use Gitwatch in two ways – to observe either a single file, or an
entire directory. Listing 3 shows how to take care of a single file that
resides inside a Git repository. The
&at the end of the command is
required, and sends the Gitwatch process to the background.
$ gitwatch.sh dataset1 &
As soon as you have saved changes in the referred file
dataset1 Gitwatch is notified about it, and it commits the changes to your local Git repository. Luckily, tracking an entire directory works in the same way.
To take care of the current directory the command is as follows.
$ gitwatch.sh . &
Figure 1 shows the output of Gitwatch that runs in the background while I am writing this article. The commit messages from Git are sent to stdout as soon as the content of a file changes, or when a file is added, deleted, or moved to a different place.
Gitwatch comes with a number of useful switches:
-s sec:: amount of time to wait for a commit after a change was
noticed. Value in seconds. The default value is two seconds.
-d format:: the format string used for the timestamp in the commit.
The default value is +%Y-%m-%d %H:%M:%S. See the manual page of the
datecommand for other formats [13,14].
-r name:: name of a remote directory the content is pushed to. The
default value is ‘no push’. Deposit your SSH keys to prevent being asked
for your password every time you do a push.
-b branchname:: the name of the branch. The default value is ‘no
branch’, and refers to the current one.
-g path:: the path to the
.gitdirectory. If not specified any
further, the local directory is used.
-l lines:: the number of lines being logged. The default value is set
to 0, and means ‘no limit’.
-L lines:: the same as
-lbut without coloured formatting.
-m message:: the commit message. The default message is: “Scripted
auto-commit on change (%d) by gitwatch.sh”. “%d” refers to the date (see
-e events:: specifies the events to be watched. This includes
‘close_write’, ‘move’. ‘delete’, and ‘create’ for closing, moving,
deleting, and creating files. ‘events’ contains a comma-separated list
of these values, and the default case consists of all four events.
Figure 2 shows Gitk  – a graphical tool to inspect the changes of
the local Git repository. The commit messages are similar, and the only
difference between them is the logging date. As stated above, you an
change the commit message using the switch
The default behaviour of Git covers all the files from your directory.
Many programs create temporary files, for example Vim, which uses filenames
that end with
.swp. Usually, it does not make sense to record them. In
order to exclude such files add the
.gitignorefile to your
Listing 5: An example .gitignore file
Be aware that Gitwatch may not be the right tool for programmers. Programmers typically want to commit only code that works, i.e.,compiles and runs properly, and hence needs to be tested before committing.
Git commit hooks might help in this case to abort an automatic commit if the code doesn’t compile or if it fails the test suite, but this scenario is more involved, and therefore outside the scope of this article.
Gitwatch is configured with a delay of two seconds to wait for a commit. This works well for SSDs and smaller files. For conventional storage media and distributed file systems this value might be a bit too small. Changing large files may lead to the same effect, and therefore we recommend increasing the value. These cases need further experimentation in order to give correct advice on which delay works best.
Gitwatch is quite useful to automatically keep track of changes in your repository. It works well for pure data files, preferably plain text documents and configuration files. In order to effectively store binary data, as well as entire operating systems, the two Git extensions git-annex  and git-lfs  are quite useful.
Gitwatch helps by providing notification as soon as a change happens – whether planned, accidental, or malicious. This mechanism allows you to see what has changed and the exact time of the change, as well as to initiate a rollback. You can use gitwatch to set up your own document platform combined with a revision control system.
The author would like to thank Axel Beckert, Veit Schiele, Gerold Rupprecht, and Zoleka Hofmann for their help and critical remarks while preparing this article.