Aswin Rajeev: Git: Clone, Fork, Pull-request and Merge-request explained

Working with Git often involves various Git platforms such as GitHub, GitLab, BitBucket etc. These platforms have Git as the underlying utility and works with the same concepts and conventions that I explained in an earlier article. However, most of such service providers provides additional functionalities that are not inherent to Git, to facilitate better collaboration across projects and teams.

Many times a project might have collaborators from outside the team working on the project. This could be the developer community as in case of open-source projects, different teams working on different aspects or use cases of an original project etc. In such scenarios, open access to the original repository would not be appropriate. To address such scenarios, there is something called a forking flow that is commonly practiced.

But before we get to discuss on forking, we can discuss about a the conventional collaboration flow.

Cloning a remote repository

Cloning is the process of checking out a Git repository into a local computer. Cloning would create something called a working-copy of the project on the machine to which the project is cloned. The working copy allows the user to make changes to the project files on this machine and lets the users commit the changes and push those to the original remote repository. This is explained in detail in my article "Understanding Git: basic concepts of Git and recommendations for use".

[ The clone project popup in GitHub ]

The remote repository doesn't contain a working directory and is called a bare-repository. The original remote repository is often referred to as the origin in the local machines to which the repository is checked out, although this can be changed if the user wish to.

The user can push changes into the origin only is he/she has write-access to the remote repository. Now as I mentioned earlier, there may be cases where the owner of a repository would allow to read the code, but does not allow writing to the repository. This is commonly the case with the thousands of open-source projects found on GitHub.

Forking a repository

In the scenarios where collaboration is to be allowed without providing explicit rights, a fork workflow is used. In this case, a developer (or a different team) would be able to fork a project into a new bare-repository. Forking is the process of creating a replica of an original repository while maintaining a pointer to the original repository.

[ A GitHub UI that allows to fork a project ]

Once a fork is created, the forked repository is completely owned by the person/team that has forked the repository. This means that they can make changes to this replica without any explicit permission from the creators or owners of the original repository.

The forked repository would still maintain a reference to the original repository which is commonly referred to as upstream. The upstream can be used when the forked replica would need to be updated with the latest changes in the original repository through a pull or fetch operation.

Pull request

One can work on an open-to-read repository by forking the repository and then cloning the forked repository to the local machine. Further changes to the original repository can be pulled and the changes that we make in the source code can be committed, and even pushed to the forked repository that we currently own. But how does this enable contributions to the original project?

This is where the reference to the original repository (namely upstream) comes into use. Most of the Git platforms such as GitHub provides an interface to request the owner of the original repository to pull the changes made in the forked repository. Such requests are called pull-requests and usually points to a specific branch in the forked repository (this is one case in which feature branches commonly used as described in my previous article). The owner(s) of the original repository would be notified by the platform and would be able to review the changes and accept them into the original project.

[ A pull request awaiting approval by the upstream project maintainers ]

Thus a seamless collaboration is facilitated without explicitly granting any privileges to the original repository. This is how thousands of open-source projects thrive in the internet utilising the millions of community developers from various parts of the world.

Is this technique limited to open-source environments? The answer is no. It is very well applicable in even closed ecosystems like that of corporations.

Applications of the fork workflow in commercial projects

In various commercial projects, the projects are hosted within private repositories, either within a Git hosting providers' eco-system, or as an independent platform hosted within the company's infrastructure. In either case, there would be usually a closed ecosystem within the organisation.

That doesn't mean that all projects within the company are accessible to all employees/teams in the company. There could be various segments within the organisation dedicated to different accounts. When collaboration is required across such teams working in silos, the fork workflow is useful.

Consider an IT services company which has a team working on a generic product to be used in various other projects of the company by customising it to the project-specific needs. The original project in the meanwhile may continue to evolve by adding generic features.

In such a scenario, it may not be appropriate for the external project teams to be given access to the product's code repository but may want to continuously take updates from the original project. Also in certain cases, the changes done for a customised version of the product (by an external team) might be good to be integrated into the original generic product.

If the fork workflow is applied here, each of the customised projects could be a fork of the original project. Thus the access control is appropriately handled, but at the same time the external projects can take specific updates from the original project as they feel necessary. Also, if certain features developed in the customised project turns out good to be promoted to the original project, a pull-request may be raised. The pull request could be accepted and the changes integrated into the original project by the project owners.

Merge requests

From whatever explained till now, pull request would turn out to be really useful - particularly when it comes to access control. So why can't the same philosophy be adopted within a single project to restrict access to specific branches?

Various Git platforms lets the owner of the repositories to protect certain branches. This way, only certain team members would have write access to the branches, or in some cases no one would have a direct write access. Many teams would have processes in place such that a review process is to be completed before a feature branch is merged on to the master branch (the main branch). To cater to such scenarios, we have the concept of merge requests.

When a specific milestone is achieved in a feature branch, a merge request could be raised to merge the changes in the feature branch with a different branch - often the master branch or another protected branch. The merge-request could be reviewed and accepted by team members who have privileges to do so.

[ A merge-request (pull request from the same repository) ready to be merged in GitHub ]

Most Git platforms allows the reviewer to compare the changes with that of the previous state of the target branch through a dedicated user interface that aid in the review process. This way an age-old problem of maintaining the source-code integrity with multiple developers involved is somewhat solved.

Conclusion

While Git itself is really powerful, the additional features offered by various Git platforms such as GitHub and GitLab are game-changers when it comes to collaboration in software development. Features such as the forking workflow and merge requests have been instrumental in the success of various open-source projects that thrives in GitHub and the like.

While getting familiarised with these concepts is essential in open-source development, it would also be useful to possess mastery over those even in large or complex commercial products. In fact, I was able to successfully adapt the concepts of Git fork workflow that could transform the working on various teams dealing with variants of an original project through forked repositories.

Git: Clone, Fork, Pull-request and Merge-request explained

Cloning a remote repository

Forking a repository

Pull request

Applications of the fork workflow in commercial projects

Merge requests

Conclusion

No comments

Post a Comment

You may also like to read

Understanding Git: basic concepts of Git and recommendations for use

JavaScript: var, let and const- should you stop using var?

Node.js: how Node revolutionised software development

Git: Clone, Fork, Pull-request and Merge-request explained

Rise of JS: 5 amazing benefits of learning Javascript

India's worst pandemic- Viral misinformation explained through the Kerala elephant death episode

Popular Topics

Total Pageviews

Archives