Dependency management using GIT
This is the transcript of the talk by Manaswini Das presented at Git Commit Show 2020.
About the speaker
Manaswini Das is an open source contributor and Associate Software Engineer at RedHat.
Transcript
[Host]
We have with us Manaswini Das, who is an open-source developer and then engineer at Red Hat. So she's going to talk about dependency management using Git.
[Manaswani]
Hi, everyone, I'm Manaswani Das and I'm an associate software engineer at Red Hat as Shreya stated, and I'll be speaking about dependency management using Git.
So first of all, before moving on to the topic, let us just quickly discuss the requirements like why do we need dependency management. Let me just walk you through some, you know, real-life scenario, suppose you have a child project, or you can, I shouldn't be using the term child project, I can say, suppose you have a project that you want to feed into multiple projects as a child project, and, you know, make it possible for the parent project to build along with the child project.
So what do you do normally, like you keep it in a folder, then you just move the folder to the parent project, like manually. I used to do this before as well.
And that led to a lot of redundancy, a lot of duplication. And then if you have, you are going to feed the same child project to different parent projects. And then you had to, you know, go and edit in every parent project if you wanted to edit the child project or something. So that was becoming difficult to edit as well.
And the next thing is the ability to build and tests from bleeding-edge checkouts, it is because supposed you make suppose the child project, some iteration of the child project can build some iteration of the parent project. So it was difficult to remember as well that, which particular iteration of the child project is suitable for which particular iteration of parent project and so on.
So we all know that Git helps us to manage source code and to, you know, help it manage dependencies. There was this new code-sharing development workflow and an entire set of tools, which were created just for this. So that can help make dependency management easier, and bridge the gap between Git Repos and sub repos within it.
So moving on, without much ado jumping into the solution. What if we could put one project into another? And what if those child projects were good project stencils, and you could just feed them with the help of some simple commands using CLI. And then you were able to just push or pull changes on the go directly from the parent project.
So for this, we have a set of tools available, which is provided by Git itself, they are known as Git submodules and git subtrees. So we should be going by the terms. So this is a real-life scenario, of what we are going to talk about. This is a cherry tree growing on top of the mulberry tree, if you are into biology, you must know how this works in real life. But if you already know what a subtree is, it is a tree that is just a part of another bigger tree.
So you can also understand it from the adjoining figure that's given. But if you're not able to just imagine it, imagine a cherry tree growing on top of a mulberry tree. So that will help you understand it. The first tool that I'm going to walk you through is Git submodules.
Type the system as Git repositories within another Git repository. And one more thing that you have to keep in mind, while you know working with Git submodules is that it points to a specific commit reference to its child repository. So we will slowly come across it when I'm going to show you that demo that I have prepared. And the next thing is, since I'd already discussed that ability to build and test from reading checkouts, it was one of the disadvantages. So if this tool doesn't provide that, then what is its use it? One more thing is we can update specific submodules if we wish to. So you can also just update all this one, all the submodules and mango or you can choose to update specific sub-modules, whatever you wish to.
So demo time, let me just show you one little project that I built using get submodules. So this is the project. To know whether a project contains a submodule or not, the first thing that you can look for is the config file that is the git modules config file. So let me just quickly go there. You can see there are three submodules right here, there is a submodule that's going to the website wiki, and then there is the path, then there is a URL. Similarly, we have two other submodules. Similarly, we have three other folders, right? Three other folders right here. But you can also see that these are not regular folders, you can see that this looks like some sort of a symbolic link. So we are here in the docs test repo. And you can also see that there is this @ followed by the commit hash. So this just shows, as mentioned earlier, it points to a specific commit reference of the child repository.
So if I click on this, you can see that it leads to a different report altogether. Similarly, like just reiterating what I said earlier, is that it is just a link. So yeah, this is how submodules look like on GitHub. So let me just quickly walk you through some commands that are used to do that. If you want to add a submodule, you can just do a simple git submodule @ followed by the URL.
There's this module getting created. I had set up the same repo, let me just quickly go through, yep, you can see that there is a new entry right here, does this a submodule sub-pre-Doc's test, there is this part there is this URL, and there is this new folder that got created. And if you click on this, you can see the folder structure. And to see whether these are sub-modules or not, you can see this blue color as symbols right here using VS code. So yeah, this is how it looks. And this is how you can add a submodule to your project. So if you want to update a specific submodule, you can just do a simple command that goes like Git submodule update, remote, followed by the path or the folder name. So if you want to just update the dm path, I can do this using this. And if you want to update all the submodules in one go, you can do a git submodule update. That said, you can even specify the remote. And if you want to delete a submodule, you can do it using this because if you do it manually, it will still be there in the remotes. So you have to do a simple gets a git rm-followed by the pathname or you know, folder name. This is how you can work with submodules.
So the next tool that we are going to discuss it's Git subtree. So Git subtree was introduced and got 1.7.11. It's written that it's an alternative, it's considered to be an alternative to get some modules, but I don't consider it so because it has its specific use case. And the next thing is, it also helps us to achieve the same thing that is to get one gate repository into another. And it offers clean integration points. That is the whole child repository gets cloned into a subfolder of the parent repository. So that makes it very easy to understand when you are a beginner. And the next thing is it helps us to inject dependency. And the next thing is it also assists the exact project. This is what the GIT subtree is.
Let me just quickly walk you through one project I created using Git subtrees. So the same as getting submodules, that is, if you want to know whether, you know, a project contains git subtree or not, you can go to the dot git trees config file. But I would like to state here that the dot gets the Git subtree version with the dot config file is not something that's directly supported by the default get the package. So to get that you have to go to the default git repository, and you have to install git subtree using that.
Otherwise, You can use git subtrees without that, but that will be mostly a local thing. And since these subtrees, you know, they look like regular folders. So it becomes really difficult to recognize whether this subtree or this is a normal subfolder or anything. So there's this git cogito wiki right here. There's this URL path similar to what we had in the dot git modules config file. And then we had this branch. So the next thing is we are in subtree dogs test so we'll just click onto that. So you can see that it isn't good. It doesn't, you know, redirect to some other, you know, repo it just if there is a subfolder. So if you want to just get the good subtree with the dot git trees config file, you can do it. I have attached the links in the upcoming slides, you can get your hands dirty using that. So, let me just walk you through how you can add a git subtree, I have set up the same project right here. So yeah, this is the command that's called git subtree add a prefix, the prefix is the path, the path where you want to have that subtree. So I have given this path as opposed to this path that doesn't exist, it creates the path for you. So you needn't worry about that. The next thing is the URL followed by the branch. So it is very necessary to, you know, provide the branch name because the subtree is also a tree in itself. So if you don't wait for the branch it will get you to know, confused about which tree it has to specifically merge with the child with the parent tree. So this is how you can add a subtree. Okay, sorry, but this is pretty much how it should be working. So since it has modifications, it is not, but you can add it for free using this. And you needn't worry about committing the same because git subtree add cum creates a commit for you itself.
So the next thing is pulling from subtrees. So for that, you can use a single command, that's a git subtree pull all And if you have changes in your child repository, and you want to push to your child repositories and one goes from the parent project itself, then you can use a git subtree below push all. So yeah, this is how you can work with git subtrees. So I have the editor right here, yeah. So I have this repo. And you can see it looks like a regular, you know, regular folder subfolder. So there is no symbol or anything. So you need to have a good subtree, you know, version with the Dot gets trees config file.
Moving on, As stated earlier, these are some links, if you want to get your hands dirty, the world is yours so, you can just go ahead with that. And the next thing is talking about the applications and powers. Those are all the examples that I showed you. So that was something that I used to, you know, do something with single-sourcing documentation. So I tried the same using another bigger project for managing project dependencies. So you can see there are this start git trees config file, then there are three subtrees. And yeah, so I added these three subtrees, and then I just ran a small Maven build, and It worked perfectly fine. So yeah, you can use, this for dependency management. And again, walking you through the other applications eliminates redundancy, of course, the shortcoming that we talked about before this.
And next thing is publishing various platforms like a good book, suppose you want to single-source documentation, suppose you have a source, repo where you want, but you want it to be, you know, rendered and a good book, or No, it didn't get appealed GitHub Pages, you can just feed it to the respective Repos and get your stuff working.
And the next thing is, it makes maintenance easier, since everything is just with a few words on the CLI, and that makes your job easier. So next thing is the condition. Like the very essential stuff I can talk about is to get submodules vs to get a sub to freeze it. So it's very, you know, important to understand this because every tool has its pros and cons. And you need to know which particular use case your project should serve, and so on. So, I can see that the git submodule was harder for me when I was beginning. So if you don't understand the stuff that it is a link to a commit reference and not an entire, you know, repo altogether. So it becomes really difficult to, you know, understand that when you are a beginner. And when it comes to git subtree, since it clones the entire, you know, just entirely clones the child repository into a subfolder of the parent repository. So it's very easy to understand and one more thing that you have to take care of is that since it clones the entire report it also merges the entire commit history of the child project with the parent project.
So I can show you this. Yeah, so you can see that I have a lot of contributors right here but they haven't contributed to my project, they can, they have contributed to this particular subfolder, that's the Cogito wiki. So yeah, this shows that it merges the entire, you know, commit history of the child project with the parent project.
And the next thing is, getting a module requires this module to be accessible via a link, like via server. So because it's a symbolic link that's just one, it requires a module to be accessible on a server. And the next thing is when it comes to git subtree is pretty much decentralized, it works independently, although you can push and pull changes from the parent project. And the next thing that you might have already guessed is, git submodule offers a smaller repository size because that's the link specifically. And the next thing is to get sub-trees there for a bigger repository size because it merges the an entirely clones the child repository, so you have to take memory considerations, you know, take into consideration memory and all if you want to, if you want to consider that if that. So, you know, hard requirements. And the next thing is that a module can be used for component-based development. That is you can update submodules one at a time, or you can, you know, just do it all at once, whatever. But in case a bit separate can be used for system base development. So if you want to have a look at how submodules and subtrees, you know, differ, when it comes to commands, you can just click on this cheat sheet right here.
So yeah, we know that everything has its pros and cons, and everything has its use cases. So this is something that I observed. And that is why I've just summarized it right here. That is git subtree, I would not consider it a direct replacement to git submodule, because it has a specific use case. But I noticed certain things that may be helpful to you.
That is if you have an external repository that you own, and you are likely to push code back to the news and get a module because I noticed that it's easier to push there. And if you have some third-party code that you're unlikely to push code back to, then you use git subtree, because it's much easier to pull, it's just a simple pull-on command. So other alternatives are available. There is a git slave, they might want to change the name right out owing to the Black Lives Matter stuff that's going on on the internet. It is a script. That is for coordinated version control of large projects, combining code from various child repositories or slave repositories as they state and it's a wrapper, a wrapper around git that manages multiple source code repositories and merges multiple source code repositories in one go. Meaning that you get the command line and will be able to control all child repositories from the parent repositories themselves. So the next thing is a git repo. That is, it is a Python script that is built on top of git to just extend the functionality of git into dependency management. So it helps manage several git repos, does the uploads of revision control systems and version control systems, and automates part of the development workflow. And the next thing is that I found close to what subtree does is get sub repo it also does the same stuff that is nested, or clones the entire child repository into a parent repository, and just enables us to push and pull code from the child repositories on the go. So there are multiple packages, repos similar stuff are available for JS repos, as well there are there is low and there is a bit if you want to have a look at that, you can just, you know, access them through these links have also tested reference for you.
Now it's time for some questions.
Q1. So how did you get the idea to do such a PLC? Like, what motivates you to do such a thing?
I would say that that was something that I was given, like, during my internship at Red Hat I didn't know such stuff existed. But before that, I used to do the same thing that I mentioned earlier. I used to keep it in a folder and I just used to copy it from folder to folder and I just made stuff difficult. But then I explored and I came across submodules and then I came across subtrees. So I just tried to do stuff with that and It worked fine. So I guess that worked for me, but I am in general as well.
Q2. What were the challenges you faced while doing such a PLC? So are there any challenges? Or, like, sometimes you get frustrated while doing this? Is there anything like that?
Yeah, because when you just type git subtree, then there is just, you know, a GitHub documentation that pops up. And that doesn't serve your purpose, because it was very local. I told you that if you don't, I didn't, it took me some time to know that there is this get subtree with a dot git trees config file, similar to what I have with git submodules, git submodules are available within the default git package itself. So it was not that difficult to figure out. But in the case of subtrees, it falls. And when I found it, then it was like, Okay, then this is something that can work on, you know, for a code sharing development workflow as well. That was some of the challenges because it was difficult to understand that stuff, as well, as I mentioned, that git submodule is a commit reference, it's a link to a commit reference of the gender puzzle tree.
Q3. Is there any place where you would not recommend subtrees and submodules?
would say, um, if you don't need dependencies and git, and it is available via NPM, or something else, the packages or something else. So I guess you don't need kids for that because that's available via a simple NPM map NPM package or, you know, some other package Maven package, whatever.
Q4. Suppose there is a case, I have a parent project in which I don't change my dependencies like, I have some dependency, I'm not going to change it regularly. So in that case, would you recommend using subtrees and submodules? performance-wise?
I don't think so. Because if, if you don't require it, you can just keep it as a normal folder. You don't need it if there are no regular updates to you know, child project, and I don't think you require it, because why would you just increase your troubles?
.
Q5. As you mentioned, there is a limitation of size in the case of submodules, right. subtree. So is there any limitation of size in the case of submodules? Also, as you mentioned, we can use large repositories in submodules, but what about studies? Can we?
I don't think there are some, you know, repository size memory constraints when it comes to submodules because they are just links, they just look like folders, but they are not folders. So that doesn't increase your repository size much. So I would say that if you are concerned about the repository size, then use git submodules not git subtrees.
Q6. What are the alternatives to this approach especially for JavaScript or Node JS developers?
I haven't explored much, but I have read about it. So you can use that if you want to, and just you know, learn more about it.
THANK YOU VERY MUCH!!
Member discussion