Blog Archives
What is Git cherry picking and how do you use it?
“Cherry pick a commit”. I’ve heard the phrase often. It sounds kind of endearing, yet scarily technical at the same time. What is cherry picking and why would you want to do it? One fine day I found that I needed it, and suddenly I appreciated the what and the why. So I figured out the how. I hope this post will help you towards the same understanding.
Here’s the scenario: I’d applied a change to the latest version of the Kubeflow docs. Specifically, the change added a banner and associated logic to inform readers if they’re reading an archived version of the docs. Now I needed to copy the same banner and logic to the older (archived) versions of the docs.
More details of the scenario
The screenshot below shows the banner that I wanted to add to all the archived versions of the docs:
The way we store archived versions of the Kubeflow docs is to make a branch of the current version (that is, a branch from the master). For example, here’s v0.6 of the docs, for which the source is in this branch on GitHub. The master branch contains the current version of the docs.
I’d added the banner and accompanying logic to the master branch in this pull request (PR). Now I needed to copy the code to all the archived branches. I didn’t want to have to copy/paste all my changes into the relevant files in every affected branch.
Enter cherry picking.
Picking sweet cherries
It’s useful to know that, when you’re using GitHub, cherry picking a commit is equivalent to cherry-picking a PR. GitHub squashes all the commits in a PR into a single commit when merging the PR into the code base.
What does a cherry-picked PR look like? No different from any other PR. It’s a collection of changes that you want to make, pointing to the branch on which you want to make them. For example, PR #1550 is a cherry pick of PR #1535, with a few extra changes added after cherry picking.
Below are the steps that I figured out to prepare and do the cherry picking. One thing to note in particular is that I had to do something different if my fork of the repository already contained a copy of the branch into which I intended to cherry pick.
The first step is to check out the master branch, which contains the updates that I want to copy to the archive branches:
git checkout master
Make sure my local working directory is up to date, by pulling all content from the remote master branch. (I’m working on a fork of the Kubeflow website repository. The convention is to give the name upstream
to the repository from which you forked.)
git pull upstream master
Get a log of commits made to the master branch, to find the commit that I want to cherry pick:
git log upstream/master
A commit name consists of a long string of letters and numbers. Let’s say that I need the commit named e895a107edba5e68cc0e36fa3a05a687e806cc19
.
Check to see which branches I have locally:
git branch -v
Also check my fork on GitHub to see which branches I already have there.
Now I’m ready to prepare the first archived branch for cherry picking. Let’s say I start with the version 0.6 branch of the docs, named v0.6-branch
. If I don’t already have the branch on my fork, I need to get a copy of the branch from the remote master, and then push that copy up to my fork, so that I have a clean slate to apply the cherry pick to. So, I pull the branch down to my local working directory then push it up to my fork. In this example, the branch name is v0.6-branch
:
git checkout master git pull upstream v0.6-branch:v0.6-branch git checkout v0.6-branch git push origin v0.6-branch
(I’m working on a fork of the Kubeflow website repository. By default, the name of your fork of the repository is origin
.)
In the cases where I do already have the branch on my fork, I need to copy the branch from my fork down to my local working directory, check that the branch is up to date by fetching updates from the main repository, then push the branch back up to my fork. In this example, the branch name is v0.5-branch
:
git fetch origin v0.5-branch:v0.5-branch git checkout v0.5-branch git status git fetch upstream v0.5-branch git push origin v0.5-branch
Now I’m ready to cherry pick the changes I need. Remember, I’m cherry picking from master into an archive branch. Let’s say I want to cherry pick into the v0.6-branch
:
git checkout v0.6-branch git cherry-pick e895a107edba5e68cc0e36fa3a05a687e806cc19
The long string of letters and numbers is the name of the commit, which I obtained earlier by running git log
.
The changes are now in my local copy of the branch. I can make extra changes if I want to. (For example, in my case I needed to update some metadata that relates specifically to the branch, including an archive flag used in the logic that determines whether to display the banner on the doc pages.)
When I’m happy with the cherry-picked updates and any other changes I’ve made, I push the updated branch up to my fork:
git push origin v0.6-branch
Then I create a PR and specify the base branch to be the name of the branch into which I’m cherry picking the changes. In the case of the above example, the base branch should be “v0.6-branch”. The screenshot below shows the base option, currently pointing to “master”, on the GitHub UI when creating a PR:
Can the cherries turn sour?
In the above scenario, I used cherry picking to apply a change going backwards in time. The requirement was to apply an update to older versions of the docs, which as a rule we don’t update very often. I didn’t cherry pick from a feature branch into the master branch. There are plenty of warnings on the web about things that could go wrong when you cherry pick. I found this post by Rob Friesel helpful in providing context in a non-scary way.
How did I make the banner itself?
That’s another story. 🙂
How to remove an updated file from a PR on GitHub
This is my third post about GitHub techniques that aren’t necessarily obvious to those of us who think in non-Git terminology. This post derives from the fact that I searched the internet for “remove file from PR” and was led astray by helpful people telling me how to use Git to delete a file.
Say you changed the content of a file by mistake, and to your surprise the file has become part of your set of changes tracked by Git, and has thus become part of your pull request (PR). Now you want to remove the file from the PR. You don’t want to delete the file itself, you just want to revert the inclusion of the file in the PR. (Because, if you delete the file, then that deletion becomes part of your change set, which is not what you want at all.)
The basic principle is: Undo the changes that you made to the file, to make the file disappear from the PR.
Assumptions:
- The instructions below assume that you have not yet merged the PR into the GitHub repository. If you have merged the PR, then you should revert the PR and create a new PR with just the updates you need.
- The instructions below assume that the unwanted updates are in a file that already exists in the GitHub repository. If the unwanted updates are in an entirely new file that you created, then you can delete the file from your file system, then run
git rm <file path>
followed bygit commit
to remove the file from the PR.
Note: Save your updates if you need them. If you still need the updates that you made to the file, perhaps to include in another PR, copy and save the updates somewhere outside your Git working area. You’ll lose the updates when you follow the steps described below.
Removing the updates manually
You can remove the updates manually, by copy-pasting the original contents of the file into your version of the file.
- Find the original version of the file, either on your fork if you forked from the main repository, or on the repository from which you cloned your local repository.
- Copy the entire content of the file and paste it into your copy of the file.
- Commit the new changes if necessary:
- If you had not yet committed your unwanted changes, then you don’t need to do any more.
- If you’d already committed your unwanted changes, create another commit now. The new commit wipes out your previous changes.
- If you’d already pushed the changes up to a remote repository, push the new commit now too.
Using git to remove the updates
Case 1: You haven’t yet committed the unwanted updates to your local repository.
Run the following command if you want to undo the updates in your working directory and you haven’t yet committed the unwanted updates:
git restore <file-name-including-path>
Case 2: You’ve committed the unwanted updates to your local repository but you haven’t yet pushed the unwanted updates to your remote repository.
By default, you have a remote repository named origin
, which is the repository from which you cloned your local copy of the files.
If you haven’t pushed the unwanted updates to the remote origin
repository, you can retrieve the file contents from the origin
repository.
The following sequence of commands assumes that the master
branch in the remote repository named origin
contains the unchanged version of the file. This is the case if you have not pushed your unwanted changes up to origin master
.
git fetch origin master git checkout origin/master -- <file-name-including-path> git commit -m "Reverted content of file."
Example of the checkout command:
git checkout origin/master -- docs/my-file.md
Case 3: You’ve already pushed the unwanted updates to your remote origin
repository.
In this case, you may be able to retrieve the file contents from your upstream
repository.
If you’re working on a fork of a repository, the convention is to give the name upstream
to the repository from which you forked. You can run the following command to see which remote repositories Git knows about:
git remote -v
Name the upstream repository now if you haven’t already named it. In the following example, replace <project>
with the GitHub project name and <repository>
with the repository name within the project:
git remote add upstream https://github.com/<project>/<repository>.git
Run the following commands to retrieve the file content from the upstream file:
git checkout upstream/master -- <file-name-including-path> git commit -m "Reverted content of file."
That’s all, folks
I’ve tried out all these commands myself, and they do what I expect them to do. Let me know if they do what you wanted to achieve too!
Here are my other two posts about Git and GitHub:
How to update your PR on GitHub
So, you’ve created a pull request (PR) on GitHub, and you’ve received some review comments. Now you need to update the PR to address the comments. How do you do that? This post shows you how to update the files in an existing PR, using either the command line or the GitHub UI.
When you ask how to update the files in an existing PR on GitHub, the answer is usually something like this:
Just push the updates to the same branch and GitHub takes care of it automagically.
Well, it’s true. GitHub does take care of it, and it is fairly magical. But how do you “just push the updates to the same branch”?
It took me a while to figure that out, the first time I needed to update the files in a PR. You can do it using command-line Git or using the GitHub UI. I prefer the command line, as it’s usually cleaner and simpler in the end, particularly if you need to update more than one file.
First, of course, you need to create the initial PR. If you haven’t yet created a PR, you can follow this guide to working on GitHub, which I created for the Kubeflow open source doc set that I’m currently working on.
I’ll assume that you already have a PR and that you need to update one or more files in that PR. Below are instructions on how to use the command line or use the GitHub UI.
Using the command line to update the files in a PR
Prerequisites:
- You need Git on your local computer. See the Git installation guide.
- If you don’t already have a Git repository on your local computer with a branch for your PR, create the local Git repository and branch now. See my blog post on how to download a PR from GitHub to your local computer. This process is necessary if you’ve been working online using the GitHub UI up to now, or if you’ve used a different computer to work on this PR up to now.
Follow these steps to update the PR:
- Open a command window on your local computer.
- Go to the directory containing the GitHub repository on your local computer. The command below assumes that you’ve cloned the repository into a directory named
git-repositories/awesome-repo
:cd git-repositories/awesome-repo
- Tell Git to check out the relevant branch. In this example, replace
your-branch-name
with the name of your branch:git checkout your-branch-name
- Run
git status
to check the status of your local files. Make sure the response shows that you’re in the branch that you expect to be in, and that there are no uncommitted changes in your local repository. (It’s a good idea to rungit status
regularly, just to check that things are as you expect.) If there are uncommitted changes in your local repository, you should take a look at them and consider committing them and pushing them up to GitHub before going any further. The result fromgit status
should look like this:On branch your-branch-name nothing to commit, working tree clean
- Run the following command to pull down the most recent changes from your branch on GitHub to your local repository. It’s fine to run this command even if there are no changes to pull:
git pull origin your-branch-name
- Update the files that you need to change. You can add, edit, and delete files using your favourite text editor. I’m currently using Visual Studio Code.
- Run
git status
to check the status of your local files. The response should tell you which files you still need to commit for Git tracking. For example, the following status response shows one changed file, namedyour-updated-doc.md
, that you still need to commit:On branch your-branch-name Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: your-updated-doc.md no changes added to commit (use "git add" and/or "git commit -a")
Another example: The following status response shows one deleted file and one added file, both of which you need to commit for Git tracking:
On branch your-branch-name Changes not staged for commit: (use "git add/rm ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) deleted: your-old-doc.md Untracked files: (use "git add ..." to include in what will be committed) your-new-doc.md no changes added to commit (use "git add" and/or "git commit -a")
- Run the following commands to commit the updated files to your local Git repository. Here’s an example commit command to tell Git about one or more updated files:
git commit -am "Fixed some doc errors."
Here’s another example, which tells Git that you’ve added a file named
your-new-doc.md
, and then commits all changes include the added file:git add your-new-doc.md git commit -am "Added a shiny new doc."
And another example, which tells Git that you’ve deleted a file named
your-old-doc.md
, and then commits all changes include the file deletion:git rm your-old-doc.md git commit -am "Deleted an obsolete doc."
- Push the updates from your local branch to the corresponding branch on GitHub:
git push origin your-branch-name
That’s it. When you look at your PR on GitHub, you’ll see a new commit listed among the comments on the PR. For example, the following screenshot shows two commits on a PR. One commit has the description “Updated for review comments”, and the other has the description “Added instruction to use different deployment name”:
When you view the files in the PR, you’ll see your changes incorporated into the latest version of each file. When the repository maintainers approve your PR, the changes will be merged into the repository.
Some notes:
- The word
origin
refers to the remote repository on GitHub from which you cloned your local repository when you first started working on it. - You can use the following command to see which remote repositories Git knows about:
git remote -v
Using the GitHub UI to update the files in a PR
There are two ways to update a file in an existing PR using the GitHub UI:
- Access the file from the PR (described in this section).
- Access the file from your fork of the repository that you’re updating (described in the section below this one).
The first way is the most direct route to the file that needs updating, once you know how to do it.
To access a file from a PR:
- Open a browser window.
- Open your PR in GitHub, and click the Files changed tab at the top of the PR:
- Click the three dots on the right-hand side of the window next to the name of the file that you want to edit, then click Edit file in the panel that opens up:
- Make your changes in the editing interface that opens up.
- When you’re ready, scroll down to the bottom of the editing interface. Enter a short description of the updates, and a longer description if necessary. Make sure the option is selected to Commit directly to the your-branch-name branch:
- Click Commit changes.
That’s it. You’ve now updated the file in the PR. When you look at your PR on GitHub, you’ll see a new commit listed among the comments on the PR. When the repository maintainers approve your PR, the changes will be merged into the repository.
An interesting point is that other people can also add commits into your PR, provided they have authority to do so. Sometimes the repository maintainers may make an update to your PR before merging it, if it’s simpler to make the update than to explain it in a PR comment. (When you create the PR, you can grant or deny permission for the maintainers to make changes to the PR.) For example, the following screenshot shows a commit that I made into someone else’s PR, before merging the PR into the main repo. The commit has the description “Added specific link to Chainer page”:
Accessing your PR from your fork of the repository in the GitHub UI
The above section shows you how to access a file directly from a PR in order to update the file. As an alternative, you can go to your fork of the repository that you’re updating, and navigate through the files in the repository to find the one you want to update.
Usually, you work on a fork of the main repository on GitHub when you create a PR. If you use the GitHub UI to create the PR, GitHub creates the fork automatically for you, the first time you create a PR for a particular repository. The reason for creating the fork is that you probably don’t have update rights on the main repository. In the instructions below, I’m assuming that you have a fork of the repository.
To access a file from your fork of the repository:
- Open a browser window.
- Find your fork of the repository on GitHub. For example, if the repository name is “awesome-repo”, then the fork should be at this URL:
https://github.com/your-github-username/awesome-repo
.You can find your fork by going to the list of your repositories on GitHub. Click the dropdown arrow next to your profile image, then click Your repositories:
You should see a list of repositories something like this:
- Click the name of the forked repository that you want to update. You should see the details of the forked repository, including the files within the repository.For example, I clicked website to open my fork of the Kubeflow website repository. The Code tab shows the list of files within the repository:
- Change to the branch that contains your PR within the repository fork. By default, the repository fork opens on the master branch. To change the branch, first click the Branch option:
Then in the dropdown list that appears, select the branch that contains your PR. The branch name may be something like “your-username-patch-1”, or it may be something meaningful that you entered when you created the PR. For example, I need to select the v1-discussion branch to find my PR:
Now you should see the same repository fork, with roughly the same files as you saw in the master branch. But the branch which you’ve selected contains all the changes you’ve made in your PR. - Click a file or directory name to navigate through the files within your fork of the repository. For example, I need to click the Content directory to find the file I’m interested in:
- When you find your file, click the edit icon to edit the file as usual:
- When you’re ready, scroll down to the bottom of the editing interface. Enter a short description of the updates, and a longer description if necessary. Make sure the option is selected to Commit directly to the your-branch-name branch:
- Click Commit changes.
That’s it. You’ve now updated the file in the PR. When you look at your PR on GitHub, you’ll see a new commit listed among the comments on the PR. When the repository maintainers approve your PR, the changes will be merged into the repository.
How to download a PR from GitHub to your computer
This is a quick tip about a useful Git technique. It took me a while to figure this out when I first needed it. I was working on a pull request (PR) on one computer when I was in the office. Then I wanted to continue working on the PR from my laptop at home. I needed to transfer my work from my work computer to my laptop, using GitHub as middleman.
Another scenario for this technique is when you’ve used the GitHub UI to make some changes, but now you want to swap to command-line usage while in the middle of your PR. This could be useful, for example, if you find that your PR needs to include changes to more than one file, which is hard to do in the GitHub UI.
Prerequisites
You need Git on your local computer. See the Git installation guide.
I’m assuming the following things:
- You’re comfortable using command-line Git.
- You already have a PR that you’ve been working on, and you want to make a local copy of the PR so that you can update one or more files in that PR. (If you haven’t yet created a PR, you can follow this quick guide to working on GitHub, which I created for the Kubeflow open source doc set that I’m currently working on.)
- You’ve pushed your latest changes up from your other machine to GitHub, so that GitHub contains the latest version of the PR.
All you want to do now is to copy a particular PR down from GitHub so that you can work on it on this computer.
Clone the repository to your computer
If you’ve already cloned the GitHub repository to your local computer, you can skip this section. This would be the case if you’ve previously done some work on this repository and on this computer.
You need a clone of the GitHub repository on the computer you’re currently using, so that Git can track the changes you make in the repository. Usually, you fork the main repository on GitHub before creating a PR. The reason for creating the fork is that you probably don’t have update rights on the main repository. I’m assuming that you have a fork of the repository, and therefore your next step is to clone your fork of the repository to your local computer, as described below.
Note: If you’re working directly on the main repository rather than on your fork of the repository, then you should clone the main repository to your local computer.
To clone your fork of the repository onto your local computer:
-
- Find your fork of the repository on GitHub. For example, if the repository name is “awesome-repo”, then the fork should be at this URL:
https://github.com/your-github-username/awesome-repo
. - Open a command window on your local computer.
- Run the following commands to clone your forked repository onto your local machine. The commands create a directory called
git-repositories
and then use HTTPS cloning to download the files:mkdir git-repositories cd git-repositories/ git clone https://github.com/your-github-username/awesome-repo.git cd awesome-repo/
- Find your fork of the repository on GitHub. For example, if the repository name is “awesome-repo”, then the fork should be at this URL:
If you prefer, you can use SSH cloning instead of HTTPS cloning:
mkdir git-repositories cd git-repositories/ git clone git@github.com:your-github-username/awesome-repo.git cd awesome-repo/
You’re now in a directory called awesome-repo
. If you take a look at the files in the directory, you should see some file- and directory names starting with .git
, indicating that Git is tracking the files in the directory. You should also see the files and directories belonging to the GitHub repository that you cloned.
Download the PR to your computer
Follow these steps to copy the PR from GitHub to your local computer:
-
- Find your PR on GitHub and check the name of the branch that contains the PR. In the screenshot below, the branch name is gcpsdk:
- Go to the directory containing the repository on your local computer. The commands below assume that you’ve cloned the repository into a directory named
git-repositories/awesome-repo
:cd git-repositories/awesome-repo
- Run these commands to copy the branch containing your PR to your computer. In the commands, change
your-branch-name
to the actual branch name:git status git checkout master git fetch origin your-branch-name:your-branch-name git checkout your-branch-name
- Find your PR on GitHub and check the name of the branch that contains the PR. In the screenshot below, the branch name is gcpsdk:
That’s it. You’re now in the branch that contains all the updates from your PR. You can continue working on the files or adding new files. Remember to git commit
and git push
as usual, to copy your updates back up to GitHub.
Here’s an explanation of each of the above commands:
git status
: Run this command to see where you are and what the current status is of your files. You may have been busy with something that still needs tidying up before you can create a new branch.git checkout master
: Go to the master branch to make sure you have a tidy place from which to create a new branch.git fetch origin your-branch-name:your-branch-name
: This is the key command. It tells Git to copy the branch from GitHub (“origin”) and to create a new branch on your local computer with the same updates and the same name.git checkout your-branch-name
: This puts you in the new branch on your local computer. This branch now has the same updates as the equivalent branch on GitHub.
Some notes for those who are interested
The above set of commands assumes that you want the branch name on your local computer to be the same as the branch name on GitHub. That’s most likely to be the case, but you can use a different local branch name if you need to. The command pattern is this:
git fetch origin your-origin-branch-name:your-local-branch-name git checkout your-local-branch-name
The word origin
refers to the remote repository on GitHub from which you cloned your local repository when you first started working on it. You can use the following command to see which remote repositories Git knows about:
git remote -v
Discovered the Issue Mover for GitHub and it’s super cool
This app makes it easy for me to move GitHub issues from one repository to another within the same GitHub org. I’ve just used it for the first time. It’s a real time saver. And it’s pretty too, especially if you’re fond of ladybirds.
The Issue Mover for GitHub:
What problem does the app solve? Let’s say you belong to an organisation on GitHub with a number of repositories. The number of repos has grown over the months and years, as it inevitably does. As a result, you frequently find yourself needing to move issues from one repo to another. It’s time-consuming to do that by hand. You need to copy across all the content of and comments from each issue, reassign each issue to the relevant contributor, add back all the labels, and finally close the original issue with a note saying that it’s moved.
I’ve jotted down some example use cases. I’m pretty sure you’ll have others in mind too:
Docs: Let’s say, in the earlier days of your project people were adding the doc issues to the code repo. But now you have a website, with its own code and its own repo. So, you want to move the open doc issues from the code repo to the website repo. This is the situation I found myself in. The Issue Mover worked like a charm.
Community: At first, all your community-related requests were lumped together with doc issues. But now you have a large community that’s creating procedures and tools of its own. You create a shiny new community repo and you want to move the relevant issues into it.
Software components: Your app/framework is expanding rapidly, and it makes sense to split off separate code repos for some of the larger, less tightly-coupled components. Of course, the relevant issues should go along for the ride.
General and ongoing: People keep putting issues into the wrong repo! 😉 You like to keep things tidy, and want to move the issues to the logical place.
I was delighted to find this app, and I hope you find it useful too!
Disclaimer: Even though many of the contributors on the Issue Mover project work for Google (and so do I) this is not an official Google Product.