All Articles

Recipe : Strip Notebook output before comitting

Notebooks are awesome for experimentation, but hard to work with within repositories. Whenever it’s time to comit, I always felt I had to chose between comitting the cell outputs, making diffs huge and not fun to work with, or stripping the output and loosing it locally.

better way

This gist shows how to commit jupyter notebooks without output to git while keeping the notebooks outputs intact locally:

  1. Add a filter to git config by running the following command in bash inside the repo:
git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'  
  1. Create a .gitattributes file inside the directory with the notebooks

  2. Add the following to that file:

*.ipynb filter=strip-notebook-output  

After that, commit to git as usual. The notebook output will be stripped out in git commits, but it will remain unchanged locally.

This gist is based on @dirkjot's answer to this StackOverflow question.

One comment at the bottom is important to add to the receipe, which is to “run git add --renormalize . to go through all of your existing notebook files and scrub the outputs. Otherwise, you could get heinous merge conflicts later.”

Published Jan 23, 2024

I am a computer scientist specializing in building machine learning powered products. I’m currently a machine learning developer at Local Logic.