Managing Messy Notebooks

This year’s ACM CHI Conference on Human Factors in Computing Systems best paper award winner has developed a set of tools to help programmers and data scientists clean up their computational notebooks so they can program more effectively and efficiently.

“Programming in computational notebooks is helpful for seeing intermediate pieces of code and results interlaced together, but often these notebooks become very long and messy. This likely resonates with many students, but also data science and industry professionals, since it is a widely used technology,” said School of Computational Science and Engineering (CSE) Ph.D. student and co-investigator of the paper, Fred Hohman.

The set of tools, called code gathering tools, allow the user to go to any part of a long notebook, such as a certain variable or equation hidden in messy code, and pull out the relevant information.

“What we did is create a means to pull out, or gather, a desired item out of a large notebook and show all its changes from previous versions. This will show you what minimal set of code you need to get a certain result,” said Hohman.

In conjunction with efficiency, this tool also helps with reproducibility, sharing code, and communication by helping analysts find, clean, recover, and compare versions of code in cluttered, inconsistent notebooks. 

According to the paper, Managing Messes in Computational Notebooks, the tools also archive all versions of code outputs, allowing analysts to review these versions and recover the subsets of code that produced them. These subsets can serve as succinct summaries of analysis activity or starting points for new analyses. 

