Cheap polyglot notebooks


RStudio v1.0 is released. One of the key features is the notebook. Yes, there is a Jupyter kernel for R, but this is probably going to be much better.

The idea of having interactive notebooks is not really new. Mathematica had it before I was born or something. IPython and then Jupyter brought these to the mainstream. Now almost every product on these lines have great many features including cool visualization support, multiple kernels (many in the same notebook) etc. You have some big products like beaker, zeppelin and of course Jupyter.

Everything is fine. But I don’t particularly enjoy it when I accidently open a notebook in my editor and its a mess of JSON waiting to force me into believing that its just a set of simple scripts I wrote to do something much simpler than

working interactively with large and complex datasets

R Markdowns shine here. They are by default cleaner to read. So that’s a plus. What else do you need? Well, R is not a general purpose language. Neither is python good for everything. A notebook supporting multiple languages? That will work. But then they are too heavy to open up everything and get going. Honestly, they are not targeting your tiny scripts crossing and piping data over language barriers to make sense of your local files or sating your fussiness for using requests over request. Notebooks simply have great overheads for small experiments with random jumble of scripts.

Org-babel

Org-babel is an Org-mode thing for running chunks of code in Org files. You could add language chunks, like in R Markdowns, and execute those with optional shared sessions or variable passing. So, for example, you could get around with your ggplot envy while using Python (at least until altair goes on big). Anyways, the point is not heavy plotting (those are much better on web based UIs), its the speed with which you can play around with your ideas inside org files.

You could use JS, Python and other languages with great packages and weave them with your writings or class notes in Org with . Additionally, being inside Emacs, you could exploit Emacs lisp and Org to have cool side effects with ease. For example, shifting your scheduled tasks depending on weather gathered using python requests. Obviously there are other ways. Get data with Elisp itself, or use something like PyOrgMode. But you might really love requests for http things and want to use Emacs’ buffer manipulation for writing to files. And you might want this weather dependent org agenda file to be self contained. Or you are of the other type and might be okay with firing up (or planning to) different processes and piping everything externally.

I recently started trying out Babel for some of the use cases involving web-scraping, some pythonish things, some directory restructuring and some planning and writing. As of now, I am really enjoying the neat polyglot notebookish properties with great many freedoms for plumbing code with prose.

Here is a quick preview for generating some results in Python

#+NAME: chunks
#+BEGIN_SRC python :results silent
  import requests
  animals = ["doge", "moon moon", "pepe", "grumpy cat"]
  chunks = []
  for animal in animals:
    r = requests.get("https://google.com/search?q=" + animal + " meme")
    chunks.append([animal, r.text])
  return chunks
#+END_SRC

You have your usual org code block with few specifications (like NAME for variable name to be used for the block result). Essentially this snip is counting Google search hits for “[name] meme” for different names. It uses requests to get the search results. Now for parsing the HTML element which says something like “About 100 results found”, lets pass the results to JS

#+NAME: counts
#+BEGIN_SRC js :var data=chunks :cmd babel-node :results list
  const cheerio = require("cheerio")
  return data.map(d => [d[0],
                        cheerio.load(d[1])("#resultStats")
                        .text()
                        .replace(/,/g, "")
                        .split(" ")[1]])
#+END_SRC

#+RESULTS: counts
- ("doge" "735000")
- ("moon moon" "3420000")
- ("pepe" "775000")
- ("grumpy cat" "1250000")

Nice. It works. We could go back to python and do a frequency plot because we have problems counting number of zeros.

#+BEGIN_SRC python :var counts=counts :results file :export both
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
sns.set()

f, ax = plt.subplots(figsize=(5, 3))
frequencies = [int(c[1]) for c in counts]
names = [c[0] for c in counts]

pos = np.arange(len(names))
ax.set_xticks(pos + 0.5)
ax.set_xticklabels(names)

plt.bar(pos, frequencies, 1)
plt.savefig("plot.png", transparent=True)

return "plot.png"
#+END_SRC

#+RESULTS:
[[file:plot.png]]

There are many languages supported and you have the usual org structures like tables and lists for mediating chunks. The working model is really simple. Even the plotting is nothing more than writing to file and displaying with the usual org-inline image toggle. But, that is expressive enough. Here is another screen using R and ggplot (plot taken from hexbin example). You could also put in something weird like C++.

All in all, these files are cheap, readable and usable. Exportable to any format imaginable. And these integrate deeply with your usual Emacs + Org-mode workflow.


Org files are not “the” polyglot notebooks. They don’t solve everything and of course not everyone uses Emacs. They are limited in many aspects, for example visualizations. But they let you do a few things in a notebookish manner which would have been a mess otherwise.

I used to do many things in Jupyter notebooks before I learned easier non-python alternatives for quick tasks. Some time after that I switched to Emacs and realized I don’t like editing in browser that much and the ipython-notebook mode was not much fun either. Org-babel looks great now. It sits nicely in between and fills many gaps. In case you are having certain likelihood of getting hooked too, try these samples.

Here is the notebook I used.