What I cannot create, I do not understand.

Just because you've implemented something, doesn't mean you understand it.

Literate blogging with Org mode and Octopress

My blog used to be hosted on Posterous, but as of this post it’s generated by a little bit of Emacs Lisp that turns Org mode files into Markdown files which are then processed by Octopress to produce the site you see here. This has the usual benefits of static site generators (simple to deploy, easy to version in Git, fast, etc) but I hope that using Org mode will also motivate me to experiment with literate programming via Babel.

An Octopress-flavored Markdown backend

There are already a few other ways to publish Org files as Markdown or Jekyll (or Octopress) blog posts, however desipite my short list of requirements, I found them to be a bit lacking. In proper not-invented-here-style, I set out to write some Emacs Lisp to export Org files to something Octopress could work with. I learned a bit about some new features of Org’s internals along the way, as well as about Emacs, since this is the largest project I’ve undertaken using Emacs Lisp.

Org has a new exporting system that’s not released with the latest stable build yet, but is well documented and is more flexible than the existing generic export library. Writing a simple backend that covered a small subset of Org’s features turned out to be pretty easy to do. If you have some need to process Org files that’s not supported out of the box, I’d recommend trying this approach. What follows are notes I took while working on this, however since this version of Org has not been released yet, you should of course refer to Org’s documentation for any inconsistencies and so on you may find. Extra caveats apply here since I’m a novice both with Org and with Emacs Lisp in general.

Since this new export system is not really released yet, you have to clone the latest from Git. The new export functionality is defined in ox.el. Of course, there are some backends packaged with Org already, including one that exports to “ASCII” text which was extremely helpful to use as an example.

Defining a new backend is done with the org-export-define-backend function, which accepts two arguments, a symbol and an alist. The symbol is just the name of your new backend. The alist is what actually defines your backend. The “keys” should be the type of Org syntax element, and the “values” should be a function that can export that type of element to your chosen format.

org-export-define-backend
1
2
3
4
5
6
7
8
9
10
11
12
(org-export-define-backend 'octopress
  '(
    (bold . org-octopress-bold)
    (fixed-width . org-octopress-fixed-width)
    (headline . org-octopress-headline)
    (italic . org-octopress-italic)
    (link . org-octopress-link)
    (paragraph . org-octopress-paragraph)
    (section . org-octopress-section)
    (src-block . org-octopress-src-block)
    (template . org-octopress-template)
))

Here I’ve defined a small backend that supports the bare minimum I needed for this blog. There are many more types of Org syntax that I’m not supporting, but since I’m a novice Org user, I figure when I discover a need for those, I’ll add them to the backend.

how Org parses and exports files

When Org exports a file using this backend, and it comes across an element of the type “bold” for instance, it will call the function associated with 'bold, in this case org-octopress-bold. That function should take three arguments: the node of the abstract syntax tree for that element, the “contents” of that node, and a plist with extra information (called the “communication channel” in the Org docs). It should return the result of exporting that node as a string.

When you publish an Org file, it’s first parsed into an abstract syntax tree, and then the export system calls these functions, in a bottom-up fashion starting with the leaves of the tree. Each function returns a string, and these strings are accumulated together until eventually the function for the root of the tree is called, and the entire document has been converted. This is probably best demonstrated with an example. Here’s a snippet of an Org file:

1
2
3
4
5
6
7
* Foo

Bar, baz! Qux flarp zoot, zip.

** Bar

Frobnicate blop, Fribble *tele* gorx.

Org will parse this file into the following syntax tree (actually the real tree has much more data attached to each node, and is also recursive making it difficult to print):

org-export-define-backend
1
2
3
4
5
6
7
8
9
(org-data
 (headline "Foo"
           (section
            (paragraph "Bar, baz! ..."))
           (headline "Bar"
                     (section
                      (paragraph "Frobnicate..."
                                 (bold "tele")
                                 "gorx.")))))

Org will walk over this tree and call your backend functions. In this case, the first function it will call is the one associated with the type bold since the node (bold "tele") is the first leaf. In my backend, this is the function org-octopress-bold.

org-octopress-bold
1
2
3
(defun org-octopress-bold (text contents info)
  "Transcode bold text to Octopress equiv of <strong>"
  (format "**%s**" contents))

This is a simple conversion from Org mode to Markdown, and since my backend is very bare-bones, I ignore the other arguments and just wrap the contents string, which in this case will be "tele", in asterisks. (Markdown doesn’t really have a concept of “bold” but instead uses HTML “strong” and “emphasis” tags which most browser’s default CSS renders as bold and italic, respectively)

paragraphs

The next function that will be called is org-octopress-paragraph, since it’s the next least node in the syntax tree.

org-octopress-paragraph
1
2
(defun org-octopress-paragraph (paragraph contents info)
  contents)

The contents of the paragraph will be a string, containing the results of transcoding the children of the paragraph, in this case two plain strings and a bold string. While I think there are some subtleties around newlines, the simplest way to deal with paragraphs are to just return the contents unchanged. Of course, if we were writing a new HTML backend, we would wrap the contents in p tags.

To be honest, paragraphs are actuall part of this system I’m a little shaky on. From what I could determine by reading ox.el and doing some experiments, there are a few syntax elements that you must provide transcoder functions for. Paragraphs are one of them. The element types headline, section, and the special type “template” are others that must be provided. The reason for this is that these are intermediary nodes in the syntax tree, so if they are not provided at all, the results of other nodes will never be accumulated.

headlines

Continuing our example parse, a node for section will be transcoded, and in my case I’m using a similar function as for paragraphs, which just returns the contents unchanged. The next node after that will be the headline node for the headline “Bar” in the original Org source.

org-octopress-headline
1
2
3
4
5
6
7
8
(defun org-octopress-headline (headline contents info)
  (let ((value (org-element-property :raw-value headline))
        (level (org-element-property :level headline)))
    (concat (apply 'concat (repeat "#" level))
            " "
            value
            "\n"
            contents)))

Markdown has a similar syntax for headlines as Org, but uses pound or hash symbols instead of asterisks. Here, we use the function org-element-property to extract some properties from the AST node headline. We need the raw value which is the headline without asterisks, and the level which is the number of asterisks. In this case, I convert all levels of headlines to Markdown headlines, but if you were to be writing a “real” backend I think you would want to respect the :headline-levels option for the project, and only convert headlines of a certain level. Again, like the paragraph node, the contents are the children of the headline, which includes everything under that headline, so we must concatenate the Markdown-style headline string with the contents so as not to lose the rest of the document.

document template

The export process continues in this fashion, until all the nodes are transcoded, their strings accumulated. There’s a special AST node type called template which represents the root of the Org document. The docs suggest using this to add a preamble and/or postamble to the result. In my case, I wanted to output the YAML front matter used by Jekyll to generate blog posts. The template transcoder function takes only two arguments, the contents string and the info plist. The root AST node is not passed into this function, I assume because the idea is that you’ve already transcoded its children and there’s not really any concrete Org syntax associated with it, so there’s nothing to do with the root node but return the contents, wrapped in pre- or postamble.

org-octopress-template
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
(defun org-octopress-template (contents info)
  "Accepts the final transcoded string and a plist of export options,
returns final string with YAML frontmatter as preamble"
  (let ((title (car (plist-get info :title)))
        (date (car (plist-get info :date)))
        (time "")
        (frontmatter
"---
layout: post
title: %s
date: %s %s
comments: true
external-url:
categories:
---
"))
    (if *org-octopress-yaml-front-matter*
        (concat (format frontmatter title date time) contents)
      contents)))

This function is an example of using the “communication channel” which is the third argument of the other transcoder functions but in this case is the second. The info plist contains all the metadata about the document that’s defined in the Org “export options template”. It’s from this that we extract the title of the post and the date and add it to the YAML front matter.

code blocks

Source blocks are another area where I customized things to output something specific to Octopress. The Github-style “backtick” code blocks used by Octopress take optional language and name parameters, which are used for syntax highlighting and for adding captions to the source block itself. Similarly, Org supports passing the language to source blocks, and attaching a name to elements in general, so I used that to add this to the backtick code block if present. I also added a little hack to ignore the language if it was something not supported by Pygments.

org-octopress-src-block
1
2
3
4
5
6
7
8
9
10
11
12
(defun org-octopress-src-block (src-block contents info)
  "Transcode a #+begin_src block from Org to Github style backtick code blocks"
  (let* ((lang (get-lang (org-element-property :language src-block)))
         (value (org-element-property :value src-block))
         (name (org-element-property :name src-block))
         (lang-and-name (or (and lang name (format " %s %s\n" lang name)) "\n")))
    (concat
     "```"
     lang-and-name
     value
     "```\n"
     contents)))

Tying it together

Having defined the new backend, all that’s left is a little boilerplate to make this backend available to Org projects:

org-octopress-publish-to-octopress
1
2
(defun org-octopress-publish-to-octopress (plist filename pub-dir)
  (org-publish-org-to 'octopress filename ".md" plist pub-dir))

My blog’s Org project alist is then, at the bare minimum:

1
2
3
4
5
6
'(("posts"
   :base-directory "/path/to/blog/root"
   :base-extension "org"
   :publishing-directory "/path/to/octopress/root/source/_posts"
   :publishing-function org-octopress-publish-to-octopress)
  ("blog name" :components ("posts")))

To start a new blog post, I use this little helper to create a new Org file with the right naming convention and export template:

octopress-helpers
1
2
3
4
5
6
7
8
9
10
11
12
(defun new-post (dir title)
  "Create and visit a new .org file in dir named $date-$title.org, ie
Octopress/Jekyll style"
  (interactive "Mdirectory: \nMtitle: ")
  (let* ((date (format-time-string "%Y-%m-%d"))
         (title-no-spaces (replace-regexp-in-string " +" "-" title))
         (dirname (file-name-as-directory dir))
         (filename (format (concat dirname "%s-%s.org") date title-no-spaces)))
    (find-file filename)
    (rename-buffer title)
    (org-insert-export-options-template)
    (rename-buffer filename)))

While I’m working on a post, I can start the Octopress preview server the normal way (rake preview) and then export the current Org file with org-publish-current-file to preview the final output in a browser.

Literate example

In case it wasn’t obvious, this whole post was written using this system, and in fact the entire (albeit small) body of code is contained in the Org file for this post, using Babel’s noweb-style literate facilities. The source for this post is on Github, and contains some extra code not exported, like various helper functions and tests.

Working on this blog post while adding some features and fixing bugs to the code was pretty interesting, even as a small taste of literate programming. I’ve got some ideas for literate posts I’d like to write, so I’m looking forward to experimenting with this and will probably add to the backend as I find new Org features.

Please let me know if you find any errors in the code, it’s definitely just an alpha MVP at this point, but if anyone’s interested in using it I would be pleased to hear from you, and help if you run into problems.