Code Highlighting for the Blog

posted by Jake on

Here are some high-level highlights on what I did to get code highlighting working for the blog. The blog runs on python/django, so this is a very pythonic solution -- which, I have to say, is not a bad solution.

I've always wanted to get some good code highlighting for the code that I wanted to post on the world wide web as I had various adventures in the software development world.

I used a wonderful custom django template filter that someone was kind enough to offer in a snippet (http://www.djangosnippets.org/snippets/119/)that brought all the important pieces together.

I altered it ever so slightly and reproduce it here:

from django import template
register = template.Library()

# Pygments: http://pygments.org -- a generic syntax highlighter.
from pygments import highlight
from pygments.formatters import HtmlFormatter
from pygments.lexers import get_lexer_by_name, guess_lexer

# Python Markdown (dropped in my project directory)
from markdown import markdown

# BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
from aprilandjake.tech.BeautifulSoup import BeautifulSoup

@register.filter("code")
def rendercode(content, safe="unsafe"):
    """Render this content for display."""

    # First, pull out all the code blocks, to keep them away
    # from Markdown (and preserve whitespace).
    soup = BeautifulSoup(str(content))
    code_blocks = soup.findAll('code')
    for block in code_blocks:
        block.replaceWith('<code class="removed"></code>')

    # Run the post through markdown.
    if safe == "unsafe":
        safe_mode = False
    else:
        safe_mode = True
    markeddown = markdown(str(soup), safe_mode=safe_mode)

    # Replace the pulled code blocks with syntax-highlighted versions.
    soup = BeautifulSoup(markeddown)
    empty_code_blocks, index = soup.findAll('code', 'removed'), 0
    formatter = HtmlFormatter(cssclass='source')
    for block in code_blocks:
        if block.has_key('class'):
            language = block['class']
        else:
            language = 'text'
        try:
            lexer = get_lexer_by_name(language, stripnl=True, encoding='UTF-8')
        except ValueError, e:
            try:
                # Guess a lexer by the contents of the block.
                lexer = guess_lexer(block.renderContents())
            except ValueError, e:
                # Just make it plain text.
                lexer = get_lexer_by_name('text', stripnl=True, encoding='UTF-8')
        empty_code_blocks[index].replaceWith(
                highlight(block.renderContents(), lexer, formatter))
        index = index + 1

    return str(soup)

This customer filter uses 3 third party modules:

  • Pygments - a syntax highlighter for Python 2.3 and above.

Install is easy with a Python .egg:

easy_install Pygments

For windows at tarball is also available at Sourceforge.

Installation is likewise easy:

easy_install markdown

If you're on Windows, a win32 installer is also available here.

Download the .py file and stick it somewhere in your project.

With all the pieces in place, now it's as easy as applying the filter just as with all the other nifty django filters.

{{ entry.body|code|safe }}
( I added the 'safe' filter.)

Now you just need the stylesheet to make all the newly created span's around your code look as magical as they really are. I horked the .css file off the Pygments demo page and did a Find|Replace from ".syntax" to ".source", so now my .css looks something like this:

.source .c { color: #60a0b0; font-style: italic } /* Comment */
.source .err { border: 1px solid #FF0000 } /* Error */
.source .k { color: #007020; font-weight: bold } /* Keyword */
/** ...  */

Then, in the content that you're feeding into your 'code' filter variable, just insert 'code' tags such as this:

<code class='python'>print "Hello, World"</code>

And replace the 'python' class with any number of different languages that Pygments supports.

That's it!

Note, in the last usage code and the custom filter code, on this line:

block.replaceWith('<code class="removed"></code>')
should have real carets instead of the HTML entities. Perhaps I'll need to fiddle with this filter because it chokes when there are 'code' tags inside.

Leave a comment

blog comments powered by Disqus