LDIF Lexer for Pygments

I'm quickly becoming a fan of the Python-Markdown implementation (to which I am now a contributor). I have a lot of code blocks containing LDIF, especially in documentation at work, but Pygments didn't support the syntax, so I've done my best to remedy that.

Here it is highlighting the output from ldapsearch sn=mcbroom.

SASL/GSSAPI authentication started
SASL username: rmcbroom@EMPLOYER.COM
SASL SSF: 56
SASL data security layer installed.
# extended LDIF
#
# LDAPv3
# base <dc=employer,dc=com> (default) with scope subtree
# filter: sn=mcbroom
# requesting: ALL
#

# rmcbroom, users, employer.com
dn: uid=rmcbroom,ou=users,dc=employer,dc=com
objectClass: top
objectClass: inetOrgPerson
objectClass: posixAccount
cn: Rob McBroom
displayName: Rob McBroom
gidNumber: 1000
homeDirectory: /home/rmcbroom
loginShell: /bin/tcsh
mobile: 800-555-1212
o: Employer
ou: users
pagerMail: 8005551212@vtext.net
sn: McBroom
givenName: Rob
uidNumber: 1000
uid: rmcbroom
mail: rmcbroom@employer.com
userPassword:: SSBkaWRuJ3QgcHV0IG15IGFjdHVhbCBwYXNzd29yZCBoZXJlLCBqYWNrYXNzLg==
sshPublicKey: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA1Z4uWN3tsCVp7ptNjFw69HxP4vEr
 VAK1h3zYHRETM5hK9YqQAQu+ZW+xrJSrWrQVdj7/KLqMHbnHS/0NaJLHne+N5SKGwWUTbKhKIUvEU
 YuMIfqwNpYU85tFkQ+HT29CDEvl/vEHXOO3ZCynGdbntShXDIplfbnmEs1IQJEH3aGQGtyfxsI5ee
 fK8BfY1RSd1S9x+NmtITPWUN0MacPWNt9QoLY/fZG3jmmCPOWpijPdjJZ0V3fVqwcyFHvGg1UD2BQ
 0ONGRc5fxMQpK6vV4G/vc9SdCOnXGv3OR0VKdIizIKg4sC1zLlTDAXNRU3rv8CpagHRhSgEEi+Y8r
 v6l9/w==

# search result
search: 5
result: 0 Success

# numResponses: 2
# numEntries: 1

It handles comments, attributes, values, multi-line values and even some things that aren't LDIF at all, like the authentication info at the top and the search result section at the bottom. I've even tested it on the hoary beast you get from querying Active Directory and it all looks good.

The lexer itself is pretty simple, so I'll show it here for the curious.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
    pygments.lexers.ldif
    ~~~~~~~~~~~~~~

    Pygments lexer for LDAP Data Interchange Format.

    :copyright: (c) 2011 by Rob McBroom.
    :license: LICENSE_NAME, see LICENSE_FILE for more details.
"""

from pygments.lexer import RegexLexer, bygroups
from pygments.token import *

class LdifLexer(RegexLexer):
    """Pygments lexer for LDAP Data Interchange Format."""
    name = 'LDAP Data Interchange Format'
    aliases = ['ldif', 'LDIF']
    filenames = ['*.ldif']
    tokens = {
        'root': [
            # authentication noise (not LDIF, but sent to STDOUT)
            (r'^SASL.*$', Text),
            # comments and in betweens
            (r'^#.*(\n .*){1,}$', Comment.Multiline),
            (r'^#.*$', Comment.Single),
            (r'^-$', Punctuation),
            (r'^(search|result|ref):\s.+$', Text),
            # attributes
            (r'^(add|replace|delete|replica|changetype)(?=:)',
             Keyword.Reserved),
            (r'^dn(?=:)', Name.Attribute, 'dn'),
            (r'^\w+(?=:)', Name.Attribute),
            # multiline values
            (r'(?<=:<\s).*(\n .*){1,}$', Name.Namespace),
            (r'(?<=::\s).*(\n .*){1,}$', Number.Hex),
            (r'(?<=:\s).*(\n .*){1,}$', Name.Variable),
            # values
            (r'(?<=changetype:\s)(add|modify|delete)$', Keyword.Reserved),
            (r'(?<=:\s)\d{14}(\.\d)?Z$', Number.Integer),
            (r'(?<=:<\s)\S.*$', String.Doc),
            (r'(?<=::\s)\S.*$', Number.Hex),
            (r'(?<=:\s)\S.*$', Name.Variable),
            # in-line separators
            (r'(?<=:)\s', Whitespace),
            (r'(?<=:<)\s', Whitespace),
            (r':[:<]?', Operator),
        ],
        'dn': [
            (r'(:)(\s)', bygroups(Operator, Whitespace)),
            (r'(?<=:\s).*(\n .*){1,}$', Name.Class),
            (r'(?<=:\s).*$', Name.Class),
        ],
    }

I've asked the Pygment's folks about including it by default, so hopefully before long it'll be widely available. Enjoy.

Side Note

I also created the LDIF bundle for TextMate. I know I'm not the only person in the world using LDAP. My theory is that I'm just the only person in the world using LDAP in conjunction with nice, modern tools.

blog comments powered by Disqus