A Documentation Repository

So, there were some changes at work a few months back. The previous mandate to keep all of our team's documentation on a shitty SharePoint server no longer applied, so I started looking around for something we could use instead.

I came up with the following goals:

  • Documentation needs to be accessible as if it were on a local disk (so people can use their preferred, locally installed tools for file management, editing, etc).
  • The repository must support Markdown.
  • The repository must not require Markdown. (Any file type should be allowed.)
  • Users should be able to link not only to documents, but to specific sections within documents.
  • A simple, low-overhead, low noise interface and a simple set-up procedure.

I looked around a bit and didn't really find anything I liked, but I figured I could roll my own without failing (too badly) on the 5th goal. What I came up with has four basic components.

  1. Apache's mod_autoindex
  2. Apache's mod_dav
  3. My own RenderMarkdown thing
  4. A simple search engine using the Zend_Search_Lucene component of Zend Framework

Browsing Files

As limited as it is, Apache's built-in ability to browse a filesystem is good enough to get the job done. That's not to say that I didn't try to make it better. I added these options to the root of the directory to be browsed. A description follows each one.

IndexOptions FancyIndexing SuppressRules HTMLTable NameWidth=*

These options and more are described in the documentation.

IndexIgnore .??* *~ *# HEADER* README* RCS CVS *,v *,t images Public

Most of that is default, but I added "images" and "Public". I'll cover images later. As for Public, all of our Documentation by default is protected using HTTPS and HTTP authentication (via LDAP). The Public folder is the document root for regular HTTP requests. By placing a document here, you can share it with "the public" (or the rest of the company in our case, since the web server is internal). Files that are hidden using this directive are still visible when using WebDAV, so it's easy to manage Public documents, but there's no reason to show them when browsing the directory via HTTPS.

IndexStyleSheet /indexing/indexing.css

Apache will add a link to this style sheet. The HTML it generates doesn't include id or class attributes, so it's somewhat limited, but it's something.

HeaderName /indexing/HEADER.html

The contents of this file get inserted before the list of files on each index page. In my case, it contains an <h1> and a simple search form.

ReadmeName /indexing/README.html

The contents of this file get appended after the list of files. I use it to provide some helpful information about the repository.

I also replaced the stock icons supplied with Apache. There are a few replacement sets out there, so take your pick.

Managing Files

This is accomplished via WebDAV. That's the only thing I could find that had native support in every major operating system and it's also available as a standard Apache module, so nothing additional needed to be installed.

The only hard part here is making sure files served via WebDAV weren't modified in any way. If you just enable DAV for a folder and open a PHP script in your file browser, Apache will process it as PHP and send you the result, not the original file. Rewrite rules can also do confusing things. To get around this, I created an alias called mount to serve as the root for WebDAV requests. This alias just points to the DocumentRoot, but because it looks like a different location, we can give it its own rules.

Alias /mount/ "/var/www/docroot/"
<Location /mount/>
    Dav On
    php_value engine off
    RewriteEngine off
    ## additional options
</Location>

So if you view the site in your browser at an address like https://docserver/docs/, you could access the same files via WebDAV by mounting https://docserver/mount/docs/. WebDAV activity is handled by the web server, so whatever user it runs as will need to be the owner of any files and directories you want to manage.

(I originally just enabled WebDAV for the docs subdirectory, but this prevents Windows from mounting it. To support Windows Explorer, you need to enable WebDAV for a top-level directory on the server. You can use ownership and permissions to block unwanted access to files outside of docs.)

Markdown

I wrote a script years ago to share Markdown files on-the-fly as HTML from my Mac OS X systems. This documentation repository project gave me an excuse to clean it up and make it more general-purpose. The result is RenderMarkdown. The README for that project speaks for itself, so I won't repeat it here.

Searching

Again, I looked for existing tools and didn't really find anything simple that met my needs. Since this web server already housed some Zend Framework applications, I created a simple system using Zend_Search_Lucene. There's an indexing script that runs once an hour, and there's a web front-end for searching and displaying results. The code isn't published anywhere, but I can share it upon request.

The stuff worth talking about here is the metadata support. I wrote the indexer to understand MultiMarkdown style metadata. A typical example would include "author" and "tags", like this

author: Rob McBroom  
tags:   ldap  
        users

And then you could search for such documents with a query like "tags:users" or "author:mcbroom". But there are no predefined attributes. You can make up your own at will and they'll get indexed. Adding this

pants: optional

would get a document listed in a search for "pants:optional". Pretty sweet.

And finally, since RenderMarkdown allows you to display metadata values as links based on some pattern, I can make the tags and authors on every page link to searches for other documents with the same tag or author.

Images

Images could go anywhere, but I don't want to clutter up the file listing with what are likely to be supporting files for other documents. I also don't want authors to have to think too hard about the proper path to an image file.

To accomplish those two goals, every directory has (or can have) a subdirectory called "images". The index options above prevent these folders from showing up in Apache's listing, but the files can still be served out normally. By putting an images directory in the same directory as the document, authors can just use a simple relative link in all documents.

![Figure 1](images/figure1.png)

Other Files

Markdown is great, but sometimes you just need a spreadsheet. In that case, create a spreadsheet. Non-Markdown files will be sent to your browser untouched.

Weaknesses

I'm not afraid to admit that there are some problems with this system, but they're minor in my opinion.

Linking between documents

Because everything can be managed using a familiar filesystem interface, people might do just that. If you start adding links between documents, there's no real guarantee that they'll always work. Your options are:

  • Obsessively stay on top of all links between documents and make sure they're valid
  • Just let them break and fix them when you catch them
  • Don't link between documents

This also affects images. If you move a document to another folder, you need to remember to move any supporting images.

Search Indexing

Since figuring out what needs to be updated is a pain in the ass and actually making updates to an existing Lucene index is an even bigger pain in the ass, my indexing script rebuilds the entire index every time it runs. This is no problem with our current repository, but it will obviously fall apart on a larger site.

Conclusion

So with all this in place, writing documentation is as simple as mounting a volume, writing a text file, and saving it. Without any further action on my part, it looks pretty, it has a table of contents, it's searchable, it has links to related documents via metadata, etc. It's been a huge time saver.

I haven't packaged all this up as a single product because I have no reason to think anyone cares, but if I'm wrong, let me know.

blog comments powered by Disqus