State-Of-The-Art Unsupervised Part-Of-Speech Tagging in 300 lines of Clojure (from Scratch)

Recently, Yoong-Keok Lee, Regina Barzilay, and myself, published a paper on doing unsupervised part-of-speech tagging. I.e., how do we learn syntactic categories of words from raw text. This model is actually pretty simple relevant to other published papers and actually yields the best results on several languages. The C++ code for this project is available and can finish in under a few minutes for a large corpus.

Although the model is pretty simple, you might not be able to tell from the C++ code, despite Yoong being a top-notch coder. The problem is the language just doesn’t facilitate expressiveness the way my favorite language, Clojure, does. In fact the entire code for the model, without dependencies beyond the language and the standard library, clojure contrib, can be written in about 300 lines of code, complete with comments. This includes a lot of standard probabilistic computation utilities necessary for doing something like Gibbs Sampling, which is how inference is done here.

Without further ado, the code is on gisthub and github (in case I make changes).


One Response to “State-Of-The-Art Unsupervised Part-Of-Speech Tagging in 300 lines of Clojure (from Scratch)”

  1. Chris Brew Says:

    I hope you’ll expand your paper for a journal article. If you do, a useful
    reference point might be Julian Kupiec’s article

    Computer Speech & Language
    Volume 6, Issue 3, July 1992, Pages 225-242


    I find this work clearer on what “unsupervised” might mean than is Merialdo. As usual, unsupervised is really a matter of degree, not a 1-0 thing.

    To my horror, I find myself drawn into the role of Guardian of the History.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: