Compiling Mallet

Steven P. Crain s...@gatech.edu

1: Download Mallet

You should first read through the official installation instructions. That will tell you where to download the source and what tools you will need. I have some different recommendations, though, so read on!

2: Uncompress Mallet

For Windows, the best program to uncompress the source code (and anything else you find tar-gzipped) is 7-zip. It works great, is well integrated with Windows, and is FREE. I like it much better than WinZip.

3: Install Ant!

Mallet comes with a Makefile, but only one certain version of make will work. If you really want to use it, go ahead. There are instructions for a tiny modification of the Makefile below. You would be better off using Ant, though, as it will definitely work and is the "right way" to do Java builds. The Ant build script also needs a change though (see below)!

Ant Site

Ant will install easily on whatever machine you plan to use. I'll tell you about Windows, since, if you have a real OS, Ant is probably already installed. After installing on Windows, click "Start" then right-click on the "My Computer" and select "Properties." Click the "Advanced" tab (Since when were environment variables "advanced?") and hit the "Environment Variables" button. In the User section, click the "New" button and add one with the "Variable name" ANT_HOME and the "Variable value" the path to where you installed Ant. Also add %ANT_HOME% into the "PATH" environment variable, or create one if its missing.

Example image of Windows's Environment Variables window Example value of PATH is C:\Program Files (x86)\PuTTY;C:\Program Files (x86)\MiKTeX 2.6\miktex\bin;C:\Program Files (x86)\GnuWin32\bin;C:\Program Files\gnuplot\bin;C:\MinGW\bin;C:\dmake;C:\bin;%ANT_HOME%\bin;C:\Program Files (x86)\svn-win32-1.4.5\bin

4: Edit Build Script or Makefile

It may be that Mallet builds as easy as the official installation instructions would lead you to believe. However, Mallet's source code has a few characters in ISO 8859 encoding. Now, if your machine happens to use that encoding, everything will build fine. (It won't process documents in other encodings optimally, but that's another matter altogether.) However Windows uses a Microsoft-specific encoding and modern Linux machines use UTF-8, so the compile will generate warnings, like the following:

C:\src\mallet-0.4\src\edu\umass\cs\mallet\projects\seg_plus_coref\ie \TUI_CorefIE.java:309: warning: unmappable character for encoding Cp1252
private static String ALPHA = "[A-Z....±é.a-zêô?Äù?ò]";

Google for warning: unmappable character for encoding Cp1252 and you will find pages that explain the basic procedure for resolving unmappable character for encoding warnings. A lucky guess that code of Mallet's vintage would be in ISO 8859 encoding gives us the particular solution we need.

Makefile

Add the extra bit shown in red below:

JAVA_FLAGS = \
-classpath "$(MALLET_DIR)/class:$(MALLET_DIR)/lib/mallet-deps.jar" \
-sourcepath "$(MALLET_DIR)/src" \
-g:lines,vars,source \
-d $(MALLET_DIR)/class \
-J-Xmx200m -source 1.4 -encoding 8859_1

build.xml

Add the extra bit shown in red below:

<javac
source="1.4"
encoding="8859_1"
destdir="${class}"
classpathref="project.classpath"
debug="true"
>
<src path="${src}"/>
<include name="edu/**/*.java"/>
<include name="gnu/**/*.java"/>
</javac>

5: Build

Now you should have no trouble building Mallet, following the instructions in the official installation instructions. Basically, you just start up a "cmd" shell in Windows, change to the Mallet directory and type ant. The same basic procedure applies to Unix/Linux/Mac.

6: Experiment!

There is a great Mallet tutorial that will walk you through the process of experimenting with the Twenty News Groups dataset. The only thing to add is that you must be in the top-level mallot directory for the programs to work! You can add the Mallot jar files to your CLASSPATH environment variable to run it from elsewhere. Oh, and on Windows you cannot run the scripts at all. You can just read them to see how to invoke Java, or you can install MinGW or Cygwin. You can also port the scripts to Windows .cmd files.


Last Modified: Nov. 8, 2007