FASTlib Tutorial

The FASTlib core, the library you are about to download, is meant to provide the basic tools needed to start hands-on with machine learning. Its design goals are to have a low learning curve, allow distributed development, be as fast as possible, yet avoid most of the worst problems of developing in a low-level language. It's very new and researchy, so it is possible that there will be a fair number of bugs.

Right now, it looks like FASTlib's core has the following:

Contents

This Document

We have four main types of documentation that come with the FASTlib, for different purposes.

Getting Started

Before installation

Systems that we have tested:

Tools you'll need:

Installation

Download fastlib here

Let's now assume that FASTlib is located at $FASTLIBPATH. That is, if I extracted fastlib.tar.bz2 to /net/hc293/garryb/fastlib, then whenever I see "$FASTLIBPATH" in this tutorial I will assume /net/hc293/garryb/fastlib. Make sure this is an absolute path (it will start with a "/" at the beginning).

After this, you will need to make sure your environment variables are set up properly. Simply add $FASTLIBPATH/script into your $PATH environment variable. First find out what shell you are using by typing

echo $0

and it will be bash, ksh, csh, or tcsh. If you are using the bash or ksh shell, you should add this to your ~/.bashrc (for bash) or ~/.kshrc (for ksh), or ~/.profile if the other doesn't exist, substituting $FASTLIBPATH accordingly:

export PATH="$FASTLIBPATH/script:$PATH"

If you are using csh or tcsh, put the following in your ~/.cshrc file: setenv PATH "$FASTLIBPATH/script:""$PATH"

Close your terminal and re-open it. Check to see if FASTlib is working by typing:

fl-build  # that is lowercase "FL-BUILD"

It should give you the help message for FASTlib's build system. If it doesn't, type:

echo $PATH

and make sure the $FASTLIBPATH/script is there and is spelled exactly correctly. Try typing the export or setenv command by hand and see if it starts working afterwards. Consult your friend who knows Unix when necessary.

Now, change into your $FASTLIBPATH, and change into examples/platonic_allnn. This is where the example code is, and you can try building it by:

fl-build allnn_main

If this does not work, email rriegel [at] cc and tell what kind of system you are trying this on or anything else you think would be interesting (really old versions of python, etc). If it worked, there will be a ./allnn_main executable. Now, try:

./allnn_main --r=../../fastlib/data/test/fake.csv

This finds the (degenerate) nearest neighbors of points in fake.csv, which are the points themselves except when multiple points occur at the same location. You can see the results in output.txt.

Code Organization

FASTlib's core has a major C++ part, but smaller python parts with a few shell scripts. These are laid out in the following way:

While we do all the tests we can to make sure changes doesn't break anything, we cannot guarantee against the occasional build break. Emails are highly appropriate in this case :-)

Building

Using the build tool

As stated earlier, building is done via the fl-build tool, which stands for FASTlib build. This tool reads through very short files which just have a list of sources (.cc files), headers (.h files), and other sub-packages it depends on, and produces a Makefile, which it runs automatically.

There are a handful of compilation modes supported by fl-build, specified by the --mode=mode parameter. Each mode has a use, and enables certain flags:

(For the curious: The command line flags are listed in script/buildsys.py)

Thus, we could re-build the example, but turn off all debug checks:

fl-build allnn_main --mode=debug

To add custom flags, add --cflags. To get increased performance on Pentium 4 and define a macro called COAGULATE, you would use:

fl-build mybinary --mode=fast --cflags="-march=pentium4 -DCOAGULATE"

Writing build files

You probably want to skip this subsection until you start writing new code.

The fl-build tool looks in the current directory for a file called build.py. The build file is executed as straight Python code, with access to a few specific functions defined by the build system, that correspond to "meta build rules". In processing these, the build system recursively pulls build files from other directories and resolves the dependencies. A Makefile is then created in your current directory, which fl-build automatically runs for you.

Here is a sample build rule that will build an executable test compiled from test.cc and test.h, and link it against the fastlib front-end library:

binrule(
   name = "test",
   sources = ["test.cc"],
   headers = ["test.h"],
   deplibs = ["fastlib:fastlib"]
   )

Each string you see is actually processed by the build system, which figures out what you are referring to. If there is no colon ":" character, it looks for a file in the same directory as the build file. If it finds a ":", such as "fastlib:fastlib", it looks in the directory "fastlib" for a rule named "fastlib" -- the left part of the colon is the directory, the right part is the rule name. For example, the full path to the rule for the main executable in u/example would be "u/example:main".

Next, suppose you want to create a small reusable library. Taking a look at the build file examples/platonic_allnn/build.py,

librule(
    name = "allnn",              # this line can be safely omitted
          # (since this is examples/platonic_allnn, the build system
          # will automatically name it "platonic_allnn" if you omit it -- most
          # other sub-packages will omit the name)
    #sources = ["helper.cc"],       # Any .c or .cc files where library functions are defined. This line can be omitted if there are no .cc or .c files.
    headers = ["allnn.h"],        # include files part of the 'lib'
    deplibs = ["fastlib:fastlib"]  # depends on fastlib core
    )

binrule(
    name = "allnn_main",                 # the executable name
    sources = ["allnn_main.cc"],         # compile allnn_main.cc
    headers = [],                  # no extra headers
    deplibs = [":allnn"]       # depends on allnn in this folder
    )

Use of C++

FASTlib is C++, but only to an extent. If you are familiar with C, you will have no problem. The things we use from C++ is:

FASTlib mostly avoids inheritance, virtual functions, and operator overloading. It also notably avoids most constructors due to their pickiness about when they are called. Here's an example of what this looks like. In particular, we'll create a multiplication table:

Matrix mult_table;

mult_table.Init(10, 10); // make it 10 rows by 10 columns

for (index_t i = 0; i < 10; i++) {
  for (index_t j = 0; j < 10; j++) {
    mult_table.set(i, j, i * j);
  }
}
// the matrix does not have to be freed

Some quick notes before we move on. The matrix must be initialized before it is usable, but keep in mind everything is freed by default. Caution -- if you declare a matrix and never initialize it, the program will crash at the end of the function. In debug mode, many FASTlib classes will let you know that this is happening.

Next, the index_t type is usually an regular int, but is wired through the system to become 64-bit if you tell it to.

Copying and Aliasing

C++ is infamous for its desire to make copies of everything. If you forget to pass a parameter as a constant reference (const Classname&), everything will be copied, but sometimes, there is just no way to get around of it. By avoiding constructors, we find copying is usually not needed. To avoid accidental copies of your class, put FORBID_COPY at the beginning like this:

class X {
  FORBID_COPY(X);
  ...
}

Sometimes, though, you really need to make copies, or at least things like copies. Many classes support a Copy method to explicitly do so, through the Matrix and Vector.

The Vector and Matrix classes support a concept of aliasing. A vector can be a vector on its own, or it could also point to some column within a matrix; a matrix can also refer to a subset of another matrix's columns. The concept is simple, but there is one caveat: without garbage collection, somebody eventually has to free the memory. Each Vector and Matrix then knows whether it is the owner of the memory it points to, and if it is, it will free the data automatically; otherwise, it is only an alias.

An example is shown here:

   Matrix original;
   original.Init(5, 5);
   ...
   for (int j = 0; j < 3; ++) {
     Matrix weak_alias;
     weak_alias.Alias(original); // this matrix now aliases the original
     weak_alias.set(j, j, 999.0); // this modifies the original matrix
     // the destructor of weak_alias is called, but the memory is not freed
   }
   Matrix newowner;
   newowner.Own(&original); // newowner
   Matrix copy;
   copy.Copy(original); // a completely new copy 
   original.Destruct(); // newowner is still valid
   original.Init(99, 99);

Debugging

C++, and equally C, can sometimes make it easy to shoot yourself in the foot. To help you avoid this, we made debugging an important part of FASTlib.

The build system by default will enable all checks in the form of #ifdef DEBUG (we'll get back to this later). These checks and safeguards, such as bounds checking and memory poisoning, are present in nearly all core FASTlib classes. If you are debugging and find 0xDEADBEEF, 2146666666, or 0x7FF388AA, or NaN, it probably means you forgot to initialize something; if you go beyond the end of a Vector, it will let you know. In practice, these have a 20% or so performance penalty -- as a habit, we've found it's just fine to leave these on until you want to run timing experiments. In machine learning code, since you can sometimes generate seemingly good results from garbage data, these types of safeguards are almost essential.

To use debugging yourself, plaster your code with the following:

You can safely leave these in at no cost in non debug mode. To set verbosity level to 3.0, you would specify the command line argument: --debug/verbosity_level=3.0

See the Doxygen for base/debug.h for more information.

FASTexec - Command-line parameters and experimentation

We felt it is important that machine learning researchers can run a lot of experiments without too much trouble. Parameter passing is integral to the experimentation process, so we unified these.

The core idea behind the system is that within one run of your program, a hierarchial data store is created. In some ways it is similar to a "Windows registry" except it only has a very brief existence. In fact, it more like an XML document tree, except without attributes, and has a "stateless" output format.

Basic command-line parameters

Let's look at a non-trivial example. We're measuring how different strided memory access patterns affect cache performance (something you probably won't care about, but is trivial to code):

#include "fastlib/fastlib.h"
int main(int argc, char *argv[]) {
  fx_init(argc, argv); // initialize command-line parameters and the data store
  int count = fx_param_int_req( // get a REQUIRED integer parameter
      NULL, // directly from command line (not from a sub module)
      "count"); // called "count", as in --count=num
  int stride = fx_param_int(NULL, "stride", 1); // defaults to stride 1
  double factor = fx_param_double(NULL, "factor", 1.0); // default value 1.0
  
  ArrayList<double> values;
  values.Init(count);

  fx_timer_start(NULL, "strides"); // timers have names
  // Let's see how stride affects cache performance
  for (int s = 0; s < stride; s++) {
    for (int j = s; j < count; j += stride) {
      values[j] = factor * j;
    }
  }
  fx_timer_stop(NULL, "strides");
  fx_format_result(NULL, "success", "%d", 1); // store some result
  fx_done();
}

The useful functions we used were fx_param_int (to get an integer), fx_param_double (to get a double-precision value), fx_timer_{start|stop} to have timers, and fx_done() to output the results. I ran this example with the parameters:

./test --count=10000 --factor=2.0

and the following text resulted:

/info/system/node/name watson.cc.gatech.edu
/info/system/arch/name x86_64
/info/system/kernel/name Linux
/info/system/kernel/release 2.6.9-55.0.6.ELsmp
/info/system/kernel/build %231%20SMP%20Thu%20Aug%2023%2011%3A13%3A21%20EDT%202007
/info/rusage/self/WARNING your_OS_might_not_support_all_of_these
/info/rusage/self/utime 0.000999
/info/rusage/self/stime 0.001999
/info/rusage/self/minflt 418
/info/rusage/self/majflt 0
/info/rusage/self/maxrss 0
/info/rusage/self/ixrss 0
/info/rusage/self/idrss 0
/info/rusage/self/isrss 0
/info/rusage/self/nswap 0
/info/rusage/self/inblock 0
/info/rusage/self/oublock 0
/info/rusage/self/msgsnd 0
/info/rusage/self/msgrcv 0
/info/rusage/self/nsignals 0
/info/rusage/self/nvcsw 1
/info/rusage/self/nivcsw 0
/info/rusage/children/WARNING your_OS_might_not_support_all_of_these
/info/rusage/children/utime 0.000000
/info/rusage/children/stime 0.000000
/info/rusage/children/minflt 0
/info/rusage/children/majflt 0
/info/rusage/children/maxrss 0
/info/rusage/children/ixrss 0
/info/rusage/children/idrss 0
/info/rusage/children/isrss 0
/info/rusage/children/nswap 0
/info/rusage/children/inblock 0
/info/rusage/children/oublock 0
/info/rusage/children/msgsnd 0
/info/rusage/children/msgrcv 0
/info/rusage/children/nsignals 0
/info/rusage/children/nvcsw 0
/info/rusage/children/nivcsw 0
/params/count 10000
/params/factor 2.0
/params/debug/print_warnings 1
/params/debug/abort_on_nonfatal 0
/params/debug/pause_on_nonfatal 0
/params/debug/print_notify_locs 0
/params/debug/noisy 0
/params/fx/timing 0
/params/stride 1
/timers/default/wall/sec 0.000240
/timers/default/user 0.000000
/timers/default/sys 0.000000
/timers/strides/wall/sec 0.000034
/timers/strides/user 0.000000
/timers/strides/sys 0.000000
/results/success 1

We admit this is a lot of text, but when you later comb through the results, you can select only the portions you care about. Briefly, each portion of the tree:

This output could indeed just be an s-expression, or it could be an attribute-less XML file. We chose this path-value dump format because it is stateless -- i.e. anyone can process it with a little bit of grep. Corruption in the file will only affect it until the next newline.

Running automated experiments and collecting results

Running experiments and collecting results is made simpler using FASTexec. After using the FASTexec code to store output variables in the datastore, you can use FASTexec to run multiple experiments.

If you type the command:

fx-run | less

you will see the help screen for fx-run. This program will run your executable with all combinations of the parameters you give it, and save results in a directory called fx that is created in the same directory you run the tests. After that, gather results using fx-csv to make an Excel-compatible comma-separated-values file, or fx-latex to create a properly escaped LaTeX table.

You can try an example with the all nearest neighbors classifier example in u/example. We just consider with time statistics since the output doesn't vary per leaf size of the trees used:

  cd $FASTLIBPATH/examples/platonic_allnn && fl-build allnn_main
  fx-run allnn_k ./allnn_main --r=../../fastlib/data/test/fake.csv --allnn/leaf_size=10,15,20,25,30
  fx-csv allnn_k ./alnn_main /params/allnn/leaf_size /timers/default/wall/sec
  fx-latex allnn_k ./alnn_main /params/allnn/leaf_size /timers/default/wall/sec --preview

Little gotchas

Success and failure

The success_t type in base/common.h defines our standard for indicating success or failure. Rather than assuming 1 or 0 or -1 indicates something or other, we explicitly return SUCCESS_PASS (succeeded), SUCCESS_FAIL (failed), or SUCCESS_WARN (something was suboptimal).

Basic types

You will notice heavy use of index_t (in base/scale.h) rather than integers. For practical purposes, index_t is a signed integer -- signed so you can loop over >= 0. On 32-bit and 64-bit Intel/AMD machines, this will be 32-bits, which is the fastest int for both systems and is relatively compact. But if you want to operate on datasets larger than a few gigabytes, you can define the SCALE_LARGE macro, and these will instantly switch to the largest size your computer can address.

Also, the base/basic_types.h file (it will be in bin/ since it is auto-generated) defines standard int16, int32, int64, uint16, uint32, uint64 types.

Printf versus Streams

We operate under the assumption that more of our audience is familiar with C I/O than with C++ I/O, especially when it comes to fancy floating-point formatting.

The caveat is that printf only works with native types like int, short, long. To print an index_t, you use:

printf("Iteration %"LI"d complete.", some_index_var);   // no special formatting
printf("Iteration %04"LI"d complete.", some_index_var); // with formatting

The LI macro is a string that will have the suitable "l" modifier for index_t (see man sprintf for more details). For other types:

Alternately, you can just cast the variable to a native type like (int) (short) or (long), if you want to be lazy.

Coding style

In short, the funny usage of capitalization and underscores does have a method to the madness. Please see Coding Style Page for a full description. Some pocket-book examples:

After a while it will make sense. The inconsistency in member methods looks very odd at first, but you must admit it kind of helps you mentally understand this very non-Demeterish example:

warehouse.carboard_box().contents().bottom().AttachTape(
    samsclub.utilities_department().shelf(3).item("packaging tape").Purchase());

Please adhere to this when making FASTlib code.