Newsgroups: git.cc.talk.c++
Subject: Re: for_each with a member function
Date: 3 Feb 2000 10:30:16 GMT

Sinth <sinth@shaftnet.org> once said:
>> Did you take 2360 back on quarters?  An understanding of LISP (or some
>> other functional language) will make understanding them much easier.
>> (If you did, I'll see if I can make a short tutorial/example that
>> captures the essence of functors and leverages 2360 knowledge.)
>
>Yep, I had it with Kurt.

Neat.

So as you know, in LISP, you can define functions whose return type is
another function.  For example, you could do something along the lines 
of (I haven't written LISP in years, so pardon me if it's a little off)

   (defun mul-n (n)
      (lambda (x) (* x n)))

where mul-n lets us define a function to multiply by n.  Thus we could
say things like (pardon the side effects)

   (setq 'double (mul-n 2))
   (setq 'triple (mul-n 3))

to create new functions which we could then call like

   (double 5)    // should return 10

Actually, I think in LISP it wasn't quite that simple (scheme does it
better, IIRC), but that's the gist of it.


You can do a similar trick in C++.  The "problem" is that in C++,
functions can only return "values" like primitives (int) or structs or
objects (class instances).  The "trick" is that you can overload the
function call operator ("operator()") in classes, to create objects that
behave like functions.  As a result, I can create a class Foo so that I
can use its instances as objects, as in

   Foo f;   // declare a foo
   f();     // call it like it's a function

Now, once you can create function-objects which return other
function-objects, you can start doing really clever things like create
infinite data structures.  Here's an example...


Suppose we want to make the infinite list 1, 2, 3, 4, ...  Imagine that
we've a function cons() which takes a data element and puts it on the
front of a list (just as in LISP).  The list is then easy to create:

   Cons count_from( int x ) {
      return cons( x, count_from(x+1) );
   }

The problem is, of course, we have infinite recursion, and we eventually
either run out of memory, or experience the heat death of the universe,
depending on relative space v time issues.  :)


However, what if each Cons cell held not (data, ptr to next cons) but
instead held (data, a function to call to generate the next cons).
Then we're in business.  Imagine a construct called lazy{}, which,
rather than evaluating its argument now, simply returns a function we
can call later to evaluate it.  That is,

   lazy{ 3+4 }

would not return 7, but rather it would return a function which, when
called, would return 7.  (This "function" is non-trivial to write, but
it can be done straightforwardly and reusably.)  Now we can truly create
an infinite list:

   Cons count_from( int x ) {
      return cons( x, lazy{ count_from(x+1) } );
   }

Now if I call count_from(1), I get a single cons cell back.  It's
"first" is 1, and its "rest" is a function.  If I call that function, it
will evaluate count_from(2), which will return a cons cell whose "first"
is 2, and whose "rest" is yet another function... which I can call
if/when I need it.


This seems like a nice/clever thing--making so-called "infinite" data
structures--but do they have practical value?  Yes!  Here's an example
where an infinite list lets us solve a problem easily and efficiently.

Let us define the "fringe" of a tree to be the list containing all of
its leaf nodes, in the order you visit them on an inorder traversal.
For example, these two trees:

      2                   4     
     / \                 / \   
    1   4               2   5 
       / \             / \   
      3   5           1   3 

both have the same "fringe", namely [1,3,5].  (Make sure you grok the
fringe idea before continuing.)

Now, suppose we want to see if two trees have the same fringe.  In the
imperative style, there are two "obvious" ways we could do this:

(1) We could compute the fringe of each tree, and then compare the
results.  This is easy to implement; something like

   list fringe( tree t ) {
      if( t == NIL )
         return empty_list;
      else if( is_leaf(t) )
         return one_element_list( t.data );
      else
         return concat_lists( fringe( t.left ), fringe( t.right ) );
   }
   fringe( tree_a ) == fringe( tree_b )   // the actual test

The problem is, this is wasteful; if the trees differ at the very first
node, then we traversed the whole rest of the tree "for nothing".

(2) We could create some kind of "tree iterator", where we could define
a method next() which would continue traversing the tree to the next
leaf.  This is difficult to do; you need to remember where you are in
the tree, and where you just came from, and have a way to traverse both
up and down the tree.  It's possible, but it's a big ugly mess and
requires parent pointers (which, for argument's sake, we'll imagine we
have).  If we could implement this, then seeing if two trees have the
same fringe would be somewhat easy:

   a = tree_a.start_iterator();
   b = tree_b.start_iterator();
   while( !a.done() && !b.done() ) {
      if( a.value() != b.value() )
         return false;                   // they don't match
      a.next();
      b.next();
   }
   return a.done() && b.done();  // they do match, unless one still isn't done

This solution is difficult to implement overall, but it has the nice
property that if, say, the very first leaf doesn't match, we quit
immediately, rather than keep walking the tree.

So, in short, the first solution is easy but wasteful, while the second
solution is harder to implement but more efficient.


We can make it easy _and_ efficient with our lazy-lists!

   Cons fringe( tree t ) {
      if( t == NIL )
         return NIL;
      else if( is_leaf(t) )
         return cons( t.data, lazy{ NIL } );
      else
         return concat_lazy_lists( lazy{ fringe( t.left ) }, 
                                   lazy{ fringe( t.right ) } );
   }
   fringe( tree_a ) == fringe( tree_b )   // the actual test

Now, since we are using "lazy lists", we don't actually compute the
values of the elements until someone demands them.  As a result, when we
do the fringe equality test (the last line of code above), then it will
compare the lists, where list comparison is defined as

   bool operator==( Cons a, Cons b ) {
      if( a==NIL  &&  b==NIL )
         return true;
      if( a==NIL  ||  b==NIL )
         return false;
      if( first(a) != first(b) )
         return false;
      return rest(a)() == rest(b)();   // note, we call functions here!
   }

Now, if the tree fringes mismatch at the first node, we never bother to
traverse the rest of the trees, because we never bother to call the
function stored in the "rest" of the cons, since we don't need it.
We have short-circuited the portion of the computation we don't need by
being lazy (just as in "true || some_expensive_function()" we needn't
call the function).


This approach can be generalized to any type of problem where it is
easy to express a solution that examines the whole solution-space, but
where examining only part of the solution-space may be enough to solve
the problem.  Back with my original example, if I want a list of N
integers, I can express the solution simply as "start with the infinite
list of all integers, and then just take the first N of them".  Clearly
without "lazy evaluation", this solution wouldn't work at all.  But if
we assume values are only computed "on demand", then everything works
out peachy.


This turned out to be more "why functors are cool" rather than "how
functors work in C++", but that's ok, as this will motivate
understanding their implementation, which I will get around to
explaining sometime.  :)

-- 
 Brian M. McNamara   lorgon@acm.org  :  I am a parsing fool!
   ** Reduce - Reuse - Recycle **    :  (Where's my medication? ;) )

