Tuesday, November 22, 2011

Learning Scala : Recursion and TCO 1


Authored by Win Myo Htet



fib=1:1[a+b|(a,b)<-zip fib(tail fib)]
My buddy texted me the above Haskell Fibonacci code, which uses  recursion, of course. Very neat, huh? We have also seen recursion making QuickSort self documenting  and gracefully simple. I have also used recursion in my Boyer-Moore search without making a conscious effort. Yes, recursion is at home in functional programming.

Recursion can make thing simple. Yet, simple is not easy! It takes recursively using recursion to master recursion. Ok, that's not a very good joke. But you get the point that you need to keep using recursion whenever possible to learn to think in recursion if you want to be the master in the land of functional programming. In Scala, you can get away with, without using recursion, then, you are not really in functional programming land. Assume that you have decided to start using recursion buying into my much lauded hype, well, let me raise the bar a bit then. With great power comes ... I mean, "With Recursion comes this nasty bug stack overflow."

Great... Well, all hope is not lost, sort of... The stack overflow happens because the function keeps calling itself thus putting itself on the stack memory again and again until it runs out of stack memory. The remedy to this is TCO (Tail Call Optimization). In TCO, we are using a particular pattern of recursion so that the compiler can optimize the code to avoid the stack overflow. Let's try lookat some code.

scala> List(1,2,3,4,5,6).foldLeft(10)(_ + _)
res0: Int = 31

Are we still talking about recursion? Yes, we are. I have mentioned in one of the blog before that foldLeft is preferred over foldRight. The reason being is that foldLeft is TCO'ed while foldRight is not. Here is a bit detail analysis of their process.

Here is the foldLeft sequence:
((((((10 + 1) + 2) + 3) + 4) + 5) + 6)
((((((11) + 2) + 3) + 4) + 5) + 6)
(((((13) + 3) + 4) + 5) + 6)
((((16) + 4) + 5) + 6)
(((20) + 5) + 6)
((25) + 6)
(31)

Here is the foldRight sequence:
(1 + (2 + (3 + (4 + (5 + (6 + 10))))))
(1 + (2 + (3 + (4 + (5 + (16))))))
(1 + (2 + (3 + (4 + (21)))))
(1 + (2 + (3 + (25))))
(1 + (2 + (28)))
(1 + (30))
(31)

In foldLeft, you can see that the result can be computed before the next iteration, while foldRight have to traverse all the way to the end, then compute and backtrack for more computing. Alright, how do we do that for our own Fibonacci code? Following the above foldLeft example, here is our Fibonacci code

object Fibonacci{

  def main(args:Array[String]):Unit={
    println(fibonacci(6))  
  }

  import scala.annotation.tailrec
  private def fibonacci(n:Int):Int={
  
    @tailrec
    def fibonacciTCO(n:Int,accu:Int):Int={
      if(n==1) return n+accu
      fibonacciTCO(n-1,n+accu)
    }
    
    fibonacciTCO(n,10)
  }  
}

Authored by Win Myo Htet

We have to include proper import above fibonacci and @tailrec annotation above fibonacciTCO. The fibonacci function has to be declared private or final so that the method cannot be overridden to undo the TCO. You also need to have accumulator (e.g. In our case accu) so that we can carry on our result and return  the final result at the last iteration without needing to backtrack all the way back. If you look carefully at the code, you will notice that, our (near ?) result(accu) is always in the same function scope and the stack reference to the previous call is not needed any more. Now, to make sure that our method is TCO'ed, we will look into the byte code of our compilation using javap with -c and -private option on the class file of Fibonacci$.class itself.

aunndroid@ubuntu:/host/Users/aunndroid/workspace/source_code$ javap -c -private Fibonacci\$
Compiled from "fibonacci.scala"
public final class Fibonacci$ extends java.lang.Object implements scala.ScalaObject{
public static final Fibonacci$ MODULE$;

public static {};
  Code:
   0: new #9; //class Fibonacci$
   3: invokespecial #12; //Method "":()V
   6: return

public void main(java.lang.String[]);
  Code:
   0: getstatic #19; //Field scala/Predef$.MODULE$:Lscala/Predef$;
   3: aload_0
   4: bipush 6
   6: invokespecial #24; //Method fibonacci:(I)I
   9: invokestatic #30; //Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
   12: invokevirtual #34; //Method scala/Predef$.println:(Ljava/lang/Object;)V
   15: return

private int fibonacci(int);
  Code:
   0: aload_0
   1: iload_1
   2: bipush 10
   4: invokespecial #42; //Method fibonacciTCO$1:(II)I
   7: ireturn

private final int fibonacciTCO$1(int, int);
  Code:
   0: iload_1
   1: iconst_1
   2: if_icmpne 9
   5: iload_1
   6: iload_2
   7: iadd
   8: ireturn
   9: iload_1
   10: iconst_1
   11: isub
   12: iload_1
   13: iload_2
   14: iadd
   15: istore_2
   16: istore_1
   17: goto 0

private Fibonacci$();
  Code:
   0: aload_0
   1: invokespecial #48; //Method java/lang/Object."":()V
   4: aload_0
   5: putstatic #50; //Field MODULE$:LFibonacci$;
   8: return

}

aunndroid@ubuntu:/host/Users/aunndroid/workspace/source_code$ 

Don't worry too much if you don't know anything here. Well, if you know everything and if you are the master of the Scala Universe, you might even not bother to read this, right ? ;) So, we find our TCO'ed fibonacciTCO$1, at line 30. At line 47, instead of re-referencing itself, we find this 17: goto 0. When you see that goto expression, you can be rest assured that your tail call recursion has been TCO'ed. The tail call recursion get optimized by scala compiler using goto. Compare to other functional languages, Scala has some restraints. Since Scala run on JVM and JVM does not support TCO(yet), Scala requires that tail call recursion must be self recursive tail call (e.g. fibonacciTCO must call fibonacciTCO).

Yup, that's quite a bit of info. I don't conceive these knowledge auto-magically. I am not the creator of Scala. I have read the blogs and books. So, I also recommend you to keep on reading about recursion and encourage you to blog about it too. Reading the same info from different narration will help you grasp the concept better. Here are the two I have read: Graham's blog and Nick's blog. They are also explaining the same info in their own way. As usual, we will be getting into some code in the next blog in this series.



Authored by Win Myo Htet

No comments:

Post a Comment