Tuesday, December 27, 2011

Learning Scala : Reading the exotic and essential List API scaladoc 6

Authored by Win Myo Htet



def /:\ [A1 >: A] (z: A1)(op: (A1, A1) ⇒ A1): A1
def fold [A1 >: A] (z: A1)(op: (A1, A1) ⇒ A1): A1
def reduce [A1 >: A] (op: (A1, A1) ⇒ A1): A1
def scan [B >: A, That] (z: B)(op: (B, B) ⇒ B)(implicit cbf: CanBuildFrom[List[A], B, That]): That
def aggregate [B] (z: B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
Since these functions are grouped together, you can say that they have some common functionality. Yes, it is their ability to do parallel processing on the List collection. fold (/:\) keeps folding, reduce:reducing and scan:scanning. Since they all are doing what they are supposed be doing(Wow, very clear!), I will be focusing on fold only and how it does parallel processing.
scala> val list=(for (i <- 1 to 25) yield i) toList
list: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)

scala> list.fold(0){(sum,a)=>println(a+":"+sum);sum+a}
1:0
2:1
3:3
4:6
5:10
6:15
7:21
8:28
9:36
10:45
11:55
12:66
13:78
14:91
15:105
16:120
17:136
18:153
19:171
20:190
21:210
22:231
23:253
24:276
25:300
res0: Int = 325

scala> list.par.fold(0){(sum,a)=>println(a+":"+sum);sum+a}
13:0
1:0
14:0
2:0
15:14
3:2
29:13
5:1
16:0
4:0
17:16
5:4
18:33
6:9
51:42
15:6
19:0
7:0
20:19
8:7
21:39
9:15
22:60
10:24
23:82
11:34
24:105
25:129
12:45
154:93
57:21
247:78
res1: Int = 325

We create a List[Int] of 25 Int. We operate fold on it to sum up the number. If we look at the printed log, we notice that it is similar to foldLeft. The number increase in sequence. It only seems to get the optimization of foldLeft. Then, we use the function par.
def par : ParSeq[A]
Returns a parallel implementation of this collection.
With par returning the parallel collection, fold operation becomes truly parallel processing and we can see that in the printed log. It seems very easy to do parallel processing in this example because we are using the very primitive nature Int type with the sum operator + having an associative property. Well, what if we are to deal with data type that is not like Int type, then you can use aggregate and pass on the custom function for the combop lambda, which reconcile the sub-results produced by parallel processing,while seqop lambda being what you normally feed to foldLeft or reduceLeft for processing. Though the concept is easy to grasp, the implementation will be quite a challenge depending on the complexities of the underlying data. Instead of providing my own example, I will provide you the links for further reading(watching). Markus Jais has blog about the performance analysis of the aggregate's parallel processing power. Aleksandar Prokopec talks about implementation and designing for parallel processing. As usual, the question has been asked on SO and answered.

Shall we go to foreach and forall now?


Authored by Win Myo Htet

No comments:

Post a Comment