Friday, November 18, 2011

Learning Scala : Boyer–Moore search 3

Authored by Win Myo Htet


We have both the first table and second. Now we just need to create a startSearch method, which will take the two tables and two Strings : searchBody and searchString. We will return a List[Int],  which will have positions of the match strings, thus the size will be the number of the matches.

def startSearch(searchBody : String, searchString : String,
  pos : Int, table1 : Map[Int, Int],
  table2 : Map[Char, Int]) : List[Int] = {
  
  def search(pos : Int, inc : Int = 0) : List[Int] = {
   if (pos >= searchBody.length - 1) return List[Int]()
   val jump = matchChar(pos + searchString.length - 1, inc)
   if (jump > 0) return search(pos + jump, 0)
   if (jump == 0) return search(pos, inc + 1)
   if (jump == -1) 
     return pos :: search(pos + searchString.length, 0)
   List[Int]()
  }

  def matchChar(pos : Int, inc : Int) : Int = {
   if (!searchBody.isDefinedAt(pos - inc)) return -2
   
   if (!table2.isDefinedAt(searchBody(pos))) 
     return searchString.length
   if (searchBody(pos - inc) 
       == searchString(searchString.length - 1 - inc)) {
    if (searchString.length - 1 - (inc + 1) == -1) return -1
    return 0
   }
   val result1 = table1.get(inc).get
   
   if (!table2.isDefinedAt(searchBody(pos - inc))) 
     return result1   
   val result2 = table2.get(searchBody(pos - inc)).get
   if (result2 > result1) return result2
   result1
  }
  
  search(pos)
 }


We have two nested functions inside our function. We check the validity of our search continuity by searchBody string length. We invoke matchChar function to get the jump value. -1 means we found a string match. 0 means the character match. All the positive value means, we will jump that much position. The sort of default return is empty list. Inside matchChar, we will do the validity check and return -2, which will fall through our default empty list return in search function. We return the length of the searchString if the last character is not a valid character according to table2. Then, we do the individual character matching. If it is matched, we will check to see it is the last character to match. If it is the last character matched, we found a match String and return -1. If it is not the last character matched, we will return 0. If the characters are not matched, the two tables are referred for the jump value and larger value is return. In case of the character not being in the table2, we cannot jump the whole searchString's length since there still is a possible match inside the range.

Below is some output. I admit that it is not well tested. You can get the complete source code here.

Look at the two Map values of ANPANMAN
aunndroid@ubuntu:/host/Users/aunndroid/workspace/source_code$ scala boyer_moore.scala 

Usage: boyer_more string_body search_string
e.g. : boyer_more "HERE IS A SIMPLE EXAMPLE" "ANPANMAN"
Map(A -> 1, M -> 2, N -> 3, P -> 5)
Map(0 -> 1, 5 -> 6, 1 -> 8, 6 -> 6, 2 -> 3, 7 -> 6, 3 -> 6, 4 -> 6)
ANPANMAN
HERE IS A SIMPLE EXAMPLE
       -
        ANPANMAN
HERE IS A SIMPLE EXAMPLE
               -
                ANPANMAN
HERE IS A SIMPLE EXAMPLE
                       -
Match found : 0 Pos : 

Here is the search replica of step by step B-M search demo from the official site <?>

aunndroid@ubuntu:/host/Users/aunndroid/workspace/source_code$ scala boyer_moore.scala "HERE IS A SIMPLE EXAMPLE" "EXAMPLE"
Map(E -> 6, X -> 5, A -> 4, M -> 3, L -> 1, P -> 2)
Map(0 -> 1, 5 -> 6, 1 -> 7, 6 -> 6, 2 -> 6, 3 -> 6, 4 -> 6)
EXAMPLE
HERE IS A SIMPLE EXAMPLE
      -
       EXAMPLE
HERE IS A SIMPLE EXAMPLE
             -
         EXAMPLE
HERE IS A SIMPLE EXAMPLE
               -
         EXAMPLE
HERE IS A SIMPLE EXAMPLE
              -
         EXAMPLE
HERE IS A SIMPLE EXAMPLE
             -
         EXAMPLE
HERE IS A SIMPLE EXAMPLE
            -
         EXAMPLE
HERE IS A SIMPLE EXAMPLE
           -
               EXAMPLE
HERE IS A SIMPLE EXAMPLE
                     -
                 EXAMPLE
HERE IS A SIMPLE EXAMPLE
                       -
                 EXAMPLE
HERE IS A SIMPLE EXAMPLE
                      -
                 EXAMPLE
HERE IS A SIMPLE EXAMPLE
                     -
                 EXAMPLE
HERE IS A SIMPLE EXAMPLE
                    -
                 EXAMPLE
HERE IS A SIMPLE EXAMPLE
                   -
                 EXAMPLE
HERE IS A SIMPLE EXAMPLE
                  -
                 EXAMPLE
HERE IS A SIMPLE EXAMPLE
                 -
Match found : 1 Pos : 17
aunndroid@ubuntu:/host/Users/aunndroid/workspace/source_code$

That's it, folk!


Authored by Win Myo Htet

No comments:

Post a Comment