What are the main differences between the Knuth-Morris-Pratt search algorithm and the Boyer-Moore search algorithm?

Technology CommunityCategory: StringsWhat are the main differences between the Knuth-Morris-Pratt search algorithm and the Boyer-Moore search algorithm?
VietMX Staff asked 3 years ago
  • Boyer-Moore’s approach is to try to match the last character of the pattern instead of the first one with the assumption that if there’s not match at the end no need to try to match at the beginning. This allows for “big jumps” therefore BM works better when the pattern and the text you are searching resemble “natural text” (i.e. English). Scanning backwards with BM is faster because if you spot a character that isn’t in the word, then you can skip many spaces forward. Best-case, you may only need to compare the last character every time, and skip forward the full length of the word. However, If you were scanning forward with KMP, and spotted the non-word character, then you can only skip forward as far as you have matched.
  • Knuth-Morris-Pratt searches for occurrences of a “word” W within a main “text string” S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.

This means KMP is better suited for small sets like DNA (ACTG) however, there have been some modifications of BM that have made small-alphabet searching viable. BM can be faster or slower than KMP depending on the input, while KMP is perfectly reliable O(n).

Things to note:

KMP:

  • Compares text left-to-right
  • Uses a failure array to shift intelligently
  • takes O(m), where m is the length of the pattern, to compute failure array
  • takes O(m), space
  • takesO(n), time to search a string

BM:

  • Compares pattern from last character
  • Uses bad character jumps and good suffix jumps
  • takes O(m + size of alphabet) to compute tables
  • takes O(m + size of alphabet), space
  • takes O(n), but usually better to search