Java Program to Implement Aho-Corasick Algorithm for String Matching

This is a java program to implement Aho-Corasick Algorithm. It is a kind of dictionary-matching algorithm that locates elements of a finite set of strings (the “dictionary”) within an input text. It matches all patterns simultaneously. The complexity of the algorithm is linear in the length of the patterns plus the length of the searched text plus the number of output matches. Note that because all matches are found, there can be a quadratic number of matches if every substring matches (e.g. dictionary = a, aa, aaa, aaaa and input string is aaaa).

Here is the source code of the Java Program to Implement Aho-Corasick Algorithm for String Matching. The Java program is successfully compiled and run on a Windows system. The program output is also shown below.

package com.maixuanviet.setandstring;
 
import java.util.Arrays;
import java.util.Scanner;
 
public class StringSearchUsingAhoCorasickAlgo
{
    static final int ALPHABET_SIZE = 26;
    Node[]           nodes;
    int              nodeCount;
 
    public static class Node
    {
        int     parent;
        char    charFromParent;
        int     suffLink    = -1;
        int[]   children    = new int[ALPHABET_SIZE];
        int[]   transitions = new int[ALPHABET_SIZE];
        boolean leaf;
        {
            Arrays.fill(children, -1);
            Arrays.fill(transitions, -1);
        }
    }
 
    public StringSearchUsingAhoCorasickAlgo(int maxNodes)
    {
        nodes = new Node[maxNodes];
        // create root
        nodes[0] = new Node();
        nodes[0].suffLink = 0;
        nodes[0].parent = -1;
        nodeCount = 1;
    }
 
    public void addString(String s)
    {
        int cur = 0;
        for (char ch : s.toCharArray())
        {
            int c = ch - 'a';
            if (nodes[cur].children == -1)
            {
                nodes[nodeCount] = new Node();
                nodes[nodeCount].parent = cur;
                nodes[nodeCount].charFromParent = ch;
                nodes[cur].children = nodeCount++;
            }
            cur = nodes[cur].children;
        }
        nodes[cur].leaf = true;
    }
 
    public int suffLink(int nodeIndex)
    {
        Node node = nodes[nodeIndex];
        if (node.suffLink == -1)
            node.suffLink = node.parent == 0 ? 0 : transition(
                    suffLink(node.parent), node.charFromParent);
        return node.suffLink;
    }
 
    public int transition(int nodeIndex, char ch)
    {
        int c = ch - 'a';
        Node node = nodes[nodeIndex];
        if (node.transitions == -1)
            node.transitions = node.children != -1 ? node.children
                    : (nodeIndex == 0 ? 0 : transition(suffLink(nodeIndex), ch));
        return node.transitions;
    }
 
    // Usage example
    public static void main(String[] args)
    {
        StringSearchUsingAhoCorasickAlgo ahoCorasick = new StringSearchUsingAhoCorasickAlgo(
                1000);
        Scanner sc = new Scanner(System.in);
        System.out.println("Enter the main string: ");
        String main = sc.nextLine().toLowerCase().trim();
        System.out.println("Enter the pattern string: ");
        String pattern = sc.nextLine().toLowerCase().trim();
        ahoCorasick.addString(pattern);
        int node = 0;
        for (char ch : main.toCharArray())
        {
            node = ahoCorasick.transition(node, ch);
        }
        if (ahoCorasick.nodes[node].leaf)
            System.out.println("A '" + pattern + "' string is substring of "
                    + main + " string.");
        else
            System.out.println("A '" + pattern
                    + "' string is not substring of " + main + " string.");
        sc.close();
    }
}

Output:

$ javac StringSearchUsingAhoCorasickAlgo.java
$ java StringSearchUsingAhoCorasickAlgo
 
Enter the main string: 
Sanfoundry
Enter the pattern string: 
Asanfoundry
A 'asanfoundry' string is not substring of sanfoundry string.

Related posts:

Hướng dẫn sử dụng luồng vào ra nhị phân trong Java
Tạo ứng dụng Java RESTful Client với thư viện OkHttp
Servlet 3 Async Support with Spring MVC and Spring Security
Java Program to Perform Postorder Recursive Traversal of a Given Binary Tree
Life Cycle of a Thread in Java
Java Program to Implement Ford–Fulkerson Algorithm
Java Program to Check whether Directed Graph is Connected using BFS
Spring @Primary Annotation
Prevent Brute Force Authentication Attempts with Spring Security
Guide to Java Instrumentation
Queue và PriorityQueue trong Java
Java Program to Describe the Representation of Graph using Adjacency Matrix
Java Program to Implement ArrayDeque API
Java Program to Implement the Checksum Method for Small String Messages and Detect
Java Program to Find the Vertex Connectivity of a Graph
Encode a String to UTF-8 in Java
Checked and Unchecked Exceptions in Java
Recommended Package Structure of a Spring Boot Project
Java Program to Perform Addition Operation Using Bitwise Operators
Using a List of Values in a JdbcTemplate IN Clause
Java Program to Implement Variable length array
Từ khóa throw và throws trong Java
Spring Boot - OAuth2 with JWT
Java Program to Implement the Program Used in grep/egrep/fgrep
Giới thiệu Aspect Oriented Programming (AOP)
Java Program to Represent Graph Using 2D Arrays
Spring Cloud – Adding Angular
Multipart Upload with HttpClient 4
Giới thiệu JDBC Connection Pool
Java Program to Represent Graph Using Linked List
Apache Commons Collections OrderedMap
Java Program to Find Transitive Closure of a Graph