This is a java program to implement Min Hash. In computer science, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are.
Here is the source code of the Java Program to Implement Min Hash. The Java program is successfully compiled and run on a Windows system. The program output is also shown below.
package com.maixuanviet.datastructures; import java.util.HashMap; import java.util.HashSet; import java.util.Map; import java.util.Random; import java.util.Set; public class MinHash<T> { private int hash[]; private int numHash; public MinHash(int numHash) { this.numHash = numHash; hash = new int[numHash]; Random r = new Random(11); for (int i = 0; i < numHash; i++) { int a = (int) r.nextInt(); int b = (int) r.nextInt(); int c = (int) r.nextInt(); int x = hash(a * b * c, a, b, c); hash[i] = x; } } public double similarity(Set<T> set1, Set<T> set2) { int numSets = 2; Map<T, boolean[]> bitMap = buildBitMap(set1, set2); int[][] minHashValues = initializeHashBuckets(numSets, numHash); computeMinHashForSet(set1, 0, minHashValues, bitMap); computeMinHashForSet(set2, 1, minHashValues, bitMap); return computeSimilarityFromSignatures(minHashValues, numHash); } private static int[][] initializeHashBuckets(int numSets, int numHashFunctions) { int[][] minHashValues = new int[numSets][numHashFunctions]; for (int i = 0; i < numSets; i++) { for (int j = 0; j < numHashFunctions; j++) { minHashValues[i][j] = Integer.MAX_VALUE; } } return minHashValues; } private static double computeSimilarityFromSignatures( int[][] minHashValues, int numHashFunctions) { int identicalMinHashes = 0; for (int i = 0; i < numHashFunctions; i++) { if (minHashValues[0][i] == minHashValues[1][i]) { identicalMinHashes++; } } return (1.0 * identicalMinHashes) / numHashFunctions; } private static int hash(int x, int a, int b, int c) { int hashValue = (int) ((a * (x >> 4) + b * x + c) & 131071); return Math.abs(hashValue); } private void computeMinHashForSet(Set<T> set, int setIndex, int[][] minHashValues, Map<T, boolean[]> bitArray) { int index = 0; for (T element : bitArray.keySet()) { /* * for every element in the bit array */ for (int i = 0; i < numHash; i++) { /* * for every hash */ if (set.contains(element)) { /* * if the set contains the element */ int hindex = hash[index]; if (hindex < minHashValues[setIndex][index]) { /* * if current hash is smaller than the existing hash in * the slot then replace with the smaller hash value */ minHashValues[setIndex][i] = hindex; } } } index++; } } public Map<T, boolean[]> buildBitMap(Set<T> set1, Set<T> set2) { Map<T, boolean[]> bitArray = new HashMap<T, boolean[]>(); for (T t : set1) { bitArray.put(t, new boolean[] { true, false }); } for (T t : set2) { if (bitArray.containsKey(t)) { // item is not present in set1 bitArray.put(t, new boolean[] { true, true }); } else if (!bitArray.containsKey(t)) { // item is not present in set1 bitArray.put(t, new boolean[] { false, true }); } } return bitArray; } public static void main(String[] args) { Set<String> set1 = new HashSet<String>(); set1.add("FRANCISCO"); set1.add("MISSION"); set1.add("SAN"); Set<String> set2 = new HashSet<String>(); set2.add("FRANCISCO"); set2.add("MISSION"); set2.add("SAN"); set2.add("USA"); MinHash<String> minHash = new MinHash<String>(set1.size() + set2.size()); System.out.println("Set1 : " + set1); System.out.println("Set2 : " + set2); System.out.println("Similarity between two sets: " + minHash.similarity(set1, set2)); } }
Output:
$ javac MinHash.java $ java MinHash Set1 : [SAN, MISSION, FRANCISCO] Set2 : [SAN, USA, MISSION, FRANCISCO] Similarity between two sets: 1.0
Related posts:
Java Program to Implement Self Balancing Binary Search Tree
Java Program to Convert a Decimal Number to Binary Number using Stacks
Java Program to Implement Shunting Yard Algorithm
Rest Web service: Filter và Interceptor với Jersey 2.x (P2)
Iterable to Stream in Java
Introduction to Java Serialization
A Guide to Java HashMap
Giới thiệu SOAP UI và thực hiện test Web Service
Java Program to Implement Dijkstra’s Algorithm using Set
Guide to Java 8’s Collectors
Java Program to Implement Pollard Rho Algorithm
Spring Boot Application as a Service
Hướng dẫn Java Design Pattern – Chain of Responsibility
Spring Cloud – Bootstrapping
Spring Webflux with Kotlin
Set Interface trong Java
Summing Numbers with Java Streams
Java Program to Check if an UnDirected Graph is a Tree or Not Using DFS
Java Program to Implement AVL Tree
Custom Thread Pools In Java 8 Parallel Streams
Java Program to Implement Maximum Length Chain of Pairs
Server-Sent Events in Spring
Cơ chế Upcasting và Downcasting trong java
An Intro to Spring Cloud Task
Introduction to Spring Cloud Rest Client with Netflix Ribbon
Rate Limiting in Spring Cloud Netflix Zuul
“Stream has already been operated upon or closed” Exception in Java
Spring Boot - Apache Kafka
Java Program to Perform Inorder Non-Recursive Traversal of a Given Binary Tree
Check if there is mail waiting
Removing Elements from Java Collections
Java Program to Implement Hash Tables Chaining with Doubly Linked Lists