This is a java program to implement Min Hash. In computer science, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are.
Here is the source code of the Java Program to Implement Min Hash. The Java program is successfully compiled and run on a Windows system. The program output is also shown below.
package com.maixuanviet.datastructures; import java.util.HashMap; import java.util.HashSet; import java.util.Map; import java.util.Random; import java.util.Set; public class MinHash<T> { private int hash[]; private int numHash; public MinHash(int numHash) { this.numHash = numHash; hash = new int[numHash]; Random r = new Random(11); for (int i = 0; i < numHash; i++) { int a = (int) r.nextInt(); int b = (int) r.nextInt(); int c = (int) r.nextInt(); int x = hash(a * b * c, a, b, c); hash[i] = x; } } public double similarity(Set<T> set1, Set<T> set2) { int numSets = 2; Map<T, boolean[]> bitMap = buildBitMap(set1, set2); int[][] minHashValues = initializeHashBuckets(numSets, numHash); computeMinHashForSet(set1, 0, minHashValues, bitMap); computeMinHashForSet(set2, 1, minHashValues, bitMap); return computeSimilarityFromSignatures(minHashValues, numHash); } private static int[][] initializeHashBuckets(int numSets, int numHashFunctions) { int[][] minHashValues = new int[numSets][numHashFunctions]; for (int i = 0; i < numSets; i++) { for (int j = 0; j < numHashFunctions; j++) { minHashValues[i][j] = Integer.MAX_VALUE; } } return minHashValues; } private static double computeSimilarityFromSignatures( int[][] minHashValues, int numHashFunctions) { int identicalMinHashes = 0; for (int i = 0; i < numHashFunctions; i++) { if (minHashValues[0][i] == minHashValues[1][i]) { identicalMinHashes++; } } return (1.0 * identicalMinHashes) / numHashFunctions; } private static int hash(int x, int a, int b, int c) { int hashValue = (int) ((a * (x >> 4) + b * x + c) & 131071); return Math.abs(hashValue); } private void computeMinHashForSet(Set<T> set, int setIndex, int[][] minHashValues, Map<T, boolean[]> bitArray) { int index = 0; for (T element : bitArray.keySet()) { /* * for every element in the bit array */ for (int i = 0; i < numHash; i++) { /* * for every hash */ if (set.contains(element)) { /* * if the set contains the element */ int hindex = hash[index]; if (hindex < minHashValues[setIndex][index]) { /* * if current hash is smaller than the existing hash in * the slot then replace with the smaller hash value */ minHashValues[setIndex][i] = hindex; } } } index++; } } public Map<T, boolean[]> buildBitMap(Set<T> set1, Set<T> set2) { Map<T, boolean[]> bitArray = new HashMap<T, boolean[]>(); for (T t : set1) { bitArray.put(t, new boolean[] { true, false }); } for (T t : set2) { if (bitArray.containsKey(t)) { // item is not present in set1 bitArray.put(t, new boolean[] { true, true }); } else if (!bitArray.containsKey(t)) { // item is not present in set1 bitArray.put(t, new boolean[] { false, true }); } } return bitArray; } public static void main(String[] args) { Set<String> set1 = new HashSet<String>(); set1.add("FRANCISCO"); set1.add("MISSION"); set1.add("SAN"); Set<String> set2 = new HashSet<String>(); set2.add("FRANCISCO"); set2.add("MISSION"); set2.add("SAN"); set2.add("USA"); MinHash<String> minHash = new MinHash<String>(set1.size() + set2.size()); System.out.println("Set1 : " + set1); System.out.println("Set2 : " + set2); System.out.println("Similarity between two sets: " + minHash.similarity(set1, set2)); } }
Output:
$ javac MinHash.java $ java MinHash Set1 : [SAN, MISSION, FRANCISCO] Set2 : [SAN, USA, MISSION, FRANCISCO] Similarity between two sets: 1.0
Related posts:
A Quick Guide to Using Keycloak with Spring Boot
Java Program to Implement the MD5 Algorithm
Java Program to Find the GCD and LCM of two Numbers
Summing Numbers with Java Streams
Overview of Spring Boot Dev Tools
Java Program to subtract two large numbers using Linked Lists
Guide to the Java Queue Interface
Guide to Guava Multimap
Java Program to Perform Uniform Binary Search
Tạo chương trình Java đầu tiên sử dụng Eclipse
Tìm hiểu về Web Service
Hướng dẫn Java Design Pattern – Object Pool
The Dining Philosophers Problem in Java
Java Program to Perform the Sorting Using Counting Sort
Properties with Spring and Spring Boot
New Features in Java 8
Spring Boot - Google Cloud Platform
Functional Interfaces in Java 8
Java Program to Implement Patricia Trie
Spring Data JPA Delete and Relationships
Java Program to Solve Tower of Hanoi Problem using Stacks
Notify User of Login From New Device or Location
Spring’s RequestBody and ResponseBody Annotations
Java Program to Implement Sorting of Less than 100 Numbers in O(n) Complexity
Java – Write a Reader to File
Guide to PriorityBlockingQueue in Java
Java Program to Implement Solovay Strassen Primality Test Algorithm
Java – InputStream to Reader
Java Program to Implement Shunting Yard Algorithm
Using Java Assertions
Sao chép các phần tử của một mảng sang mảng khác như thế nào?
Giới thiệu luồng vào ra (I/O) trong Java