How to check for braces balance in a really large (1T) file in parallel?

Technology CommunityCategory: StacksHow to check for braces balance in a really large (1T) file in parallel?
VietMX Staff asked 3 years ago
Problem

There is a large file( 1TB) containing braces. How to check for braces balance? Make this process parallel.

If ( = 1 and ) = -1. We should check the same two conditions:

  1. The total sum of the brackets is 0
  2. We never go negative at any point of time.
int counter = 0;
for (int i=0; i<string.Length; i++)
{
    char c = string[i];
    if (c == '{')
    {
        counter++;
    }
    else if (c == '}')
    {
        counter--;
        if (counter < 0)
        {
            return false;
        }
    }
}
return (counter == 0);

The first instinct to solving the parallel problem might be to split the work up into equal-sized chunks for each processor. If there are N characters in the file and K processors, give each processor a chunk of N/K characters. When processing a chunk, we can no longer simply return false when the count drops below 0, because there may be more opening braces in an earlier chunk processed by a different processor. We also can’t return false if the counter is not 0 at the end, because it may be that there are more closing braces in later chunks. What we should calculate for each chunk, instead, is:

  1. If we were doing a global brace counter as in the single-threaded approach, by how much would this chunk change the counter? This can be a positive or negative number. Call this the “brace delta“. This number is equal to the number of opening braces minus the number of closing braces in the chunk.
  2. While we are calculating the “brace delta“, what is the lowest point to which the brace delta drops anytime during the processing of the chunk? Call that the “brace delta minimum“. This number is important because this is the point in the chunk where the global counter would be the smallest. We can check whether the global counter would have ever been negative by seeing whether (global counter before chunk) + (brace delta minimum in chunk) < 0 for any chunk.

These two things would be calculated by:

ChunkResult BraceDeltaAlgorithm (int chunkStartPos, int chunkEndPos, String input)
{
    int braceDelta = 0;
    int minBraceDelta = braceDelta;
    for (int i=chunkStartPos; i<chunkEndPos; i++)
    {
        char c = input[i];
        if (c == '{') 
        {
            braceDelta++;
        }
        else if (c == '}')  
        {
            braceDelta--;
            if (braceDelta < minBraceDelta)
            {
                minBraceDelta = braceDelta;
            }
        }
    }
    return new ChunkResult (braceDelta, minBraceDelta);
}

The complexity of the above code is O(N/K) for each of K processors. However, we can consider it O(N/K) time because all the processors can work in parallel.

After we process each chunk in parallel, we combine the results in a single-threaded way. We use the brace deltas to calculate the value of the global counter at the beginning of every chunk, and we use the brace delta min of the chunks to verify that the global counter never drops below 0. The global counter value should be 0 at the end of all the chunks as before.

int globalCounter = 0;
for (int i=0; i<K; i++)
{
    ChunkResult result = childThreads[i].getResult();
    if ( globalCounter + result.minBraceDelta < 0)
    {
        return false;
    }
    globalCounter += result.braceDelta;
}
return (globalCounter == 0);

The non-parallel part takes O(K) time, making the overall time for the solution O(N/K + K).