Wednesday, August 26, 2020

Longest Palindrome : Manacher's Algorithm

Originally Posted on SeedBx


Palindromes are one of the most beautiful strings in the world but finding the longest palindromic substring is one of the most computationally daunting task. In this post I am going to talk about the Manacher’s Algorithm which achieves this in linear time.


Problem Statement

Given a string P, find the longest palindromic substring.


Naive Approach

Try 1 :

Loosely speaking, we can generate all the substring and check each of these substrings whether they are a palindrome or not. To be more concrete, we run an outer loop which iterates over each and every position of the string P, and for each position we run an inside loop, which generates each substring with that position as the starting point. For each of the generated we run a kind of two pointer method to check if that substring is a palindrome or not.

Pseudo Code :

checkPalindrome(S)

for i=0 upto S.Length, j=S.Length-1 to 0

    if S[i]!=S[j]

        return false

return true

 

maxLength=0

maxPalindrome=null

for i=0 upto P.Length

    for j=i+1 upto P.Length

        if checkPalindrome(P[i...j]) and P[i...j].Length>maxLength

            maxLength=P[i...j].Length

            maxPalindrome=P[i...j]


Time Complexity : O(|P|3)

Space Complexity : O(1)


Try 2:

One other way in which we can tackle this problem is by observing that each palindrome has a center along which the string becomes symmetric.

For Example :

·       S=”radar” has a center at ‘d’.

·       T=”aayushhsuyaa” has centres at both the ‘h’s.

Using this fact, we can construct a solution to exploiting this fact. More concretely, we will, for each point check whether an odd-length palindrome has a center at that point and compare its length or if an even-length palindrome has a center in the neighbourhood.

Pseudo Code :

maxLength=0

maxPalindrome=null

 

for i=0 upto P.Length

    // Checking for Odd-Length Palindromes

    low=i-1

    high=i+1

    while low>=0 and high<P.Length and P[low]==P[high]

        if P[low...high].Length>maxLength

            maxLength=P[low...high].Length

            maxPalindrome=P[low...high]

        low--

        high++

       

    // Checking for Even-Length Palindromes

    low=i-1

    high=i

    while low>=0 and high<P.Length and P[low]==P[high]

        if P[low...high].Length>maxLength

            maxLength=P[low...high].Length

            maxPalindrome=P[low...high]

        low--

        high++


Time Complexity : O(|P|2)

Space Complexity: O(1)


Can we do better?

The answer to this lies in Manacher’s Algorithm.


Manacher’s Algorithm

Manacher‘s Algorithm is the most efficient program to find out the longest palindrome in a string. It uses a lot of clever insights to convert the previous quadratic algorithm to a linear algorithm.


Key Insights

1. Each palindrome has a centre of symmetry.

As shown before, each palindrome has a centre of symmetry about which if we expand we can get the palindrome.

Example : S=”ABABA” has a centre at 3rd position.

Using this subtle fact as a backbone, the algorithm uses an auxiliary string which consists of special character (or more a character which is not present in the string) between each of the letter and having two mutually different characters other than the special character to mark the starting and ending of the string.

Example: For string S given above, the auxililary string, T, will look like $#A#B#A#B#A#@.

Using this string as the starting point solves the dilemma of having two centres for even-length palindromes as for an even-length palindrome, the special character serves as the centre.

 

2. Palindromes are symmetric about the centre uptill the boundary.

If we look carefully, right part looks just like the left part of the palindrome. Hence there exists a mirror counterpart for each position of the palinidrome.

Example: Let us take the string S=”ABABA”

Tree Representation of string ABABA

However this may seem more than tempting to assume that len(palindrome(right))=len(palindrome(left)).

But the one thing that we are missing right now, is that we are ignoring anything before and after the palindromic substring “ABABA”.

If we consider the string S=”ABABAB”.

 

Tree Representation of string ABABAB

So we can actually observe that len(palindrome(right))>=len(palindrome(left)).

Now let us consider another string S=”BABABA”.

Tree Representation of string BABABA

So we can observe that len(palindrome(right))>=R-right’s index

Combining the two observations, we get:

If len(palindrome(left)) goes beyond L

len(palindrome(right))>=R-right’s index

If len(palindrome(left)) is within L

len(palindrome(right))>=len(palindrome(left))

Hence len(palindrome(right))>=min(len(palindrome(left)), R-right’s index)

 

In Manacher's algorithm, for every index element, we expand beyond the minimum length that we get from the above established fact to find the actual length of the palindrome.


Note : The auxiliary string that is constructed should have different starting and ending special characters.

Example : T="@#A#B#A#B#A#@" is a wrong auxiliary string as the end characters are also involved in the largest palindromic substring.


Pseudo Code:

A=[]

C=0, R=0

for i=1 upto T.Length

    mirr=2*C-i

    if i<R

        A[i]=min(R-i,A[mirr])

    while(T[i+(1+A[i])]==T[i-(1+A[i])])

        A[i]++

    if i+A[i]>R

        C=i

        R=1+A[i]


Time Complexity : I will not give a rigorous proof of the time complexity, but if you look carefully then at the moment we find a centre, we can at max go till the boundary of the string and hence at the worst case we will traverse at max twice the length of the array which asymptotically O(|P|).

Space Complexity : O(|P|)


Manacher's Algorithm is one of the most beautiful algorithm, which effectively finds the largest palindromic substring. Although there are not many practical applications of Manacher's Algorithm apart from being used in some compression procedures, it gives a different viewpoint towards strings and in particular palindromes.


Thanks for reading.


Also See : Pattern Searching : KMP Algorithm

No comments:

Post a Comment