Originally Posted on SeedBx
Palindromes are one of the most beautiful strings in the world but finding the longest palindromic substring is one of the most computationally daunting task. In this post I am going to talk about the Manacher’s Algorithm which achieves this in linear time.
Problem Statement
Given a string P, find the longest palindromic substring.
Naive Approach
Try 1 :
Loosely speaking, we can generate all the substring and check each of these substrings whether they are a palindrome or not. To be more concrete, we run an outer loop which iterates over each and every position of the string P, and for each position we run an inside loop, which generates each substring with that position as the starting point. For each of the generated we run a kind of two pointer method to check if that substring is a palindrome or not.
Pseudo Code :
checkPalindrome(S)
for i=0 upto S.Length, j=S.Length-1 to 0
if S[i]!=S[j]
return false
return true
maxLength=0
maxPalindrome=null
for i=0 upto P.Length
for j=i+1 upto P.Length
if checkPalindrome(P[i...j]) and P[i...j].Length>maxLength
maxLength=P[i...j].Length
maxPalindrome=P[i...j]
Time Complexity : O(|P|3)
Space Complexity : O(1)
Try 2:
One other way in which we can tackle this problem is by observing that each palindrome has a center along which the string becomes symmetric.
For Example :
· S=”radar” has a center at ‘d’.
· T=”aayushhsuyaa” has centres at both the ‘h’s.
Using this fact, we can construct a solution to exploiting this fact. More concretely, we will, for each point check whether an odd-length palindrome has a center at that point and compare its length or if an even-length palindrome has a center in the neighbourhood.
Pseudo Code :
maxLength=0
maxPalindrome=null
for i=0 upto P.Length
// Checking for Odd-Length Palindromes
low=i-1
high=i+1
while low>=0 and high<P.Length and P[low]==P[high]
if P[low...high].Length>maxLength
maxLength=P[low...high].Length
maxPalindrome=P[low...high]
low--
high++
// Checking for Even-Length Palindromes
low=i-1
high=i
while low>=0 and high<P.Length and P[low]==P[high]
if P[low...high].Length>maxLength
maxLength=P[low...high].Length
maxPalindrome=P[low...high]
low--
high++
Time Complexity : O(|P|2)
Space Complexity: O(1)
Can we do better?
The answer to this lies in Manacher’s Algorithm.
Manacher’s Algorithm
Manacher‘s Algorithm is the most efficient program to find out the longest palindrome in a string. It uses a lot of clever insights to convert the previous quadratic algorithm to a linear algorithm.
Key Insights
1. Each palindrome has a centre of symmetry.
As shown before, each palindrome has a centre of symmetry about which if we expand we can get the palindrome.
Example : S=”ABABA” has a centre at 3rd position.
Using this subtle fact as a backbone, the algorithm uses an auxiliary string which consists of special character (or more a character which is not present in the string) between each of the letter and having two mutually different characters other than the special character to mark the starting and ending of the string.
Example: For string S given above, the auxililary string, T, will look like $#A#B#A#B#A#@.
Using this string as the starting point solves the dilemma of having two centres for even-length palindromes as for an even-length palindrome, the special character serves as the centre.
2. Palindromes are symmetric about the centre uptill the boundary.
If we look carefully, right part looks just like the left part of the palindrome. Hence there exists a mirror counterpart for each position of the palinidrome.
Example: Let us take the string S=”ABABA”
However this may seem more than tempting to assume that len(palindrome(right))=len(palindrome(left)).
But the one thing that we are missing right now, is that we are ignoring anything before and after the palindromic substring “ABABA”.
If we consider the string S=”ABABAB”.
So we can actually observe that len(palindrome(right))>=len(palindrome(left)).
Now let us consider another string S=”BABABA”.
So we can observe that len(palindrome(right))>=R-right’s index
Combining the two observations, we get:
If len(palindrome(left)) goes beyond L
len(palindrome(right))>=R-right’s index
If len(palindrome(left)) is within L
len(palindrome(right))>=len(palindrome(left))
Hence len(palindrome(right))>=min(len(palindrome(left)), R-right’s index)
In Manacher's algorithm, for every index element, we expand beyond the minimum length that we get from the above established fact to find the actual length of the palindrome.
Note : The auxiliary string that is constructed should have different starting and ending special characters.
Example : T="@#A#B#A#B#A#@" is a wrong auxiliary string as the end characters are also involved in the largest palindromic substring.
Pseudo Code:
A=[]
C=0, R=0
for i=1 upto T.Length
mirr=2*C-i
if i<R
A[i]=min(R-i,A[mirr])
while(T[i+(1+A[i])]==T[i-(1+A[i])])
A[i]++
if i+A[i]>R
C=i
R=1+A[i]
Time Complexity : I will not give a rigorous proof of the time complexity, but if you look carefully then at the moment we find a centre, we can at max go till the boundary of the string and hence at the worst case we will traverse at max twice the length of the array which asymptotically O(|P|).
Space Complexity : O(|P|)
Manacher's Algorithm is one of the most beautiful algorithm, which effectively finds the largest palindromic substring. Although there are not many practical applications of Manacher's Algorithm apart from being used in some compression procedures, it gives a different viewpoint towards strings and in particular palindromes.
Thanks for reading.
Also See : Pattern Searching : KMP Algorithm
No comments:
Post a Comment