Monday, September 7, 2020

Maximum Sub-Array Sum : Kadane's Algorithm

Originally Posted on SeedBx


Array is almost, in some way, in every application of programming and computer science in general. With array being so important in practical application any procedure on array becomes significant.

In this post we will look at the problem of finding maximum sub-array sum which can be done efficiently by using Kadane's Algorithm.


Problem Statement

Given an array of elements A, find the maximum sub-array sum.


Note : Sub-array is a collection or block of contiguous items in an array.

Example : For array A[]={1, 2, 3, 4}, {1, 2} is a sub-array whereas {1, 3} is not.


Naive Approach

Loosely speaking, we can traverse each and every subset that can be generated and calculate the sum for each and every possible subset.

Pseudo Code :

maxsum=0

for i=0 upto A.length

    sum=0

    for j=i+1 upto A.length

        sum=sum+A[j]

        if(sum>maxsum)

            maxsum=sum

 

Time Complexity : O(|A|2)

Space Complexity : O(|A|)

 

Kadane's Algorithm

The approach of Kadane's Algorithm is simple. We keep two flags, namely maxSum and currSum which store the global maximum and the local maximum respectively.

At each iteration we update the currSum as the maximum of currSum+currElement and currElement.

·       If currSum + currElement is bigger then, we are adding to a sub-array started at a position before the currElement's position.

·       If currElement is bigger then, we are considering the sub-array that is starting at the currElement's position.

At each iteration, we also update the maxSum by checking if the local maximum is the global maximum i.e. we update maxSum to be the maximum of maxSum and currSum.


Example : Let us consider an array A[]={12, 23, -38, 1, 2}.

  • We initialise maxSum, currSum with the value at A[0].

Array at index 0
  • At index 1, we update the value of currSum=max(12, 12+23)=35 and maxSum=max(12, 35).
Array at index 1

  • At index 2, we update the value of currSum=max((-38), 35+(-38))=-3 whereas the value of maxSum remains unchanged.
Array at index 2

  • At index 3, we update the value of currSum=max(1, 1+(-3))=1 and the value of maxSum remains unchanged.
Array at index 3

  • At index 4, we update the value of currSum=max(2, 2+1)=3 and the value of maxSum remains unchanged.
Array at index 4

Hence, the maximum sub-array sum for this array A is maxSum=35.


Pseudo Code :

maxSum=currSum=A[0]

 

for i=1 upto A.length

    currSum=max(A[i], currSum+A[i])

    maxSum=max(maxSum, currSum)

 

Time Complexity : O(|A|)

Space Complexity : O(|A|)

 

Kadane's Algorithm is one simple algorithm that in a very few lines of code finds the maximum sub-array sum of an array. This algorithm can be applied for business analysis to study sales, profits and losses so as to make judicial business decisions.


Thanks for reading.

 

Tuesday, September 1, 2020

Gentle Introduction to C : Array

Originally Posted on SeedBx 


Welcome to the ninth part of the Gentle Introduction to C series. As said before this series aims to provide you a brief introduction to C language.

In this post, we will learn about a very basic data structure, namely single-dimensional array, often just called array.


Interesting Fact : Array is at the base of computer memory. Computer memory is just a very big array.


Array

Array is a collection of similar objects or items which are stored in a contiguous fashion.

Let us understand this definition by breaking it into two parts.


Collection of Similar Items

Array can be used to store almost any data type. However given the data type, we cannot store data items of any other kind in it.

Example :

Sample Valid Array


The above array is valid and is an example of an int type array.

Sample Invalid Array


However, the above array is invalid because the array contains more than one type, namely char, float and int.


Stored in Contiguous Manner

Array elements, in the computer memory, are stored in contiguous memory locations. Hence by knowing the memory address of any of the elements of an array, we can reach to any other array element by just traversing the memory addresses in a linear fashion.


Note : Array elements in C (and most of the programming languages) can be accessed using their index. Indexes are like placeholder which can be used to get information or value at any position in the array. In C, 0-based indexing is used i.e. the first element is at the 0th index position.

Example :

Indexing in Array


Declaring an Array

An array in C can be declared in three different ways.

1.    Declaration by Specifying Size

  datatype arr_name[size];

2.    Declaration by Initialising Array

  datatype arr_name[]={items, in, a, comma, seperated, list};

3.    Declaration by Specifying Size and Initialising Array

  datatype arr_name[size]={size, items, in, a, comma, seperated, list};

For example,

int a[10];

char name[]={'a', 'a', 'y', 'u', 's', 'h'};

char foo[6]={'r', 'a', 'n', 'd', 'o','m'};

In the above code snippet, three arrays have been declared one, a, of int type of size 10 and the others, name and foo, of char type and size 6.


Note : An array declaration like,

char bar[1]={'m', 'o', 'h', 'a', 'n'};

will not give a compiler error and rather will just give a warning. However this declaration is not encouraged.


Accessing Array Elements

As said above, array elements can be accessed using indexes. In C (and many similar programming languages), indexes start at 0 and are always positive.

To access the nth element in an array arr_name we will write the array name followed by the index in square brackets i.e.

     arr_name[n-1]

For example, in the above given array name, if we want to access the first char, we can do so by

    name[0];


Note :

     arr[-1];

arr[100];

In C, there is no index out of bound checking i.e. if for a given array, arr of size 10, the above code snippet will not give any error, however the results returned will be unexpected and totally garbage.


Array is a data structure that is not only limited to C and rather is fundamental to almost all programming languages. There are some flaws in array, particularly it's static nature, which are overcame in many other array-like data structures. However array is often the starting point for many applications.

There is still much to be said about arrays, in particular of it's other variant i.e. multi-dimensional array but I will leave that for another post.


Thanks for reading.


Also See : Function, Pointers

Wednesday, August 26, 2020

Longest Palindrome : Manacher's Algorithm

Originally Posted on SeedBx


Palindromes are one of the most beautiful strings in the world but finding the longest palindromic substring is one of the most computationally daunting task. In this post I am going to talk about the Manacher’s Algorithm which achieves this in linear time.


Problem Statement

Given a string P, find the longest palindromic substring.


Naive Approach

Try 1 :

Loosely speaking, we can generate all the substring and check each of these substrings whether they are a palindrome or not. To be more concrete, we run an outer loop which iterates over each and every position of the string P, and for each position we run an inside loop, which generates each substring with that position as the starting point. For each of the generated we run a kind of two pointer method to check if that substring is a palindrome or not.

Pseudo Code :

checkPalindrome(S)

for i=0 upto S.Length, j=S.Length-1 to 0

    if S[i]!=S[j]

        return false

return true

 

maxLength=0

maxPalindrome=null

for i=0 upto P.Length

    for j=i+1 upto P.Length

        if checkPalindrome(P[i...j]) and P[i...j].Length>maxLength

            maxLength=P[i...j].Length

            maxPalindrome=P[i...j]


Time Complexity : O(|P|3)

Space Complexity : O(1)


Try 2:

One other way in which we can tackle this problem is by observing that each palindrome has a center along which the string becomes symmetric.

For Example :

·       S=”radar” has a center at ‘d’.

·       T=”aayushhsuyaa” has centres at both the ‘h’s.

Using this fact, we can construct a solution to exploiting this fact. More concretely, we will, for each point check whether an odd-length palindrome has a center at that point and compare its length or if an even-length palindrome has a center in the neighbourhood.

Pseudo Code :

maxLength=0

maxPalindrome=null

 

for i=0 upto P.Length

    // Checking for Odd-Length Palindromes

    low=i-1

    high=i+1

    while low>=0 and high<P.Length and P[low]==P[high]

        if P[low...high].Length>maxLength

            maxLength=P[low...high].Length

            maxPalindrome=P[low...high]

        low--

        high++

       

    // Checking for Even-Length Palindromes

    low=i-1

    high=i

    while low>=0 and high<P.Length and P[low]==P[high]

        if P[low...high].Length>maxLength

            maxLength=P[low...high].Length

            maxPalindrome=P[low...high]

        low--

        high++


Time Complexity : O(|P|2)

Space Complexity: O(1)


Can we do better?

The answer to this lies in Manacher’s Algorithm.


Manacher’s Algorithm

Manacher‘s Algorithm is the most efficient program to find out the longest palindrome in a string. It uses a lot of clever insights to convert the previous quadratic algorithm to a linear algorithm.


Key Insights

1. Each palindrome has a centre of symmetry.

As shown before, each palindrome has a centre of symmetry about which if we expand we can get the palindrome.

Example : S=”ABABA” has a centre at 3rd position.

Using this subtle fact as a backbone, the algorithm uses an auxiliary string which consists of special character (or more a character which is not present in the string) between each of the letter and having two mutually different characters other than the special character to mark the starting and ending of the string.

Example: For string S given above, the auxililary string, T, will look like $#A#B#A#B#A#@.

Using this string as the starting point solves the dilemma of having two centres for even-length palindromes as for an even-length palindrome, the special character serves as the centre.

 

2. Palindromes are symmetric about the centre uptill the boundary.

If we look carefully, right part looks just like the left part of the palindrome. Hence there exists a mirror counterpart for each position of the palinidrome.

Example: Let us take the string S=”ABABA”

Tree Representation of string ABABA

However this may seem more than tempting to assume that len(palindrome(right))=len(palindrome(left)).

But the one thing that we are missing right now, is that we are ignoring anything before and after the palindromic substring “ABABA”.

If we consider the string S=”ABABAB”.

 

Tree Representation of string ABABAB

So we can actually observe that len(palindrome(right))>=len(palindrome(left)).

Now let us consider another string S=”BABABA”.

Tree Representation of string BABABA

So we can observe that len(palindrome(right))>=R-right’s index

Combining the two observations, we get:

If len(palindrome(left)) goes beyond L

len(palindrome(right))>=R-right’s index

If len(palindrome(left)) is within L

len(palindrome(right))>=len(palindrome(left))

Hence len(palindrome(right))>=min(len(palindrome(left)), R-right’s index)

 

In Manacher's algorithm, for every index element, we expand beyond the minimum length that we get from the above established fact to find the actual length of the palindrome.


Note : The auxiliary string that is constructed should have different starting and ending special characters.

Example : T="@#A#B#A#B#A#@" is a wrong auxiliary string as the end characters are also involved in the largest palindromic substring.


Pseudo Code:

A=[]

C=0, R=0

for i=1 upto T.Length

    mirr=2*C-i

    if i<R

        A[i]=min(R-i,A[mirr])

    while(T[i+(1+A[i])]==T[i-(1+A[i])])

        A[i]++

    if i+A[i]>R

        C=i

        R=1+A[i]


Time Complexity : I will not give a rigorous proof of the time complexity, but if you look carefully then at the moment we find a centre, we can at max go till the boundary of the string and hence at the worst case we will traverse at max twice the length of the array which asymptotically O(|P|).

Space Complexity : O(|P|)


Manacher's Algorithm is one of the most beautiful algorithm, which effectively finds the largest palindromic substring. Although there are not many practical applications of Manacher's Algorithm apart from being used in some compression procedures, it gives a different viewpoint towards strings and in particular palindromes.


Thanks for reading.


Also See : Pattern Searching : KMP Algorithm

Tuesday, August 25, 2020

Gentle Introduction to C : Recursion

Originally Posted on SeedBx


Welcome to the eight part of the Gentle Introduction to C series. As said before this series aims to provide you a brief introduction to C language.

In this post, we will be talking about the concept of recursion.


Recursion

Recursion is the process in which a function calls itself directly or indirectly and the corresponding function is called a recursive function.

Example :

void func() {

    // some code...

    func();

    // some code...

}

The above function func() calls itself in the function, so this is an instance when a function calls itself directly in a recursion.

void func1() {

    // some code...

    func2();

    // some code...

}

void func2() {

    // some code...

    func1();

    // some code...

}  

In the above code functions func1() and func2() calls itself, however unlike the previous code both the functions func1() and func2() calls itself indirectly.


Each recursion function consists of two parts :

1.    General (Recursive) Case: The part of the recursive functions which expresses the solution in terms of smaller version of itself.

2.    Base Case: The part of the recursive function for which the solution can be stated non-recursviely.

Let us understand about these parts through an example,

int fact(int n) {

    if(n==1)                // - (i)

        return 1;

    return n*fact(n-1);     // - (ii)

}

In the above program, a function fact() has been described which calculates the factorial of a given number. Here line -(i) is the base case where as line -(ii) is the general case.

Typically to solve a problem using recursion, we often try to break the problem statement in terms of smaller version/ inputs of the given statement and think of a base case for which the solution can be calculated easily.


Note : If the base case is not defined or not reached then a stack overflow error/ problem arises.


Example :

int fact(int n) {

    return n*fact(n-1);

}

The above function will never terminate and hence a stack overflow error occurs.


Memory Allocation to Different Function Calls

Whenever a function is called from main(), a portion of memory is allocated to it on the memory stack. So following the same pattern, whenever a recursive function calls itself a portion of the memory stack is allocated to that particular call on top of the previous call. This process continues until the base case is reached, after which the function calls are popped from the stack in a Last-In-First-Out (LIFO) manner.

For example, the call stack for fact(4) will be,


Note : Stack overflow error also occurs when a lot of function calls/ data are being pushed onto the stack, so that the total calls exceed the memory/ capacity of the stack.


Comparison with Iterative Method

The fact that every recursive solution has an iterative counterpart makes them direct candidates for comparison.

Example :

int fact(int n) {

    int res=1;

    for(int i=1;i<=n;i++)

    res=res*i;

    return res;

}

In the above code, fact() function described earlier is written in an iterative fashion.

Advantages of Recursion over Iteration

Recursive programs are often clean and easy to read and understand.

Disadvantages of Recursion over Iteration

·       Recursion requires higher space requirements as all the function calls remain in the memory stack until the base case is reached.

·       Recursion programs are slow, with respect to time, compared to their iterative counterparts due to all the function calls and returns overhead.


Recursion is a programming concept that is not only limited to C and expands to every programming language. Using recursion is one of the many ways to efficiently use computational powers to do difficult tasks and problems.


Thanks for reading.

Also See : Function, main() Function