Exercise: Recursive String Permutations

The challenge, from Interview Cake weekly problem #399, is to write a recursive function that generates all permutations of an input string. They should all be returned as a set, and don't worry about time or space complexity. I decided to give this one a shot. One can assume that every character in the string is unique.

That last condition means that I can just use a set to contain my characters, which is handy since it means no duplicates.

Before I start, I'll point out: any attempt to generate all permutations is necessarily going to be large. For any set of n unique elements, there will be n-factorial permutations. That escalates very quickly and it won't be too long before your workstation is going to run out of memory. I never went above 8 characters in my approaches.

First Go: The Naive Approach

First, I established the basis case. If you have a string with only a single character in length, there is only one possible permutation. So, you just return that string.

When you have more than one character, you remove a character from that set and make a recursive call with the remaining characters. When the response comes back, you take the character you removed and intersperse it among all the strings in your result set. So, in the case of the string 'ab', you remove 'a' and call the function again with just the string 'b' - this will return a set containing only a single string. Then I do the interspersing: I put 'a' in position 0, and then again in position '1', resulting in the strings 'ab' and 'ba.'

This generates every permutation of those characters together. You then bundle all of those strings into a new set and return it.

What are the problems with this approach?

For one thing, there is a lot of repeated work. You will be making a lot of recursive calls on strings you already know the permutations of. Think of it this way: Let's say my input string is 'abcde.' There are 120 permutations of this string. But what happens when I omit 'a' and run the function again for 'bcde' - how many times will I call a function for a combination of 'de?' Quite a few! Optimizations are needed to prevent the number of unnecessary recursive calls.

Secondly, the complexity is very high. It's guaranteed to be at least n-factorial complexity, but it's actually worse than that because of the for loops contained within the recursive call (given that I need to iterate over each character in the set, and for each character, over the results of the recursive call. Yikes!

I can optimize this by using memoization.

Second Attempt: Memoization

With memoization, you maintain data so you don't have to make repeated calls for data you already know. An example of how this could be done in other classic recursive examples, like generating Fibonacci numbers, is to maintain an array of size n, where n is the input number, and store values therein once you calculate them. That way all of the Fibonacci numbers are calculated only once, and repeated calls are replaced with references to the right position in the array.

Of course, we're dealing with strings, not numbers. We can't use an array. A better option might be to use a hash map, with a string as a key and a set of permutations of that string as the value.

I did this with my second attempt, and it definitely resulted in fewer recursive calls than my first. In fact, the recursive function was only called 31 times for a 5 character string, as opposed to 206 times with the naive approach. That's a huge improvement, though there is still high complexity due to the loop structures within the recursive calls. And the space requirements are still going to be large, as the memo structure is going to contain an entry for just about every possible substring.

Let's see if there's another option I can take with fewer space requirements.

Third Attempt: Heap's Algorithm

There is a third option that requires no additional space: heap's algorithm.

The function (in my implementation) has three parameters: the input, a position, and a set to contain the results (if you just want to print the results, you won't need this). The position initially begins as the length of the string.

The basis case is when the position is equal to one. At this point you have a permutation. Add it to the results!

Then there is the for loop, which begins at 0 and ends at position. It makes a recursive call with position - 1. Then, it makes a swap: if position is even, it swaps input[0] with input[position - 1]. If odd, it swaps input[i] with input[position - 1].

This is enough to generate all permutations, and with using no additional space to boot (aside from the space to store the results).

Which performed best?

With three different approaches, I decided to do some performance testing. I ran each version of the algorithm ten times with a different sized input - ranging from 3 to 8 - and recorded the average run time in milliseconds. The results were very clear: Heap's Algorithm was the best performer.

It's not just the lower run-time, it also increases at a lower rate than the other two. I wanted to go above 8 for more data points, but going up to 9 took too long.

So I think we have our winner. My naive approach was obviously not going to do the job, memoization improves the run time but at the cost of more memory consumed. Heap's algorithm gives the best of both words: better run-time and no additional space constraints.

Mark's Blog

Search This Blog