The shortest superstring problem

I was introduced to the shortest superstring problem by a request I got on Fiverr. The request turned out to be a waste of time but I really enjoyed learning about it.

Problem statement

The shortest common superstring problem aims to find a string with a minimal length that contains every string in a given set.

shortest superstring problem

Example

Suppose the given set has two strings: ABA and AAB.

We could say ABAAB contains both ABA and AAB by simply merging the last character in ABA and the first character in AAB, both being A. The resulting superstring has 5 characters. This is a possible “solution”.

Another way to do it is to flip the order and merge the last two characters in AAB and the first two in ABA. We then get AABA. This contains both ABA and AAB and has only 4 characters. So this one is shorter than the first “solution”. We then say this one is a better solution, a more optimal solution.

This one is after all the best solution for the given set.

Relevance

As a professor in grad school used to say: “So what?”. We care because this can help in various ways. The most commonly cited uses are in DNA research and in the development of data compression algorithms.

The difficulty

This problem is a lot harder than it looks. The problem will grow exponentially more complex with more strings in the set, longer strings, and more distinct characters.

It is a somewhat known and classical problem in computer science and can be “solved” but it is considered an NP-complete problem. Basically, it means that it will be very hard to solve in a timely manner. I mentioned similarly hard problems in a previous post.

These problems are so hard to solve that they are usually tackled with a “greedy algorithm” which simply provides an approximate, good (but not the best) solution in a very efficient and quick way.

“Future work”

In a separate post, I will show how I implemented the greedy algorithm using VBA, as I am not a programmer but I repeatedly end up doing programmatic things, and VBA is very easy to work with.

And hopefully, in a third post, I will show how I combined the greedy algorithm with some optimization to get a better solution than with the greedy algorithm alone.

Balloons 728x90 Made Easy
Things that I use, like, and am affiliated with:
Google Fi offers great cell phone service in 120 countries, get $20 off using the link. Get discounted phones with service activation and no contract.
Uber and Lyft are offering discount rates on your first rides using the links.
AirBnB where you can be home anywhere in the world; get up to $55 off with the link.
I never spend money before I check Mr Rebates, Raise, Ebates or Honey to get cashbacks, rebates, discounts, coupons or cheaper gift cards.
Personal Capital is a free online financial tool that allows you to see all your money-related accounts and analyze allocation, performance, risk, fees, etc and decide and plan accordingly. Get $20 for opening an account with the link
This blog is hosted at Hostgator

Leave a Reply