System Design - TinyURL

Problem

Please design a service like TinyURL, a URL shortening service, a web service that provides short aliases for redirection of long URLs.

Thoughts

Basically we need a one to one mapping to get shorten URL which can retrieve original URL later. This will involve saving such data into database. Things to check:

  • What's the traffic volume?
  • What's the mapping function?
  • How long is the shortened URL?
  • Single machine or multiple machines?
Traffic

Twitter has about 300M active users per month. Let's assume we are 10% popular as Twitter and each user generates 1 shortened URL per day. This leads to 30M service calls per month (1M calls/day). If we are going to keep our service for 5 years, our service will generate about 1.8B records.

URL length

Shortened URL can be combinations of numbers(0-9) and characters(a-Z). If we have shortened URL length as k, it gives us 62^k unique combinations. Let's set k = 6, this will render 56.8B combinations which is far more than what we need.

Database data model

The table design is simple. We just need original_url and short_url with id. Since id is auto increment, we can simply use it as our hash key to generate shortened URL.

+--------------------+----------------------+-------------+
| id (auto increment)| original_url         | short_url   |
+--------------------+----------------------+-------------+
|                    |                      |             |
+--------------------+----------------------+-------------+

Implementation

After inserted original_url into the database, we get the id for following Python function. (base = 62 as we are using 0-9 a-Z)

def getShortURL(id, base):
    res = []
    while id > 0:
        digit = id % base
        res.append(digitMap(digit))
        id /= base
    while len(res) < 6:
        res.append(digitMap(0))
    return ''.join(reversed(res))

def digitMap(digit):
    if digit <= 9:
        # 0-9
        return chr(48+chr)   
    elif digit <= 35:
        # A-Z
        return chr(65+digit)
    else:
        # a-z
        return chr(97+digit)

To generate a shortened URL, it takes O(k) runtime complexity.

Multiple machines

If we are dealing with massive data of our service, distributed storage can increase our capacity. The idea is simple, get a hash code from original URL and go to corresponding machine then use the same process as a single machine. For routing to the correct node in cluster, Consistent Hashing is commonly used.

Following is the pseudo code for example,

Get shortened URL
  • hash original URL string to 2 digits as hashed value hash_val
  • use hash_val to locate machine on the ring
  • insert original URL into the database and use getShortURL function to get shortened URL short_url
  • Combine hash_val and short_url as our final_short_url (length=8) and return to the user
Retrieve original from short URL
  • get first two chars in final_short_url as hash_val
  • use hash_val to locate the machine
  • find the row in the table by rest of 6 chars in final_short_url as short_url
  • return original_url to the user

Comments

comments powered by Disqus