Did You Mean… Search Suggestions in the Gribly?

Posted on October 9th, 2007 by

Finding people in the Gribly just got a lot easier. That’s right, you will now get helpful suggestions that may help you find who you are looking for, especially when you don’t know how to spell someone’s name.

This is a popular feature in many search engines including Google and we are very excited to add it to the Gribly. We hope that this will save valuable time and frustration while providing more relevant results in Gribly searches.

For instance, if you search for “Peterson,” the Gribly will suggest “Pederson,” “Petersen,” and “Patterson.” Pretty slick.

How did you do it?

On the technical side of things, most of the magic is done using a combination of the soundex algorithm and Levenshtein distance of your search term and possible other matches.

Thankfully, MSSQL has some built-in SOUNDEX functions, so that part was a snap. Calculating the Levenshtein distance was a bit more work, but thanks to Joseph Gama we were able to borrow some functions that did most of the heavy lifting for us.

 


7 Comments

  1. Subhash Pant says:

    It would be nice if you could share the code, and if not, at least some expertise. I’ve been playing around with this, and so far, I’ve got upto Levenshtein code only. During QA, some of the words failed. Few questions that I have is:

    Levenshtein allows you to calculate the distance between words too. This is how, if one doesnt match, other will. Thinking of a scenario where I am using a geographical database, where the occurance of a word is once and not more than once, would you suggest to go ahead with soundex? Because sounds to be like soundex is for comparing words which are similar, but among many similar words.

    I would appreciate when you get back to this.

  2. Joe Lencioni says:

    We have a suggestion function that accepts the search terms, performs a query on our database table that contains people’s names, and returns a few similar names that are in the table.

    Here’s a part of the query where the magic happens:
    SELECT DISTINCT lastName, CONVERT(INT, database.owner.LEVENSHTEIN(lastName, '$lastName')) AS score
    FROM table
    WHERE lastName <> '$lastName'
    AND SOUNDEX('$lastName') = SOUNDEX(lastName)

    So here, the SOUNDEX will get us close to some words and then LEVENSCHTEIN will rank them for us.

    In our implementation, we also union the top 3 last name results with the top 3 first name results and then take the top 3 of all of those and return those suggestions to the browser.

    I’m not entirely positive, but it seems like a similar implementation may work for you. Let us know what you come up with.

  3. Subhash Pant says:

    Thank you very much. I was able to accomplish this by doing levenshtein only, however. i heard that soundex was a bit slower based on algorithm, but i still have to see how efficient it is in the MS SQL server. Your help on this regard is much beneficial to me.

    -SP

  4. Lucy M says:

    Hello, I want to add the ‘did you mean’ feature (like on Google) on my shareware site, where a lot of users have mistypes, and they get 0 results… so they exit the site very fast. Do you have any idea how I could implement such a thing? I tried to use your suggestion, but could not make it work… Does it require a dictionary or something? I found some scripts, but none of them gave relevant results… just stupid, irrelevant suggestions. Any help would be appreciated…
    Thanks, Lucy

  5. Joe Lencioni says:

    @Lucy M: To my knowledge, it does not require any special dictionary. For our results we are retrieving suggestions based on other names in our directory database. One option would be to log each search query with some additional statistical information such as how many results were returned and how many people click on relevant results. Then you can use that information to suggest to the user the one that was a good match and a good score based on the number of results returned, query frequency, and how many results were clicked.

    Otherwise, maybe you are using a database other than MS SQL (MySQL maybe?). The information that I posted is for MS SQL and probably won’t work in MySQL. There is a SOUNDEX function in MySQL but I don’t know about a built-in MySQL function that calculates the Levenschtein distance. It looks like there may be some solutions for Levenschtein.

  6. Subhash Pant says:

    The way I did it was:

    Used levenshtein distance to calculate the possible matches, built a subset of 100s and then did a soundex on it. Depending on the database server that you are using, the functions should be self explanatory. However, using stored procedure like Joe mentioned is very effective in terms of time.

  7. Thanigainathan.S says:

    Hi,

    Really a nice post. I have wondered for these kinds of features.

    Thanks,
    Thani