fperkins.com – Frank Perkins
Retro Gaming Image

Personal blog of Frank Perkins

Make SQL Server 2000 Full Text Search Accent Insensitive

Category: tech

For anyone that has used Microsoft Full Text Search to replace inefficient like queries, you know it’s been a huge help. I have seen performance gains increase from 1000-3000+ms down to 100-500ms at most. Definitely the way to go if you’re building your own search queries in a MS SQL 2000 database.

However, and this is a big however, by default, it’s accent sensitive. It doesn’t matter if your DB is setup as accent insensitive, it will still be AS. From reading various newsgroups articles, I could find only three ways around this problem:

  1. Upgrade to SQL Server 2005. This obviously isn’t an option in most cases cause it probably going to be a lot of work to migrate all your systems over
  2. Install the evaluation copy of MS Sharepoint Server. Apparently there is a “patch” that will make the full text catalogs AI. However, this sounded like a hack and I wasn’t about to start installing SPS.
  3. Programmatically fix it yourself

Luckily, the process in which we search the FTC was already setup to be “patched”. Meaning, using stored procs, we combine all our data into one “super meta table” which can easily be searched. e.g. comment + articleName + articleTitle = searchText

Now to get it to be AI, was simply to append the AI version of the word to the searchText column. You can do this through a Data Driven Query Task DTS job setup as follows:

  • Source: The table you are looking to fix
  • Bindings: Create a temporary table and add two columns: The unique identifier and the column you are making AI
  • Transformations: Setup ActiveX script transformations between the two columns in the two tables. e.g. id -> id and text -> text. For the text transformation, you will need to modify this snippet to match your own personal needs, but this give you the general idea of how I’m stripping the accents.
    '**********************************************************************
    '  Visual Basic Transformation Script
    '************************************************************************
    
    '  Copy each source column to the destination column
    Function Main()
    myArray = split(rtrim(DTSSource("searchText")), " ")
    addString = " "
    
    for each i in myArray
    dim strCheckAccent
    strCheckAccent = CheckAccent(lcase(i))
    if strCheckAccent  <> "" then
    addString = addString +  strCheckAccent + " "
    end if
    Next
    
    DTSDestination("varSearchText") = rtrim(DTSSource("searchText")) + addString
    Main = DTSTransformstat_UpdateQuery
    End Function
    
    function  CheckAccent(pWord)
    dim found
    found = 0
    if InStr(pWord, "à") then
    pWord = replace(pWord,"à","a")
    found = 1
    end if
    if InStr(pWord, "â") then
    pWord = replace(pWord,"â","a")
    found = 1
    end if
    if InStr(pWord, "é") then
    pWord = replace(pWord,"é","e")
    found = 1
    end if
    if InStr(pWord, "è") then
    pWord = replace(pWord,"è","e")
    found = 1
    end if
    if InStr(pWord, "ê") then
    pWord = replace(pWord,"ê","e")
    found = 1
    end if
    if InStr(pWord, "ë") then
    pWord = replace(pWord,"ë","e")
    found = 1
    end if
    if InStr(pWord, "î") then
    pWord = replace(pWord,"î","i")
    found = 1
    end if
    if InStr(pWord, "ï") then
    pWord = replace(pWord,"ï","i")
    found = 1
    end if
    if InStr(pWord, "ô") then
    pWord = replace(pWord,"ô","o")
    found = 1
    end if
    if InStr(pWord, "ù") then
    pWord = replace(pWord,"ù","u")
    found = 1
    end if
    if InStr(pWord, "û") then
    pWord = replace(pWord,"û","u")
    found = 1
    end if
    if InStr(pWord, "ü") then
    pWord = replace(pWord,"ü","u")
    found = 1
    end if
    if InStr(pWord, "ç") then
    pWord = replace(pWord,"ç","c")
    found = 1
    end if
    if InStr(pWord, "œ") then
    pWord = replace(pWord,"œ","oe")
    found = 1
    end if
    if found = 1 then
    checkAccent = pWord
    else
    checkAccent = ""
    end if
    end function
  • Queries: I do this for the Update Query Type: update flat_table set searchText = ? where uniqueId = ? Then make sure parameter 1 and parameter 2 are setup to point to the temporary table.
  • That’s it really. What’s happening here is it’s taking the original text and simply appending the accent insensitive variation of the words to the end and repopulating it back into the same table.

    e.g. “1/2 lb de bœuf haché maigre, cuit et égoutté” becomes “1/2 lb de bœuf haché maigre, cuit et égoutté boeuf hache egoutte”

No Comments, Comment or Ping

Reply to “Make SQL Server 2000 Full Text Search Accent Insensitive”