Make SQL Server 2000 Full Text Search Accent Insensitive
Category: tech
For anyone that has used Microsoft Full Text Search to replace inefficient like queries, you know it’s been a huge help. I have seen performance gains increase from 1000-3000+ms down to 100-500ms at most. Definitely the way to go if you’re building your own search queries in a MS SQL 2000 database.
However, and this is a big however, by default, it’s accent sensitive. It doesn’t matter if your DB is setup as accent insensitive, it will still be AS. From reading various newsgroups articles, I could find only three ways around this problem:
- Upgrade to SQL Server 2005. This obviously isn’t an option in most cases cause it probably going to be a lot of work to migrate all your systems over
- Install the evaluation copy of MS Sharepoint Server. Apparently there is a “patch” that will make the full text catalogs AI. However, this sounded like a hack and I wasn’t about to start installing SPS.
- Programmatically fix it yourself
Luckily, the process in which we search the FTC was already setup to be “patched”. Meaning, using stored procs, we combine all our data into one “super meta table” which can easily be searched. e.g. comment + articleName + articleTitle = searchText
Now to get it to be AI, was simply to append the AI version of the word to the searchText column. You can do this through a Data Driven Query Task DTS job setup as follows:
- Source: The table you are looking to fix
- Bindings: Create a temporary table and add two columns: The unique identifier and the column you are making AI
- Transformations: Setup ActiveX script transformations between the two columns in the two tables. e.g. id -> id and text -> text. For the text transformation, you will need to modify this snippet to match your own personal needs, but this give you the general idea of how I’m stripping the accents.
'********************************************************************** ' Visual Basic Transformation Script '************************************************************************ ' Copy each source column to the destination column Function Main() myArray = split(rtrim(DTSSource("searchText")), " ") addString = " " for each i in myArray dim strCheckAccent strCheckAccent = CheckAccent(lcase(i)) if strCheckAccent <> "" then addString = addString + strCheckAccent + " " end if Next DTSDestination("varSearchText") = rtrim(DTSSource("searchText")) + addString Main = DTSTransformstat_UpdateQuery End Function function CheckAccent(pWord) dim found found = 0 if InStr(pWord, "à") then pWord = replace(pWord,"à","a") found = 1 end if if InStr(pWord, "â") then pWord = replace(pWord,"â","a") found = 1 end if if InStr(pWord, "é") then pWord = replace(pWord,"é","e") found = 1 end if if InStr(pWord, "è") then pWord = replace(pWord,"è","e") found = 1 end if if InStr(pWord, "ê") then pWord = replace(pWord,"ê","e") found = 1 end if if InStr(pWord, "ë") then pWord = replace(pWord,"ë","e") found = 1 end if if InStr(pWord, "î") then pWord = replace(pWord,"î","i") found = 1 end if if InStr(pWord, "ï") then pWord = replace(pWord,"ï","i") found = 1 end if if InStr(pWord, "ô") then pWord = replace(pWord,"ô","o") found = 1 end if if InStr(pWord, "ù") then pWord = replace(pWord,"ù","u") found = 1 end if if InStr(pWord, "û") then pWord = replace(pWord,"û","u") found = 1 end if if InStr(pWord, "ü") then pWord = replace(pWord,"ü","u") found = 1 end if if InStr(pWord, "ç") then pWord = replace(pWord,"ç","c") found = 1 end if if InStr(pWord, "œ") then pWord = replace(pWord,"œ","oe") found = 1 end if if found = 1 then checkAccent = pWord else checkAccent = "" end if end function - Queries: I do this for the Update Query Type: update flat_table set searchText = ? where uniqueId = ? Then make sure parameter 1 and parameter 2 are setup to point to the temporary table.
That’s it really. What’s happening here is it’s taking the original text and simply appending the accent insensitive variation of the words to the end and repopulating it back into the same table.
e.g. “1/2 lb de bœuf haché maigre, cuit et égoutté” becomes “1/2 lb de bœuf haché maigre, cuit et égoutté boeuf hache egoutte”
No Comments, Comment or Ping
Reply to “Make SQL Server 2000 Full Text Search Accent Insensitive”