Index Server Faq
How to deal with Noise Words
Q. When I issue the following query
Select * from TableName where (*, '"The rain in Spain stays mainly in the plain"')
I get the following message:
Server: Msg 7619, Level 16, State 1, Line 1 Execution of a full-text operation failed. A clause of the query contained only ignored words.
How to I handle this?
A. This is a problem with SQL 2000 & 7 Full Text Search. It is also a problem with other Microsoft search servers, for instance Index Server, and Indexing Services also generate this error. This error is caused because at least one search argument or token in your search phrase is considered to be a noise word.
There are several strategies for getting around this.
The best option is edit your noise word list (%programfiles%\microsoft sql server\mssql\ftdata\sqlserver\config) and called noise.ext, where extension is coded after your language name. For instance for US English, the noise word list is noise.enu, and for the Queens English, it is noise.eng. Remove all the words and symbols you find there and replace it with a single space.
If you don't replace it with a single space, SQL Server will use the indexing services noise word lists found in %windir%\system32.
Some people raise objections to this as it does cause your catalog to swell and there is the perceived decrease in both indexing and querying performance. Indexing does take slightly longer, but only at the tail end of your indexing process when your catalog is being compiled. Querying performance is not measurably impacted.
A Microsoft search person, told me that they were consdering with doing away with noise word lists altogether as they were so problematic for them, and also as disk space is now so cheap. Noise word lists are fossilized remnants of a time when disk space was more expensive. Noise word lists are also more meaningful and necessary for internet seach engines. For example the MSN desktop search engine does not seem to care about noise words at all. Nor does it appear that search.msn.com does either.
One other factor when you empty your noise word list, is that you will find that ranking is depressed, in other words hits that before would generate a rank of 400, will now generate ranks of perhaps 300. The standard deviation of the ranking also goes down - in otherwords, before you might see a ranking differential of 50 between hits; with no noise word list you might see a differential of 30.
Another options are using a FreeText search. This has the drawback of perhaps returning more search results, which are sometimes false positives. Also you can also get the "A clause of the query contained only ignored words" error message when you only search on noise words with a Freetext search as well.
A third option is to parse at the client. I have included a link to a page that will parse out the noise words in the search phrase before sending it back to the web server. Click here to download the file.
Click here for a xp to parse the noise words out at the client. To get this to work, copy the dll to C:\Program Files\Microsoft Sql Server\Mssql\Binn.
Then issue the following commands:
declare @input char(255)
declare @output char(255)
set @input ='this is a string with a lot of noise words in it'
exec xp_parser @output OUTPUT, @input
set @input ='this is a jennifer with a lot of noise words in it'
exec xp_parser @output OUTPUT, @input
Notice how the output for the second phrase is
'jenny lot noise words'
We have replaced Jennifer with Jenny.
Finally here is some tsql to parse the noise words out. I present this merely as a proof of concept. The performance of this is so painful, that it will be useless in a production environment.
Create a table called parser:
create table parser (col1 char(50), spid_id int)
and then run this stored procedure
Create procedure parse_this_string @str char(255), @rtn_str char(255) OUTPUT
set nocount on
declare @count int
declare @spid int
declare @string char(255)
declare @number_of_white_spaces int
???calculating the number of white spaces
select @number_of_white_spaces =len(@str) - len(replace(@str,' ',''))
???iterating through the search phrase
while (@count<= @number_of_white_spaces)
???inserting each word in the search phrase into the table parser along with the
???spid of the client
insert into parser (col1,spid_id)values (rtrim(substring(@str,0,charindex(char(32),@str))), @@spid)
select @str=ltrim(substring(@str, charindex(char(32),@str),255))
???select only non noise words from your search string (by seeing what words are
???not in the noise word table), and building the search string
select @string =rtrim(@string)+ ' contains(au_lname,'+char(39)+rtrim(col1)+char(39)+') or ' from parser where col1 not in (select col001 from noise)
???deleteing this spid???s entries from the parser table delete from parser where spid_id=@@spid
???putting the front part on your search string and removing the final or
select @rtn_str ='select * from authors where' + substring(@string,0,len(@string) 2)
declare @output nchar(255)
exec parse_this_string 'test1 me myself and i Hilary', @output output
exec sp_executesql @output
Q. I'm having trouble with Editing Noise Words files on Clustered SQL server. It
appears that when virtual server sits on first node FTS uses C:\Program
designed by :: smilla group