MySQL can perform boolean full-text searches using the
IN BOOLEAN MODE modifier:
mysql>SELECT * FROM articles WHERE MATCH (title,body)->AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);+----+-----------------------+-------------------------------------+ | id | title | body | +----+-----------------------+-------------------------------------+ | 1 | MySQL Tutorial | DBMS stands for DataBase ... | | 2 | How To Use MySQL Well | After you went through a ... | | 3 | Optimizing MySQL | In this tutorial we will show ... | | 4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... | | 6 | MySQL Security | When configured properly, MySQL ... | +----+-----------------------+-------------------------------------+
The + and - operators
indicate that a word is required to be present or absent,
respectively, for a match to occur. Thus, this query retrieves
all the rows that contain the word “MySQL” but that
do not contain the word
“YourSQL”.
In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which
+ stands for AND
- stands for NOT
[no operator] implies
OR
Boolean full-text searches have these characteristics:
They do not use the 50% threshold.
They do not automatically sort rows in order of decreasing relevance. You can see this from the preceding query result: The row with the highest relevance is the one that contains “MySQL” twice, but it is listed last, not first.
They can work even without a FULLTEXT
index, although a search executed in this fashion would be
quite slow.
The minimum and maximum word length full-text parameters apply.
The stopword list applies.
The boolean full-text search capability supports the following operators:
+
A leading plus sign indicates that this word must be present in each row that is returned.
-
A leading minus sign indicates that this word must not be present in any of the rows that are returned.
Note: The - operator acts only to exclude
rows that are otherwise matched by other search terms. Thus,
a boolean-mode search that contains only terms preceded by
- returns an empty result. It does not
return “all rows except those containing any of the
excluded terms.”
(no operator)
By default (when neither + nor
- is specified) the word is optional, but
the rows that contain it are rated higher. This mimics the
behavior of MATCH() ...
AGAINST() without the IN BOOLEAN
MODE modifier.
> <
These two operators are used to change a word's contribution
to the relevance value that is assigned to a row. The
> operator increases the contribution
and the < operator decreases it. See
the example following this list.
( )
Parentheses group words into subexpressions. Parenthesized groups can be nested.
~
A leading tilde acts as a negation operator, causing the
word's contribution to the row's relevance to be negative.
This is useful for marking “noise” words. A row
containing such a word is rated lower than others, but is
not excluded altogether, as it would be with the
- operator.
*
The asterisk serves as the truncation (or wildcard)
operator. Unlike the other operators, it should be
appended to the word to be affected.
Words match if they begin with the word preceding the
* operator.
If a stopword or too-short word is specified with the
truncation operator, it will not be stripped from a boolean
query. For example, a search for '+word
+stopword*' will likely return fewer rows than a
search for '+word +stopword' because the
former query remains as is and requires
stopword* to be present in a document.
The latter query is transformed to +word.
"
A phrase that is enclosed within double quote
(“"”) characters matches
only rows that contain the phrase literally, as it
was typed. The full-text engine splits the phrase
into words, performs a search in the
FULLTEXT index for the words. Prior to
MySQL 5.0.3, the engine then performed a substring search
for the phrase in the records that were found, so the match
must include non-word characters in the phrase. As of MySQL
5.0.3, non-word characters need not be matched exactly:
Phrase searching requires only that matches contain exactly
the same words as the phrase and in the same order. For
example, "test phrase" matches
"test, phrase" in MySQL 5.0.3, but not
before.
If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stopwords or shorter than the minimum length of indexed words, the result is empty.
The following examples demonstrate some search strings that use boolean full-text operators:
'apple banana'
Find rows that contain at least one of the two words.
'+apple +juice'
Find rows that contain both words.
'+apple macintosh'
Find rows that contain the word “apple”, but rank rows higher if they also contain “macintosh”.
'+apple -macintosh'
Find rows that contain the word “apple” but not “macintosh”.
'+apple ~macintosh'
Find rows that contain the word “apple”, but if
the row also contains the word “macintosh”,
rate it lower than if row does not. This is
“softer” than a search for '+apple
-macintosh', for which the presence of
“macintosh” causes the row not to be returned
at all.
'+apple +(>turnover <strudel)'
Find rows that contain the words “apple” and “turnover”, or “apple” and “strudel” (in any order), but rank “apple turnover” higher than “apple strudel”.
'apple*'
Find rows that contain words such as “apple”, “apples”, “applesauce”, or “applet”.
'"some words"'
Find rows that contain the exact phrase “some
words” (for example, rows that contain “some
words of wisdom” but not “some noise
words”). Note that the
“"” characters that enclose
the phrase are operator characters that delimit the phrase.
They are not the quotes that enclose the search string
itself.

User Comments
Just having a negated term (such as -apple or ~orange) will always return null in boolean mode full text searches.
For example if +apple -macintosh returns:
apple orange
then -macintosh returns the empty set rather than the same or larger result. This seems like buggy behavior.
A workaround is to use NOT MATCH.....
I have found this with both 4.0.16 and 4.0.18
Keep in mind that although MATCH() AGAINST() is case-insensitive, it also is basically **accent-insensitive**. In other words, if you do not want _mangé_ to match with _mange_ (this example is in French), you have no choice but to use the BOOLEAN MODE with the double quote operator. This is the only way that MATCH() AGAINST() will make accent-sensitive matches.
E.g.:
SELECT * FROM quotes_table WHERE MATCH (quote) AGAINST ('"mangé"' IN BOOLEAN MODE)
For multiword searches:
SELECT * FROM quotes_table MATCH (quote) AGAINST ('"mangé" "pensé"' IN BOOLEAN MODE)
SELECT * FROM quotes_table MATCH (quote) AGAINST ('+"mangé" +"pensé"' IN BOOLEAN MODE)
Although the double quotes are intended to enable phrase searching, just like any web search engine for example, you can also use them to signify single words where accents and other diacritics matter.
The only drawback to this method seems to be that the asterisk operator is mutually exclusive with the double quote. Or I just haven't been able to combine both effectively.
Be careful with the phrase search when short words are involved!
Words that are shorter than the minimum word length (by default, words with up to 3 characters) are sometimes taken into consideration when you search for phrases, but sometimes not!
Example 1:
A search for the phrase "the creation" will find all records that really contain this phrase, and only those. So, a record containing only "la creation du monde", even without the accent aigu on the e in creation, won't be found. This is just fine and what one would expect.
Example 2: A search for the phrase "let it be" won't find any record, not even records containing something like "The Beatles: Let It Be". According to the MySQL team, this is not a bug.
I personally find it very counterintuitive to sometimes take short words into consideration for phrase searches, but only if there is at least one properly long word in the search phrase.
Why the default value of minimum character length is 4?
there are many words with 4character is meaningful, e.g. see, hot, hat, pie,... Is there any techinical problem when setting this default value to 3?
>Just having a negated term (such as -apple or ~orange) will always return null in boolean mode full text searches.
I have this problem too. Anyone can help?
For those of you who interface MySQL with PHP and wonder what the problem is with getting "exact phrases" working properly, here's the way to go.
$query= "SELECT code, category, header, date FROM articles WHERE MATCH (text,header,summary) AGAINST ('" . stripslashes (str_replace (""", "\"", ($_POST['keywords']))) . "' IN BOOLEAN MODE)";
PHP, or some setups or with some browsers, convert double quotes from POST data to their HTML-equivalents even without being asked to do that. The above will fix the issue. Stripslashes() is optional, I prefer to keep it in to keep things looking clean, though the \" doesn't seem to break the boolean literal search.
It's also possible to create a prioritized boolean query with the following SQL:
SELECT id, text, MATCH (text) AGAINST ('word1 word2 word3' in boolean mode)
AS score FROM table1
WHERE MATCH (text) AGAINST ('word1 word2 word3' in boolean mode) order by score desc;
According to the last comment by Rob Thorpe it's possible to prioritize the boolean query like so:
SELECT id, text, MATCH (text) AGAINST ('word1 word2 word3' in boolean mode)
AS score FROM table1
WHERE MATCH (text) AGAINST ('word1 word2 word3' in boolean mode) order by score desc;
I tried this and it failed to work, i.e. all the scores turned out to be '1' even though the number of matches differed from record to record.
Boolean searching has two deficiencies: 1) results are not sorted by relevance and; 2) no method by which to weigh certain columns. There is a way around both of these problems. For example, if I have a table of articles and want to weigh the title more heavily than the text, I can do the following:
SELECT *, ( (1.3 * (MATCH(title) AGAINST ('+term +term2' IN BOOLEAN MODE))) + (0.6 * (MATCH(text) AGAINST ('+term +term2' IN BOOLEAN MODE))) ) AS relevance FROM [table_name] WHERE ( MATCH(title,text) AGAINST ('+term +term2' IN BOOLEAN MODE) ) HAVING relevance > 0 ORDER BY relevance DESC;
Here we artificially manipulate the relevancy score to give title more weight by multiplying by the constant 1.3. In the above query, it doesn't seem to matter whether I have 3 fulltext indexes or just one comprising the title and text columns. From my testing, the results appear to be the same.
In response to the note above Posted by Adam George on December 13 2004 7:32pm:
In my tests it would seem that the score returned is an integer equal to the number of words matched. So if you match on 3 words the scores will range from 1 to 3. If you match only on one word, or only one word is matched in any document, then the scores would all be 1.
In response to Vincent Tsang:
The default value of minimum character length (which is 4) can be modified. It's #defined as ft_min_word_len at myisam/ft_static.c of you sources.
Of course you can always use mysql variables (read http://dev.mysql.com/doc/mysql/en/Fulltext_Fine-tuning.html)
Hope this works!
George Gkotsis
In response to Joe Laffey and Adam George:
To enhance sorting of the results in boolean mode you can do the following:
SELECT id, text, MATCH (text) AGAINST ('word1 word2 word3')
AS score FROM table1
WHERE MATCH (text) AGAINST ('+word1 +word2 +word3' in boolean mode) order by score desc;
Using the first MATCH() we get the score in non-boolean search mode (more distinctive). The second MATCH() ensures we really get back only the results we want (with all 3 words). If you want to do 'any of the words' search only, it's better to use non-boolean search instead (unless you are using boolean in order to get rid of 50% treshold limit).
I'm not sure why MATCH/AGAINST uses a different scoring method when in boolean mode and when it's not. As stated above, if searching 3 terms in boolean mode, the score will be between 1 and 3 (integer). However, if not in boolean mode, the score is a floating point value.
It seems that non-boolean mode returns a "real" relevancy (based on how often each term was found I presume). In boolean mode it only returns how many terms were found. To me, this is not really relevancy. For instance, if searching on 2 terms, one result may have 20 occurrences of each term and another may have only 1 occurrence of each word, yet they will both return "2" as their relevance.
This seems to work well for ranking relevance in boolean queries:
select products_id,match(products_model) against ('printer' ) as Relevance
from products
where match(products_model) against ('+"printer"' in boolean mode)
Hi,
if You want to combine the phrase search with the *, simply search for the words separately and apply a having clause.
For example:
If You wish to find `bird cathcing`, `bird cathcer`, `bird cathers`
try this
SELECT column, MATCH( column ) AGAINST ('bird catch' IN BOOLEAN MODE) AS rank
FROM mytable
WHERE MATCH( column ) AGAINST ('bird catch' IN BOOLEAN MODE) > 0
HAVING column LIKE '%bird catch%'
ORDER BY rank DESC
I know the HAVING clause is slow, but it is only allpied to the rows that match the search criteira.
Have'n tested on big tables, but I have a feeling it works just fine.
I find that IN BOOLEAN MODE is the only thing that actually returns results back. I don't understand why either. Also, I found that utf8_bin causes a lot of issues with FULLTEXT searching, (not reporting a bug), most likely because it's some sort of CASE SENSITIVE, or something kind of search, however I changed from utf8_bin to utf8_general_ci, and my BOOLEAN MODE FULLTEXT searching works great.
I noticed another thing that FULLTEXT searching doesn't do, it doesn't give an accurate Relevancy.
For example:
I have 646,000 Books, of which I want to search the title for the word Hatchet. In the results, if the book title "Hatchet" is found, it rates 1.5, when it should rate it 100, since it is a 100% match for the word Hatchet. However, this "Hatchet" that was found, is rated 1.5, as well as 30 other book titles, that have the word "Hatchet" in their title, but not necessarily the exact book title "Hatchet", if you know what I mean.
In this case, I am being forced to make an Array ReArrange. Which will take the MOST RELEVANT, on top of the relevant found by FULLTEXT, and rearrange to meet my needs.
I am doing this:
Take original array, rebuild array, find EXACT matches, put them first. Then find matches starting with the keyword "Hatchet", and putting them next, and then everything else after that. I wish MySQL would do that for me, but I guess the relevance engine has a function like this, I am just using it incorrectly.
Here is my syntax, if anyone has any ideas how to speed it up too, that would be great:
SELECT ProductID, Title, Description, Price, RetailPrice, MATCH(Title)
AGAINST ('+(>$keyword)' IN BOOLEAN MODE)
AS m FROM server.book WHERE MATCH(Title)
AGAINST ('+(>$keyword)' IN BOOLEAN MODE) ORDER BY m DESC, Title ASC LIMIT 50;
Before writing me, test it yourself.
Thanks! :)
Nathan
Also, if my email isn't displayed, it's admin@sellchain.com
I am working on the solution, but if anyone finds it, im sure others could use it.
Nathan
This works for me so I get a score and the benefits of a boolean search. However, it's doing two different fulltext searches so it may slow things down a lot:
SELECT *, MATCH(post_content, post_title) AGAINST('string') AS `score` FROM posts MATCH(post_content, post_title) AGAINST('string' IN BOOLEAN MODE) ORDER BY score DESC LIMIT 10
The boolean mode after the from statement automatically pulls out the 0's and then the Match statement in the SELECT clause allows me to get the relevance score so I can sort by it.
Following on from Robert's comment, I've been playing around using his technique trying to get 'exact' and 'any word' searches to work. I've been having trouble with plurals, e.g. searching for 'anchor bolts' when the fulltext index includes 'anchors' and 'bolt'. This can be solved with the boolean part of the Match statement by trimming and searching for +anchor* +bolt*, but the non-boolean part of the Match statement for relevance is still a problem returning zeros in some instances. My 'fix' for this is to search for 'anchor anchors bolt bolts' in the non-boolean part (fairly easy to program in php), which seems to solve the problem. Any easier solutions such as like '%anchor%'?
It seems that the * modifier can only be applied to the end of a word. Placed at the beginning of a word, it will remove that word from the search altogether.
I recently had this experience when using the "match against" in boolean :
"There is something strange about the FULLTEXT SEARCH, MATCH AGAINST IN BOOLEAN MODE that is not intuitive when searching for a PHRASE, and that is the situation where the phrase contains a special character like a backslash(/). The documentation states that anything within the double quotes of the phrase, including punctuation and white space, is used in the search. Apparently this is not totally true."
Here is the response I received from Paul DuBois after reporting this as a bug:
"A phrase that is enclosed within double quote (`"') characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words, performs a search in the FULLTEXT index for the words. Prior to MySQL 5.0.3, the engine then performed a substring search for the phrase in the records that were found, so the match must include non-word characters in the phrase. As of MySQL 5.0.3, non-word characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test,phrase" in MySQL 5.0.3, but not before."
Thank you, Paul, now I understand. It would have saved me a lot of time if I had realized this before. That is why I have posted this here.
I needed to be able to take a user search that might have words in it that are less then the min word length and return only results that contained all words.
What I did was break the query up into two sets of words, one set contained all the words that were >= ft_min_word_len, the other set contained all the shorter words. I did a fulltext search on the words that met the fulltext search length requirement and augmented it with an AND clause that used LIKE '%<shortword1>%'. MySQL uses the fulltext index to narrow down the results and then applies the LIKE conditions, so it stays fast.
This way you don't need to change your ft_min_word_len to a smaller number, which will make your indexes bigger.
Here's the regex I used to break things up:
\b(\w{4,})\b
where ft_min_word_len = 4, that will match all words of 4 or more letters.
Example
Search for the string "axle hub nut" will result in:
WHERE MATCH (col) AGAINST ("+axle*" IN BOOLEAN MODE) AND col LIKE '%hub%' AND col LIKE '%nut%'
Fulltext boolean mode search returning (relevance):
SELECT MATCH (x) AGAINST ('word1 word2' IN BOOLEAN MODE) AS relevance
The returned relevance is 0,1 or 2. 0 = no match, 1 = one of the words is matched (word1 or word2), 2 = both word are matched.
When using eg. '+word1 word2 word3' it returns a floating point number (double) which is the relevance (all returned rows contain word1, some are more relevant (they return word2 and word3, others less..they contain word1 and (word2 or word3).. others even less..containing only word1 (relevance = 1).
Seemingly inconsistent behavior that others might encounter.
MATCH(..) AGAINST('-a -b')
is not the same as
NOT MATCH(..) AGAINST('+a +b')
A query like '-macintosh' will not return any data because too many results would match. From the documentation:
'A word that matches half of the rows in a table is less likely to locate relevant documents. In fact, it most likely finds plenty of irrelevant documents. We all know this happens far too often when we are trying to find something on the Internet with a search engine. It is with this reasoning that rows containing the word are assigned a low semantic value for the particular data set in which they occur. A given word may reach the 50% threshold in one data set but not another.'
And
MATCH(..) AGAINST('-a -b')
is not the same as
NOT MATCH(..) AGAINST('+a +b')
is true, take a look at http://en.wikipedia.org/wiki/De_Morgan's_laws
Add your own comment.