Basic fuzzy queries:
GET /docs/doc/_search
{
"query": {
"fuzzy": {
"body": "robot"
}
}
}
GET /docs/doc/_search
{
"query": {
"match": {
"body": {
"query": "robot",
"fuzziness": "auto"
}
}
}
}
You can use fuzzy
or match
query with explicit fuzziness
attribute. Be warned that fuzzy
query is not analyzed.
fuzzyness
means how different the matched words can be from the one in the query (max edit distance). Your best option is to use AUTO, which automatically changes the value according to the words length. Values are usually 0, 1, 2. You can't go higher than 2.
This query yelds only the "Transformers" document in the results. If you set fuzzyness
to 2 it will yield "Robin Hood" as well: robot
has an edit distance of 2 from robin
.
GET /docs/doc/_search
{
"query": {
"multi_match": {
"fields": [ "title", "body", "keywords" ],
"query": "leetle nemmo fisch",
"fuzziness": "auto",
"operator": "and"
}
}
}
"fuzziness": "auto"
is a default value. The default for operator
is or
, if you use "operator": "and"
the matching will be more complete phrases oriented, just like you need. You cannot use match_phrase
with fuzzy queries, so you have to fallback to this strategy.
You can use prefix_length
to limit the initial characters that will not be fuzzied, making the query less expensive:
GET /docs/doc/_search
{
"query": {
"multi_match": {
"fields": [ "title*", "body", "keywords" ],
"query": "litle nemmo fisch",
"fuzziness": "auto",
"operator": "and",
"prefix_length": 3
}
}
}
If you increase prefix_length
to 4
there will be no matches with the example query (litl
will not match litt
, fisc
will not match fish
...)
Let's get all the documents that don't have the title containing "nemo" and a body that don't contain "fish", using the fuzzied forms "nemmo" and "fisch" in the query:
GET /docs/doc/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"title": {
"query": "nemmo",
"fuzziness": "auto"
}
}
},
{
"match": {
"body": {
"query": "fisch",
"fuzziness": "auto"
}
}
}
]
}
}
}
This is a filter. You can add as many must_not
clauses as you want, just keep the following format:
{
"match": {
"title": {
"query": "nemmo",
"fuzziness": "auto"
}
}
}
It's the usual match query, but with explicit fuzziness
attribute to make it fuzzy.
This query will give higher boost to documents where the match is in the title field:
GET /docs/doc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"keywords": {
"query": "action",
"fuzziness": "auto"
}
}
},
{
"match": {
"body": {
"query": "litle",
"fuzziness": "auto",
"boost": 5
}
}
}
]
}
}
}
Not clear what "no-op" means, could you please elaborate?
Anyway, the basic must
query is this:
GET /docs/doc/_search
{
"query": {
"bool": {
"must": [
{ "fuzzy": { "title": "nemmo"} },
{ "fuzzy": { "body": "fisch"} }
]
}
}
}
Remember that fuzzy
is not analyzed. It may work for you if you don't need analysis, but if you do you should use match
+ fuzzines
as in the examples above.
Here I am enhancing the should
fuzzy query with a geo_distance
filter:
GET /docs/doc/_search
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"match": {
"keywords": {
"query": "action",
"fuzziness": "auto"
}
}
},
{
"match": {
"body": {
"query": "litle",
"fuzziness": "auto"
}
}
}
]
}
},
"filter": {
"geo_distance": {
"distance": "100km",
"location": {
"lat": 45,
"lon": 10
}
}
}
}
}
}
The query without geo_distance
filter was returning 2 results, now it picks only the one within the expected distance.