Andrew Pennebaker
https://github.com/mcandre/cheatsheets/blob/master/lucene.md
Lucene is a programmable search engine, used by elasticsearch and Kibana to search public and private data collections.
Lucene indexes can be case-sensitive or case-insensitive, depending on configuration.
cats
CATS
CaTs
Unlike other search engines, Lucene defaults term-pairing to ORs rather than ANDs.
cats dogs
cats OR dogs
Most of the time, you will want to remember to explicitly AND terms together:
cats AND dogs
+cats +dogs
(+cats +dogs) (+"peanut butter" +jelly)
Minus (-
) excludes a term from results, and automatically ANDs it with the rest of the query.
cats -dogs
cats AND NOT dogs
"grumpy cat"
Question mark (?
) matches a single, arbitrary character.
Asterisk (*
) matches any word or phrase.
Notes:
- Wildcards and other special characters (e.g.,
+
,-
,&
,|
,!
,(
,)
,{
,}
,[
,]
,^
,"
,~
,*
,?
,:
,and \
) need to be escaped (e.g.,\?
,\?
) when used inside phrases/strings, or searched for as a literal. - An asterisk cannot be used as the first character of a term (e.g.,
*oogle
is bad syntax).
cats
c?ts
+khtml +like +Gecko
+khtml +like +Geck?
"khtml like Geck\?"
+"khtml, like" +Ge*
"khtml, like \*"
error\:
Lucene can search for similar terms:
integer~
will match on integer
, integers
, and intejer
.
An optional fuzziness threshold can be specified, from 0.0 (very loose) to 1.0 (very strict).
integer~
integer~0.5
integer~0.4
integer~0.6
Hosts tend to require fully qualified domain names (e.g., google
is bad syntax, google.com
is good syntax). Though wildcards can help abbreviate this.
host:tomcat.apache.org
host:tomcat*
path:catalina*
Each Lucene index may specify additional query operators. Common operators include message:
and timestamp:
.
Note: When a term is not prefixed with an operator, it is automatically searched for across all operators. For best results, it is often useful to not specify any operators for your search terms.