1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
|
[[analysis-synonym-tokenfilter]]
=== Synonym Token Filter
The `synonym` token filter allows to easily handle synonyms during the
analysis process. Synonyms are configured using a configuration file.
Here is an example:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/synonym.txt"
}
}
}
}
}
--------------------------------------------------
The above configures a `synonym` filter, with a path of
`analysis/synonym.txt` (relative to the `config` location). The
`synonym` analyzer is then configured with the filter. Additional
settings are: `ignore_case` (defaults to `false`), and `expand`
(defaults to `true`).
The `tokenizer` parameter controls the tokenizers that will be used to
tokenize the synonym, and defaults to the `whitespace` tokenizer.
Two synonym formats are supported: Solr, WordNet.
[float]
==== Solr synonyms
The following is a sample format of the file:
[source,js]
--------------------------------------------------
# blank lines and lines starting with pound are comments.
#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS. These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit
#Equivalent synonyms may be separated with commas and give
#no explicit mapping. In this case the mapping behavior will
#be taken from the expand parameter in the schema. This allows
#the same synonym file to be used in different synonym handling strategies.
#Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod
#multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
#is equivalent to
foo => foo bar, baz
--------------------------------------------------
You can also define synonyms for the filter directly in the
configuration file (note use of `synonyms` instead of `synonyms_path`):
[source,js]
--------------------------------------------------
{
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms" : [
"i-pod, i pod => ipod",
"universe, cosmos"
]
}
}
}
--------------------------------------------------
However, it is recommended to define large synonyms set in a file using
`synonyms_path`.
[float]
==== WordNet synonyms
Synonyms based on http://wordnet.princeton.edu/[WordNet] format can be
declared using `format`:
[source,js]
--------------------------------------------------
{
"filter" : {
"synonym" : {
"type" : "synonym",
"format" : "wordnet",
"synonyms" : [
"s(100000001,1,'abstain',v,1,0).",
"s(100000001,2,'refrain',v,1,0).",
"s(100000001,3,'desist',v,1,0)."
]
}
}
}
--------------------------------------------------
Using `synonyms_path` to define WordNet synonyms in a file is supported
as well.
|