当前位置: 动力学知识库 > 问答 > 编程问答 >

lucene - Unable to filter out n shingle(n - gram) facets using the "exclude" words option provided in the "facets" query

问题描述:

I am trying to make a tagcloud of words and phrases using the facets feature of elasticsearch.

My mapping:

curl -XPOST http://localhost:9200/myIndex/ -d '{

...

"analysis":{

"filter":{

"myCustomShingle":{

"type":"shingle",

"max_shingle_size":3,

"output_unigrams":true

}

},

"analyzer":{ //making a custom analyzer

"myAnalyzer":{

"type":"custom",

"tokenizer":"standard",

"filter":[

"lowercase",

"myCustomShingle",

"stop"

]

}

}

}

...

},

"mappings":{

...

"description":{ //the field to be analyzed for making the tag cloud

"type":"string",

"analyzer":"myAnalyzer",

"null_value" : "null"

},

...

}

Query for generating facets:

curl -X POST "http://localhost:9200/myIndex/myType/_search?&pretty=true" -d '

{

"size":"0",

"query": {

match_all:{}

},

"facets": {

"blah": {

"terms": {

"fields" : ["description"],

"exclude" : [ 'evil' ], //remove facets that contain these words

"size": "50"

}

}

}

}

My problem is, when I insert a word say 'evil' in the "exclude" option of "facets", it successfully removes the facets containing the words(or single shingles) that match 'evil'. But it doesn't remove the 2/3 word shingles, "resident evil" , "evil computer", "my evil cat". How do I remove the facets of phrases containing the "exclude words"?

网友答案:

It isn't completely clear what you want to achieve. You usually wouldn't make facets on analyzed fields. Maybe you could explain why you're making shingles so that we can help achieving what you want in a better way.

With the exclude facet parameter you can exclude some specific entry, but evil is not the same as resident evil. If you want to exclude it you need to specify it. Facets are made based on indexed terms, and resident evil is in fact a single term in the index, which is not the same as the term evil.

Given the choice that you already made for indexing and faceting, there is a way to achieve what you want. Elasticsearch has a really powerful scripting module. You can use a script to decide whether each entry should be included in the facet or not like this:

{
  "query": {
    "match_all" : {}
  },
  "facets": {
    "tags": {
      "terms": {
        "field" : "tags",
        "script" : "term.contains('evil') ? true : false"
      }
    }
  }
}
分享给朋友:
您可能感兴趣的文章:
随机阅读: