当前位置: 动力学知识库 > 问答 > 编程问答 >

java - Using payload boost with FuzzyQuery in Lucene-4.x

问题描述:

Is there any chance to use payload boost as described here with FuzzyQuery ? Or maybe one can suggest what is the best strategy to implement a hybrid Fuzzy/Payload search.

Currently I have documents in the index in which certain parts have got higher match priorities using the technique described in the article. Everything goes well until fuzzy queries come to the stage.

Right now I'm planning to hack the Lucene code somehow to be able to adjust terms scoring with payload factor, e.g. in MultiTermQuery.TopTermsScoringBooleanQueryRewrite.addClause(). However I'm not sure that this is the best way to resolve the problem.

Please suggest.

Likely a very similar question has been asked a while ago, but hasn't received a satisfactory solution.

网友答案:

I have a solution.

You should use only PayloadTermQuery, but you can extend your tokens with a unique filter. With this unique filter you can put new simplified terms into a token chain, I mean ascii folded, remove double letters, and so on.... By using PayloadTermQuery you also can devalue the score for new terms.

For me this solution works fine, and really fast. I hope I could help.

Some code from my solution:

  private String simplifyingToken(String token) {
    String token = H.foldToAscii(token);
    if(!H.isNumber(token)){
        token = token.replaceAll("(.)\\1", "$1"); //double letters
    }
    token = token.replaceAll("\\-", "");
    token = token.replaceAll("(ou)", "u");
    token = token.replaceAll("(cz)", "c");
    token = token.replaceAll("w", "v");
    return simpleTokenJocker + token; //tf idf correction
}
分享给朋友:
您可能感兴趣的文章:
随机阅读: