当前位置: 动力学知识库 > 问答 > 编程问答 >

java - Exact field match within Lucene query

问题描述:

I'm looking for a way to exact match a field when a user includes in in a query.

For example, assume we have these docs:

  • Doc 1: catchall: "hello world", subject: "science"
  • Doc 2: catchall: "goodbye world", subject: "life science"

If the user searches for subject:science world I want only doc1 to be returned since it is an exact match for subject. However, I am getting both docs.

I tried indexing subject with KeywordAnalyzer but I get this error:

java.lang.IllegalStateException: field "subject" was indexed without position data; cannot run Phrase Query

网友答案:

Error of "subject" field could be because you don't have term vector stored with your field in index (are you using StringField or TextField in your lucene code?)

To store term vector for field, you should use Field class of lucene, while defining Field use below as FieldType

    FieldType fieldType = new FieldType();
    fieldType.setStoreTermVectors(true);
    fieldType.setStoreTermVectorPositions(true);
    fieldType.setIndexed(true);
    fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
    fieldType.setStored(true);

adding document having above field

doc.add(new Field("field_name", "data", fieldType));

Now first question, I can think of two ways

1) Implement Custom Similarity - you can create new Similarity class derived from default similarity and change lenghtNorm method such that you will prioritize document with "science" more then "life science", how? score every documents based on ratio (matched terms from query / total terms in document). As you can see shorter and more relevant documents will be scored better with above ratio.

2) Postprocess lucene returned documents to eliminate unwanted documents (not much recommended) - get list of documents from Lucene and eliminate unwanted documents not matching your search criteria.

分享给朋友:
您可能感兴趣的文章:
随机阅读: