I'm looking for a way to exact match a field when a user includes in in a query.
For example, assume we have these docs:
If the user searches for subject:science world I want only doc1 to be returned since it is an exact match for subject. However, I am getting both docs.
I tried indexing subject with KeywordAnalyzer but I get this error:
java.lang.IllegalStateException: field "subject" was indexed without position data; cannot run Phrase Query
Error of "subject" field could be because you don't have term vector stored with your field in index (are you using StringField or TextField in your lucene code?)
To store term vector for field, you should use Field class of lucene, while defining Field use below as FieldType
FieldType fieldType = new FieldType(); fieldType.setStoreTermVectors(true); fieldType.setStoreTermVectorPositions(true); fieldType.setIndexed(true); fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS); fieldType.setStored(true);
adding document having above field
doc.add(new Field("field_name", "data", fieldType));
Now first question, I can think of two ways
1) Implement Custom Similarity - you can create new Similarity class derived from default similarity and change lenghtNorm method such that you will prioritize document with "science" more then "life science", how? score every documents based on ratio (matched terms from query / total terms in document). As you can see shorter and more relevant documents will be scored better with above ratio.
2) Postprocess lucene returned documents to eliminate unwanted documents (not much recommended) - get list of documents from Lucene and eliminate unwanted documents not matching your search criteria.