I am trying to set up Solr but encountered the problem mentioned in the title. I just downloaded Solr and used the built-in example. When I used a query with words occurred in the example documents, such as "ipod". Solr worked properly. However, when I added some words that are not in these documents, such as "what". Solr does not return anything. For me, it is weird since the relevance scores should be computed to query terms separately and added up. Non-existing query term should not affect the ranking (even though the coord norm is affected, thus the scores of documents will change).
Could anyone tell me what might be the issue? Thanks.
There are several ways of configuring how you want this behavior. I'll assume that you're using the edismax query handler for these examples, although some of these also apply to the standard lucene query parser.
The reason for not always wanting "ipod what" to retrieve the same subset sa "ipod" is that you'll get a poor result set and user experience for terms that are more general than "ipod" (i.e. searching for "microsoft windows" will not be perceived as a good search result if you're showing only general hits for anything about windows - it's usually better to say "we didn't find anything" in those cases). It all depends on your use case.
First, you can do it yourself, by applying either
OR between terms to get the exact kind of matching you're looking for.
You can use
q.op to configure wether each term should be AND-ed together (all required) or OR-ed together (any one is sufficient). This overrides the (now deprecated) value from
<solrQueryParser defaultOperator=".."/> in schema.xml.
For (e)dismax, there's the
mm parameter, which allows you do more specific, but in a general way, handling of how you want matches to be performed.
mm allows you to say "at least 50% of the terms should match" or "if there's only two terms, both should match, but any over that should be optional" or "match everything up to four, and 75% after that".