I want to construct a feature vector of each document from the lucene index.
I've also got a set of keywords, and want to construct a feature vector of them.
Then I will try to match the document according to the similarity of feature vectors of documents and keywords.
So, any hints on how lucene can help me address these three tasks?
As bmargulies says, you can use Mahout. Here's some documentation on it: https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text#CreatingVectorsfromText-FromLucene