当前位置: 动力学知识库 > 问答 > 编程问答 >

rdf - DBLP Author Disambiguation

问题描述:

guys, I am doing some research on DBLP, and using the repository of Hugh Glaser, RKB-EXPLORER DBLP(rdf/xml).

consider this page of a article in dblp:

http://dblp.rkbexplorer.com/id/journals/jvcir/YuanWSZ13

as you can see, the author id of this article is something like this:

http://dblp.rkbexplorer.com/id/people-b3f641eef09c498bdd94087b74854be9-36a6b8e7b69947e5659953aaf7fb802c.

I tried same author name with different articles, and know that the id above details like this:

b3f641eef09c498bdd94087b74854be9: the author name's 32 chartacters encode.(never mind)

36a6b8e7b69947e5659953aaf7fb802c: the article name's 32 encode.

so, it acctually gives the same id to "same name" people, but many people have exactly same name.this is ambiguation.

For dblp author disambiguation ,I tried two approaches below:

  1. get the affiliation of each article, then if the same name appeared in two articles with same affiliation. I think this can be sure a same person.

    but the difficult is the dblp.rkbelporer.com dataset didn't provide enough info about this. and use google search to search article title, can't get enough info too.

  2. get all author's image of each article, and do something like personal image match to check whether same name is same person.

    but this is also some kind of not feasible too, as the author personal images of articles are too less.

So, any suggestion ? thx very much.

分享给朋友:
您可能感兴趣的文章:
随机阅读: