How do I use OpenNLP to do coreference resolution?


Answer by David Dearing:

Coreference Resolution is hard, so no guarantee that OpenNLP will actually solve this specific problem, but I recently wrote a series of blog posts for leveraging OpenNLP 1.5.x tools.  It's a bit dense to copy here, so here's a link with more details:

Making Coreference Resolution with OpenNLP 1.5.0 your bitch

At a high level, you will need to load the appropriate OpenNLP coreference model libraries and also the WordNet 3.0 dictionary.  Given those dependencies, initializing the linker object is pretty straightforward:

// LinkerMode should be TEST
//Note: I tried LinkerMode.EVAL before realizing that this was the problem
Linker _linker = new DefaultLinker("lib/opennlp/coref", LinkerMode.TEST);

(aside: it sure would be nice to have some code formatting here, Quora)

Using it, however, is a bit less obvious.  You need to:

  1. Break the content down into sentences and the corresponding tokens
  2. Create a Parse object for each sentence
  3. Wrap each sentence Parse so as to indicate the sentence ordering:
    final DefaultParse parseWrapper = new DefaultParse(parse, i);
  4. Use the Linker to get the Mention objects from each parse
    final Mention[] extents =
  5. Finally, use the Linker to identify the distinct entities across the Mentions
    DiscourseEntity[] entities = _linker.getEntities(arrayOfAllMentions);

