|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectabner.Trainer
The Trainer class will train a CRF to extract entities from a customized dataset. The input file must be tokenized with one sentence per line, with a "|" (vertical pipe) separating a word/token from its label. The first token of an entity name should have a label beginning with "B-", all other entity token labels should begin with "I-", and non-entity tokens should be labeled with "O":
IL-2|B-DNA gene|I-DNA expression|O and|O NF-kappa|B-PROTEIN B|I-PROTEIN activation|O ...
| Constructor Summary | |
Trainer()
|
|
| Method Summary | |
void |
train(java.lang.String trainFile,
java.lang.String modelFile)
Takes input trainFile (format described above), and saves a trained linear-chain CRF on the data using ABNER's default feature set in the corresponding output modelFile. |
void |
train(java.lang.String trainFile,
java.lang.String modelFile,
java.lang.String[] tags)
Identical to the other train routine, but the set of tags (e.g. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public Trainer()
| Method Detail |
public void train(java.lang.String trainFile,
java.lang.String modelFile)
Takes input trainFile (format described above), and saves a trained linear-chain CRF on the data using ABNER's default feature set in the corresponding output modelFile.
Warning: training will take several hours, perhaps even days to complete depending on corpus size and number of entity tags.
public void train(java.lang.String trainFile,
java.lang.String modelFile,
java.lang.String[] tags)
Identical to the other train routine, but the set of tags (e.g. "PROTEIN", "DNA", etc.) allows the model to periodically output progress in terms of precision/recall/f1 during training. Note: do not use "B-" or "I-" prefixes.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||