A B E G I N P S T

A

abner - package abner
 

B

BIOCREATIVE - Static variable in class abner.Tagger
The tagger trained on the BioCreative corpus.

E

EXTERNAL - Static variable in class abner.Tagger
Indicates a tagger for some externally-trained model.

G

getEntities(String) - Method in class abner.Tagger
Similar to getSegments, but returns all segments in the entire document that correspond to entities (e.g.
getEntities(String, String) - Method in class abner.Tagger
Returns only segments corresponding to the entity provided in the tag argument (do not us "B-" or "I-" prefixes).
getMode() - Method in class abner.Tagger
Return the tagger's mode (NLPBA, BIOCREATIVE, or EXTERNAL)
getSegments(String) - Method in class abner.Tagger
Take an input string (if tokenization is turned on, this string will be tokenized as well) and return a Vector of 2D String arrays, where sentence tokens are segments (not individual words).
getTokenization() - Method in class abner.Tagger
Return the tagger's current tokenization setting.
getWords(String) - Method in class abner.Tagger
Take an input string (if tokenization is turned on, this string will be tokenized as well) and return a Vector of 2D String arrays, where sentence tokens are words stored in result[0][...] and tags are stored in result[1][...].

I

Input2TokenSequence - class abner.Input2TokenSequence.
Input2TokenSequence is a text processing Pipe for the MALLET framework.
Input2TokenSequence(boolean) - Constructor for class abner.Input2TokenSequence
 
Input2TokenSequence() - Constructor for class abner.Input2TokenSequence
 

N

NLPBA - Static variable in class abner.Tagger
The tagger trained on the NLPBA corpus.
nextToken() - Method in class abner.Scanner
 

P

pipe(Instance) - Method in class abner.Input2TokenSequence
 

S

Scanner - class abner.Scanner.
ABNER's Scanner class implements the finite state machine used in tokenization.
Scanner(Reader) - Constructor for class abner.Scanner
 
Scanner(InputStream) - Constructor for class abner.Scanner
 
setTokenization(boolean) - Method in class abner.Tagger
Turn on/off ABNER's built-in tokenization (default is true).

T

Tagger - class abner.Tagger.
This is the interface to the CRF that does named entity tagging.
Tagger() - Constructor for class abner.Tagger
Basic Constructor: Loads the "NLPBA" model by default.
Tagger(int) - Constructor for class abner.Tagger
Advanced constructor: Specify either "NLPBA" or "BioCreative" model.
Tagger(File) - Constructor for class abner.Tagger
External constructor: Load a trained CRF specified by the external model file.
Trainer - class abner.Trainer.
The Trainer class will train a CRF to extract entities from a customized dataset.
Trainer() - Constructor for class abner.Trainer
 
tagABNER(String) - Method in class abner.Tagger
Takes input text and returns a string of annotated text in the ABNER training format:
tagIOB(String) - Method in class abner.Tagger
Takes input text and returns a string of annotated text in CoNLL-style "IOB" format:
tagSGML(String) - Method in class abner.Tagger
Takes input text and returns a string of annotated text in a generic SGML-style format:
tokenize(String) - Method in class abner.Tagger
Take raw text apply ABNER's built-in tokenization on it.
train(String, String) - Method in class abner.Trainer
Takes input trainFile (format described above), and saves a trained linear-chain CRF on the data using ABNER's default feature set in the corresponding output modelFile.
train(String, String, String[]) - Method in class abner.Trainer
Identical to the other train routine, but the set of tags (e.g.

A B E G I N P S T