Launching Naiban will result in the presentation of the command line interface menu. There are three ClassificationEngineModel operations accessible via Naiban's command-line interface:
- Train - the train operation takes an argument, the full path to a directory holding classifiable files.
- Classify - the classify operation takes an argument, the full path to a classifiable file.
- Store - the store operation persists the currently acquired knowledge to the configured datastore (filesystem or RDBMS)
The default installation comes with sample data in the ./data directory. Test your installation with the following operations:
- t data/train - this operation will train the configured classifier with the data in the data/train directory. The
sample data defines two categories (Buckets):
- Spam
- Ham
The data used to distinguish between these two categories includes:
- Text Attributes:
- To
- From
- Body
- Numeric Attributes:
- SourceLength
- s - this operation will persist the knowledge acquired through training to the configured datastore.
- c data/tests/spamBody.txt - this operation will classify the given file using the current knowledge.
The default configuration will persist and read knowledge from a JDBC enabled local datastore (hsqldb).