SourceForge.net Logo

Naiban: User's Guide

Launching Naiban will result in the presentation of the command line interface menu. There are three ClassificationEngineModel operations accessible via Naiban's command-line interface:

  • Train - the train operation takes an argument, the full path to a directory holding classifiable files.
  • Classify - the classify operation takes an argument, the full path to a classifiable file.
  • Store - the store operation persists the currently acquired knowledge to the configured datastore (filesystem or RDBMS)

The default installation comes with sample data in the ./data directory. Test your installation with the following operations:

  • t data/train - this operation will train the configured classifier with the data in the data/train directory. The sample data defines two categories (Buckets):
    1. Spam
    2. Ham
    The data used to distinguish between these two categories includes:
    • Text Attributes:
      1. To
      2. From
      3. Body
    • Numeric Attributes:
      1. SourceLength
  • s - this operation will persist the knowledge acquired through training to the configured datastore.
  • c data/tests/spamBody.txt - this operation will classify the given file using the current knowledge.

The default configuration will persist and read knowledge from a JDBC enabled local datastore (hsqldb).