|
|
Spider: IntroductionSpider is a complete standalone Java application designed to easily integrate varied datasources. Spider will retrieve (on demand or via automated scheduling) data from these varied sources and can be configured to push it to consumers such as listeners on a messaging queue, email accounts, flat files, or databases. Providing implementations of all the possible drivers and protocols used to interact with data producers and consumers is beyond the intended scope of this document. However, Spider was designed with ease of extension as its highest priority, so its default configuration can be easily modified and extended to support other third-party or proprietary invocation or dissemination methods. To demonstrate Spider's ease of use, a sample Spider configuration file is listed below:
<task domain="production" source="website" subject="slashdot" id="login" >
<target>
<file format="html" class="com.tempeststrings.spider.tasks.targets.file.StateCapableHTTPFile">
<url>
<literal value = "http://slashdot.org/users.pl"/>
</url>
<url requestType="POST">
<literal value = "http://slashdot.org/users.pl"/>
<parameter name="unickname" value="my_userid" />
<parameter name="upasswd" value="my_password" />
<parameter name="userlogin" value="Log in" />
<parameter name="op" value="userlogin" />
</url>
<url>
<literal value = "http://slashdot.org/"/>
</url>
</file>
</target>
<schedule>
<minute value="0"/>
<minute value="15"/>
<minute value="30"/>
<minute value="45"/>
<hour value="*"/>
</schedule>
</task>
This configuration file:
|