|
Simple search queries consist of terms, optionally with attributes. Complex queries combine subqueries with operators such as "and" and "or".
Search results are represented by result sets. A result set is an ordered collection of records. Each record contains data, potentially in several forms (a short form and a long form, for instance) and has a content-type (usually a MIME type).
The Gazebo Client Toolkit uses an asynchronous, lazy evaluation retrieval scheme. Instead of blocking while results are retrieved, a Gazebo client registers handlers for result sets and records prior to submitting search requests. Those handlers are invoked when results are returned. Result sets may only be partially retrieved initially, but accessors for as-yet-not-retrieved records can be used transparently along with accessors for records which have been retrieved already, and the client toolkit will make additional requests to Gazebo when necessary to retrieve them. The toolkit also performs rudimentary caching, again transparently. The effect is to minimize network traffic in the typical case that many records match, but the user only needs to peruse a small number of them to determine how to further refine their query.
Each search attribute is a name/value pair where the name represents the the type or class of search attribute and the value identifies a member of the class. For instance the a search attribute with the type "field" and the value "title" might indicate to match the search term against the title field. A search attribute with the type "frequency" and the value "megahertz" might indicate that the search term is a frequency expressed in megahertz. And finally, an attribute with the name "structure" and the value "beginning" might indicate to perform the search with right truncation.
Attributes nest, and each path to a leaf node represents a combination of attributes which can be applied to a search term. Each data source in the Gazebo configuration has its own attribute tree, but when two data sources share part of an attribute tree, it indicates a semantic equivalence between the data sources. A Gazebo client can retrieve a merged attribute tree from the Gazebo server, and then search using that attribute tree as if the Gazebo server represented a single data source.
Attributes are represented by the class
ncsa.gazebo.protocol.Attribute.
Simple attributes can be constructed
using a single name/value pair, as follows:
import ncsa.gazebo.protocol.Attribute; ... Attribute myAttribute = new Attribute("field", "title"); |
Nested attribute combinations can be constructed by adding the sub-attribute as an additional argument to the above constructor, in the following manner:
import ncsa.gazebo.protocol.Attribute; ... Attribute myAttributeCombo = new Attribute ("field", "title", new Attribute("structure", "beginning")); |
This can be repeated ad infinitum.
ncsa.gazebo.protocol.Query represents a search query. The
simplest query consists only of a term:
import ncsa.gazebo.protocol.Query; ... Query myQuery = Query.newInstance("fish"); |
Or a query can contain an attribute:
import ncsa.gazebo.protocol.Query; import ncsa.gazebo.protocol.Attribute; ... Query myQuery = Query.newInstance ("fish", new Attribute("field", "subject")); |
Complex queries are constructed by combining subqueries with an operator:
import ncsa.gazebo.protocol.Query; import ncsa.gazebo.protocol.Attribute; ... Query q1 = Query.newInstance("fish",new Attribute("field","subject")); Query q2 = Query.newInstance("duck"); Query myQuery = Query.newInstance(q1,"and",q2); |
ncsa.gazebo.protocol.Request.
Search requests are represented by
ncsa.gazebo.protocol.Search, which is a subclass of
Request. Constructing a search request is trivial:
simply pass a Query to Search's constructor:
import ncsa.gazebo.protocol.Query; import ncsa.gazebo.protocol.Search; Query myQuery = Query.newInstance("Smith",new Attribute("field","author")); Search mySearch = new Search(myQuery); |
Search requests have several other optional parameters. These include
the number of records to (initially) retrieve (zero by default), and a
set of data sources to search (by default, all data sources available
through the Gazebo server). The number of records to retrieve is set
with setCount and data sources are added with
addDB:
...
mySearch.setCount(25);
mySearch.addDB("Library of Congress");
mySearch.addDB("Astronomy Digital Image Library");
|
Data source names are discovered by the client using a Meta request (more on that later). Although the default behavior of a Search request is to search all data sources, adding just one data source means to search only that one, and adding additional ones means to search them as well.
ncsa.gazebo.protocol.ResultSet. They're received
asychronously by implementations of the
ncsa.gazebo.ctk.ResultSetListener interface.
A result set is an ordered collection of records which matches a
query. The number of records in a set can be retrieved with
ResultSet.getHits, and the name of the data source can be
retieved with ResultSet.getDBName:
import ncsa.gazebo.ctk.*;
...
public class MyResultSetListener implements ResultSetListener {
public void resultSetReceived(ResultSetEvent rse) {
ResultSet rs = rse.getResultSet();
System.out.println(rs.getHits() + " hits on " + rs.getDBName());
}
}
|
ResultSet.fetchBrief and ResultSet.fetchFull, which
take as arguments an ordinal index (1-based).
The two predefined
record types, ResultSet.BRIEF and ResultSet.FULL,
represent summary and complete record forms,
respectively. Brief records are usually no longer than a title, and
full records are often abstracts or entire documents. Record content
types are represented as MIME types. The content type of any record
can be determined by called ResultSet.getContentType. In
the example below, the first record of a result set is passed to an
HTML renderer, if it is an HTML document:
...
ResultSet rs;
...
String doc = rs.fetchFull(1);
if(rs.getContentType(rs).equals("text/html")) {
myHTMLRenderer.render(doc);
}
|
ncsa.gazebo.ctk.Session represents a client
interaction with a Gazebo server. It provides mechanisms for
connecting, disconnecting, submitting search requests, adding
listeners, and delivering results to those listeners
asynchronously.
To construct a Session, just pass the host and port of the Gazebo
server to its constructor:
import ncsa.gazebo.ctk.*;
...
Session mySession = new Session("host.domain.edu",2323);
|
To add a ResultSetListener, call
Session.addResultSetListener:
...
mySession.addResultSetListener(new ResultSetListener() {
public void resultSetReceived(ResultSetEvent rse) {
ResultSet rs = rse.getResultSet();
System.out.println ("got results for " + rs.getDBName());
}
});
|
To sumbit a request to Gazebo, pass it to Session.search:
...
Query myQuery = Query.newInstance("trichotillomania");
mySession.search(new Search(myQuery));
|
import java.io.*;
import ncsa.gazebo.ctk.*;
import ncsa.gazebo.protocol.*;
public class SimpleClient {
Session theSession;
public void searchFor(String word, Attribute attr) {
try {
theSession.search(Query.newInstance(word,attr));
} catch (Exception e) {
e.printStackTrace();
}
}
public SimpleClient (String host, int port) {
try {
theSession = new Session(host, port);
} catch (CTKException e) {
e.printStackTrace();
}
theSession.addResultSetListener(new ResultSetListener() {
public void resultSetReceived(ResultSetEvent rse) {
ResultSet rs = rse.getResultSet();
System.out.println(rs.getHits()+" hits from "+rs.getDBName());
}
});
}
public static void main (String args[]) {
SimpleClient sc = new SimpleClient("ospsun1.nci.nih.gov",9270);
// first arg is author's name to search for
sc.searchFor(args[0], new Attribute("field","Author"));
}
}
|
To catch status messages, implement
ncsa.gazebo.ctk.ResponseListener and add a listener to a
Session object with
Session.addResponseListener.
ncsa.gazebo.ctk.ResponseEvent objects
containing ncsa.gazebo.protocl.Response are passed to the ResponseListener.
Each Response object has a status code associated with
it, which can be determined by calling
Response.getStatus. A human-readable status message
is available from Response.getStatusMessage. Status code
constants are defined in Response. A convenience method,
Response.isStatusGood, can be used to determine if the
response indicates an error.
Note that result sets generate ResponseEvents as well as
ResultSetEvents, so you will want to make sure to handle
each ResultSet in only one place. This is typically not
a problem in that the only information of interest about a
ResponseEvent is its status code.
The following ResponseListener prints a message when an error occurs:
public class MyResponseListener implements ResponseListener {
public void responseReceived(ResponseEvent rse) {
Response rs = rse.getResponse();
if(!rs.isStatusGood()) {
String dbName = rs.getDB();
String message = rs.getStatusMessage();
System.out.println(dbName+": error: "+message);
}
}
}
|