Journaling Context Tutorial
From Tupelo Wiki
| Table of contents |
Journaling Context Tutorial
What is it?
The journaling context is a context that keeps a running list of all operations and stores them as transactions. They are then sequentially processed against a backing context. Among possible reasons to use this are
- Error recovery: If a context has intermittent failures then the journal will retry operations until they succeed.
- A slow backing context: Due to its implementation the journaling context can be appreciably faster than many contexts.
- An expensive backing context: If the backing context has significant overhead for each call, the journaling context can batch up pending writes.
- A very basic type of transaction management or caching. The journal can accumulate pending operations then be synced or cleared.
Configuration, various flavors
The most used version of the journaling context is JournalingContext (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/current/org/tupeloproject/journaling/JournalingContext.html). You should understand how this operates. It relies on an embedded database (H2 at this writing) for its transaction management and a hash file context to stage blobs. When created, all of this is configured automatically. If you need to change this, then consider using the fully configurable version, BaseJournalingContext (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/current/org/tupeloproject/journaling/BaseJournalingContext.html). This allows for explicit creation of the journal and setting up the blob store for staging.
JournalingContext will be able to recover from problems with the backing context, but if a client needs to shutdown and restart the VM, it will be necessary to start up the journaling context again with the same working directory. Here is how to do that
JournalingContext jc = new JournalingContext(backingContext); File journalLocation = jc.getJournalLocation(); // Persist this value // set any other configuration values jc.open(); //... ready for use
After a VM restart you would do something like this:
JournalingContext jc = new JournalingContext(backingContext); jc.setJournalLocation(journalLocation); //set any configuration values. jc.open(); //... ready for use
Features
There are several features and options that the journaling context supports.
Deferred writes
To defer writing means to accumulate all write operations until a sync is explicitly invoked. When the context is synced, each pending operation is performed in the sequence it was received.
Batch mode
Batch mode can be invoked when writes are deferred. In this case all triple writes are journaled and when the context is synced, these are assembled into one single operation. Note that the original order of all operations is respected. Triples are batched until another operator is encountered. So for instance if several triple writes are performed, then a blob write, then more triple writes the way this would be batched is as a single large triple write, the blob write then a single large triple write.
Limiting the batch size
If the context is in batch mode (which again implies it is also deferring writes), it may be necessary to limit the size of triples being written to the backing context at one time. This can be because of any number of performance or other reasons (e.g. the backing context actually fronts a web service which denies requests over a certain size.) A rough estimate based on the NTriples serialized form is made.
JournalingContext jc = new JournalingContext(backingContext); jc.setDeferWrites(true); jc.setBatchMode(true); jc.setBatchSize(100000L); // don't send more than 100,000 bytes at once jc.open(); //lots and lots of triple or any other operation jc.sync(); // the writes are batched
Note that if the batch size is less than a single triple, it will be ignored, as is the case if it is set less than or equal to zero. This is because it is not possible to send only part of a triple.
Running asynchronously
For top speed and reliability, it is possible to run the journaling context asynchronously. In that case, all writes are deferred (and possibly batched, if needed) and a thread is started which periodically clear pending operations. This is done on the instance of the journaling context:
JournalingContext jc = new JournalingContext(backingContext); jc.setDeferWrites(true); // could also turn on batch mode or set the batch size if wanted jc.startSyncThread(); // starts it with the default sleep interval. jc.open(); // operations... until we are done. jc.stopSyncThread(); jc.close(); // dispose of all resources.
Note that the sync thread is for the given journaling context instance, so this must be invoked for each journaling context.
Clearing the journal
In the event of a restart, it is possible that the journal will contain information that is unwanted. In order to dispose of this, a client needs to set the clearJournal flag before issuing the first open.
JournalingContext jc = new JournalingContext(backingContext); jc.setCleanJournal(true); // could also turn on other options at this point. jc.open(); // continue with a clean log.
Reads and fetches
Note: writes operations are journaled, but matching, reads and fetches are passed directly through to the backing context. Therefore it is possible to set the journaling context to defer all writes, issue several write operations and then attempt to fetch triples or blobs which are not yet in the backing context. In order to make sure that everything that can be done will be done, issue a sync() before any reads.
When are OperatorExceptions thrown?
OperatorException occurs during a journaled operation only when there is a bona fide issue with the journaling context itself. If the backing context is unavailable or there is some other issue, this is not passed to the user, since doing so would largely negate the utility of journaling. On the other hand, read operations are passed directly to the backing context and if they fail the exception is propagated directly.
