Author: Matthew Smith, Senior Systems Programmer, LTS.
eSpace workflows are required to allow a flexible definition of processes in eSpace. Examples of this are inserting objects, reviewing metadata and changing data formats. A workflow can be an automatic procedure such as generating thumbnails for images or a manual procedure such as reviewing metadata.
Workflows are a state based system so that they have a process associated with each state. The process definition could be automated or require manual completion. On completion of the workflow state process, the workflow moves to the next state.
Workflows are triggered by events in the system. A set of pre-defined events will be provided – insert, update, review etc… For workflows that are not associated with an event, they can also be explicitly launched.
Note that fedora already supports a way of processing records through disseminators. Repository objects can be associated with methods that invoke remote procedure calls with the object when the associated disseminator is invoked. These may be used by the eSpace workflow system with the remote procedure calls linking back to eSpace to process the data.
NOTE: This was investigated but not implemented.
The espace workflows allow the user to define a set of states that an object associated with a workflow must progress through. Each state has an associated action and access privileges.
Workflows are created in eSpace without needing to be related to any objects or collections. The workflow just describes a set of steps to follow. It can be made to apply to certain objects and collections through being associated. A workflow can be assigned to many objects in the repository.
Objects can be associated with a workflow manually. For example, a manager may wish to review a collection so they apply the review workflow to the collection.
Objects can be associated automatically with a workflow through a set of pre-defined events including insert, update and delete. For example a collection may have a workflow event set so that all inserted objects are associated with a ‘new object’ workflow.
When the end of a workflow is reached, the workflow is disassociated with the object so that other workflows can be applied to that object. An object can only be associated with one workflow at a time.
The workflow has sets of actions that occur when the item moves to each state – these might be automatic like generating thumbnails or emailing the collection owner, or they might be manual such as entering and reviewing metadata.
Automatic state action scripts are stored in a workflow script directory that espace accesses. eSpace can allow the scripts to be edited or they can be copied into the directory. Scripts have their output stored as a datastream which is stored as part of a record of things that have happened to the object.
Scripts might be able to change the object which they will do by invoking espace functions to update or edit items. An API should be provided for this purpose – perhaps as a web service.
To launch an automatic action, the action script whether internal or external needs access to the object. There will need to be a way of passing the object and information from the object metadata to the action script.
The workflow defines a set of states that the object goes through. The states only apply within the workflow and have no meaning outside of it. The states may have different meanings in different workflows.
A state may lead to multiple states as long as it doesn’t have automatic processing. There is no way to make an automatic decision for the next state to go to. A manually processed state may provide a list of states to transition the item to.
Each state of the workflow has an access level so that only roles above a certain level can access an item.
Each manual process state is assigned to a user or group using the eSpace roles that the item either inherits or has already set.
Implementation and Design
Workflow Data Structure
The workflows will be stored in the eSpace database (not fedora). Most of the interesting stuff will happen in the workflow states.
The states can form a complex web which is defined by a state links table. The restriction is that an automatic state has to go to only one next state – since it can’t make a decision. For manual states, the user is presented with a list of possible states to transition to according to the links setup by the administrator.
The behaviours can be automatic or manual. An automatic program is executed and the state is moved on to the next state.
A manual process shows the user a form which they must submit to move on. The next state is determined by which button they use to submit the form.
The forms for each workflow state are defined in scripts which are included in the workflow processing.
Association, Process Inputs and Workflow Status
The progress through the workflow will be saved with the object being worked on. The object will be associated with a workflow but the workflow will need to know what the current state of the object is and maybe a few other little things.
To do this, each object will have a datastream that stores the associated workflow and workflow status information. NOTE: This was implemented using an object stored in the session, not as a workflow.
The behaviours / action scripts may need to obtain inputs from the objects workflow status.
eSpace has ACML datastreams which define roles that users and groups can have on an object. The ACMLs control what roles apply to an object and who can fill those roles (groups and individuals).
A few options here:
- Use a second permissions system for items in a workflow - Restrict access for certain roles – the ACML is left alone (or inherited) and the workflow stores a list of allowable roles. eSpace checks the workflow as well as the ACML. There would still need to be a bit of smarts so that an editor can still view an object even though the view role is disallowed in the workflow (assuming the editor role is allowed). To configure this, the user just selects which roles to enable.
- Edit the ACML in the workflow but keep the original handy to be restored at the end of the workflow
- Use a simple published / unpublished status – The object can be flagged as published or some other status. When it is not published, then it can’t be listed or viewed (except by editors and above). This is simpler than being able to select which roles can have access during a workflow. Part of the workflow is just a decision – “is the object publicly accessible when this workflow is associated with it?” One of the actions of a workflow can be to change the published status.
Option 3 is the favourite at the moment.
There are three trigger types for automatically assigning an item to a workflow:
- Create – creating a record
- Ingest – ingesting a datastream
- Update – updating metadata record
- Delete – deleting a record
A table in the espace DB stores these triggers.
When the Update or Delete buttons are clicked for an object, the user must choose from a set of workflows that can apply to that object (depending on its display type). The user is then guided through a set of screens as defined in the workflow states.
When creating an object, the Create Triggers for the parent collection or community apply.
When the user clicks the ‘new’ link, the workflow/new.php script is invoked. This script first gets all the assigned workflows. It looks in the parent records workflow triggers as well.
If there are workflows for more than one type of Object Display Type (xdis_id) then the user is prompted to select the Object Type. If there is more than one workflow for the selected object type, then the user is prompted to select a workflow as well. For example – when creating images, there is a bulk import workflow and a single import workflow.
The ingest trigger is a bit tricky as it can interrupt the create and update triggers. To make things easier, we will only accept automatic steps to the ingest process. The differences are:
- Ingest works with a dsID as well as a PID
- Ingest may use a different workflow depending on the mimetype and the document display type.
- Ingest doesn’t allow manual steps – it is automated only.
- Ingest uses the first and closest workflow – the children workflows override the parent workflows.
Perhaps a purge trigger will be needed separate to a delete trigger. The purge trigger would be similar to the ingest trigger for each datastream.
There are special triggers for communities and collections – when working on a community, the display document type is not used – instead they are preset to the constant xdis_ids used for communities and collections.
Some record should be kept in a datastream of actions that have been performed on a record. Perhaps an XML scheme for recording process records needs to be developed. The process record would record the actions performed, the date, the user and maybe a comment from the user who did it.
The scripts are kept in the ‘workflow/’ directory under fez’s base directory. Some of the scripts are quite specialised but hopefully once the base functionality is in, the additional scripts will be a bit easier.
If the workflow behaviours are automatic, then they don’t need templates. Otherwise the templates for the scripts are kept in ‘templates/en/workflow/’. The template for workflow scripts should be always set to ‘workflow/index.tpl.html’ and then the sub-template is set using