Process Flow
Graph data can provide an overview of the high-level project process, supported by underlying, queryable data. For this example, the steps in the flow were first recorded in a spreadsheet.
R was used to read the spreadsheet and convert the data to RDF using the rdflib package. Each process step is represented with their ID value prefixed with PROSTEP_
to indicate it is a process step. A unique hash value for each step is a preferred method for IRI creation, but would make this document harder to read. For example, PROSTEP_e147d974 is more difficult to follow than PROSTEP_ID4. Values from the other columns (step number, title, and description) become objects in the data as shown in Figure 5.2:
The term hasNext has the corresponding inverse term hasPrevious defined in the project ontology.
:hasNext rdf:type owl:ObjectProperty ; :hasPrevious rdf:type owl:ObjectProperty ; owl:inverseOf :hasNext .
This means that only the hasNext term needs to be created in the source data:
:PROSTEP_ID2 hasNext :PROSTEP_ID3
and a reasoner can then infer:
:PROSTEP_ID3 hasPrevious :PROSTEP_ID2
making it possible to query the process flow in both directions.
A process flow step can also be specified to be “part of” a project (in this case the File Catalog Project identified as :FileCatProject ) and a reasoner can infer the parts (process steps) that belong to a Project using the inverse predicate schema:hasPart. A step therefore isPartOf a project and a project hasPart many steps.
The model sketch in Figure 5.2 serves as a guide when creating the data conversion code in R. The “next step” in the process is represented using the hasNext predicate between columns A and E in the spreadsheet (Figure 5.1). The resulting TTL data is stored in a file for upload to the database.
Example Triples for Process Flow Step 4
filecat:PROSTEP_ID4 dcterms:description "Develop from headers. Revise based on use"^^xsd:string ; dcterms:title "Dictionary"^^xsd:string ; a filecat:ProcessStep ; schema:isPartOf filecat:FileCatProject ; filecat:hasNext filecat:PROSTEP_ID5 ; filecat:stepNumber 4 .
Query Along the Process Path
You wish to answer the question: “What project steps are affected when the dictionary file is changed?” In other words, what are the process steps that occur after Step 4? A SPARQL query easily answers this question.
PREFIX : <https://www.example.org/tw/filecat#\> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX schema: <https://schema.org/> SELECT ?step ?process WHERE{ ?startStep :stepNumber 4; :hasNext+ ?nextId . ?nextId :stepNumber ?step ; dcterms:title ?process . }ORDER BY ?step
step process 5 "Process Folders" 6 "Extract Values" 7 "Identify terms" 8 "Create TTL Files" 9 "Load TTL into Database" 10 "Validate with SHACL" 11 "View Project Report" 12 "Find Files App" 13 "Revise data"
Observe in the query above how the plus symbol is used on the hasNext predicate to follow along this predicate to all the next steps from Step 4 onward.
A reverse query to find steps preceding Step 4 could be used in databases that support a reasoner to infer the hasPrevious predicate from the source data. Other queries could confine the impact scope to a set number of next or previous steps, etc.
Visualize
Visual process flows can be built from the graph data. The query itself is very basic. Note how :hasNext is optional because the last step in a series has no following step. Information about each step is retrieved and the result is sorted by :stepNumber.
SELECT ?id ?stepNum ?title ?description ?hasNextId WHERE { ?proStepId a :ProcessStep ; schema:isPartOf :FileCatProject ; :stepNumber ?stepNum ; dcterms:title ?title ; dcterms:description ?description . OPTIONAL{ ?proStepId :hasNext ?hasNextStepId . } } ORDER BY ?stepNum
Values resulting from the query can then be displayed using various graphics packages. This example uses the DiagrammeR R package.
Click on the flowchart to enlarge image in new tab.