1. Getting Started
"Where, where is my common sense? How...did I get in a jam like this?" - Talking Heads "Uh-Oh, Love Comes to Town" [Album: Talking Heads: 77]
Constructing a File Catalog
The problem I chose to solve was to organize my R and SPARQL code for ease of reuse. After several years working with Graph Data I have accumulated a large number of R and SPARQL scripts spread over many projects and folders. I have increasingly found myself trying to find code that I had written months or even years previously. Where is that code now? What visualizations did I create? A file catalog as a Knowledge Graph would help me answer these questions. The need to reorganize my code presented an opportunity to further develop my R and Knowledge Graph skills.
The methods described in this project are often not the most elegant or efficient. Choices were made to illustrate specific concepts and approaches and may not apply to your situation. To contain scope, these pages cover the code inventory for the File Catalog Project itself and not the larger catalog for all my code, thus eating my own dog food along the way.
Skills and Tools Required
R is used as the language for parsing folder structures, data manipulation, data conversion to RDF, reporting, and visualization. You are encouraged to use the tool set with which you are most familiar. Python is an excellent alternative with strong support for RDF as demonstrated in the recent update to rdflib. Python boasts a large number of visualization libraries that are especially helpful in presenting and exploring graph data.
The primary tool set for this project includes:
- Linked Data as Resource Description Framework (RDF)
- R
- RStudio - code development
- RMarkdown - reporting, templates, interactive tables.
- Shiny - search application
- Blogdown - site content.
- Free version of the Stardog database for RDF triples, reasoner, and similarity model
- Protege ontology editor
Competency Questions
Following the classic approach outlined in Ontology Development 101 (Noy and McGuinness), I compiled a list of Competency Questions that my project must answer, both for the cataloging project itself and also the larger effort of cataloging all my R and SPARQL code.
- Full Catalog Questions (excerpt)
- Where are R Shiny scripts that use SPARQL to obtain data for display in a visNetwork graph?
- Do I have a code example for a Lollipop Chart?
- Is there a link to StackOverflow for this code example?
- What do the visualizations look like for this RShiny app?
- What is the most recent code I wrote for Sankey plots?
- Have code files or images changed since the project was cataloged?
- ...and many more.
A second set of questions focus on the code for the File Catalog Project itself. The same methods for creating the inventory are followed in both cases.
- File Catalog Project Questions (excerpt)
- What is the R program execution order in this project?
- Which R files produce the TTL files that are uploaded to the graph database?
- Which R scripts render RMarkdown files?
- ...and more.
A successful project is one where the data model and instance data are able to satisfy the competency questions that remain within the agreed upon project scope.
The questions lead to the development of an ontology and data model that will drive the project. Entities and relationships are identified from the questions and formed into a data model that is revised as the project progresses. Standards are developed to ensure data is collected that will answer the competency questions.