Basic Relationships

Classes of Things

From the Competency Questions and File Headers it is possible to identify and classify several different types of entities that must appear in the data model.

You may want to answer the question, “What R files (scripts, markdown, RShiny, datafiles) are part of the project?” One way to do this is to define subclasses for these types of files. A File entity can belong to one or more File Type classes. The file types RDA, RMD, etc. are defined as subclasses of file type R. When a reasoner is used, a query for FileTypeR will return instance data that are members of its subclasses, thus finding all the various types of R files. The Protege ontology editor is useful for specifying classes and subclasses in a file that is uploaded to the database along with the instance data.

Figure 2.1. Classes in Protege (excerpt)

Text Mining extracts bigrams and keywords from the file headers. These terms are classified in the ontology as TermBigram and TermKeyword. Their parent class Term enables a query to find both bigrams and keywords using the more generic term class. Keywords and bigrams are linked to the files as described on the Dictionary page.

In addition to serving as a source for terms extraction, the fields in the File Headers form key components of the data model in their own right. A file has a title (TITL), a description (DESC), etc. The diagram below illustrates how some values are represented as string literals, while others are IRIs that can link to other values in the graph of data.

Figure 2.2. Initial Model for File Information

Relationships between Things

A triple is the basic building block of RDF. It consists of a Subject joined to an Object by a Predicate relationship.

Figure 2.3. The Subject, Predicate, Object Model of RDF.

The video What is Linked Data? from 2012 provides a unique and informative introduction to the topic. Another good way to learn about Linked Data is by writing queries.I highly recommend Bob DuCharme’s book Learning SPARQL. A tutorial on Linked Data and the SPARQL query language is beyond the scope of these pages. Please see the Resources page for more information.

When creating Linked Data as RDF, re-use existing ontologies whenever possible for classifying entities and specifying relationships. Dublin Core Terminology (specifed with the prefix dcterms: in the above) and Schema.org (as schema:) are good sources. Many predicates in this project use a default prefix of : which refers to the IRI reference https://www.example.org/tw/filecat#. Use of a default prefix should be revisited later to increase alignment with public ontologies and to help ensure common understanding and data interoperability.

Writing out a few of the entities and relationships in common language helps illustrate how they will appear as Subject, Predicate, Object triples in RDF.

  • A File has title TITL
  • A File has description DESC
  • A File has Term Keyword

It is helpful to view triples graphically as two nodes with the relation between them. The figure below shows how a File is associated with a Keyword. Recall how the ontology defines both bigrams and keywords as types of terms.

Figure 2.4. Relationship between a File and a Term (Keyword or Bigram).

Mouseover the black mouse to the left of this text to learn how to interact with this image.

The hasTerm predicate is defined in Protege as an owl:ObjectProperty using Web Ontology Language (OWL).

  :hasTerm rdf:type owl:ObjectProperty .

Every file has one or more terms. Likewise, a term can be assigned to one or more files. It is useful to be able to query in both directions, to find files that have specified terms and to find the terms that are assigned to files. This is accomplished defining isTermOf as the owl:InverseOf hasTerm. A reasoner can then infer that every time there is a hasTerm relationship, there is also the inverse relation isTermof.

:hasTerm  rdf:type      owl:ObjectProperty ;
          owl:inverseOf :isTermOf .
:isTermOf rdf:type      owl:ObjectProperty .

Inferred Relations

In the example data below, both ExampleFile1.R and ExampleFile2.R have the term visnetwork. A reasoner will infer the inverse relationship isTermOf, shown as an orange dotted line.

Figure 2.5. Inferred Relation “isTermOf.”

You can interact with this image.

The utlity of classifying keywords and bigrams as terms, then using hasTerm / isTermOf relationships, will be demonstrated in chapter 6. Finding Files

A similar inferred relation for hasNext / hasPrevious will be used in chapter 5. Process and Code Flow.

NEXT: