Shared SI Service



In this page



Service and Datasets

This is a public SPARQL endpoint driven by open source VirtuosoWeb Site Linking Policy and accessible from https://shared.semantics.cancer.gov/sparql. It is meant to provide an integrated view of the terminology and data elements produced and/or published by the Cancer Data Standards Registry and Repository (caDSR) and Enterprise Vocabulary Services (EVS) projects in the Semantic Infrastructure group. We anticipate that eventually it will include various other datasets of utility to the NCI.

The endpoint is fronted by this web-site which provides a SPARQL query editorWeb Site Linking Policy for ad hoc queries to facilitate testing queries and data exploration, documentation (this page), sample queries, and various file downloads.

The web server running this web site also examines queries sent to the endpoint for various issues. This has two consequences:

  1. The rules might on some cases prevent valid queries from executing.
    • If you believe that your query is valid, please contact us at NCIAppSupport@nih.gov, we will review your query and either rewrite it or edit our rules to allow your query to run.
  2. Add to the response time, which will depend on the query itself and the size of the resultset and can add between 20 and 500 msecs to the response.
    • If this is an issue for you, and your application runs from within the NCI network or externally on behalf of an NCI program, please contact us at NCIAppSupport@nih.gov to be safelisted. We will contact you for additional information.

The current datasets in the endpoint are:

  • A subset of the data in caDSR, primarily the Data Elements sans some administrative information (no forms are included). Monthly XML exports from the caDSR are taken and converted to RDF and loaded into the quadstore; the monthly export taken for processing is from the first day of the month.
  • The NCI Thesarus (NCIt) in a flat RDF representation. Monthly distributions of the NCIt in OWL DL are processed and the flat RDF generated. This RDF consists of a conversion of class expressions to simple assertions so that, with the exception of owl:Axioms, blank nodes are eliminated.

We are not currently making the RDF available but could periodically post a downloadable version to serve as an example of the dataset(s). Please let us know if this is of interest to you via email at NCIAppSupport@nih.gov.

In addition to the support email above, you can post requests or issues in the "shared-si-issues"Web Site Linking Policy umbrella Github repository created for this purpose. Some of the repositories for this project are private, so this issue repository covers all aspects of the project.



Data examples

Example TurtleWeb Site Linking Policy representation of the caDSR data element ID 2936520

cmdr:DE2936520 rdfs:label 
       "Person Dual X-ray Absorptometry Body Composition Value" ;
   cmdr:short_name "PERS_DXA_BOD_COM_VAL" ;
   cmdr:publicId "2936520" ;
   a rdfs:Class ;
   isomdr:version "1" ;
   skos:definition "Body composition measured by Dual-energy X-ray 
       Absorptiometry (DXA) measured at 8 years and older. [Manually
       curated]_The numerical result of the determination of the 
       vertical force exerted by the mass of an individual as a 
       result of gravity." ;
   isomdr:registration_status "Application" ;
   isomdr:administration_status "RELEASED" ;
   skos:altLabel "Measurement of the study subject lean body mass 
       using Dual-energy X-ray Absorptiometry (DXA) measured at 8
       years and older";
   isomdr:VD_publicId "2179445" ;
   isomdr:VD_version "2" ;
   cmdr:value_domain_type "NonEnumerated" ;
   cmdr:value_domain_datatype "NUMBER" ;
   isomdr:DEC_Long_Name "Person Dual X-ray Absorptometry Body 
       Composition" ;
   isomdr:DEC_publicId "2925903" ;
   isomdr:Object_Class
      [ cmdr:main_concept ncit:C25190 ; cmdr:display_order "0" ] ;
   isomdr:Property
      [ cmdr:main_concept ncit:C53414 ; cmdr:display_order "0" ] ,
      [ cmdr:minor_concept ncit:C13041 ; cmdr:display_order "1" ] ,
      [ cmdr:minor_concept ncit:C48789 ; cmdr:display_order "2" ] .

[] a owl:Axiom ;
   owl:annotatedSource cmdr:DE2936520 ;
   owl:annotatedProperty skos:altLabel ;
   owl:annotatedTarget "Measurement of the study subject lean body 
       mass using Dual-energy X-ray Absorptiometry (DXA) measured 
       at 8 years and older" ;
   cmdr:alternate_name_type "Preferred Question Text" .

Example Turtle representation of the NCIt (rdf) concept C18369

ncit:C18369 rdf:type owl:Class ;
   rdfs:subClassOf ncit:C18340 ;
   ncit:R155 ncit:C13625 ;
   ncit:R37 ncit:C17133 ;
   ncit:R41 ncit:C14225 ;
   ncit:P102 "U02082" ;
   ncit:NHC0 "C18369" ;
   ncit:P208 "CL448410" ;
   ncit:P108 "Oncogene TIM" ;
   ncit:P90 "Oncogene TIM" ;
   ncit:P97 "Oncogene TIM encodes a predicted 60 kD protein containing
       a DBL homology domain, shared by several signal transducing 
       regulators of small GTP-binding proteins. TIM is thought to 
       control cytoskeletal organization through regulation of small 
       GTP-binding proteins. The human gene is located at 7q33-q35." ;
   ncit:P90 "Rho guanine nucleotide exchange factor 5 gene" ;
   ncit:P100 "600888" ;
   rdfs:label "Oncogene TIM" ;
   ncit:P90 "Transforming immortalized mammary oncogene" ,
      "Guanine nucleotide regulatory protein TIM gene" ;
   ncit:P106 "Gene or Genome" ;
   ncit:P366 "Oncogene_TIM" .

[ rdf:type owl:Axiom ;
   owl:annotatedSource ncit:C18369 ;
   owl:annotatedProperty ncit:P90 ;
   owl:annotatedTarget "Oncogene TIM" ;
   ncit:P383 "PT" ;
   ncit:P384 "NCI"
 ] .

  • The Turtle syntax for RDF and SPARQL are very similar.Web Site Linking Policy Looking at the compact Turtle representation can be very valuable when writing a SPARQL query.
  • In the two examples to the left, caDSR above, and NCIt below, carriage returns have been introduced within the text values of several properties. These carriage returns render the representations invalid but it's done to increase readability in the page (escaped characters are allowed).
  • The . ; , (dot, semicolon, comma) noted in the Sample Queries page are used in the same manner in Turtle and in SPARQL (see Predicate ListsWeb Site Linking Policy and Object ListsWeb Site Linking Policy in the Turtle spec).
  • Blank nodes can be denoted with the [...] bracket notation.
    • The caDSR isomdr:Object_Class and isomdr:Property predicates show the blank nodes objects with the pattern [ p1 o1 ; p2 o2 ], and the owl:Axiom's subject as [].
    • The NCIt owl:Axiom is represented entirely within brackets.
    • Except for the owl:Axioms in the flat RDF representation of the NCIt, there are no other blank nodes. Arbitrary class expressions in OWL DL blank nodes makes it difficult to build queries generically.
  • Although the data might be loaded to the quadstore with a given set of prefixes (i.e. cmdr, isomdr), those prefixes are not persisted. You query the data using the prefixes that you declare (or with full IRIs).
  • Using prefixes, the bracket notation for blank nodes, and using path expressions in complex queries, can increase the readability and clarify the intent of the query.


Graph Names used in this service (excluding internal graphs)

Graph Name
http://www.openlinksw.com/schemas/virtrdf#
http://www.w3.org/ns/ldp#
http://localhost:18890/sparql
http://localhost:18890/DAV/
http://www.w3.org/2002/07/owl#
http://cbiit.nci.nih.gov/srro
http://cbiit.nci.nih.gov/caDSR
http://www.geneontology.org/GO
http://ncim.nci.nih.gov/NCIMetathesaurus.rdf
http://ncicb.nci.nih.gov/xml/owl/EVS/ThesaurusInf.rdf

The following query returns the current graph names:

select distinct ?graph 
where { 
  graph ?graph { ?s ?p ?o } 
} limit 100


Common prefixes already declared in this service

PrefixIRI
dchttp://purl.org/dc/elements/1.1/
owlhttp://www.w3.org/2002/07/owl#
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfshttp://www.w3.org/2000/01/rdf-schema#
skoshttp://www.w3.org/2004/02/skos/core#
voidhttp://rdfs.org/ns/void#
xmlhttp://www.w3.org/XML/1998/namespace
xsdhttp://www.w3.org/2001/XMLSchema#

Even though these common prefixes are "known" by this service, we recommend that you include prefixes in your queries. One exception is for the case mentioned further below, involving casting values from one numeric type to another using the xsd datatypes.



Namespaces referenced in caDSR RDF

DescriptionIRI
caDSRhttp://cbiit.nci.nih.gov/caDSR#
ISO 11179http://www.iso.org/11179/MDR#
MADS/RDFhttp://www.loc.gov/mads/rdf/v1#
NCI Thesaurushttp://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#
NCI Metathesaurus http://ncim.nci.nih.gov/NCIMetathesaurus.owl#
MGED Ontologyhttp://mged.org/MGEDOntology#
NDF-RT http://evs.nci.nih.gov/ftp1/NDF-RT/NDF-RT.owl#
CTCAE v5http://ncicb.nci.nih.gov/xml/owl/EVS/ctcae5.owl#
LOINChttps://loinc.org/code#
SNOMEDhttp://snomed.info/id#
RadLexhttp://radlex.org/RID/
caBIGhttp://ncicb.nci.nih.gov/cabig#
MedDRAhttps://identifiers.org/meddra#
Gene Ontologyhttp://purl.org/obo/owl/GO#


SPARQL coverage and protocol

The SPARQL Query forms select, ask, describe, and construct are all supported. SPARQL UpdateWeb Site Linking Policy and the Graph Store HTTP ProtocolWeb Site Linking Policy are not supported.

  • HTTP GET and URL-encoded POST are supported; the table below only shows information for POST requests.
  • SPARQL queries must be encoded in UTF-8.Web Site Linking Policy
  • The data in this service is Unicode, your Accept content type should include charset=utf-8
  • Content-Type in POST request: application/x-www-form-urlencoded
  • Parameters: query is required. There are additional HTTP request parameters in the spec that can be included, e.g. default-graph-uri, or specific to the service, e.g. maxrows. For the most part the info can be passed as part of the query and we are considering not supporting them in the future.
Query formAccept Content-Type
select
  • application/sparql-results+xml
  • application/sparql-results+json
  • text/tab-separated-values
  • text/csv
  • text/html
ask
  • application/sparql-results+xml
  • application/sparql-results+json
  • text/tab-separated-values
  • text/csv
  • text/html
describe
  • text/turtle
  • text/rdf+n3
  • application/rdf+xml
construct
  • text/turtle
  • text/rdf+n3
  • application/rdf+xml


Do, Don't, Hints

  • Do not include comments or carriage returns in your query when accessing the endpoint directly. Comments can be included in the SPARQL query editor, but only up to a point, the query could wind up being rejected by this service.

  • Include prefixes in your query, but do not include a prefix for XMLSchema if you need to cast a value to a float (this is a bug), e.g.
    
    prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    prefix xsd: <https://www.w3.org/2001/XMLSchema#>
    select ?labelA ?labelB ( xsd:float(strlen(?labelA)) / xsd:float(strlen(?labelB)) as ?fx)
    from ...
    
  • Identify the datasets (graphs) that you want to query using the FROM, FROM NAMED or GRAPH keywords, for instance
    
    select ?x ?y ?z
    from <graphname>
    where ...
    
  • Include a limit on the resultset no matter how small you expect it to be, e.g.
    
    select ?x ?y ?z
    from <graphname>
    where { ?x ?y ?z }
    limit 1000
    
    Use of the limit keyword is not currently being enforced, but it will be in the future.
    • If there's a need to generate a large resultset (>10000 rows), you can page with order by... limit... offset...