Shared SI Service
In this page
Service and Datasets ↰
This is a public SPARQL endpoint driven by open source Virtuoso  and accessible from
https://shared.semantics.cancer.gov/sparql
. It is meant to provide an integrated view of the terminology and data elements produced and/or published by the Cancer Data Standards Registry and Repository (caDSR) and Enterprise Vocabulary Services (EVS) projects in the Semantic Infrastructure group. We anticipate that eventually it will include various other datasets of utility to the NCI.
The endpoint is fronted by this web-site which provides a SPARQL query editor  for ad hoc queries to facilitate testing queries and data exploration, documentation (this page), sample queries, and various file downloads.
The web server running this web site also examines queries sent to the endpoint for various issues. This has two consequences:
- The rules might on some cases prevent valid queries from executing.
- If you believe that your query is valid, please contact us at NCIAppSupport@nih.gov, we will review your query and either rewrite it or edit our rules to allow your query to run.
- Add to the response time, which will depend on the query itself and the size of the resultset and can add between 20 and 500 msecs to the response.
- If this is an issue for you, and your application runs from within the NCI network or externally on behalf of an NCI program, please contact us at NCIAppSupport@nih.gov to be safelisted. We will contact you for additional information.
The current datasets in the endpoint are:
- A subset of the data in caDSR, primarily the Data Elements sans some administrative information (no forms are included). Monthly XML exports from the caDSR are taken and converted to RDF and loaded into the quadstore; the monthly export taken for processing is from the first day of the month.
- The NCI Thesarus (NCIt) in a flat RDF representation. Monthly distributions of the NCIt in OWL DL are processed and the flat RDF generated. This RDF consists of a conversion of class expressions to simple assertions so that, with the exception of
owl:Axiom
s, blank nodes are eliminated.
We are not currently making the RDF available but could periodically post a downloadable version to serve as an example of the dataset(s). Please let us know if this is of interest to you via email at NCIAppSupport@nih.gov.
In addition to the support email above, you can post requests or issues in the "shared-si-issues"  umbrella Github repository created for this purpose. Some of the repositories for this project are private, so this issue repository covers all aspects of the project.
Data examples ↰
Example Turtle  representation of the caDSR data element ID 2936520
cmdr:DE2936520 rdfs:label
"Person Dual X-ray Absorptometry Body Composition Value" ;
cmdr:short_name "PERS_DXA_BOD_COM_VAL" ;
cmdr:publicId "2936520" ;
a rdfs:Class ;
isomdr:version "1" ;
skos:definition "Body composition measured by Dual-energy X-ray
Absorptiometry (DXA) measured at 8 years and older. [Manually
curated]_The numerical result of the determination of the
vertical force exerted by the mass of an individual as a
result of gravity." ;
isomdr:registration_status "Application" ;
isomdr:administration_status "RELEASED" ;
skos:altLabel "Measurement of the study subject lean body mass
using Dual-energy X-ray Absorptiometry (DXA) measured at 8
years and older";
isomdr:VD_publicId "2179445" ;
isomdr:VD_version "2" ;
cmdr:value_domain_type "NonEnumerated" ;
cmdr:value_domain_datatype "NUMBER" ;
isomdr:DEC_Long_Name "Person Dual X-ray Absorptometry Body
Composition" ;
isomdr:DEC_publicId "2925903" ;
isomdr:Object_Class
[ cmdr:main_concept ncit:C25190 ; cmdr:display_order "0" ] ;
isomdr:Property
[ cmdr:main_concept ncit:C53414 ; cmdr:display_order "0" ] ,
[ cmdr:minor_concept ncit:C13041 ; cmdr:display_order "1" ] ,
[ cmdr:minor_concept ncit:C48789 ; cmdr:display_order "2" ] .
[] a owl:Axiom ;
owl:annotatedSource cmdr:DE2936520 ;
owl:annotatedProperty skos:altLabel ;
owl:annotatedTarget "Measurement of the study subject lean body
mass using Dual-energy X-ray Absorptiometry (DXA) measured
at 8 years and older" ;
cmdr:alternate_name_type "Preferred Question Text" .
Example Turtle representation of the NCIt (rdf) concept C18369
ncit:C18369 rdf:type owl:Class ;
rdfs:subClassOf ncit:C18340 ;
ncit:R155 ncit:C13625 ;
ncit:R37 ncit:C17133 ;
ncit:R41 ncit:C14225 ;
ncit:P102 "U02082" ;
ncit:NHC0 "C18369" ;
ncit:P208 "CL448410" ;
ncit:P108 "Oncogene TIM" ;
ncit:P90 "Oncogene TIM" ;
ncit:P97 "Oncogene TIM encodes a predicted 60 kD protein containing
a DBL homology domain, shared by several signal transducing
regulators of small GTP-binding proteins. TIM is thought to
control cytoskeletal organization through regulation of small
GTP-binding proteins. The human gene is located at 7q33-q35." ;
ncit:P90 "Rho guanine nucleotide exchange factor 5 gene" ;
ncit:P100 "600888" ;
rdfs:label "Oncogene TIM" ;
ncit:P90 "Transforming immortalized mammary oncogene" ,
"Guanine nucleotide regulatory protein TIM gene" ;
ncit:P106 "Gene or Genome" ;
ncit:P366 "Oncogene_TIM" .
[ rdf:type owl:Axiom ;
owl:annotatedSource ncit:C18369 ;
owl:annotatedProperty ncit:P90 ;
owl:annotatedTarget "Oncogene TIM" ;
ncit:P383 "PT" ;
ncit:P384 "NCI"
] .
- The Turtle syntax for RDF and SPARQL are very similar. 
Looking at the compact Turtle representation can be very valuable when writing a SPARQL query.
- In the two examples to the left, caDSR above, and NCIt below, carriage returns have been introduced within the text values of several properties. These carriage returns render the representations invalid but it's done to increase readability in the page (escaped characters are allowed).
- The
. ; ,
(dot, semicolon, comma) noted in the Sample Queries page are used in the same manner in Turtle and in SPARQL (see Predicate Lists and Object Lists 
in the Turtle spec).
- Blank nodes can be denoted with the
[...]
bracket notation.- The caDSR
isomdr:Object_Class
andisomdr:Property
predicates show the blank nodes objects with the pattern[ p1 o1 ; p2 o2 ]
, and theowl:Axiom
's subject as[]
. - The NCIt
owl:Axiom
is represented entirely within brackets. - Except for the
owl:Axiom
s in the flat RDF representation of the NCIt, there are no other blank nodes. Arbitrary class expressions in OWL DL blank nodes makes it difficult to build queries generically.
- The caDSR
- Although the data might be loaded to the quadstore with a given set of prefixes (i.e.
cmdr
,isomdr
), those prefixes are not persisted. You query the data using the prefixes that you declare (or with full IRIs). - Using prefixes, the bracket notation for blank nodes, and using path expressions in complex queries, can increase the readability and clarify the intent of the query.
Graph Names used in this service (excluding internal graphs) ↰
Graph Name |
---|
http://www.openlinksw.com/schemas/virtrdf# |
http://www.w3.org/ns/ldp# |
http://localhost:18890/sparql |
http://localhost:18890/DAV/ |
http://www.w3.org/2002/07/owl# |
http://cbiit.nci.nih.gov/srro |
http://cbiit.nci.nih.gov/caDSR |
http://www.geneontology.org/GO |
http://ncim.nci.nih.gov/NCIMetathesaurus.rdf |
http://ncicb.nci.nih.gov/xml/owl/EVS/ThesaurusInf.rdf |
The following query returns the current graph names:
select distinct ?graph
where {
graph ?graph { ?s ?p ?o }
} limit 100
Common prefixes already declared in this service ↰
Prefix | IRI |
---|---|
dc | http://purl.org/dc/elements/1.1/ |
owl | http://www.w3.org/2002/07/owl# |
rdf | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
rdfs | http://www.w3.org/2000/01/rdf-schema# |
skos | http://www.w3.org/2004/02/skos/core# |
void | http://rdfs.org/ns/void# |
xml | http://www.w3.org/XML/1998/namespace |
xsd | http://www.w3.org/2001/XMLSchema# |
Even though these common prefixes are "known" by this service, we recommend that you include prefixes in your queries. One exception is for the case mentioned further below, involving casting values from one numeric type to another using the xsd
datatypes.
Namespaces referenced in caDSR RDF ↰
Description | IRI |
---|---|
caDSR | http://cbiit.nci.nih.gov/caDSR# |
ISO 11179 | http://www.iso.org/11179/MDR# |
MADS/RDF | http://www.loc.gov/mads/rdf/v1# |
NCI Thesaurus | http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl# |
NCI Metathesaurus | http://ncim.nci.nih.gov/NCIMetathesaurus.owl# |
MGED Ontology | http://mged.org/MGEDOntology# |
NDF-RT | http://evs.nci.nih.gov/ftp1/NDF-RT/NDF-RT.owl# |
CTCAE v5 | http://ncicb.nci.nih.gov/xml/owl/EVS/ctcae5.owl# |
LOINC | https://loinc.org/code# |
SNOMED | http://snomed.info/id# |
RadLex | http://radlex.org/RID/ |
caBIG | http://ncicb.nci.nih.gov/cabig# |
MedDRA | https://identifiers.org/meddra# |
Gene Ontology | http://purl.org/obo/owl/GO# |
SPARQL coverage and protocol ↰
The SPARQL Query forms select
, ask
, describe
, and construct
are all supported. SPARQL Update  and the Graph Store HTTP Protocol 
are not supported.
- HTTP GET and URL-encoded POST are supported; the table below only shows information for POST requests.
- SPARQL queries must be encoded in UTF-8. 
- The data in this service is Unicode, your
Accept
content type should includecharset=utf-8
- Content-Type in POST request:
application/x-www-form-urlencoded
- Parameters:
query
is required. There are additional HTTP request parameters in the spec that can be included, e.g.default-graph-uri
, or specific to the service, e.g.maxrows
. For the most part the info can be passed as part of the query and we are considering not supporting them in the future.
Query form | Accept Content-Type |
---|---|
select |
|
ask |
|
describe |
|
construct |
|
Do, Don't, Hints ↰
- Do not include comments or carriage returns in your query when accessing the endpoint directly. Comments can be included in the SPARQL query editor, but only up to a point, the query could wind up being rejected by this service.
- Include prefixes in your query, but do not include a
prefix
for XMLSchema if you need to cast a value to a float (this is a bug), e.g.prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix xsd: <https://www.w3.org/2001/XMLSchema#>select ?labelA ?labelB ( xsd:float(strlen(?labelA)) / xsd:float(strlen(?labelB)) as ?fx) from ... - Identify the datasets (graphs) that you want to query using the
FROM
,FROM NAMED
orGRAPH
keywords, for instanceselect ?x ?y ?z from <graphname> where ...
- Include a
limit
on the resultset no matter how small you expect it to be, e.g.
Use of theselect ?x ?y ?z from <graphname> where { ?x ?y ?z } limit 1000
limit
keyword is not currently being enforced, but it will be in the future.- If there's a need to generate a large resultset (>10000 rows), you can page with
order by... limit... offset...
- If there's a need to generate a large resultset (>10000 rows), you can page with