Simple Format

The simple format is a simplified, relational view in to each terminology, mapset, or subset. It is a series of pipe-separated, UTF-8 text files that represent different aspects of the content. For each file, the columns are documented along with indicators of whether they are *required and the corresponding data type (STRING, INTEGER, or BOOLEAN) and PK to indicate the primary key.

Terminology (or code system) Files

  • concepts.txt - code|active|semanticType|conceptName

    • This file has one entry per code

    • *code (STRING, PK) = the concept code

    • *active (BOOLEAN) = true if active, false if not

    • semanticType (STRING) = the high level semantic type of the concept (related to “domain” or “entityType” but not exactly). It may be empty for things like very high level concepts in the hierarchy for some terminologies.

    • *conceptName (STRING) = the preferred name of the concept

  • attributes.txt - code|attributeName|attributeValue

    • One line for each concept attribute/property.

    • *code (STRING) = the concept code

    • *attributeName (STRING) = the attribute/property name

    • *attributeValue (STRING) = the attribute/property value

  • metadata.txt - abbreviation|description

    • Entries for expansions of abbreviated values that may be used in the other files. This is really informational to provide potentially extra contextual information about what things like term types and attribute names may mean in a more human readable way. Values are only provided where they are sourced from the original terminology

    • *abbreviation (STRING, PK) = an abbreviation for a value used in one of the other files, e.g. “PT”

    • *description (STRING) = the description (or expanded form) of the abbreviation, e.g. “Preferred Term”

  • parChd.txt - parentCode|childCode

    • One entry for each parent-child relationships

    • Some terminologies have “poly-hierarchy” which means individual nodes can have multiple parents. When this occurs, a transitive closure computation of the “tree positions” produces multiple entries for the same concept/code.

    • *parentCode (STRING)= code of the parent concept

    • *childCode (STRING)= code of the child concept

  • relationships.txt - fromCode|defining|group|type|additionalType|toCode

    • One entry per non parent/child relationship. These are lateral relationships in the terminology that may be part of a “description logic” definition of the content or simply represent other kinds of associations between codes. Inactive relationships are NOT loaded into TermHub.

    • *fromCode (STRING) = the code on the left-hand-side of the relationship (the “from” or the “source”)

    • defining (BOOLEAN) = a true/false indicator of whether this is part of a logical definition for the concept

    • group (STRING) = used primarily by SNOMEDCT terminologies to “group” relationships together. For example a SNOMEDCT concept may have a “finding site” and a “morphology” relationship that are bound to each other. For example, a concept may assert that a “finding site” of “kidney” has a morphology of “lesion”. For concepts that express multiple finding sites, it is a way of linking the morphological abnormalities with particular finding sites.

    • *type (STRING) = a high level type of the relationship, like “other” or “broader” or “narrower”.

    • additionalType (STRING) = a more specific relationship/association type.

    • *toCode (STRING) = the code on the right-hand-side of the relationship (the “to” or the “target”)

  • terms.txt - code|termId|active|language|termType|term

    • Each line represents one of the terms associated with this concept. There will be one entry that matches the name also shown in the concepts file. In other words, every name associated with the concept can be found in this file. Attributes for these entries can be found in termAttributes.txt

    • *code (STRING) = the concept code

    • *termId (STRING, PK) = the id of this entry, so it can be linked to in termAttributes.txt

    • *active (BOOLEAN) = true if active, false if not

    • *language (STRING) = the language code for the synonym, typically “en”

    • *termType (STRING) = this is a value that indicates the type of term. This may be a value like “PT” for (preferred term) or “SY” (for synonym), or may be a more complex value. This information is derived from original source data and has to be interpreted/conceptualized in that context. There is no overarching “standard” model for term types. However, this field is useful to understand how to use the synonym.

    • *term (STRING) = the actual designation/name/term/display of the term.

  • termAttributes.txt - code|termId|attributeName|attributeValue

    • Each line represents an attribute of one of the synonym entries (and there may be multiple attributes). What entries exist here depend on the nature of the terminology and how it defines attributes on synonyms.

    • *code (STRING) = the concept code of the term. This field is denormalized but makes it easy to “grep” for a code across the simple files and see all data associated with it.

    • *termId (STRING) = the id of this entry, so it can be linked from terms.txt

    • *attributeName (STRING) = the attribute/property name

    • *attributeValue (STRING) = the attribute/property value

  • version.txt - abbreviation|description|uri|oid|version|releaseDate

    • Contains a header with a single line for terminology metadata

    • *abbreviation (STRING, PK) = a simple abbreviation for the terminology, e.g. “SNOMEDCT_US” used within TermHub to identify the terminology

    • *description (STRING) = a description of the terminology

    • uri (STRING) = the FHIR code system URI for the terminology

    • oid (STRING) = the FHIR code system OID for the terminology

    • *version (STRNG) = the version for this terminology

    • *releaseDate (STRING) = the release date always represented as YYYY-MM-DD. This could be represented as a date but is intended to be alpha sortable to determine whether a version is newer than a previously loaded one.

Mapset (or concept map) Files

  • mapsets.txt - code|fromTerminology|toTerminology|name|description|mappingsFile

    • This file has one entry per mapset. More than one entry is allowed because we use this format to import data into TermHub, but in the export context there will only ever be one line in this file.

    • *code (STRING, PK) = the mapset code

    • *fromTerminology (STRING) = the abbreviation of the terminology for the “from” or “source” side of the map

    • *toTerminology (STRING) = the abbreviation of the terminology for the “to” or “destination” side of the map

    • *name (STRING) = the name of the mapset

    • *description (STRING) = a description of the mapset

    • *mappingsFile (STRING) = the filename for the mappings of this mapset. In exported simple mapsets this value will always be “mappings.txt”

  • mappings.txt - fromCode|toCode

    • One line per mapping

    • *fromCode (STRING) = the code of the concept being mapped “from”. The mapsets.txt file will indicate which terminology is involved via the fromTerminology field.

    • *toCode (STRING) = the code of the concept being mapped “to”. The mapsets.txt file will indicate which terminology is involved via the toTerminology field.

  • mappings.json - not a relational file, but this is created because “complex” mappings may have more information than the simple format can support and this is way to leverage that while keeping mappings.txt as simple as possible.

  • version.txt - abbreviation, description, uri, oid, version, releaseDate

    • Contains a header with a single line for mapset metadata. Identical format as for terminology downloads to maintain consistency.

    • *abbreviation (STRING, PK) = a simple abbreviation for the terminology, e.g. “SNOMEDCT_US-ICD10CM” used within TermHub to identify the mapset

    • *description (STRING) = a description of the mapset

    • uri (STRING) = the FHIR code system URI for the mapset

    • oid (STRING) = the FHIR code system OID for the mapset

    • *version (STRNG) = the version for this mapset

    • *releaseDate (STRING) = the release date always represented as YYYY-MM-DD. This could be represented as a date but is intended to be alpha sortable to determine whether a version is newer than a previously loaded one.

Subset (or value set) Files

  • subsets.txt - code|name|description|membersFile

    • This file has one entry per subset. More than one entry is allowed because we use this format to import data into TermHub, but in the export context there will only ever be one line in this file.

    • *code (STRING, PK) = the subset code

    • *name (STRING) = the name of the mapset

    • *description (STRING) = a description of the subset

    • *membersFile (STRING) = the filename for the mappings of this mapset. In exported simple mapsets this value will always be “mappings.txt”

  • members.txt - terminology|code

    • *terminology (STRING) = the abbreviation of the terminology for subset member code. This field exists because subsets can reference members from multiple terminologies

    • *code (STRING,) = the subset member code

  • members.json - not a relational file, but this is created because subset members may have more information than the simple format can support and this is way to leverage that while keeping members.txt as simple as possible.

  • version.txt - abbreviation, description, uri, oid, version, releaseDate

    • Contains a header with a single line for mapset metadata. Identical format as for terminology downloads to maintain consistency.

    • *abbreviation (STRING) = a simple abbreviation for the subset, e.g. “SNOMEDCT-CORE” used within TermHub to identify the subset

    • *description (STRING) = a description of the subset

    • uri (STRING) = the FHIR code system URI for the subset

    • oid (STRING) = the FHIR code system OID for the subset

    • *version (STRNG) = the version for this subset

    • *releaseDate (STRING) = the release date always represented as YYYY-MM-DD. This could be represented as a date but is intended to be alpha sortable to determine whether a version is newer than a previously loaded one.