Semantics in action: a guide for representing clinical data elements with SNOMED CT

The competency building framework and rule creation cycle have been applied for the past 4 years [2], resulting in the following set of rules. They apply first to fully understanding the data element’s meaning, then to representing that meaning accurately (Fig. 2). The rules are grouped into four categories: 1) Context, 2) Single concepts first, 3) Approved post-coordination, and 4) Extending post-coordination, which each contain two subrules. Semantic accuracy is the foundation of our strategy and the intended outcome of the guide’s application. This principle extends to all the following rules, and examples of how semantic inaccuracy can lead to meaning loss are given throughout. Most of the rules apply to more complex tasks, so concern mainly post-coordination. We follow an order of priorities, grouped into the four categories mentioned above, to use the simplest and least number of codes possible and to ensure consistency. A quick guide summarizing the rules and a detailed example of the application of the guide is available in Appendix A.

Fig. 2figure 2

A summary of the rules established and their intended sequence. After understanding the context (steps 1 and 2), the corresponding SCT expression is derived from one of the following steps depending on the complexity of the data element

Context

Context encompasses both the clinical acquisition context, i.e., patient and medical service characteristics, and the structural context, i.e., how fields are structured in a clinical form. Both are equally important to consider. When an unclear data element is encountered, context is needed to define it before representation.

Structural context

When representing fields from a clinical form, the following situation was encountered in the DWH: a form called “PF_0886”, with an id of “FORMIDOC_90” and label of “suites” (roughly translated to “what happens next”). Without further investigation, this field could be incorrectly represented as “308273005 |Follow-up status (finding)|”. However, after opening the EHR and finding the correct form (Obstetrics service—Pregnancy record) at the correct section (Post Partum), it can be correctly understood and represented (Fig. 3).

Fig. 3figure 3

Complete understanding achieved through visualisation of the data element’s structural context (translated from French)

Sometimes a data element encountered seems simple to represent, but is ambiguous or lacking detail, and context is needed to disambiguate it. The label “breast”, for example, can have different meanings when located in two different forms, once as a checkbox and once as part of a drop list. It is easy to represent both as “76752008 |Breast structure (body structure)|”, but that would be incorrect (Fig. 4). Although both are part of an “oncological history” section, one is about the patient, the other about the patient’s family.

Fig. 4figure 4

The term “breast” has different meanings depending on context in the electronic health record (translated from French)

Clinical acquisition context

The environment in which data is acquired plays an important role in the meaning attached to it. One example is the language used by the caregiver and the system. Most of our database is in French, but the most complete version of SCT is the International Edition, which only includes English descriptions. SCT is described as language-independent but was created by an English-speaking team, and therefore has an English-biased concept segmentation. If concepts are sufficiently defined, their relations should unambiguously convey the intended meaning. However, considering the large number of primitive concepts with little or no stated attribute relations, errors remain possible. Specific care must be taken to avoid semantic shifts to properly represent French expressions in English concepts while conserving meaning. Even words which are lexically identical can be semantically opposites, and just translating the labels of data elements can lead to major errors. For example, in English, nyctalopia is defined as the inability to see in low light, or night-blindness [28]. In French, it is the opposite, as nyctalopie is the ability to see in low light [29]. Abbreviations can also mean different things, such as BAV, which in French can mean “Bloc auriculo-ventriculaire” (233917008 |Atrioventricular block (disorder)|) or “baisse de l’acuité visuelle” (13164000 |Reduced visual acuity (finding)|). The temptation to use the first code which comes to mind must be resisted and the context understood first.

Single concepts first

It is important to be as precise as possible by choosing the most granular code, preferably with single concepts first. Granularity increases as one descends a hierarchy, as children are refinements of their parents [30, 31]. There are different ways to ensure use of single codes first to avoid unnecessary PC.

Exploration

The first is by exploring SCT’s poly-hierarchy and inferring attributes from parent concepts. One concept often has multiple parents, to which it is linked by “is a “ relationships [30], and multiple descendants which are linked to it by the same relationship. By exploring the parents, children of neighbouring branches, and semantically similar terms (“tree-walk” [12]), the most appropriate concept to use can be located [32]. When deciding which concept to use, choosing the correct hierarchy, and paying close attention to the synonyms, descriptions, and relationships is very important. For example, take the French term “thrombaphérèse”. When searching for the equivalent concept with “thrombapheresis” or “thrombopheresis”, no results appear in SNOMED CT’s browser. However, by searching for a similar term such as “plasmapheresis”, then exploring its parents and its children, the appropriate term of “plateletpheresis” can be found (Fig. 5).

Fig. 5figure 5

Exploring the SNOMED CT hierarchy to find the appropriate concept by (1) finding a similar term, then (2) exploring its surrounding concepts

When there is no sufficiently precise single code, using a more general concept should be avoided as it is preferable to avoid any information loss.

Inference

As stated above, inference is already included in SCT with stated and inferred concept definitions [33]. Using clinical knowledge and common sense, inference is also possible by the annotator when certain details are inferred from the context with a high level of confidence. This allows the use of a single code which appears less precise but is complete. When representing the concept “fracture of the tibial tuberosity”, there are multiple options available (Fig. 6). The pre-coordinated expressions available have additional information on fracture morphology, so PC seems the best option at first. In clinical settings however, the distinction between open and closed fractures is generally only made when the former is true. Therefore, if a fracture is not explicitly described as open, it can be considered closed, and a single code chosen to represent it without losing any information. The opposite applies to wounds, which are considered open by default.

Fig. 6figure 6

Inferring information to avoid unnecessary post-coordination

Approved PC

When post-coordination becomes unavoidable, different styles means there are different ways to express similar meanings [34, 35]. While this is unavoidable, rules are necessary to maintain consistency as much as possible.

Multiple focus expressions

First use multiple focus expressions, which are usually limited to concepts from the same top-level hierarchy [36]. When representing the clinical concept “post-infectious reactionary bacterial arthritis”, two SCT concepts encompass almost all the required information: “12913305 |Post-bacterial arthropathy (disorder)|”, and “239783001 |Post-infective arthritis (disorder)|”. They are neither equivalent nor complete however, as information on inflammatory morphology and bacterial causality, respectively, is missing. PC is therefore required. Combining the two gives a fully defined term and allows representation with a high level of precision while avoiding complex PC (Fig. 7). When semantically equivalent representations exist, always choose the simplest one.

Fig. 7figure 7

Different but semantically equivalent ways of using post-coordination

Refinements

The next step is to use PC with refinements defined by SI, which becomes complex with the addition of attribute groups and nested refinements [37]. This must be done in compliance with SI’s rules, but will not be detailed specifically as sufficient guidelines exist from SI and others. Referring to the MRCM and Editorial Guide is generally sufficient for proficient refined PC [38, 39].

Extending PC

Despite SCT’s extensiveness [40, 41], challenges inevitably arise when no pre-coordinated expressions or approved post-coordination fit. In this case, extending SCT’s compositional grammar and editorial guidance is the only solution. Such changes modify the MRCM and must be added to the institution’s classifier to avoid classification issues, which can be a complex process. Different examples are shown in Table 1.

Table 1 Attribute extensions and use of unapproved attributesDomain and range extensions

The first way is by using approved attributes in situations that resemble what they have been approved for. This can be done by extending an attribute’s domain, range, or both. Since these are approved attributes, maintaining coherency with current rules is mandatory, which means that definitions of attributes cannot be changed. For the attribute “255234002 |After (attribute)|”, “ < < 71388002 |Procedure (procedure)|” has been added to the domain, but the range and definition remain unchanged, only expanding the scope.

New attributes

As a last resort, unapproved attributes should be used sparingly, as they may disappear or change in future SCT versions. They are useful for refining hierarchies which have no approved attributes, such as “410607006 |Organism (organism)|”, and for representing scalar values such as sizes [42]. To ensure consistent use, a complete set of domain, range, cardinalities, and definition must be validated. These new attributes, if correctly classified, can then be submitted to the national SNOMED centre for further approval if necessary. As always, but especially for new attributes, all expressions are reviewed extensively during focus groups to ensure semantic equivalence with the source data and a uniform approach.

Comments (0)

No login
gif