Natural language processing for comorbidity detection in electronic health records of COVID-19 patients in a Swiss hospital group

,

doi:10.4414/smi.2022.00001

Original article

Natural language processing for comorbidity detection in electronic health records of COVID-19 patients in a Swiss hospital group

DOI: https://doi.org/10.4414/smi.2022.00001
Publication Date: 10.03.2022
Swiss Med Informatics. 2022;(01-2021-SMI):XX-XX

Schöning Verena^a, Liakoni Evangelia^a, Drewe Jürgen^b, Hammann Felix^a

Affiliations

^a Clinical Pharmacology and Toxicology, Department of General Internal Medicine, Inselspital, Bern University Hospital, University of Bern, Switzerland

^b Department of Clinical Pharmacology, University Hospital Basel, Switzerland

Summary

Several risk factors have been identified for severe clinical outcomes of COVID-19. Some can be found in structured data of patients’ Electronic Health Records (EHRs). Others are included in medical histories as unstructured free-text, and thus cannot be easily detected automatically.

We propose a real-time system for comorbidity detection in German free-text medical histories for triage support at a tertiary Swiss hospital group using dictionary-based Natural Language Processing (NLP). While focusing on arterial hypertension, chronic heart failure, atrial fibrillation/flutter, coronary heart disease, asthma, chronic obstructive pulmonary disease (COPD), diabetes, dementia, and cancer, we were able to detect comorbidities with an accuracy of ≥98%. Thus, the use of a basic NLP algorithm can provide a timesaving support for risk assessment, especially in patients with long medical histories and multiple comorbidities

Background

Coronavirus disease 19 (COVID-19) is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It was first identified in Wuhan, China, in December 2019 [1], and spread globally with rising numbers of cases and deaths [2].

Several risk factors have been identified for a severe COVID-19 progression. Some of the reported risk factors, such as age [3, 4], sex [5, 6], or obesity [7–10], are included in tabulated form in the electronic health records (EHRs). However, information on other relevant comorbidities such as diabetes [11–13], cardiac [14–16] and pulmonary diseases [17, 18], cancer [19], or dementia [20] are usually included as free-text in the medical history of the EHR and can be time-consuming to retrieve. This holds in particular for patients with a long medical history and conditions that might not be included separately in the list of diagnoses (e.g., pre-diabetes). Encoding into ICD-10 (International Classification of Diseases) identifiers – which would be highly suitable for machine processing – is often performed only after discharge or death of the patient and thus not readily available on admission.

Natural language processing (NLP) can be used to extract information such as comorbidities from unstructured text. Even though numerous tools for medical NLP exist, the implementations are mainly suitable only for English text [21]. Although dictionary-based approaches were successfully implemented in other languages [22, 23], we were not able to find a suitable implementation for German EHRs in-house. Due to the specialised language used within EHRs, the application of existing tools which were trained on other domains might not perform as expected without further adaptation [24]. NLP approaches have previously been used in German EHRs for highly specific tasks, such as radiological, nephrological and cancer report generation [25-27].

Here we present an easily implementable and maintainable dictionary-based NLP algorithm for the real-time detection of comorbidities in unstructured (free-text) medical histories of German EHRs. We chose comorbidities that may influence the risk and the prognosis of a COVID-19 infection, namely arterial hypertension, chronic heart failure, atrial fibrillation or flutter, coronary heart disease, asthma, chronic obstructive pulmonary disease (COPD), diabetes, dementia, and cancer [14, 15, 17, 28, 29].

After development and internal validation, we evaluated the usefulness of this method in a clinical setting at a large hospital network in Switzerland by applying it to EHRs of patients who had SARS-CoV-2 testing at any point during the ‘first’ and ‘second’ waves of the pandemic in 2020.

Methods

Ethics approval and consent to participate

The study was approved by the Cantonal Ethics Committee of Bern (Project-ID 2020-00973). Participants either agreed to a general research consent or, for participants with no registered general research consent status (neither agreement nor rejection), a waiver of consent was granted by the ethics committee.

Study population

The retrospective study was carried out at the Insel Hospital Group (IHG), a tertiary hospital network and the biggest healthcare provider in Switzerland, with six locations and about 860,000 patients treated per year. We considered all individuals tested for SARS-CoV-2 at the IHG between 1 February and 16 November 2020, covering the ‘first wave’ and part of the ‘second wave’ of COVID-19 in the country, and who did not reject the IHG general research consent. For patients with no registered general research consent status, a waiver of consent was granted by the ethics committee. Patients who objected to the general research consent of the IHG were excluded from the study. Participation in other trials (including COVID-19-related treatment studies) was not an exclusion criterion and was not recorded separately. A reverse-transcriptase polymerase chain reaction (RT-PCR) assay on nasopharyngeal swabs was in use throughout the entire observation period as diagnostic test for SARS-CoV-2 detection. All patients had been discharged or died by the time of the analysis.

The NLP validation dataset consisted of patients who were tested positive for SARS-CoV-2 during the first wave in Switzerland (February to August 2020) and whose medical history was available. In total, the validation dataset consisted of 138 patients. The medical histories of the EHRs were screened manually by two of the authors for the following comorbidities of interest: arterial hypertension, chronic heart failure, atrial fibrillation or flutter, coronary heart disease, asthma, COPD, diabetes (including type I, type II and pre-diabetes), dementia (including mild cognitive impairment), and cancer [14, 15, 17, 28, 29].

For the application example use case, patients were classified according to their test result and disease severity, with the worst outcome at any point determining the class:

Negative: Patients who always tested negative for SARS-CoV-2
Positive: Patients who tested positive for SARS-CoV-2 at any point
Non-severe: Patients who tested positive for SARS-CoV-2, but were neither admitted to the intensive care unit (ICU) nor died of any cause during their hospital stay.
Severe: Composite outcome for patients who tested positive for SARS-CoV-2 and required ICU admission at any stage during the disease or died of any cause during their hospital stay.

Natural language processing

The comorbidities of interest were identified in the medical history, which is the unstructured, free-text part of the EHRs. In a first step, we merged the information of different time points into a single text file per patient and removed duplicate sentences automatically, keeping only the earliest entry, as information was often copied and pasted within the medical history. We then read the text files in as data corpus with which we performed the actual NLP (fig. 1). General data cleaning of these files consisted of converting special characters (e.g., umlauts such as ‘ä’ to ‘ae’), setting the whole text to lower case, and removing dates. Sentences were then tokenised, i.e., the text was separated into single (half-) sentences (tokens) [30]. For the separation of tokens, we used paragraphs and punctuation (periods and commas), with the exception of decimal punctuation for numbers. Tokens with fewer than two characters were removed. Also, we removed tokens that contained information on family history as identified by key terms (e.g., mother, father, uncle). As the analysed medical history consisted mainly of catch phrases and half-sentences, we decided against a removal of stop words and part-of-speech tagging. For each comorbidity, a specific dictionary of key terms was generated, which consisted of common terms (one or more words) and abbreviations used to describe the diseases and comorbidities in German EHR systems. The key term list was created based on suggestions from the physicians involved in this study and an unstructured internet survey of German-speaking medical websites (e.g., from other hospitals). As there are many different terms for the diverse types of cancer, another approach was chosen for this entity. We first scanned all EHRs for words containing cancer-specific terms such as ‘cancer’, ‘carcinoma’, or ‘neoplasia’, and created a key terms list. This list was then appended with other common terms and abbreviations for specific types of cancer, for instance ‘ALL’ for acute lymphoblastic leukaemia [31]. As some terms are used more or less interchangeably (e.g., ‘carcinoma’ and ‘cancer’) and sometimes the spelling was German or Latin (‘karzinoma’ vs ‘carcinoma’), we standardised this by replacement in the key terms and also in the EHRs, even though this sometimes resulted in unusual terms. This accounted for the different terminologies used by different physicians without unnecessarily inflating the key terms list.

As we also used the key terms for selective dictionary-based correction of spelling errors and lemmatization, the single terms were not chosen to be grammatically correct in the context, but to be close to all possible grammatical forms. For spelling correction and lemmatization, the key terms list was transformed into a dictionary where each word appeared only once. Words within the medical history were replaced with the corresponding word of the dictionary when the similarity threshold of the Levenshtein distance (minimum number of characters required to transform one word to another) was equal to or bigger than 90%. This threshold was set empirically. Then, for each token within the medical history, we checked for the presence of the defined key terms. If the key term consisted of more than one word (e.g., ‘arterial hypertension’), the algorithm was set to allow up to three words between the single words of the key term, so phrases such as ‘arterial and pulmonary hypertension’ could be correctly detected. This further reduced the need for removing stop words. Additionally, the order of the terms could be reversed within the token compared to the key terms list.

If the key term was present, we determined the context of the key term (affirmation or negation) using a German implementation of NegEx [32]. The output of the negation detection was screened and the negation trigger list appended accordingly where necessary (e.g. ‘non-insulin’). In addition, false positive results could be identified during that screening and the pre-processing was amended accordingly (e.g., the nutrient solution ‘All-in-one^®’ caused interference during cancer detection with the ‘ALL’ abbreviation for acute lymphoblastic leukaemia). Additionally, on a random basis, we examined the context of the identified key terms by extracting the token with the key term, and the token before and after it. In the case of an equal or bigger number of affirmed than negated statements, the patient was tagged as ‘positive’ for the comorbidity. The disease status of each patient as determined by NLP was compared with the manually tagged label. For cases where the automatically detected disease status differed from the manually determined one, the medical history was screened again, and, if possible, the key term list was corrected accordingly. In some cases, the comorbidity detected by NLP was correct and we amended the tagged label. The NLP detection was considered suitable if the accuracy per disease was ≥95%.

Software and statistical tests

NLP was performed in Python (version 3.8.5). The text file corpus was generated using the Natural Language Toolkit (nltk) package (version 3.5) [33]. Steps requiring regular expressions (e.g., text cleaning, sentence tokenisation, word replacement) were performed using the re package (version 2.2.1). For spelling correction and lemmatization, we used the FuzzyWuzzy package (version 0.18.0) and specific, manually generated keyword lists. Detection of negation was performed using NegEx [34] in combination with its German implementation [32].

Data processing, analysis and visualisation was performed in GNU R (version 4.0.2, R Foundation for Statistical Computing, http://www.R-project.org, Vienna, Austria). Standard statistics, such as chi-square or Fisher’s exact tests, were conducted using the stats package (version 4.0.2).

Comparisons were performed between (a) patients tested negative and positive for SARS-CoV-2 and (b) patients with a non-severe and severe clinical manifestation of COVID-19. Statistical significance levels between the different severity cohorts were determined using the chi-square test or the Fisher’s exact test (sample size ≤5) for categorical parameters. A p-value of <0.05 was considered statistically significant.

Results

The comorbidities of interest, namely arterial hypertension, chronic heart failure, atrial fibrillation or flutter, coronary heart disease, asthma, COPD, diabetes, dementia, and cancer could be automatically detected in the German EHR of the IHG using NLP with an accuracy of ≥98%. Detailed validation metrics for all diseases studied are presented in table 1. The final key terms used for each disease are shown in the supplementary material (table S1).

Table 1:

Validation metrics of natural language processing (NLP) detection of diseases.

Disease	Accuracy	Sensitivity / recall	Specificity	Positive predictive value / precision	Negative predictive value
aHT	0.99	0.99	1	1	0.99
Chronic heart failure	0.99	1	0.99	0.97	1
Atrial fibrillation or flutter	0.99	1	0.99	0.94	1
Coronary heart disease	0.99	1	0.98	0.9	1
Asthma	0.99	0.9	1	1	0.98
COPD	0.99	0.91	1	1	0.99
Dementia	0.99	1	0.98	0.9	1
Cancer	0.99	0.95	0.98	0.91	0.99
Diabetes	0.98	0.97	1	1	0.99

aHT: arterial hypertension; COPD: chronic obstructive pulmonary disease

Further investigation into misclassification revealed several instances that could not be fixed by improving the key terms. In one case, gestational diabetes was diagnosed by the screening physician based on the reported results of the oral glucose tolerance test, without gestational diabetes being explicitly mentioned in the EHR. In some cases, the diagnosis was mentioned in a part of the EHR which was eventually not used in the NLP pipeline. Especially in the detection of cancer, the rule-based system struggled with abbreviations, as those are ambiguously used. We judged the necessary amendment of the key terms list as risk of over-fitting and decided against it.

The comorbidities of a total of 6250 included patients were detected using the NLP algorithm. The patients were categorised as either SARS-CoV-2 negative (n = 5664) or positive (n = 586), and – among the latter – as having a non-severe (n = 461) or severe (n = 125) clinical course. The distribution of comorbidities is shown in table 2.

Table 2:

Distribution of risk factors as determined by automated analysis of the electronic health records (EHRs).

Diseases	Test for SARS-CoV-2			COVID-19 clinical manifestation
	Negative (N = 5664)	Positive (N = 586)	p-value	Non-severe (N = 461)	Severe (N = 125)	p-value

aHT; n (%)	2270 (40.1%)	270 (46.1%)	0.006	192 (41.7%)	78 (62.4%)	<0.002
Chronic heart failure; n (%)	1591 (28.1%)	140 (23.9%)	0.035	97 (21.04%)	43 (34.4%)	0.003
Atrial fibrillation or flutter; n (%)	1025 (18.1%)	84 (14.3%)	0.027	59 (12.80%)	25 (20.0%)	0.058
Coronary heart disease; n (%)	1052 (18.6%)	92 (15.7%)	0.098	56 (12.15%)	36 (28.8%)	<0.002
Asthma; n (%)	350 (6.2%)	52 (8.9%)	0.015	45 (9.8%)	7 (5.6%)	0.203
COPD; n (%)	612 (10.8%)	45 (7.7%)	0.023	28 (6.1%)	17 (13.6%)	0.009
Dementia; n (%)	516 (9.1%)	55 (9.4%)	0.885	40 (8.7%)	15 (12.0%)	0.339
Cancer; n (%)	1407 (24.8%)	96 (16.4%)	<0.002	73 (15.8%)	23 (18.4%)	0.582
Diabetes; n (%)	1,129 (19.9%)	155 (26.5%)	<0.002	111 (24.1%)	44 (35.2%)	0.017

Bold numbers indicate significant differences (p <0.05) between compared groups. aHT: arterial hypertension; COPD: chronic obstructive pulmonary disease

Discussion

During the ongoing pandemic, the swift identification of patients at risk for infection and a severe course of COVID-19 is highly important. Manual identification of risk factors is a time-consuming task, especially for patients with co-morbidities, as the respective medical histories might be very long. Using NLP on EHRs poses several challenges, as written clinical text contains abbreviations, acronyms, spelling errors and nested negations [30]. Furthermore, most available software solutions focus on English language text. We were able to develop an NLP algorithm to detect relevant comorbidities as risk factors for COVID-19 in the medical history of patients at time of admission. The selected key terms allowed for a detection in German EHRs with an accuracy of ≥98% on the validation dataset.

The threefold use of the key terms for dictionary-based spelling error correction, lemmatization and disease detection had several advantages:

Independence from spelling error correction and lemmatization tool, being able to deal with German medical notes;
Standardisation of grammatical forms and expressions, which led to a reduction of required key terms;
Quick implementation of new comorbidities by defining new key term lists.

Furthermore, even though the algorithm was deliberately kept simple to allow for easier use and maintenance (e.g., no part-of-speech tagging was performed), we were able to show its utility within this defined setting.

In the example use case, arterial hypertension and diabetes were identified as risk factors for a positive COVID-19 test result as well as for a more severe clinical outcome. This is in line with observations of other studies [35]. Diabetes was seen in approximately one third of patients with a severe disease course. This proportion was within the range of other studies [36, 37].

Conclusions

In conclusion, we were able to develop a suitable NLP algorithm for real-time screening of German EHRs for comorbidities relevant for COVID-19 with an accuracy of ≥98%. Our technique can easily be transferred to other sites, including general medical practices. Use of NLP can provide a timesaving support for risk assessment, especially in patients with long medical histories and multiple comorbidities. Even though the applied approach is simplistic and not comparable to the performance of commercially available solutions (e.g., 3M™ 360 Encompass™ System for computer-assisted coding or ID DIACOS^®) the setup was easy and suitable for the research question. These advanced systems also allow for e.g. ICD-10 coding and integrate feedback from hundreds of hospitals worldwide [38]. However, depending on the setting and planned use, the required fees might outweigh the benefit of use. As all components used in our study are available free of charge, it can be applied in a variety of research questions, e.g. as pre-selection tool when working with historical EHRs, where encoding is missing and/or no automatic encoding system is implemented. Some limitations of this algorithm have to be considered: the setup is more time-consuming and site-specific than out-of-the-box solutions, and needs to be adapted to the use case. Additionally, a dictionary-based approach does not consider semantic connections, which are important for the linguistic context. Therefore, it is not suitable for research questions for which a deeper understanding of the written text is needed.

Acknowlegements

We thank Noel Frey, Myoori Wijayasingham, and the Insel Data Science Centre for database and infrastructure support.FH and VS conceptualised this study.

Author contributions: VS and FH performed the data analysis. FH and EL contributed to data extraction. All authors critically revised and approved the final manuscript.

Data availability

The datasets used and/or analyses during the current study are available from the corresponding author on reasonable request.

Financial disclosure

Not applicable.

Competing interests

The authors declare that they have no competing interests

Correspondence

Felix Hammann, MD, PhD

Clinical Pharmacology and Toxicology

Department of General Internal Medicine

Inselspital, Bern University Hospital

Freiburgstrasse

CH-3010 Bern

Felix.Hammann@insel.ch

References

1. Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG A new coronavirus associated with human respiratory disease in China. Nature. 2020 Mar;579(7798):265–9. http://dx.doi.org/10.1038/s41586-020-2008-3 PubMed 1476-4687

2. Johns Hopkins University. COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). 2020 [cited 2020 November 17]; Available from: https://coronavirus.jhu.edu/map.html

3. Bencivenga L, Rengo G, Varricchi G. Elderly at time of COronaVIrus disease 2019 (COVID-19): possible role of immunosenescence and malnutrition. Geroscience. 2020 Aug;42(4):1089–92. http://dx.doi.org/10.1007/s11357-020-00218-9 PubMed 2509-2723

4. Cunha LL, Perazzio SF, Azzi J, Cravedi P, Riella LV. Remodeling of the Immune Response With Aging: Immunosenescence and Its Potential Impact on COVID-19 Immune Response. Front Immunol. 2020 Aug;11:1748–1748. http://dx.doi.org/10.3389/fimmu.2020.01748 PubMed 1664-3224

5. Peckham H, de Gruijter NM, Raine C, Radziszewska A, Ciurtin C, Wedderburn LR Male sex identified by global COVID-19 meta-analysis as a risk factor for death and ITU admission. Nat Commun. 2020 Dec;11(1):6317. http://dx.doi.org/10.1038/s41467-020-19741-6 PubMed 2041-1723

6. Klein SL, Dhakal S, Ursin RL, Deshpande S, Sandberg K, Mauvais-Jarvis F. Biological sex impacts COVID-19 outcomes. PLoS Pathog. 2020 Jun;16(6):e1008570. http://dx.doi.org/10.1371/journal.ppat.1008570 PubMed 1553-7374

7. Kang Z, Luo S, Gui Y, Zhou H, Zhang Z, Tian C Obesity is a potential risk factor contributing to clinical manifestations of COVID-19. Int J Obes. 2020 Dec;44(12):2479–85. http://dx.doi.org/10.1038/s41366-020-00677-2 PubMed 1476-5497

8. Kwok S, Adam S, Ho JH, Iqbal Z, Turkington P, Razvi S Obesity: A critical risk factor in the COVID-19 pandemic. Clin Obes. 2020 Dec;10(6):e12403. http://dx.doi.org/10.1111/cob.12403 PubMed 1758-8111

9. Földi M, Farkas N, Kiss S, Zádori N, Váncsa S, Szakó L, KETLAK Study Group. Obesity is a risk factor for developing critical condition in COVID-19 patients: A systematic review and meta-analysis. Obes Rev. 2020 Oct;21(10):e13095. http://dx.doi.org/10.1111/obr.13095 PubMed 1467-789X

10. Sattar N, McInnes IB, McMurray JJ. Obesity Is a Risk Factor for Severe COVID-19 Infection: Multiple Potential Mechanisms. Circulation. 2020 Jul;142(1):4–6. http://dx.doi.org/10.1161/CIRCULATIONAHA.120.047659 PubMed 1524-4539

11. Zhu L, She ZG, Cheng X, Qin JJ, Zhang XJ, Cai J Association of Blood Glucose Control and Outcomes in Patients with COVID-19 and Pre-existing Type 2 Diabetes. Cell Metab. 2020 Jun;31(6):1068–1077.e3. http://dx.doi.org/10.1016/j.cmet.2020.04.021 PubMed 1932-7420

12. Tadic M., Cuspidi C., The influence of diabetes and hypertension on outcome in COVID-19 patients: Do we mix apples and oranges? The Journal of Clinical Hypertension, 2020. n/a(n/a).

13. Lim S, Bae JH, Kwon HS, Nauck MA. COVID-19 and diabetes mellitus: from pathophysiology to clinical management. Nat Rev Endocrinol. 2021 Jan;17(1):11–30. http://dx.doi.org/10.1038/s41574-020-00435-4 PubMed 1759-5037

14. Ssentongo P, Ssentongo AE, Heilbrunn ES, Ba DM, Chinchilli VM. Association of cardiovascular disease and 10 other pre-existing comorbidities with COVID-19 mortality: A systematic review and meta-analysis. PLoS One. 2020 Aug;15(8):e0238215. http://dx.doi.org/10.1371/journal.pone.0238215 PubMed 1932-6203

15. Inciardi RM, Adamo M, Lupi L, Cani DS, Di Pasquale M, Tomasoni D Characteristics and outcomes of patients hospitalized for COVID-19 and cardiac disease in Northern Italy. Eur Heart J. 2020 May;41(19):1821–9. http://dx.doi.org/10.1093/eurheartj/ehaa388 PubMed 1522-9645

16. Collard D, Nurmohamed NS, Kaiser Y, Reeskamp LF, Dormans T, Moeniralam H Cardiovascular risk factors and COVID-19 outcomes in hospitalised patients: a prospective cohort study. BMJ Open. 2021 Feb;11(2):e045482. http://dx.doi.org/10.1136/bmjopen-2020-045482 PubMed 2044-6055

17. Yang J, Zheng Y, Gou X, Pu K, Chen Z, Guo Q Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis. Int J Infect Dis. 2020 May;94:91–5. http://dx.doi.org/10.1016/j.ijid.2020.03.017 PubMed 1878-3511Edifix has not found an issue number in the journal reference. Please check the volume/issue information. (Ref. 17 "Yang, et al., 2020")

18. Olloquequi J. COVID-19 Susceptibility in chronic obstructive pulmonary disease. Eur J Clin Invest. 2020 Oct;50(10):e13382. http://dx.doi.org/10.1111/eci.13382 PubMed 1365-2362

19. Desai A, Khaki AR, Kuderer NM. Use of Real-World Electronic Health Records to Estimate Risk, Risk Factors, and Disparities for COVID-19 in Patients With Cancer. JAMA Oncol. 2021 Feb;7(2):227–9. http://dx.doi.org/10.1001/jamaoncol.2020.5461 PubMed 2374-2445

20. Wang Q, Davis PB, Gurney ME, Xu R. COVID-19 and dementia: analyses of risk, disparity, and outcomes from electronic health records in the US. Alzheimers Dement. 2021 Aug;17(8):1297–306. http://dx.doi.org/10.1002/alz.12296 PubMed 1552-5279

21. Starlinger J., How to improve information extraction from German medical records. it - Information Technology, 2017. 59(4): p. 171-179.

22. Wolf M, Petukhova V, Klako D. Term-Based Extraction of Medical Information: Pre-Operative Patient Education Use Case. in Proceedings of Recent Advances in Natural Language Processing. 2018. Varna, Bulgaria.

23. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507–13. http://dx.doi.org/10.1136/jamia.2009.001560 PubMed 1527-974X

24. Ferraro JP, Daumé H, Duvall SL, Chapman WW, Harkema H, Haug PJ. Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. J Am Med Inform Assoc. 2013 Sep-Oct;20(5):931–9. http://dx.doi.org/10.1136/amiajnl-2012-001453 PubMed 1527-974X

25. Toepfer M, Corovic H, Fette G, Klügl P, Störk S, Puppe F. Fine-grained information extraction from German transthoracic echocardiography reports. BMC Med Inform Decis Mak. 2015 Nov;15(1):91. http://dx.doi.org/10.1186/s12911-015-0215-x PubMed 1472-6947CrossMark reports an erratum (or similar issue). The CrossMark type is "correction". Additional information can be found at https://doi.org/10.1186/s12911-015-0226-7. (Ref. 25 "Toepfer, Corovic, Fette, Klügl, Störk, Puppe, 2015")

26. Becker M, Kasper S, Böckmann B, Jöckel KH, Virchow I. Natural language processing of German clinical colorectal cancer notes for guideline-based treatment evaluation. Int J Med Inform. 2019 Jul;127:141–6. http://dx.doi.org/10.1016/j.ijmedinf.2019.04.022 PubMed 1872-8243Edifix has not found an issue number in the journal reference. Please check the volume/issue information. (Ref. 26 "Becker, et al., 2019")

27. Roller R mEx - An Information Extraction Platform for German Medical Text. in Proceedings of the 11th International Conference on Semantic Web Applications and Tools for Healthcare and Life Sciences (SWAT4HCLS'2018). Semantic Web Applications and Tools for Healthcare and Life Sciences (SWAT4HCLS-2018), December 3-5. 2018. Antwerp, Belgium.

28. Nandy K, Salunke A, Pathak SK, Pandey A, Doctor C, Puj K Coronavirus disease (COVID-19): A systematic review and meta-analysis to evaluate the impact of various comorbidities on serious events. Diabetes Metab Syndr. 2020 Sep - Oct;14(5):1017–25. http://dx.doi.org/10.1016/j.dsx.2020.06.064 PubMed 1878-0334

29. Liu N, Sun J, Wang X, Zhao M, Huang Q, Li H. The Impact of Dementia on the Clinical Outcome of COVID-19: A Systematic Review and Meta-Analysis. J Alzheimers Dis. 2020;78(4):1775–82. http://dx.doi.org/10.3233/JAD-201016 PubMed 1875-8908

30. Dalianis H. Clinical Text Mining: Secondary Use of Electronic Patient Records. 2018: Springer Nature.

31. ONKO-Internetportal. Krebsarten A-Z. 2018 14.04.2021]; Available from: https://www.krebsgesellschaft.de/basis-informationen-krebs/krebsarten.html

32. Cotik V., Negation Detection in Clinical Reports Written in German. 2016.

33. Bird S, Klein E, Loper E. Natural Language Processing with Python. 2009: O'Reilly Media, Inc.

34. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001 Oct;34(5):301–10. http://dx.doi.org/10.1006/jbin.2001.1029 PubMed 1532-0464

35. Wolff D Risk factors for Covid-19 severity and fatality: a structured literature review. Infection. 2021. PubMed 0300-8126

36. Bhatraju PK, Ghassemieh BJ, Nichols M, Kim R, Jerome KR, Nalla AK Covid-19 in Critically Ill Patients in the Seattle Region - Case Series. N Engl J Med. 2020 May;382(21):2012–22. http://dx.doi.org/10.1056/NEJMoa2004500 PubMed 1533-4406

37. Arentz M, Yim E, Klaff L, Lokhandwala S, Riedo FX, Chong M Characteristics and Outcomes of 21 Critically Ill Patients With COVID-19 in Washington State. JAMA. 2020 Apr;323(16):1612–4. http://dx.doi.org/10.1001/jama.2020.4326 PubMed 1538-3598

38. 3M. 3M™ 360 Encompass™ System for computer-assisted coding. 2021 [cited 2021 December 20]; Available from: https://www.3m.com/3M/en_US/health-information-systems-us/improve-revenue-cycle/coding/facility/360-encompass-computer-assisted-coding/

Copyright

Published under the copyright license
“Attribution – Non-Commercial – NoDerivatives 4.0”.
No commercial reuse without permission.
See: emh.ch/en/emh/rights-and-licences/

Original article

Natural language processing for comorbidity detection in electronic health records of COVID-19 patients in a Swiss hospital group