Concept Extraction in Electronic Health Records

Image credit: Health Catalyst

Deep learning algorithms have been sucessful in addressing problems in the healthcare industry. One of the most studied tasks in this industry are electronic health records, where machines search and classify relevant tokens in text records. For this task, the deep learning models are represented by vectors that embeds the words and their contexts within the medical records. Although this dense and vectorized representation allows high precision and recall for the concept extraction task, it is not easy to determine which are the elements within the documents that guided the machine in the classification of the tokens.

An alternative to deep learing algorithms for the concept extraction task are the bootstrap learning algorithms. They can represent the semantics of the electronic health records by grouping similar words and their contexts, organized in a knowledge base. The bootstrap learning algorithms rely on shallow features of the semantics of the text, that is, it only considers words and close dependency contexts to learn. Although these algorithms can not understand long term dependencies like the deep learning algorithms, their knowledge base are built in natural language and are easy to understand.

With the recent success of deep learning models to address real world problems in the healthcare industry, there is a tendency that these complex models will be more present in the future. To allow more transparency and make these solutions more available to the general people, we propose the novel idea of employing the natural language knowledge base built by bootstrap learning algorithms to interpret complex, deep learning models for the task of concept extraction in electronic health records.

Avatar
3778 Care
Research Group