世界科技研究与发展 ›› 2024, Vol. 46 ›› Issue (6): 831-849.doi: 10.16507/j.issn.1006-6055.2024.01.001 cstr: 32308.14.1006-6055.2024.01.001

• 数字医学 • 上一篇    下一篇

电子健康记录数据挖掘技术研究进展

麻笑生1,2,3 刘巍1,2 王思丽1,2 杨恒1,2   

  1. 1.中国科学院西北生态环境资源研究院;2.甘肃省知识计算与决策智能重点实验室;3.中国科学院大学经济与管理学院信息资源管理系
  • 出版日期:2025-01-03 发布日期:2025-01-03
  • 基金资助:
    甘肃省自然科学基金“甘肃省医疗健康大数据资产管理模式与再利用机制研究”(23JRRA581),甘肃省哲学社会科学规划“基于大数据技术提升新闻媒体舆论监督能力研究”(2021YB158)

Research Progress of Electronic Health Record Mining Technology

MA Xiaosheng1,2,3 LIU Wei1,2 WANG Sili1,2 YANG Heng1,2   

  1. 1. Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences; 2. Key Laboratory of Knowledge Computing and Intelligent Decision; 3. Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences
  • Online:2025-01-03 Published:2025-01-03

摘要: 人工智能和数据驱动的电子健康记录(EHR)挖掘可以发现潜在医学规律和知识,为精准化、个性化医疗决策和健康管理提供高价值情报和技术方法支撑。本文从Web of Science、PubMed以及CNKI数据库中检索相关EHR挖掘的文献,通过可视化发文趋势及关键词共现分析领域研究热点与趋势。在充分了解EHR数据类型和数据库来源的基础上,对科学界现有EHR挖掘技术方法及其优缺点进行归纳总结与对比分析。研究发现,目前EHR挖掘技术可分为基于关联规则、词典和规则相结合、统计机器学习、深度学习四种,其中基于深度学习的EHR数据挖掘技术是当前的研究热点和趋势,可对大规模复杂异构的EHR数据进行高效挖掘和结果预测。总体研究仍存在挖掘结果可解释性差、技术方法单一和融合不足、智能化程度低和可移植性较差、多模态异构数据的表示学习能力不强、在医疗领域实际应用落地困难等问题。未来研究应针对EHR挖掘结果的可解释性、多模态异构数据的强表示性、EHR数据的集成和标准化,以及在临床医疗实践中的可落地性等重点展开研究。此外,随着大语言模型和知识图谱相关技术的快速发展,建议探索其在EHR挖掘领域实际应用的可行性。

关键词: 电子健康记录, 电子病历, 数据挖掘, 机器学习, 深度学习

Abstract: Artificial intelligence and data-driven EHR mining can discover potential medical laws and knowledge, and provide high-value intelligence and technical method support for precise and personalised medical decision-making and health management. In this paper, we retrieve relevant EHR mining literature from Web of Science, PubMed, and CNKI databases, and analyse the hotspots and trends of research in the field by visualising the trend of publication and keyword co-occurrence. On the basis of a full understanding of EHR data types and database sources, the existing EHR mining technology methods in the scientific community and their advantages and disadvantages are summarised and compared and analysed. The current EHR mining techniques can be divided into four types: association rule-based, dictionary and rule combination, statistical machine learning, and deep learning. Among them, deep learning-based EHR data mining technology is the current research hotspot and trend, which can efficiently mine and predict large-scale complex and heterogeneous EHR data. The overall research still exists problems such as poor interpretability of mining results, single and insufficient integration of technical methods, low degree of intelligence and poor portability, poor representation learning ability of multimodal heterogeneous data, and difficulties in landing practical applications in the medical field. Future research should focus on the interpretability of EHR mining results, strong representation of multimodal heterogeneous data, integration and standardisation of EHR data, and landability in clinical medical practice. In addition, with the rapid development of technologies related to large language models and knowledge graphs, explore the feasibility of their practical application in the field of EHR mining.

Key words: Electronic Health Records (EHR), Electronic Medical Records, Data Mining, Machine Learning, Deep Learning