Результаты исследований: Глава в книге, отчете, сборнике статей › Материалы конференции › Рецензирование
Результаты исследований: Глава в книге, отчете, сборнике статей › Материалы конференции › Рецензирование
}
TY - GEN
T1 - Speeding up the interpretation of differential gene expression analysis results
AU - Dordiuk, Vladislav
AU - Demicheva, Ekaterina
AU - Ushenin, Konstantin
N1 - The research funding from the Ministry of Science and Higher Education of the Russian Federation (Ural Federal University Program of Development within the Priority-2030 Program) is gratefully acknowledged.
PY - 2022/11/11
Y1 - 2022/11/11
N2 - The existing methods for interpreting the differential gene expression analysis results are mainly divided into three categories: cluster analysis, enrichment analysis, and the construction of genetic networks. Despite the rich abilities, all approaches take a lot of time to compute, and the final results are not always sufficient for understanding of the logic that binds genes into groups.In this paper, we propose a complete pipeline in order to make the process of understanding the results of differential gene expression analysis much faster, easier, and more efficient. The pipeline takes in Gene Ontology terms along with descriptions of collected genes, and returns the output of gene clusters, topics they are related to, and a filtered list of most common words that can be found in each of them. The processing involves an artificial neural network model BERT for semantic information extraction, BERTopic for unsupervised topic extraction, dimensional reduction for data simplification, and clustering for the search of dependencies.The pipeline was tested with ablation study and its performance was evaluated by an expert with gene expression datasets from NCBI GEO that include different types of cardiomyopathy: dilated, inflammatory, ischemic, non-ischemic, and healthy individuals.
AB - The existing methods for interpreting the differential gene expression analysis results are mainly divided into three categories: cluster analysis, enrichment analysis, and the construction of genetic networks. Despite the rich abilities, all approaches take a lot of time to compute, and the final results are not always sufficient for understanding of the logic that binds genes into groups.In this paper, we propose a complete pipeline in order to make the process of understanding the results of differential gene expression analysis much faster, easier, and more efficient. The pipeline takes in Gene Ontology terms along with descriptions of collected genes, and returns the output of gene clusters, topics they are related to, and a filtered list of most common words that can be found in each of them. The processing involves an artificial neural network model BERT for semantic information extraction, BERTopic for unsupervised topic extraction, dimensional reduction for data simplification, and clustering for the search of dependencies.The pipeline was tested with ablation study and its performance was evaluated by an expert with gene expression datasets from NCBI GEO that include different types of cardiomyopathy: dilated, inflammatory, ischemic, non-ischemic, and healthy individuals.
UR - http://www.scopus.com/inward/record.url?partnerID=8YFLogxK&scp=85147514896
U2 - 10.1109/SIBIRCON56155.2022.10017117
DO - 10.1109/SIBIRCON56155.2022.10017117
M3 - Conference contribution
SN - 978-166546480-2
SP - 560
EP - 565
BT - 2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences, SIBIRCON 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)
Y2 - 11 November 2022 through 13 November 2022
ER -
ID: 34716358