Recent Advances in Clinical Trials

Open Access ISSN: 2771-9057

Abstract


Using Bioinformatics and Machine Learning Techniques to Identify Potential Genes that may be Associated with Lung Cancer and Facilitate the Screening of Such Genes

Authors: Zeynep Kucukakcali, Ipek Balikci Cicek.

Aim: Lung cancer, the most frequently diagnosed cancer globally, is the leading cause of cancer-related deaths. Due to its increasing prevalence and low survival rates, new biomarkers are needed to diagnose the disease. Therefore, this study aims to identify potential genes that may be associated with lung cancer by bioinformatics methods using gene expression data of lung cancer and non-tumour tissues, and to classify the data with stochasting gradient boosting (SGB), one of the machine learning models, and to determine the genes that may be most associated with the disease with variable significance values obtained at the end of the model.

Methods: The data underwent bioinformatics analyses utilizing the limma package within the R programming language. During the modeling phase, the SGB model was utilized for classification purposes. The evaluation of classification performance was conducted by various measures, including accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1-score. Following the process of modeling, the variable importance values were utilized to ascertain the influential genes in relation to the target variable.

Results: Based on the outcomes of bioinformatic analysis, a total of 7098 expressions exhibited statistically significant variations in gene expression levels between the two groups. The performance metrics derived from the SGB model were accuracy (93.5%), balanced accuracy (94.1%), sensitivity (88.2%), specificity (100%), positive predictive value (100%), negative predictive value (87.5%), and F1-score (93.8%). Based on the findings pertaining to variable importance, it was determined that the AGTR1, TNXB///TNXA, and SPP1 genes exhibited significant efficacy in the process of tumorigenesis.

Conclusion: Lung cancer-associated genes have been identified through the utilization of bioinformatics and machine learning models. Through conducting thorough research on the discovered genes, it is possible to confirm the correctness of their association with the disease and subsequently design target-based treatment options for these genes.

View/Download pdf