2021年 05期

Prediction of Gene Binding Protein Based on Deep Learning and Support Vector Machine


摘要(Abstract):

针对传统的方法对蛋白质预测的精度低且需要人工提取环节等问题,提出一种基于深度学习和支持向量机的基因结合蛋白预测算法;该算法将卷积神经网络与门控循环单元结合,搜索蛋白质序列,保留蛋白质序列中氨基酸的位置依赖性,利用支持向量机代替神经网络的Softmax分类器对蛋白质的特征序列进行预测;将该模型分别在基准数据集DBP2858和PDB14189上进行对比实验。结果表明,该模型具有更好的脱氧核糖核酸结合蛋白预测能力,并且预测精度和效率均较高。

关键词(KeyWords): 深度学习;支持向量机;脱氧核糖核酸结合蛋白;蛋白质预测

基金项目(Foundation): 国家自然科学基金项目(61662028);; 广西科技计划项目(2019AC20168)

作者(Author): 陈佐瓒,徐兵,丁小军,甘井中

DOI: 10.13349/j.cnki.jdxbn.20210309.001

参考文献(References):

[1] KUMAR M,GROMIHA M M,RAGHAVA G P S.Identification of DNA-binding proteins using support vector machines and evolutionary profiles[J].BMC Bioinformatics,2007,8:463.

[2] QU Y H,YU H,GONG X J,et al.On the prediction of DNA-binding proteins only from primary sequences:a deep learning approach[J].PLoS One,2017,12(12):e0188129.

[3] KAHNG M,ANDREWS P Y,KALRO A,et al.ActiVis:visual exploration of industry-scale deep neural network models[J].IEEE Transactions on Visualization and Computer Graphics,2018,24(1):88-97.

[4] WANG X Z,WU P Z,LIU G,et al.Learning performance prediction via convolutional GRU and explainable neural networks in e-learning environments[J].Computing,2019,101(6):587-604.

[5] SHORE J,JOHNSON R.Axiomatic derivation of the principle of maximum entropy[J].IEEE Transaction on Information Theory,1980,26(1):26-37.

[6] GUO Y Z,YU L Z,WEN Z N,et al.Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences[J].Nucleic Acids Research,2008,36(9):3025-3030.

[7] 张戈.基于DeepCNN的拟南芥TFBS预测模型的构建及其在植物跨物种的迁移学习应用[D].武汉:华中农业大学,2019.

[8] 肖飞.深度学习方法在生物质谱及蛋白质组学中的应用[J].饮食科学,2019(10):98,100.

[9] 王艳,谢广苏,沈晓宇.一种基于MSER和SWT的新型车牌检测识别方法研究[J].计量学报,2019,40(1):82-90.

[10] 赵新元,秦伟捷,钱小红.深度学习方法在生物质谱及蛋白质组学中的应用[J].生物化学与生物物理进展,2018,45(12):1214-1223.

[11] 刘申.预测DNA结合蛋白的可解释深度学习模型研究[D].天津:天津大学,2018.

[12] 李洪顺,于华,宫秀军.一种只利用序列信息预测RNA结合蛋白的深度学习模型[J].计算机研究与发展,2018,55(1):93-101.

[13] 曲宇辉.利用序列信息预测DNA结合蛋白的深度学习算法研究[D].天津:天津大学,2018.

[14] IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[J].Progress in Biochemistry and Biophysics,2018,38(12):134-141.

[15] LIU B,LIU F L,FANG L Y,et al.RepRNA:a web server for generating various feature vectors of RNA sequences[J].Molecular Genetics and Genomics,2016,291:473-481.

[16] ZHANG H,FU Y,FENG L B,et al.Implementation of hybrid alignment algorithm for protein database search on the SW26010 many-core processor[J].IEEE Access,2019,7:128054-128063.

[17] ZUO L Q,SUN H M,MAO Q C,et al.Natural scene text recognition based on encoder-decoder framework[J].IEEE Access,2019,2019,7:62616-62623.

[18] YOSINSKI J,CLUNE J,BENGIO Y,et al.How transferable are features in deep neural networks[J].Advances in Neural Information Processing Systems,2014,27:3320-3328.

[19] MacKAY D J C.The evidence framework applied to classification networks[J].Neural Computation,1992,4(5):720-736.

[20] LOBO I.Basic local alignment search tool (BLAST)[J].Journal of Molecular Biology,2012,215(3):403-410.

[21] WU T F,LIN C J,WENG R C.Probability estimates for multi-class classification by pairwise coupling[J].Journal of Machine Learning Research,2004,5(4):975-1005.

[22] SAGENDORF J M,BERMAN H M,ROHS R.DNAproDB:an interactive tool for structural analysis of DNA-protein complexes[J].Nucleic Acids Research,2017,45(W1):W89-W97 .

[23] LIU B,XU J H ,FAN S X,et al.PseDNA-Pro:DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation[J].Molecular Informatics,2015,34(1):8-17.