Selected publications

mMGE: a database for human metagenomic extrachromosomal mobile genetic elements.
Lai S, Jia L, Subramanian B, Pan S, Zhang J, Dong Y, Chen WH, Zhao XM.
Nucleic Acids Res. (2020)

Here we present mMGE, a comprehensive catalog of 517 251 non-redundant eMGEs, including 92 492 plasmids and 424 759 phages, derived from diverse body sites of 66 425 human metagenomic samples.

Docking sites inside Cas9 for adenine base editing diversification and RNA off-target elimination.
Li S, Yuan B, Cao J, Chen J, Chen J, Qiu J, Zhao XM, Wang X, Qiu Z, Cheng TL..
Nature communications. (2020)

Here, functional ABE variants with diversified editing scopes and reduced RNA off-target activities are identified using domain insertion profiling inside SpCas9 and with different combinations of TadA variants. Engineered ABE variants in this study display narrowed, expanded or shifted editing scopes with efficient editing activities across protospacer positions 2-16.

Identifying age-specific gene signatures of the human cerebral cortex with joint analysis of transcriptomes and functional connectomes.
Zhao X, Chen J, Xiao P, Feng J, Nie Q, Zhao XM.
Briefings in Bioinformatics. (2020)

Here, with a novel method transcriptome-connectome correlation analysis (TCA), which integrates the brain functional magnetic resonance images and region-specific transcriptomes, we identify age-specific cortex (ASC) gene signatures for adolescence, early adulthood and late adulthood.

nMAGMA: a network-enhanced method for inferring risk genes from GWAS summary statistics and its application to schizophrenia.
Yang A, Chen J, Zhao XM.
Briefings in Bioinformatics. (2020)

We propose a new approach, namely network-enhanced MAGMA (nMAGMA), for gene-wise annotation of variants from GWAS summary statistics. Compared with MAGMA and H-MAGMA, nMAGMA significantly extends the lists of genes that can be annotated to SNPs by integrating local signals, long-range regulation signals (i.e. interactions between distal DNA elements), and tissue-specific gene networks.

STAB: a spatio-temporal cell atlas of the human brain.
Song L, Pan S, Zhang Z, Jia L, Chen WH, Zhao XM.
Nucleic Acids Res. (2020)

Here, we present STAB (a Spatio-Temporal cell Atlas of the human Brain), a database consists of single-cell transcriptomes across multiple brain regions and developmental periods. Right now, STAB contains single-cell gene expression profiling of 42 cell subtypes across 20 brain regions and 11 developmental periods.

Deep learning of brain magnetic resonance images: A brief review.
Zhao X, Zhao XM.
Methods. (2020)

In this survey, we give a brief review of the recent popular deep learning approaches and their applications in brain MRI analysis. Furthermore, popular brain MRI databases and deep learning tools are also introduced. The strength and weaknesses of different approaches are addressed, and challenges as well as future directions are also discussed.

GMrepo: a database of curated and consistently annotated human gut metagenomes.
Wu S, Sun C, Li Y, Wang T, Jia L, Lai S, Yang Y, Luo P, Dai D, Yang YQ, Luo Q, Gao NL, Ning K, He LJ, Zhao XM, Chen WH.
Nucleic Acids Res. (2020)

GMrepo (data repository for Gut Microbiota) is a database of curated and consistently annotated human gut metagenomes. Its main purpose is to facilitate the reusability and accessibility of the rapidly growing human metagenomic data.

Hierarchical graphical model reveals HFR1 bridging circadian rhythm and flower development in Arabidopsis thaliana.
Duren Z, Wang Y, Wang J, Zhao XM, Lv L, Li X, Liu J, Zhu XG, Chen L, Wang Y.
NPJ Syst Biol Appl. (2019)

Here, we proposed a hierarchical graphical model to estimate TF activity from mRNA expression by building TF complexes with protein cofactors and inferring TF’s downstream regulatory network simultaneously. Then we applied our model on flower development and circadian rhythm processes in Arabidopsis thaliana.

EnImpute: imputing dropout events in single cell RNA sequencing data via ensemble learning.
Zhang XF, Ou-Yang L, Yang S, Zhao XM, Hu X, Yan H.
Bioinformatics. (2019)

Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result.

Identification of Functional Gene Modules Associated With STAT-Mediated Antiviral Responses to White Spot Syndrome Virus in Shrimp.
Zhu G, Li S, Wu J, Li F, Zhao XM.
Frontiers in Physiology (2019)

In this work, based on the gene expression profiles of shrimp with an injection of WSSV and STAT double strand RNA (dsRNA), we constructed a gene co-expression network for shrimp and identified the gene modules that are possibly responsible for STAT-mediated antiviral responses.

DeepPhos: prediction of protein phosphorylation sites with deep learning.
Luo F, Wang M1, Liu Y, Zhao XM, Li A.
Bioinformatics (2019)

In this study we present DeepPhos, a novel deep learning architecture for prediction of protein phosphorylation. Unlike multi-layer convolutional neural networks, DeepPhos consists of densely connected convolutional neuron network blocks which can capture multiple representations of sequences to make final phosphorylation prediction by intra block concatenation layers and inter block concatenation layers.

MVP: a microbe-phage interaction database.
Gao NL, Zhang C, Zhang Z, Hu S, Lercher MJ, Zhao XM, Bork P, Liu Z, Chen WH.
Nucleic Acids Research (2018)

The main purpose of MVP (Microbe Versus Phage) is to provide a comprehensive catalog of phage–microbe interactions and assist users to select phage(s) that can target (and potentially to manipulate) specific microbes of interest.

HISP: A Hybrid Intelligent Approach for Identifying Directed Signaling Pathways.
Zhao XM, Li S.
Journal of Molecular Cell Biology (2017)

In this paper, we propose a novel hybrid intelligent method, namely HISP (Hybrid Intelligent approach for identifying directed Signaling Pathways), to determine both the topologies of signaling pathways and the direction of signaling flows within a pathway based on integer linear programming and genetic algorithm. By integrating the protein−protein interaction, gene expression, and gene knockout data, our HISP approach is able to determine the optimal topologies of signaling pathways in an accurate way.

CSTEA: a webserver for the Cell State Transition Expression Atlas.
Zhu G, Yang H, Chen X, Wu J, Zhang Y, Zhao XM.
Nucleic Acids Research (2017)

Here, we present CSTEA (Cell State Transition Expression Atlas), a webserver that organizes, analyzes and visualizes the time-course gene expression data during cell differentiation, cellular reprogramming and trans-differentiation in human and mouse.

PhosD: inferring kinase-substrate interactions based on protein domains.
Qin GM, Li RY, Zhao XM.
Bioinformatics (2017)

In this paper, we propose a novel probabilistic model named as PhosD to predict kinase–substrate relationships based on protein domains with the assumption that kinase–substrate interactions are accomplished with kinase–domain interactions.

GEAR: A database of Genomic Elements Associated with drug Resistance.
Wang YY, Chen WH, Xiao PP, Xie WB, Luo Q, Bork P, Zhao XM.
Scientific Reports (2017)

Here, we present GEAR (A database of Genomic Elements Associated with drug Resistance) that aims to provide comprehensive information about genomic elements (including genes, single-nucleotide polymorphisms and microRNAs) that are responsible for drug resistance.

Identifying disease associated miRNAs based on protein domains.
Qin GM, Li RY,Zhao XM.
IEEE/ACM Transactions on Computational Biology and Bioinformatics (2016)

In this work, we present a new approach to identify disease associated miRNAs based on domains, the functional and structural blocks of proteins. The results on real datasets demonstrate that our method can effectively identify disease related miRNAs with high precision.

The exploration of network motifs as potential drug targets from post-translational regulatory networks.
Zhang XD, Song J, Bork P, Zhao XM.
Scientific Reports (2016)

In this work, we construct a post-translational regulatory network (PTRN) consists of phosphorylation and proteolysis processes, which enables us to investigate the regulatory interplays between these two PTMs.

PPIM: A Protein-Protein Interaction Database for Maize.
Zhu G, Wu A, Xu XJ, Xiao PP, Lu L, Zhao XM, et al.
Plant Physiology (2016)

In this work, we present the Protein-Protein Interaction Database for Maize (PPIM), which covers 2,762,560 interactions among 14,000 proteins. The PPIM contains not only accurately predicted PPIs but also those molecular interactions collected from the literature. The database is freely available at with a user-friendly powerful interface.

Oxidized glutathione (GSSG) inhibits epithelial sodium channel activity in primary alveolar epithelial cells.
Downs CA, Kreiner L,Zhao XM, Trac P, Johnson NM, et al.
American Journal of Physiology-Lung Cellular and Molecular Physiology (2015)

In the present study, we used single channel patch-clamp recordings to examine the effect of oxidative stress, via direct application of glutathione disulfide (GSSG), on ENaC activity.

Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks
Zhang X, Zhao J, Hao JK, Zhao XM, Chen L.
Nucleic Acids Research (2015)

In this work, to overcome the problems, we propose a novel concept, namely conditional mutual inclusive information (CMI2), to describe the regulations between genes. Furthermore, with CMI2, we develop a new approach, namely CMI2NI (CMI2-based network inference), for reverse-engineering GRNs.

Identifying cancer-related microRNAs based on gene expression data.
Zhao XM, Liu KQ, Zhu G, He F, Duval B, Richer JM, Huang DS, Jiang CJ, Hao JK, Chen L.
Bioinformatics (2015)

We present a novel computational framework to identify the cancer-related miRNAs based solely on gene expression profiles without requiring either miRNA expression data or the matched gene and miRNA expression data.

jNMFMA: a joint non-negative matrix factorization meta-analysis of transcriptomics data.
Wang HQ, Zheng CH, Zhao XM.
Bioinformatics (2015)

This article proposes a new meta-analysis method for identification of DEGs based on joint non-negative matrix factorization (jNMFMA).

Cascleave 2.0, a new approach for predicting caspase and granzyme cleavage targets.
Wang M, Zhao XM, Tan H, Akutsu T, Whisstock J, and Song J.
Bioinformatics (2014)

We develop a new bioinformatics tool, termed Cascleave 2.0, which builds on previous success of the Cascleave tool for predicting generic caspase cleavage sites.

A survey on computational approaches to identifying disease biomarkers based on molecular networks.
Qin G, Zhao XM.
Journal of theoretical biology (2014)

In this paper, we surveyed the recent progress on the computational approaches that have been developed to identify disease biomarkers based on molecular networks.

Human monogenic disease genes have frequently functionally redundant paralogs.
Chen WH, Zhao XM, Noort V and Bork P.
PLoS Computational Biology (2013)

We propose that functional compensation by duplication of genes masks the phenotypic effects of deleterious mutations and reduces the probability of purging the defective genes from the human population; this functional compensation could be further enhanced by higher purification selection between disease genes and their duplicates as well as their orthologous counterpart compared to non-disease genes.

eFG: an electronic resource for Fusarium graminearum.
Liu X, Zhang X, Tang WH, Chen L, Zhao XM.
Database (Oxford) (2013)

In this work, we present a comprehensive database, namely eFG (Electronic resource for Fusarium graminearum), to the community for further understanding this destructive pathogen.

NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference.
Zhang X, Liu K, Liu ZP, Duval B, Richer JM, Zhao XM, Hao JK, Chen L.
Bioinformatics (2013)

In this work, we present a novel method, namely NARROMI, to improve the accuracy of GRN inference by combining ordinary differential equation-based recursive optimization (RO) and information theory-based mutual information (MI).

Identifying dysregulated pathways in cancers from pathway interaction networks.
Liu KQ, Liu ZP, Hao JK, Chen L, Zhao XM.
BMC Bioinformatics (2012)

In this paper, we propose a novel approach to identify dysregulated pathways in cancer based on a pathway interaction network.

Identifying disease genes and module biomarkers by differential interactions.
Liu X, Liu ZP, Zhao XM, Chen L.
Journal of the American Medical Informatics Association : JAMIA (2012)

In this paper, we present a novel approach to predict disease genes and identify dysfunctional networks or modules, based on the analysis of differential interactions between disease and control samples, in contrast to the analysis of differential gene or protein expressions widely adopted in existing methods.

Prediction of drug combinations by integrating molecular and pharmacological data.
Zhao XM, Iskar M, Zeller G, Kuhn M, van Noort V, Bork P.
PLOS Computational Biology (2011)

Here, we present a novel computational approach to predict drug combinations by integrating molecular and pharmacological data.

A Systems biology approach to identify effective cocktail drugs.
Wu Z, Zhao XM, Chen L.
BMC Systems Biology (2010)

In this paper, we presented a novel network-based systems biology approach to identify effective drug combinations by exploiting high throughput data.

A discriminative approach to identifying domain-domain interactions from protein-protein interactions.
Zhao XM, Chen L, Aihara K.
Proteins (2010)

In this article, we propose a novel discriminative approach for predicting DDIs based on both protein–protein interactions (PPIs) and the derived information of non-PPIs.

FPPI: Fusarium graminearum protein-protein interaction database.
Zhao XM, Zhang XW, Tang WH, Chen L.
Journal of Proteome Research (2009)

F. graminearum protein−protein interaction (FPPI) database provides comprehensive information of protein−protein interactions (PPIs) of F. graminearum predicted based on both interologs from several PPI databases of seven species and domain−domain interactions experimentally determined based on protein structures.

Uncovering signal transduction networks from high-throughput data by integer linear programming.
Zhao XM, Wang RS, Chen L, Aihara K.
Nucleic Acids Research (2008)

In this article, we propose a novel method for uncovering signal transduction networks (STNs) by integrating protein interaction with gene expression data.

Gene function prediction using labeled and unlabeled data.
Zhao XM, Wang Y, Chen L, Aihara K.
BMC Bioinformatics (2008)

In this paper, we present a new technique, namely Annotating Genes with Positive Samples (AGPS), for defining negative samples in gene function prediction.

Protein classification with imbalanced data.
Zhao XM, Li X, Chen L, Aihara K.
Proteins (2008)

This article presents a new technique for protein classification with imbalanced data.