ANALYZING GENOMIC DIVERSITY OF CANDIDATUS LIBERIBACTER BY PAN-GENOME CONSTRUCTION - Sarah N Batarseh Logo

ANALYZING GENOMIC DIVERSITY OF CANDIDATUS LIBERIBACTER BY PAN-GENOME CONSTRUCTION - Sarah N Batarseh

Sarah N Batarseh

University of California Irvine, Irvine, CA

Candidatus Liberibacter is a group of bacterial species that cause diseases in plants, such as Huanglongbing disease of citrus trees and Zebra Chip in potatoes, through obligate intracellular pathogenesis of the phloem. Although it is known that Candidatus Liberibacter species are highly divergent, pan-genomes of the individual species and of the entire genus have not been constructed or analyzed. We ask what is the composition of the pan-genome between all species of Candidatus Liberibacter and of the different individual species? We also ask what genes, if any, are under selection? We hypothesize that the genetic diversity between Candidatus Liberibacter species will be reflected in the construction of their individual pan-genomes. Furthermore, we hypothesize that genes that lead to virulence, such as sec-dependent effectors, may be under positive selection, because they constantly evolve to invade the host. To address these questions, we retrieved 42 genomes from the NCBI database, encompassing many pathogenic, unculturable species (C. Liberibacter asiaticus, solanacearum, africanus, americanus, europaeus), and one nonpathogenic, culturable species (C. crescens). After annotating the genomes using Prokka, we used the Roary pipeline to construct a pan-genome of all Candidatus Liberibacter species, with Bartonella bacilliformis as an appropriate outgroup. The pan-genome consisted of 242 core genes (99% < strains < 100%), 285 soft-core (95% < strains < 99%), 824 shell genes (15% < strains < 95%), and 3119 cloud genes (0% < strains < 15%). Through multiple sequence alignment with PRANK, a maximum-likelihood phylogenetic tree was generated through RAxML utilizing all core genes. We categorized the functions of core and accessory genes using the EggNOG database. The core genes consisted of many genes integral in essential cell function, with 30.8% of core genes involved in transcription and translation, but the accessory genes spanned more variable functions, including 25 virulent genes. We utilized the PAML/codeml package to calculate the ratio of substitution rates, dN/dS, between core genes. On average, the core genes had dN/dS=0.0916, suggesting most core genes are under extreme constraint. The gene cdsA involved in cell membrane formation and pcs involved in biofilm formation had the highest dN/dS values of 0.31797, 0.59136, respectively. Pan-genome analyses at the species-level revealed variations in the accessory genomes. Interesting patterns emerged in the pan-genomes of americanus and asiaticus in which we found the shell genes account for 81.5% (N=1913) and 82.6% (N=2075) of genes, respectively. Analysis of the composition of the pan-genome between Candidatus Liberibacter species can provide insight into the genetic diversity of each species. Understanding the similarities and disparities between species of Candidatus Liberibacter, along with which genes are under selection, can lend to better understanding of the mechanisms and the management of plant disease.