On March 3rd 2023, the leading international journal Cell published online the research paper entitled “Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation”. The research was completed by an international team led by Shao-Hua Fan (first author) of the School of Life Sciences and Human Phenome Institute at Fudan University and Sarah A. Tishkoff (corresponding author) of the Department of Genetics at the University of Pennsylvania.
The team spent last 7 years analyzing the whole genome sequencing data of 180 individuals from 12 representative African populations. These ethnic groups have a wide range of geographical distribution, subsistence styles, such as farming, hunting, gathering and pastoralism, and speak the languages covering 4 major language phyla in Africa.
Modern humans originated in Africa and have inhabited in Africa longer than in any other continents. Africans have the greatest level of genetic and phenotypic diversity in the world. In addition, nearly one-third of all modern human languages are spoken in Africa, making it one of the most linguistically diverse regions in the world.
However, Africans are underrepresented in current genetic and genomic research. Studying the genetic diversity of African populations is crucial to understand the origin, ancient genetic structure and adaptive evolution of modern humans, but also will provide the basis for more precise human medicine.
Introgression and severe population bottlenecks have resulted in low genetic diversity in some African populations
Based on the human reference genome, the study identified 32 million single nucleotide polymorphism (SNP) sites, of which approximately 5.3 million were previously undiscovered. These newly discovered mutations were widely found in genomic functional regions such as enhancers, promoters, and transcription factor binding sites.
The study found significant differences in both the average number of SNPs and genetic diversity among the 12 African populations analyzed. The San (Ju|’hoansi and !Xoo) and rainforest hunter-gatherers (RHG), who rely on hunting and gathering for living, had the most SNPs and highest genetic diversity, while the genetic diversity of populations such as Amhara, Fulani, Chabu, and Hadza was the lowest. The main reasons for the low genetic diversity in some African populations are pervasive gene flow from outside Africa (into the Amhara and Fulani populations) and severe population bottlenecks (in the Hadza and Chabu populations). For example, the population size of the Hadza and Chabu groups, who live in Tanzania and Ethiopia, respectively, is already less than 1,000 people.
The common ancestor of the San and the RHG diverged earliest in evolution
Scientists used the neighbor-joining method to systematically compared the 12 ethnic groups in this study with Northern Europeans from Utah (CEU), Han Chinese in Beijing, China and Toscani in Italy (TSI) from the “1,000 Genomes Project”, and Papuan samples from the “Simons Genome Diversity Project (SGDP)”. The results showed that the genetic ancestor of the San was the earliest branch in the evolution of modern humans, followed by the ancestor of the rainforest hunter-gatherers (RHG). The early divergence of the ancestors of modern San and RHG people was also evidenced by principal component analysis (PCA) and ADMIXTURE.
The study found that the clustering patterns of different ethnic groups in the neighbor-joining phylogenic tree were significantly related to their current geographical environments, indicating that the geographical environment is an important factor that constrains gene flow between different ethnic groups.
Due to factors such as gene flow and genetic recombination, the neighbor-joining method, principal component analysis, and ADMIXTURE mentioned above cannot be used to reconstruct the early genetic structure of modern humans. However, when the scientists incorporated all these factors into a complex model for calculation and analysis, they found that the common ancestor of the San and pygmies, rather than the ancestor of the San, was the earliest branch of modern humans. The study also inferred that the divergence of modern humans occurred 280,000 years ago, which is consistent with previous archaeological and ancient DNA-based estimates of time.
The scientists, through PCA and ADMIXTURE analysis, also discovered San-related ancestral components in the genomes of the Hadza and Sandawe who currently speak Khoesan in Tanzania. Through PCA analysis, 55 previously published ancient African samples from different regions were projected to modern African populations. The results showed that though Khoesan is now spoken only by the San in southern Africa and the Hadza and Sandawe in Tanzania, a large number of ancient Africans from different regions are related to the San and the Hadza/Sandawe in East Africa, even though there are currently no ethnic groups speaking Khoesan languages in these regions. This suggests that the Khoesan-speaking Africans once spread widely throughout the continent, but this population became extinct perhaps due to the Bantu migration.
Key mutations responsible for lighter skin color of the San
Moreover, based on positive selection, functional annotation, and results from whole-genome sequencing, the team studied 12 indigenous African populations that are specific to African ethnic groups, providing a comprehensive picture of the adaptive evolution of African populations. Skin color diversity is an important hallmark of adaptive evolution in modern humans across different geographical regions. The San are the lightest-skinned ethnic group among African populations, but the genetic basis for their light skin color is still poorly understood.
Using functional genomics and other methods, this study identified intronic mutations of PDPK1, as a decisive factor affecting the light skin color of the San. This enhancer is only active in melanocytes. CRISPRi of the enhancer inhibits PDPK1 gene expression and decreases the melanin level in MNT-1 cells. The study further found that the frequency of rs77665059-C is over 80% in some San populations, but very low in other African populations (average frequency of 14%) and non-African populations (average frequency of 3%). rs77665059-C inhibits the activity of the enhancer in melanoma cell lines, reduces the expression of PDPK1, and is significantly associated with the light skin color phenotype in the San, providing a new genetic explanation for the evolution of their light skin color.
Presented by Fudan University Media Center
Writer: Bogdan Zabarov
Editor: Li Yijie, Wang Mengqi