A CNIO team reduces the size of the human genome to 19,000 genes.
How nutrients are metabolised and how neurons communicate in the brain are just some of the messages coded by the 3 billion letters that make up the human genome. The detection and characterisation of the genes present in this mass of information is a complex task that has been a source of ongoing debate since the first systematic attempts by the Human Genome Project more than ten years ago.
A study at the Spanish National Cancer Research Centre (CNIO) updates the number of human genes, those that can generate proteins, to 19,000; 1,700 fewer than the genes in the most recent annotation, and well below the initial estimations of 100,000 genes. The work concludes that almost all of these genes have ancestors prior to the appearance of primates 50 million years ago.
The shrinking human genome is how the team describes the continuous corrections to the numbers of the protein-coding genes in the human genome over the years that has culminated in the approximately 19,000 human genes described in the present work. The coding part of the genome [which produces proteins] is constantly moving. No one could have imagined a few years ago that such a small number of genes could make something so complex.
The scientists began by analysing proteomics experiments; proteomics is the most powerful tool to detect protein molecules. In order to determine a map of human proteins the researchers integrated data from seven large-scale mass spectrometry studies, from more than 50 human tissues, in order to verify which genes really do produce proteins.
The results brought to light just over 12,000 proteins and the researchers mapped these proteins to the corresponding regions of the genome. They analysed thousands of genes that were annotated in the human genome, but that did not appear in the proteomics analysis and concluded that 1,700 of the genes that are supposed to produce proteins almost certainly do not for various reasons, either because they do not exhibit any protein coding features, or because the conservation of their reading frames does not support protein coding ability.
One hypothesis derived from the study is that more than 90% of human genes produce proteins that originated in metazoans or multicellular organisms of the animal kingdom hundreds of millions of years ago; the figure is over 99% for those genes whose origin predates the emergence of primates 50 million years ago.
The figures indicate that the differences between humans and primates at the level of genes and proteins are very small. The number of new genes that separate humans from mice [those genes that have evolved since the split from primates] may even be fewer than ten. This contrasts with the more than 500 human genes with origins since primates that can be found in the current annotation. The researchers conclude, that the physiological and developmental differences between primates are likely to be caused by gene regulation rather than by differences in the basic functions of the proteins in question.
The sources of human complexity lie more in how genes are used rather than on the number of genes, in the thousands of chemical changes that occur in proteins or in the control of the production of these proteins by non-coding regions of the genome, which comprise 90% of the entire genome and which have been described in the latest findings of the international ENCODE project, a Project in which the team participates.
The work brings the number of human genes closer to other species such as the nematode worms Caenorhabditis elegans, worms that are just 1mm long, but apparently less complex than humans. But the team prefers not to make comparisons. The human genome is the best annotated, but the researchers still believe that 1,700 genes may have to be re-annotated. The work suggests that they will have to redo the calculations for all genomes, not only the human genome.
The research results are part of GENCODE, a consortium which is integrated into the ENCODE Project and formed by research groups from around the world whose task is to provide an annotation of all the gene-based elements in the human genome.
The data are being discussed by GENCODE for incorporation into the new annotations. When this happens it will redefine the entire mapping of the human genome, and how it is used in macro projects such as those for cancer genome analysis.
Get Healthinnovations delivered to your inbox: