Skip to content

Types of Genomic Copy Number Variations

Classification of CNVs based on CN level and genomic size

Genomic copy number variations can affect the production of proteins through their modification of the "dosage" (i.e. number of alleles) of genes covered by the CNV as well as through the (in)complete disruption of coding or regulatory elements. The supposed effects depend on the magnitude of change, i.e. the number of gained or lost copies. Especially in cancer cells the "normal" allele count for the given genomic region has to be considered: Cancer genomes may have undergone general anaeuplodization events with preceding and/or followed by regional CNV events.

Copy number changes can be expressed as absolute (total allele count) and relative (e.g. uncalibrated CN measurements such as log2 ratio) values. Empirically - and to overcome a number of issues of exact CN count calibrations - a classification into a number CN types has become "standard operating procedure" with details differing to some extent. This post attempts to summarize some previous annotation practices and emerging standards for CNV annotation, especially with respect to somatic CNVs and cancer genomics.

Additional measures which have been used in some of those classification attempts have been:

  1. the locality of additional copies of genomic regions in gain CNVs
  2. the size of the CNV events
  3. an overall context or inferred mechanism

Here we will mostly consider 2 - the CNV size - where it makes sense; the locality or overall mechanism (such as "chromothripsis", "katagesis" etc.) constitute special cases which in some cases (e.g. many tandem duplications) can be considered gene specific structural alterations more than "dosage events".

Regarding the CNV size, CNVs can broadly be separated into "cytogenetic" or "arm-level" changes1 which arise from errors in chromosomal distribution during cellular division with or without structural chromosomal rearrangements, and "focal" CNVs based on DNA replication errors with smaller (i.e. [sub-]megabase) deletions or (intra or extrachromosomal) multi-copy duplications2. While the role of disease-related, arm- and chromosome-level CNVs in cancer still is poorly understood, focal CNVs frequently involve the genomic locations of genes with known involvement in neoplastic processes.

Importantly, the magnitude of these CNV size classes may differ. All arm-level gain CNVs and many focal CNVs have a low divergence in CN count while some focal CNVs can reach very high relative magnitude, such as an amplification with tens or hundreds of copies. However, the definition of genomic amplification is ambiguous. Generally, it means multiple copies of chromosomal segments, but the exact copy number threshold to define amplification is not consistent in multiple studies, varying from 4 to 9 or more (Table 1).

For deletions, the definition of high-magnitude deletion is clear. It usually represents a complete loss of genomic elements and relevant functions.

Purported Effects of CNV "Classes" in Oncogenomics

The observation that high-amplitude focal CNVs often occur in specific regions where tumor oncogenes and suppressor genes locate indicates that this type of CNV is possible to contain the optimal chromosomal segments for promotion of cancer development and thus has great power to discover cancer-related genes. The prevalence of low-amplitude chromosomal arm-level CNVs in cancer patients shows a close relationship between CNV and molecular properties of cancer cells. The length and amplitude of somatic CNVs determine distinct classes of somatic CNVs. They are different in occurrence frequency and impact on genomic components. A better understanding of both types of somatic CNVs is important and required. Currently, there are several standards and ontologies to specify different classes of CNV. More details can be found in the comparison table 2 (taken from the Beacon v2 documentation).

Varying Definitions of "High Level" Gain CNVs

Study ID & link Study name Threshold (>=)
PMID:21494657 Network-Guided Analysis of Genes with Altered Somatic Copy Number and Gene Expression Reveals Pathways Commonly Perturbed in Metastatic Melanoma 4
PMID:17161620 Specificity, selection and significance of gene amplifications in cancer 5
PMID:32024823 High-coverage whole-genome analysis of 1220 cancers reveals hundreds of genes deregulated by rearrangement-mediated cis-regulatory alterations 5
PMID:35524475 Refinement of computational identification of somatic copy number alterations using DNA methylation microarrays illustrated in cancers of unknown primary 5
PMID:25719666 Whole genomes redefine the mutational landscape of pancreatic cancer 6
DOI:10.1036/ommbid.321 Gene Amplification in Human Cancers: Biological and Clinical Significance 7
PMID:25110350 Focal chromosomal copy number aberrations in cancer—Needles in a genome haystack 9
COSMIC database 5 if ploidy <= 2.7
9 if ploidy > 2.7
PMID:28445112 Tracking the Evolution of Non–Small-Cell Lung Cancer 2 x ploidy + 1
PMID:28104840 Tumor aneuploidy correlates with markers of immune evasion and with reduced response to immunotherapy logR >= 1

A CNV Classification Based on Relative CN Alterations

In 2021 - based on a recognized need to formalize the representation of relative CN value classes as a concept capturing general usage scenarios from cancer genomics and genetics while accommodating different data generation practices - we proposed a set of basic CNV classes for adoption by the Experimental Factor Ontology, following discussions with members of the hCNV community and the GA4GH Variant Representation Standard working group.

Term Use Comparison

Beacon VCF SO EFO VRS Notes
DUP DUP3 SO:0001742 copy_number_gain EFO:0030070 copy number gain low-level gain (implicit) a sequence alteration whereby the copy number of a given genomic region is greater than the reference sequence
DUP DUP3 SO:0001742 copy_number_gain EFO:0030071 low-level copy number gain low-level gain
DUP DUP3 SO:0001742 copy_number_gain EFO:0030072 high-level copy number gain high-level gain commonly but not consistently used for >=5 copies on a bi-allelic genome region
DUP DUP3 SO:0001742 copy_number_gain EFO:0030073 focal genome amplification high-level gain commonly but not consistently used for >=5 copies on a bi-allelic genome region, of limited size (operationally max. 1-5Mb)
DEL DEL3 SO:0001743 copy_number_loss EFO:0030067 copy number loss partial loss (implicit) a sequence alteration whereby the copy number of a given genomic region is smaller than the reference sequence
DEL DEL3 SO:0001743 copy_number_loss EFO:0030068 low-level copy number loss partial loss
DEL DEL3 SO:0001743 copy_number_loss EFO:0030069 complete genomic deletion complete loss complete genomic deletion (e.g. homozygous deletion on a bi-allelic genome region)

  1. This represents an approximation and doesn't try to address the - fascinating - biology of genomic rearrangements such as chromothripsis, katagesis etc. Maybe another time... 

  2. Beroukhim, Rameen, et al. "The landscape of somatic copy-number alteration across human cancers." Nature 463.7283 (2010): 899-905. 

  3. VCFv4.4 introduces an SVCLAIM field to disambiguate between in situ events (such as tandem duplications; known adjacency/ break junction: SVCLAIM=J) and events where e.g. only the change in abundance / read depth (SVCLAIM=D) has been determined. Both J and D flags can be combined.