Skip to content

CNV Annotation Formats

With the "dual origin" in cytogenetics ("chromosome based") and genomics ("sequencing based") analyses the annotation of copy number variants has evolved starting from different directions. This page summarizes some of the common annotation schemes, terminologies and file formats which have some application to genomic copy number variations.

Cytogenetics vs. Molecular Biology...

From the cytogenetic side the use of cytogenetic bands as coordinate system, has been amended by increasing use of mapping positions (i.e. for molecular-cytogenetic or hybrid analyses with known probe positions) while for array and sequencing based CNV detection an increasing focus lies in the determination of discrete allelic copy number counts and the assignment of a limited set of CNV classes reflecting common use concepts.

CNV Term Use Comparison in Computational (File/Schema) Formats

This table is maintained in parallel with the Beacon v2 documentation.

VRS proposal1
EFO:0030070 copy number gain DUP2 or
SO:0001742 copy_number_gain low-level gain (implicit) ⇒ EFO:0030070 copy number gain a sequence alteration whereby the copy number of a given genomic region is greater than the reference sequence
EFO:0030071 low-level copy number gain DUP2 or
SO:0001742 copy_number_gain low-level gainEFO:0030071 low-level copy number gain
EFO:0030072 high-level copy number gain DUP2 or
SO:0001742 copy_number_gain high-level gainEFO:0030072 high-level copy number gain commonly but not consistently used for >=5 copies on a bi-allelic genome region
EFO:0030073 focal genome amplification DUP2 or
SO:0001742 copy_number_gain high-level gainEFO:0030073 focal genome amplification commonly but not consistently used for >=5 copies on a bi-allelic genome region, of limited size (operationally max. 1-5Mb)
EFO:0030067 copy number loss DEL2 or
SO:0001743 copy_number_loss partial loss (implicit) ⇒ EFO:0030067 copy number loss a sequence alteration whereby the copy number of a given genomic region is smaller than the reference sequence
EFO:0030068 low-level copy number loss DEL2 or
SO:0001743 copy_number_loss partial lossEFO:0030068 low-level copy number loss
EFO:0020073 high-level copy number loss DEL2 or
SO:0001743 copy_number_loss partial lossEFO:0020073 high-level copy number loss a loss of several copies; also used in cases where a complete genomic deletion cannot be asserted
EFO:0030069 complete genomic deletion DEL2 or
SO:0001743 copy_number_loss complete lossEFO:0030069 complete genomic deletion complete genomic deletion (e.g. homozygous deletion on a bi-allelic genome region)
Last updated 2023-03-22 by @mbaudis (EFO:0020073)
updated 2023-03-20 by @mbaudis (VRS proposal)


Sine 1963, the International System for Human Cytogenetic Nomenclature (ISCN) has provided standards and guidelines for annotation of human karyotypes and cytogenetic abnormalities.

Recent editions have tried to accomodate for genomic variants derived from molecular and molecular-cytogenetics technologies such as FISH, genomic microarrays and DNA sequencing.

Examples (CNV)

  • 46,XX,trp(8)(q21q24)
  • ish cgh dim(17p12p11),enh(8)(q24)
    • chromosomal Comparativ Genomic Hybridization (CGH)
  • ISCN 2020 is the latest edition, available as book (Karger)



While VCF is a file format, originally developed (and optimised) for the representation of possibly recurring variants across a set of analyses, it also allows for the storage & representation of CNV events3.

Variant Schemas

GA4GH "Variant Representation" schema

The "Genomic Knowledge Standards" (GKS) of the Global Alliance for Genomics and Health GA4GH develops a modern schema for the unambiguous representation, transmission and recovery of sequence variants (genomic and beyond).

The first release of the [GA4GH Variation Representation Specification (vr-spec v1.0) does not yet include the option to represent structural variants. However, the internal roadmap of the project points towards an extension for CNV representation in 2020.

Ad-Hoc & "Community" Formats

Progenetix Variant schema

The Progenetix cancer genomics resource store their millions of CNVs in as data objects in MongoDB document databases. The format of the single variants is based on the original GA4GH schema (see above), with some extensions and modifications.

Development of the Progenetix format closely follows the work of the GA4GH GKS group adopts core objects and concepts from vr-spec repository.

The Progenetix data serves as the repository behind the Beacon+ forward looking implementation of the ELIXIR Beacon project.

Progenetix CNV example

  _id: ObjectId("5bab576a727983b2e00b8d32"),
  id: 'pgxvar-5bab576a727983b2e00b8d32',
  variant_internal_id: '11:52900000-134452384:DEL',
  callset_id: 'pgxcs-kftvldsu',
  biosample_id: 'pgxbs-kftva59y',
  individual_id: 'pgxind-kftx25eh',
  variant_state: { id: 'EFO:0030067', label: 'copy number loss' },
  type: 'RelativeCopyNumber',
  location: {
    sequence_id: 'refseq:NC_000011.10',
    type: 'SequenceLocation',
    interval: {
      start: { type: 'Number', value: 52900000 },
      end: { type: 'Number', value: 134452384 }
  relative_copy_class: 'partial loss',
  updated: '2022-03-29T14:36:47.454674'

  1. The VRS annotations refer to the status at v1.2 (2022). The GA4GH VRS team is currently (Spring 2023) preparing an updated specification which will introduce the new class CopyNumberChange (discussion...) with the use of the EFO terms (including a new term for high level deletion (EFO:0020073) in the April 2023 EFO release). 

  2. While the use of VCF derived (DUP, DEL) values had been introduced with beacon v1, usage of these terms has always been a recommendation rather than an integral part of the API. We now encourage the support of more specific terms (particularly EFO) by Beacon developers. As example, the Progentix Beacon API uses EFO terms but provides an internal term expansion for legacy DUP, DEL support. 

  3. VCFv4.4 introduces an SVCLAIM field to disambiguate between in situ events (such as tandem duplications; known adjacency/ break junction: SVCLAIM=J) and events where e.g. only the change in abundance / read depth (SVCLAIM=D) has been determined. Both J and D flags can be combined.