CNV Annotation Formats
Cytogenetics vs. Molecular Biology...¶
With the "dual origin" in cytogenetics ("chromosome based") and genomics ("sequencing based") analyses the annotation of copy number variants has evolved starting from different directions. From the cytogenetic side the use of cytogenetic bands as coordinate system, has been amended by increasing use of mapping positions (i.e. for molecular-cytogenetic or hybrid analyses with known probe positions) while for array and sequencing based CNV detection an increasing focus lies in the determination of discrete allelic copy number counts and the assignment of a limited set of CNV classes reflecting common use concepts.
CNV Term Use Comparison in Computational (File/Schema) Formats¶
This table is maintained in parallel with the Beacon v2 documentation.
EFO | Beacon | VCF | SO | GA4GH VRS1 | Notes |
---|---|---|---|---|---|
EFO:0030070 |
DUP 2 orEFO:0030070 |
DUP SVCLAIM=D 3 |
SO:0001742 copy_number_gain |
EFO:0030070 gain |
a sequence alteration whereby the copy number of a given genomic region is greater than the reference sequence |
EFO:0030071 low-level copy number gain |
DUP 2 orEFO:0030071 |
DUP SVCLAIM=D 3 |
SO:0001742 copy_number_gain |
EFO:0030071 |
|
EFO:0030072 high-level copy number gain |
DUP 2 orEFO:0030072 |
DUP SVCLAIM=D 3 |
SO:0001742 copy_number_gain |
EFO:0030072 |
commonly but not consistently used for >=5 copies on a bi-allelic genome region |
EFO:0030073 focal genome amplification |
DUP 2 orEFO:0030073 |
DUP SVCLAIM=D 3 |
SO:0001742 copy_number_gain |
EFO:0030072 |
commonly but not consistently used for >=5 copies on a bi-allelic genome region, of limited size (operationally max. 1-5Mb) |
EFO:0030067 copy number loss |
DEL 2 orEFO:0030067 |
DEL SVCLAIM=D 3 |
SO:0001743 copy_number_loss |
EFO:0030067 |
a sequence alteration whereby the copy number of a given genomic region is smaller than the reference sequence |
EFO:0030068 low-level copy number loss |
DEL 2 orEFO:0030068 |
DEL SVCLAIM=D 3 |
SO:0001743 copy_number_loss |
EFO:0030068 |
|
EFO:0020073 high-level copy number loss |
DEL 2 orEFO:0020073 |
DEL SVCLAIM=D 3 |
SO:0001743 copy_number_loss |
EFO:0020073 |
a loss of several copies; also used in cases where a complete genomic deletion cannot be asserted |
EFO:0030069 complete genomic deletion |
DEL 2 orEFO:0030069 |
DEL SVCLAIM=D 3 |
SO:0001743 copy_number_loss |
EFO:0030069 |
complete genomic deletion (e.g. homozygous deletion on a bi-allelic genome region) |
Last updated 2023-03-22 by @mbaudis (VRS 1.3 adjustment)¶
updated 2023-03-22 by @mbaudis (EFO:0020073) & 2023-03-20 by @mbaudis (VRS proposal)¶
ISCN¶
Sine 1963, the International System for Human Cytogenetic Nomenclature (ISCN) has provided standards and guidelines for annotation of human karyotypes and cytogenetic abnormalities.
Recent editions have tried to accomodate for genomic variants derived from molecular and molecular-cytogenetics technologies such as FISH, genomic microarrays and DNA sequencing.
Examples (CNV)¶
46,XX,trp(8)(q21q24)
ish cgh dim(17p12p11),enh(8)(q24)
- chromosomal Comparativ Genomic Hybridization (CGH)
Links¶
- ISCN 2020 is the latest edition, available as book (Karger)
HGVS¶
Links¶
- HGVS DNA Sequence Variant Nomenclature
VCF¶
While VCF is a file format, originally developed (and optimised) for the representation of possibly recurring variants across a set of analyses, it also allows for the storage & representation of CNV events.
Links¶
- VCF specification v4.2 PDF
Variant Schemas¶
GA4GH "Variant Representation" schema¶
The "Genomic Knowledge Standards" (GKS) of the Global Alliance for Genomics and Health GA4GH develops a modern schema for the unambiguous representation, transmission and recovery of sequence variants (genomic and beyond).
The first release of the [GA4GH Variation Representation Specification (vr-spec v1.0) does not yet include the option to represent structural variants. However, the internal roadmap of the project points towards an extension for CNV representation in 2020.
Links¶
- vr-spec repository
- documentation
Ad-Hoc & "Community" Formats¶
Progenetix Variant
schema¶
The Progenetix cancer genomics resource store their millions of CNVs
in as data objects in MongoDB document databases. The
format of the single variants is based on the Beacon v2 default model with some
modifications (e.g. incorporating the VRS 1.3 RelativeCopyNumber
concept but
w/ slightly rewrapped components).
The Progenetix data serves as the repository behind the Beacon+ forward looking implementation of the ELIXIR Beacon project. Accordingly, upon export through the API variants are re-mapped to a Beacon v2 representation.
Progenetix CNV example¶
{
"id": "pgxvar-5bab576a727983b2e00b8d32",
"variant_internal_id": "11:52900000-134452384:DEL",
"callset_id": "pgxcs-kftvldsu",
"biosample_id": "pgxbs-kftva59y",
"individual_id": "pgxind-kftx25eh",
"variant_state": { "id": "EFO:0030067", "label": "copy number loss" },
"type": "RelativeCopyNumber",
"location": {
"sequence_id": "refseq:NC_000011.10",
"chromosome": "11",
"type": "SequenceLocation",
"interval": { "start": 52900000, "end": 134452384 }
}
}
"updated": "2022-03-29T14:36:47.454674"
}
Links¶
- schema in progenetix/bycon code repository
-
The VRS annotations refer to the status from v1.3 (2022) when the new class
CopyNumberChange
(discussion...) with the use of the EFO terms. ↩ -
While the use of VCF derived (
DUP
,DEL
) values had been introduced with beacon v1, usage of these terms has always been a recommendation rather than an integral part of the API. We now encourage the support of more specific terms (particularly EFO) by Beacon developers. As example, the Progentix Beacon API uses EFO terms but provides an internal term expansion for legacyDUP
,DEL
support. ↩↩↩↩↩↩↩↩ -
VCFv4.4 introduces an
SVCLAIM
field to disambiguate between in situ events (such as tandem duplications; known adjacency/ break junction:SVCLAIM=J
) and events where e.g. only the change in abundance / read depth (SVCLAIM=D
) has been determined. Both J and D flags can be combined. ↩↩↩↩↩↩↩↩ -
VRS did not adopt the "amplification" term due to possible inconsistencies ↩