Variant

From Original GA4GH schema

Status: proposed

Provenance

Used by

Authors

Schema source: YAML file

Properties of the Variant class

Property Type Format Description
alternate_bases string * one or more bases relative to start position of the reference genome, replacing the reference_bases value * for precise variants; normally not used for structural (e.g. DUP, DEL) alterations
biosample_id The optional identifier ("biosample.id") of the biosample this variant was reported from. This is a shortcut to using the variant -> callset -> biosample chaining.
callset_id string * The identifier ("callset.id") of the callset this variant is part of. * Optional, if another provenance method is provided (e.g. if variants are nested with the parental object as in a Phenopacket)
created timestamp The creation time of this record, in ISO8601
digest string * Concatenated unique specific elements of the variant. * Optional, convenience element to derive unique variants in "individual variant from callset" storage systems
end array int64 array of 0 (for presise sequence variants), 1 or 2 (for imprecise end position of structural variant) integers
genotype array list of strings, which represent the (phased) alleles in which the variant was being observed
id string * The local-unique identifier of this variant (referenced as "variant_id"). * Optional
info ./Info additional variant information, as defined in the example and accompanying documentation
mate_name string Mate name (chromosome) for fusion (BRK) events; otherwise left empty. Accepting values 1-22, X, Y.
reference_bases string one or more bases at start position in the reference genome, which have been replaced by the `alternate_bases` value
reference_name string Reference name (chromosome). Accepting values 1-22, X, Y.
start array int64 array of 1 or 2 (for imprecise end position of structural variant) integers
updated timestamp The time of the last edit of this record, in ISO8601
variant_type string the variant type in case of a named (structural) variant (e.g. DUP, DEL, BND ...)

Description

The document describes attributes of the variant object. In its current implementation, valiant (and related genomic objects such as callset) represent extended versions of the original, VCF-derived GA4GH schema. This format may be superseeded or augmented based on current developments in the GA4GH::GKS work stream.

Examples

{
   "biosample_id" : "fcl-bs-0099615",
   "callset_id" : "structdb-cs-nhl-0009876",
   "created" : "2017-10-25T07:06:03Z",
   "digest" : "8,14:20867740-21977798,21978106:BND",
   "end" : [
      "21977798",
      "21978106"
   ],
   "id" : "structdb-var-123456789",
   "mate_name" : 14,
   "reference_bases" : "N",
   "reference_name" : 8,
   "start" : [
      "20867740"
   ],
   "updated" : "2017-10-25T07:06:03Z",
   "variant_type" : "BND"
}
{
   "biosample_id" : "structdb-bs-nhl-0009876",
   "callset_id" : "structdb-cs-nhl-0009876",
   "created" : "2019-01-22T03:06:45Z",
   "digest" : "6:63450000,63550000-63450000,63550000:DEL",
   "end" : [
      "63450000",
      "63550000"
   ],
   "id" : "structdb-var-123456790",
   "info" : {
      "cnv_length" : 85500000,
      "cnv_value" : "-0.294"
   },
   "reference_bases" : "N",
   "reference_name" : 6,
   "start" : [
      "63450000",
      "63550000"
   ],
   "updated" : "2019-02-01T12:40:21Z",
   "variant_type" : "DEL"
}
{
   "alternate_bases" : "AC",
   "callset_id" : "DIPG_CS_0290",
   "created" : "2018-11-06T11:46:30.028Z",
   "digest" : "2:203420136:A>AC",
   "genotype" : [
      "1",
      "."
   ],
   "id" : "5be1840772798347f0ed9e8b",
   "reference_bases" : "A",
   "reference_name" : 2,
   "start" : [
      "203420136"
   ],
   "updated" : "2018-11-06T11:46:30.028Z"
}

Notes and examples on the Variant properties

alternate_bases
'alternate_bases' : "AC"
biosample_id
'biosample_id' : "pgx-bs-987647"
callset_id
'callset_id' : "PGX_AM_CS_GSM1690424"
created
'created' : "2017-10-25T07:06:03Z"
digest
'digest' : "4:12282-46465:DEL"
end
'end' : [
  21977798,
  21978106
]
db.variants.find( { "reference_name" : "9",  "variant_type" : "DEL", "start" : { $lteq : 21975098 }, "end" : { $gteq : 21967753 } } )
genotype
'genotype' : [
  '1',
  '.'
]
id
'id' : "amvar-8754-7751-1119-8539"
info
'info' : {
  'cnv_length' => 1205290,
  'cnv_value' => '-0.294'
}
mate_name
'mate_name' : "14"
reference_bases
'reference_bases' : "G"
reference_name
'reference_name' : "8"
start
'start' : [
  20867740
]
updated
'updated' : "2022-11-11T09:45:13Z"
variant_type
'variant_type' : "DEL"
Edit on Github...