Variant

From Original GA4GH schema, with modifications

Status: proposed

Provenance

Used by

Authors

Schema source: YAML file

Properties of the Variant class

Property Type Format Description
alternate_bases string one or more bases relative to start position of the reference genome,replacing the reference_bases value; for precise variants
biosample_id The identifier ("biosample.id") of the biosample this variant was reported from. This is a shortcut to using the variant -> callset -> biosample chaining.
callset_id string The identifier ("callset.id") of the callset this variant is part of.
created timestamp The creation time of this record, in ISO8601
digest string concatenated unique specific elements of the variant
end array int64 * array of 0 (for presise sequence variants), 1 or 2 (for imprecise end position of structural variant) integers * this corresponds to the end position and [CIEND] interval in VCF * some implementations express this concept in a modified form (e.g. Beacon v1.0 uses separate values instead of an array for position bracketing)
genotype array list of strings, which represent the (phased) alleles in which the variant was being observed
id string The local-unique identifier of this variant (referenced as "variant_id").
info additional variant information, as defined in the example and accompanying documentation
mate_name string Mate name (chromosome) for fusion (BND) events; otherwise left empty. Accepting values 1-22, X, Y.
reference_bases string one or more bases at start position in the reference genome, which have been replaced by the alternate_bases value; for precise variants
reference_name string Reference name (chromosome). Accepting values 1-22, X, Y.
start array int64 * array of 1 or 2 (for imprecise end position of structural variant) integers * this corresponds to the position and [CIPOS] interval in VCF * some implementations express this concept in a modified form (e.g. Beacon v1.0 uses separate values instead of an array for position bracketing)
updated timestamp The time of the last edit of this record, in ISO8601
variant_type string the variant type in case of a named (structural) variant (e.g. DUP, DEL, BND ...)

Description

The document describes attributes of the variant object. In its current implementation, variant (and related genomic objects such as callset) represent extended versions of the original, VCF-derived GA4GH schema. This format may be superseeded or augmented based on current developments in the GA4GH::GKS work stream.

Examples

{
   "alternate_bases" : "AC",
   "biosample_id" : "biosample_id",
   "callset_id" : "callset_id",
   "created" : "2017-10-25T07:06:03Z",
   "digest" : "digest",
   "end" : [
      "21977798",
      "21978106"
   ],
   "genotype" : [
      "1",
      "."
   ],
   "id" : "id",
   "info" : {
      "cnv_length" : 1205290,
      "cnv_value" : "-0.294"
   },
   "mate_name" : 14,
   "reference_bases" : "G",
   "reference_name" : 8,
   "start" : [
      "20867740"
   ],
   "updated" : "2017-10-25T07:06:03Z",
   "variant_type" : "DEL"
}

Notes and examples on the Variant properties

alternate_bases
'alternate_bases' : "AC"
biosample_id
'biosample_id' : "pgx-bs-987647"
callset_id
'callset_id' : "PGX_AM_CS_GSM1690424"
created
'created' : "2017-10-25T07:06:03Z"
digest
'digest' : "4:12282-46465:DEL"
end
'end' : [
  21977798,
  21978106
]
genotype
'genotype' : [
  '1',
  '.'
]
id
'id' : "amvar-8754-7751-1119-8539"
info
'info' : {
  'cnv_length' => 1205290,
  'cnv_value' => '-0.294'
}
mate_name
'mate_name' : "14"
reference_bases
'reference_bases' : "G"
reference_name
'reference_name' : "8"
start
'start' : [
  20867740
]
updated
'updated' : "2022-11-11T09:45:13Z"
variant_type
'variant_type' : "DEL"
Edit on Github...