The GA4GH data model recommends the use of a default object hierarchy in standard and product design processes. While it reflects concepts from the original GA4GH schema, it provides mostly a structural guideline for API and data store design, but is not thought to provide a set of absolute implementation requirements.
The GA4GH data model for genomics recommends the use of a principle object hierarchy, consisting of
These basic definitions will be detailed further on.
Additional concepts (e.g. dataset, study …) may be added in the future.
In the design of genomics APIs, file formats and storage protocols, it is of relevance to adhere to a logical object structure which reflects physical reality and common data handling procedures.
At the core of many (human health related and other) databases and procedural systems is the concept of a “biosample”, representing the source of biological material on which some (molecular or other) analyses are being performed, leading to a set of observations (e.g. the genomic variants measured by Whole Genome Sequencing and called against a reference genome, in the DNA extracted from a tissue biopsy).
For a consistant API design, it is important to relate observations and measurement to the correct object in the data model’s hierarchy. A typical example human genomic data analysis is the association of phenotypic information to the type of biosample being analysed. For the association of genomic variants with a cancer diagnosis, it is of paramount importance to know if - for an individual with a cancer diagnosis - the observed variants were called from a germline biosample (i.e. analysis of cancer predisposition) or from a cancer tissue biosample (i.e. somatic mutation analysis).