Skip to content

Class: DataSet

A collection of related data items or records that are organized together in a common format or structure, to enable their computational manipulation as a unit.

URI: vacoreim:DataSet

classDiagram class DataSet InformationEntity <|-- DataSet DataSet : contributions DataSet --|> Contribution : contributions DataSet : dateAuthored DataSet : derivedFrom DataSet --|> InformationEntity : derivedFrom DataSet : description DataSet : extensions DataSet --|> Extension : extensions DataSet : id DataSet : identifiers DataSet : label DataSet : license DataSet : recordMetadata DataSet --|> RecordMetadata : recordMetadata DataSet : releaseDate DataSet : reportedIn DataSet --|> Document : reportedIn DataSet : specifiedBy DataSet --|> Method : specifiedBy DataSet : subtype DataSet --|> Coding : subtype DataSet : type DataSet : urls DataSet : version DataSet : xrefs

Inheritance

Slots

Name Cardinality and Range Description Inheritance
subtype 0..1
Coding
A specific type of data set the DataSet object represents. direct
releaseDate 0..1
DateTime
The date when a version-level Data Set was released. direct
license 0..1
String
The type of license that dictates legal permissions for how the Data Set can be used - typically referenced by its URL. direct
version 0..1
String
The version of the Data Set (use this field used in cases where version is not reflected in an identifier associated with the Data Set) direct
contributions 0..*
Contribution
A specific contribution made by some Agent to the creation, modification, or validation of the information represented in the Information Entity. InformationEntity
dateAuthored 0..1
DateTime
The date the information content expressed in this entity was generated. InformationEntity
specifiedBy 0..*
Method
A particular plan specification that describes all or part of the process that led to creation of the reported information (e.g. a specific experimental protocol, analysis specification, cohort selection criteria, interpretation guideline, etc.). InformationEntity
derivedFrom 0..*
InformationEntity
An information resource from which the Information Entity is derived, in whole or in part InformationEntity
reportedIn 0..*
Document
A document in which the information content carried by the Information Entity is expressed InformationEntity
id 1..1
Identifier
The logical identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system. Entity
identifiers 0..*
Identifier
A business identifier or accession number for the entity, typically as provided by an external system or authority, that is globally unique and persists across implementing systems. Entity
label 0..1
String
A primary name for the Entity. Entity
urls 0..*
Url
The URL/web address of a digital resource representing the entity, or providing information about it. Entity
xrefs 0..*
String
Cross-references to database identifier(s) representing the same (or a closely related) entity or concept as the Entity. Entity
recordMetadata 0..1
RecordMetadata
A reusable structure that encapsulates provenance metadata about the present record/data object (as opposed to provenance information about the real world entity this record/data object represents). Entity
type 1..1
Class
The schema class that is instantiated by the data object. Must be the name of a class from the VA schema. Element
description 0..1
String
A free text description of the Element. Element
extensions 0..*
Extension
A key-value data structure that allows definition of custom fields to capture information not directly supported by the VA specification. Element

Usages

used by used in type used
StudyResult derivedFrom range DataSet

Comments

  • Instances of this class represent a specific version of a given dataset (e.g. gnomAD v3.1.2). At present, the model does not support 'Summary Level' or 'Distribution Level' representations (see https://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels).

Examples include the Broad ExAC dataset on allele population frequency, or a SIFT dataset of computational predictions functional impact for a set of variants, or the contents of a VCF file that describes variations observed in a particular patient and various annotations made on them.

Identifier and Mapping Information

Schema Source

  • from schema: https://w3id.org/ga4gh-va-core-im

Mappings

Mapping Type Mapped Value
self vacoreim:DataSet
native vacoreim:DataSet

LinkML Source

Direct

name: DataSet
description: A collection of related data items or records that are organized together
  in a common format or structure, to enable their computational manipulation as a
  unit.
title: Data Set
comments:
- 'Instances of this class represent a specific version of a given dataset (e.g. gnomAD
  v3.1.2). At present, the model does not support ''Summary Level'' or ''Distribution
  Level'' representations (see https://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels).


  Examples include the Broad ExAC dataset on allele population frequency, or a SIFT
  dataset of computational predictions functional impact for a set of variants, or
  the contents of a VCF file that describes variations observed in a particular patient
  and various annotations made on them.'
from_schema: https://w3id.org/ga4gh-va-core-im
is_a: InformationEntity
slots:
- subtype
- releaseDate
- license
- version
slot_usage:
  subtype:
    name: subtype
    description: A specific type of data set the DataSet object represents.
    comments:
    - The recorded type is typically based on the nature of the data in the dataset,
      and what it describes (e.g. genomic sequence dataset, genome feature annotation
      dataset) .
    multivalued: false
    domain_of:
    - DataItem
    - DataSet
    - Document
    - Statement
    - StudyResult
    - EvidenceLine
    - Method
    - Activity
    - Agent
    - Proposition
    range: Coding
    required: false
  releaseDate:
    name: releaseDate
    description: The date when a version-level Data Set was released.
    comments:
    - This attribute may apply to version and distribution-level Data Set representations.
    multivalued: false
    domain_of:
    - DataSet
    range: DateTime
    required: false
  license:
    name: license
    description: The type of license that dictates legal permissions for how the Data
      Set can be used - typically referenced by its URL.
    comments:
    - 'The VA Model can record license information for resources such as Data Sets
      and Documents that get published and released as a unit for community use.

      This attribute may apply to summary, version, and distribution-level Data Set
      representations.'
    multivalued: false
    domain_of:
    - DataSet
    - Document
    - Method
    range: string
    required: false
  version:
    name: version
    description: The version of the Data Set (use this field used in cases where version
      is not reflected in an identifier associated with the Data Set)
    comments:
    - The VA Model can record version information for resources such as Data Sets
      and Documents that get published and released as a unit for community use. These
      may go through rounds of revisions that add or modify content, but don’t change
      the identity of the resource.
    multivalued: false
    domain_of:
    - DataSet
    - Document
    range: string
    required: false

Induced

name: DataSet
description: A collection of related data items or records that are organized together
  in a common format or structure, to enable their computational manipulation as a
  unit.
title: Data Set
comments:
- 'Instances of this class represent a specific version of a given dataset (e.g. gnomAD
  v3.1.2). At present, the model does not support ''Summary Level'' or ''Distribution
  Level'' representations (see https://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels).


  Examples include the Broad ExAC dataset on allele population frequency, or a SIFT
  dataset of computational predictions functional impact for a set of variants, or
  the contents of a VCF file that describes variations observed in a particular patient
  and various annotations made on them.'
from_schema: https://w3id.org/ga4gh-va-core-im
is_a: InformationEntity
slot_usage:
  subtype:
    name: subtype
    description: A specific type of data set the DataSet object represents.
    comments:
    - The recorded type is typically based on the nature of the data in the dataset,
      and what it describes (e.g. genomic sequence dataset, genome feature annotation
      dataset) .
    multivalued: false
    domain_of:
    - DataItem
    - DataSet
    - Document
    - Statement
    - StudyResult
    - EvidenceLine
    - Method
    - Activity
    - Agent
    - Proposition
    range: Coding
    required: false
  releaseDate:
    name: releaseDate
    description: The date when a version-level Data Set was released.
    comments:
    - This attribute may apply to version and distribution-level Data Set representations.
    multivalued: false
    domain_of:
    - DataSet
    range: DateTime
    required: false
  license:
    name: license
    description: The type of license that dictates legal permissions for how the Data
      Set can be used - typically referenced by its URL.
    comments:
    - 'The VA Model can record license information for resources such as Data Sets
      and Documents that get published and released as a unit for community use.

      This attribute may apply to summary, version, and distribution-level Data Set
      representations.'
    multivalued: false
    domain_of:
    - DataSet
    - Document
    - Method
    range: string
    required: false
  version:
    name: version
    description: The version of the Data Set (use this field used in cases where version
      is not reflected in an identifier associated with the Data Set)
    comments:
    - The VA Model can record version information for resources such as Data Sets
      and Documents that get published and released as a unit for community use. These
      may go through rounds of revisions that add or modify content, but don’t change
      the identity of the resource.
    multivalued: false
    domain_of:
    - DataSet
    - Document
    range: string
    required: false
attributes:
  subtype:
    name: subtype
    description: A specific type of data set the DataSet object represents.
    comments:
    - The recorded type is typically based on the nature of the data in the dataset,
      and what it describes (e.g. genomic sequence dataset, genome feature annotation
      dataset) .
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: false
    alias: subtype
    owner: DataSet
    domain_of:
    - DataItem
    - DataSet
    - Document
    - Statement
    - StudyResult
    - EvidenceLine
    - Method
    - Activity
    - Agent
    - Proposition
    range: Coding
    required: false
  releaseDate:
    name: releaseDate
    description: The date when a version-level Data Set was released.
    comments:
    - This attribute may apply to version and distribution-level Data Set representations.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: false
    alias: releaseDate
    owner: DataSet
    domain_of:
    - DataSet
    range: DateTime
    required: false
  license:
    name: license
    description: The type of license that dictates legal permissions for how the Data
      Set can be used - typically referenced by its URL.
    comments:
    - 'The VA Model can record license information for resources such as Data Sets
      and Documents that get published and released as a unit for community use.

      This attribute may apply to summary, version, and distribution-level Data Set
      representations.'
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: false
    alias: license
    owner: DataSet
    domain_of:
    - DataSet
    - Document
    - Method
    range: string
    required: false
  version:
    name: version
    description: The version of the Data Set (use this field used in cases where version
      is not reflected in an identifier associated with the Data Set)
    comments:
    - The VA Model can record version information for resources such as Data Sets
      and Documents that get published and released as a unit for community use. These
      may go through rounds of revisions that add or modify content, but don’t change
      the identity of the resource.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: false
    alias: version
    owner: DataSet
    domain_of:
    - DataSet
    - Document
    range: string
    required: false
  contributions:
    name: contributions
    description: A specific contribution made by some Agent to the creation, modification,
      or validation of the information represented in the Information Entity.
    comments:
    - This attribute points to a Contribution object, which holds a structured description
      of the actions taken by a particular agent in contributing to an Information
      Entity.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: true
    alias: contributions
    owner: DataSet
    domain_of:
    - InformationEntity
    - RecordMetadata
    range: Contribution
    required: false
  dateAuthored:
    name: dateAuthored
    description: The date the information content expressed in this entity was generated.
    comments:
    - We use the term 'authored' to refer to the generation of information in the
      abstract - i.e. the information content expressed in a Statement, not a concrete
      encoding of it in a specific language or format.  The 'dateAuthored' attribute
      captures when this abstract information content was first created.  Information
      about when a particular concrete encoding of this information was created (e.g.
      as a VA-based json object) would live in a RecordMetadata object attached to
      the Information Entity).
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: false
    alias: dateAuthored
    owner: DataSet
    domain_of:
    - InformationEntity
    range: DateTime
    required: false
  specifiedBy:
    name: specifiedBy
    description: A particular plan specification that describes all or part of the
      process that led to creation of the reported information (e.g. a specific experimental
      protocol, analysis specification, cohort selection criteria, interpretation
      guideline, etc.).
    comments:
    - 'This field captures specific instances of specifications / methods (vs the
      ''methodTypes'' attribute which captures types of methods applied).

      In various domains and communities, such specifications may be called ''protocols'',
      ''guidelines'', ''methods'', ''rule sets'', etc.'
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: true
    alias: specifiedBy
    owner: DataSet
    domain_of:
    - InformationEntity
    - Activity
    range: Method
    required: false
  derivedFrom:
    name: derivedFrom
    description: An information resource from which the Information Entity is derived,
      in whole or in part
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: true
    alias: derivedFrom
    owner: DataSet
    domain_of:
    - InformationEntity
    - StudyResult
    - RecordMetadata
    range: InformationEntity
    required: false
  reportedIn:
    name: reportedIn
    description: A document in which the information content carried by the Information
      Entity is expressed
    comments:
    - "This attribute is used specifically to reference documents/publications where\
      \ the Information Entity is expressed or reported.  For a Statement, this might\
      \ be a publication where the authors express the statement in text.  For a Data\
      \ Item, this might be a publication with a table or figure that reports the\
      \ value of the  data.  \nNote that the VA-Spec provide separate attributes for\
      \ describing different types of 'references' from an Information Entity to a\
      \ Document (e.g. hasEvidenceFromSource is used by a Statement to reference a\
      \ Document that provided evidence for the knowldege the Statement expresses."
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: true
    alias: reportedIn
    owner: DataSet
    domain_of:
    - InformationEntity
    range: Document
    required: false
  id:
    name: id
    description: The logical identifier of the entity in the system of record, e.g.
      a UUID.  This 'id' is unique within a given system. The identified entity may
      have a different 'id' in a different system.
    comments:
    - FHIR naming conventions are followed here, where an 'id' field holds logical
      identifiers which are unique only within a given system, and an 'identifier'
      field holds business identifiers, which are globally unique and used to connect
      entities and share content across systems.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: false
    alias: id
    owner: DataSet
    domain_of:
    - Entity
    range: Identifier
    required: true
  identifiers:
    name: identifiers
    description: A business identifier or accession number for the entity, typically
      as provided by an external system or authority, that is globally unique and
      persists across implementing systems.
    comments:
    - FHIR naming conventions are followed here, where an 'id' field holds logical
      identifiers which are unique only within a given system, and an 'identifier'
      field holds business identifiers, which are globally unique and used to connect
      entities and share content across systems.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: true
    alias: identifiers
    owner: DataSet
    domain_of:
    - Entity
    range: Identifier
    required: false
  label:
    name: label
    description: A primary name for the Entity.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: false
    alias: label
    owner: DataSet
    domain_of:
    - Entity
    - Coding
    range: string
    required: false
  urls:
    name: urls
    description: The URL/web address of a digital resource representing the entity,
      or providing information about it.
    comments:
    - This attribute is meant to point directly to locations on the web where more
      information about the Entity can be found.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: true
    alias: urls
    owner: DataSet
    domain_of:
    - Entity
    range: Url
    required: false
  xrefs:
    name: xrefs
    description: Cross-references to database identifier(s) representing the same
      (or a closely related) entity or concept as the Entity.
    comments:
    - Preferred values for this field are CURIEs or URLs for database records - so
      the system that provisioned the identifier is clear.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: true
    alias: xrefs
    owner: DataSet
    domain_of:
    - Entity
    range: string
    required: false
  recordMetadata:
    name: recordMetadata
    description: A reusable structure that encapsulates provenance metadata about
      the present record/data object  (as opposed to provenance information about
      the real world entity this record/data object represents).
    comments:
    - Record-level metadata applies to a specific concrete encoding/serialization
      of information (e.g as a record in a specific knowlegebase, or an online digital
      resource). A RecordMetadata object can capture when, how, and by whom a specific
      record was generated or modified; what upstream resources it was derived/retrieved
      from; and record-level administrative information such as versioning and system
      / lifecycle status.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: false
    alias: recordMetadata
    owner: DataSet
    domain_of:
    - Entity
    range: RecordMetadata
    required: false
  type:
    name: type
    description: The schema class that is instantiated by the data object.  Must be
      the name of a class from the VA schema.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: false
    alias: type
    owner: DataSet
    domain_of:
    - Element
    range: Class
    required: true
  description:
    name: description
    description: A free text description of the Element.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: false
    alias: description
    owner: DataSet
    domain_of:
    - Element
    - Extension
    range: string
    required: false
  extensions:
    name: extensions
    description: A key-value data structure that allows definition of custom fields
      to capture information not directly supported by the VA specification.
    comments:
    - The VA-Spec provides  implementers the ability to extend any model elements
      with new attributes using this flexible Extension element.
    from_schema: https://w3id.org/ga4gh-va-core-im
    rank: 1000
    multivalued: true
    alias: extensions
    owner: DataSet
    domain_of:
    - Element
    range: Extension
    required: false