Class: DataSet
A collection of related data items or records that are organized together in a common format or structure, to enable their computational manipulation as a unit.
URI: vacoreim:DataSet
classDiagram
class DataSet
InformationEntity <|-- DataSet
DataSet : contributions
DataSet --|> Contribution : contributions
DataSet : dateAuthored
DataSet : derivedFrom
DataSet --|> InformationEntity : derivedFrom
DataSet : description
DataSet : extensions
DataSet --|> Extension : extensions
DataSet : id
DataSet : identifiers
DataSet : label
DataSet : license
DataSet : recordMetadata
DataSet --|> RecordMetadata : recordMetadata
DataSet : releaseDate
DataSet : reportedIn
DataSet --|> Document : reportedIn
DataSet : specifiedBy
DataSet --|> Method : specifiedBy
DataSet : subtype
DataSet --|> Coding : subtype
DataSet : type
DataSet : urls
DataSet : version
DataSet : xrefs
Inheritance
- Element
- Entity
- InformationEntity
- DataSet
- InformationEntity
- Entity
Slots
Name | Cardinality and Range | Description | Inheritance |
---|---|---|---|
subtype | 0..1 Coding |
A specific type of data set the DataSet object represents. | direct |
releaseDate | 0..1 DateTime |
The date when a version-level Data Set was released. | direct |
license | 0..1 String |
The type of license that dictates legal permissions for how the Data Set can be used - typically referenced by its URL. | direct |
version | 0..1 String |
The version of the Data Set (use this field used in cases where version is not reflected in an identifier associated with the Data Set) | direct |
contributions | 0..* Contribution |
A specific contribution made by some Agent to the creation, modification, or validation of the information represented in the Information Entity. | InformationEntity |
dateAuthored | 0..1 DateTime |
The date the information content expressed in this entity was generated. | InformationEntity |
specifiedBy | 0..* Method |
A particular plan specification that describes all or part of the process that led to creation of the reported information (e.g. a specific experimental protocol, analysis specification, cohort selection criteria, interpretation guideline, etc.). | InformationEntity |
derivedFrom | 0..* InformationEntity |
An information resource from which the Information Entity is derived, in whole or in part | InformationEntity |
reportedIn | 0..* Document |
A document in which the information content carried by the Information Entity is expressed | InformationEntity |
id | 1..1 Identifier |
The logical identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system. | Entity |
identifiers | 0..* Identifier |
A business identifier or accession number for the entity, typically as provided by an external system or authority, that is globally unique and persists across implementing systems. | Entity |
label | 0..1 String |
A primary name for the Entity. | Entity |
urls | 0..* Url |
The URL/web address of a digital resource representing the entity, or providing information about it. | Entity |
xrefs | 0..* String |
Cross-references to database identifier(s) representing the same (or a closely related) entity or concept as the Entity. | Entity |
recordMetadata | 0..1 RecordMetadata |
A reusable structure that encapsulates provenance metadata about the present record/data object (as opposed to provenance information about the real world entity this record/data object represents). | Entity |
type | 1..1 Class |
The schema class that is instantiated by the data object. Must be the name of a class from the VA schema. | Element |
description | 0..1 String |
A free text description of the Element. | Element |
extensions | 0..* Extension |
A key-value data structure that allows definition of custom fields to capture information not directly supported by the VA specification. | Element |
Usages
used by | used in | type | used |
---|---|---|---|
StudyResult | derivedFrom | range | DataSet |
Comments
- Instances of this class represent a specific version of a given dataset (e.g. gnomAD v3.1.2). At present, the model does not support 'Summary Level' or 'Distribution Level' representations (see https://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels).
Examples include the Broad ExAC dataset on allele population frequency, or a SIFT dataset of computational predictions functional impact for a set of variants, or the contents of a VCF file that describes variations observed in a particular patient and various annotations made on them.
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/ga4gh-va-core-im
Mappings
Mapping Type | Mapped Value |
---|---|
self | vacoreim:DataSet |
native | vacoreim:DataSet |
LinkML Source
Direct
name: DataSet
description: A collection of related data items or records that are organized together
in a common format or structure, to enable their computational manipulation as a
unit.
title: Data Set
comments:
- 'Instances of this class represent a specific version of a given dataset (e.g. gnomAD
v3.1.2). At present, the model does not support ''Summary Level'' or ''Distribution
Level'' representations (see https://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels).
Examples include the Broad ExAC dataset on allele population frequency, or a SIFT
dataset of computational predictions functional impact for a set of variants, or
the contents of a VCF file that describes variations observed in a particular patient
and various annotations made on them.'
from_schema: https://w3id.org/ga4gh-va-core-im
is_a: InformationEntity
slots:
- subtype
- releaseDate
- license
- version
slot_usage:
subtype:
name: subtype
description: A specific type of data set the DataSet object represents.
comments:
- The recorded type is typically based on the nature of the data in the dataset,
and what it describes (e.g. genomic sequence dataset, genome feature annotation
dataset) .
multivalued: false
domain_of:
- DataItem
- DataSet
- Document
- Statement
- StudyResult
- EvidenceLine
- Method
- Activity
- Agent
- Proposition
range: Coding
required: false
releaseDate:
name: releaseDate
description: The date when a version-level Data Set was released.
comments:
- This attribute may apply to version and distribution-level Data Set representations.
multivalued: false
domain_of:
- DataSet
range: DateTime
required: false
license:
name: license
description: The type of license that dictates legal permissions for how the Data
Set can be used - typically referenced by its URL.
comments:
- 'The VA Model can record license information for resources such as Data Sets
and Documents that get published and released as a unit for community use.
This attribute may apply to summary, version, and distribution-level Data Set
representations.'
multivalued: false
domain_of:
- DataSet
- Document
- Method
range: string
required: false
version:
name: version
description: The version of the Data Set (use this field used in cases where version
is not reflected in an identifier associated with the Data Set)
comments:
- The VA Model can record version information for resources such as Data Sets
and Documents that get published and released as a unit for community use. These
may go through rounds of revisions that add or modify content, but don’t change
the identity of the resource.
multivalued: false
domain_of:
- DataSet
- Document
range: string
required: false
Induced
name: DataSet
description: A collection of related data items or records that are organized together
in a common format or structure, to enable their computational manipulation as a
unit.
title: Data Set
comments:
- 'Instances of this class represent a specific version of a given dataset (e.g. gnomAD
v3.1.2). At present, the model does not support ''Summary Level'' or ''Distribution
Level'' representations (see https://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels).
Examples include the Broad ExAC dataset on allele population frequency, or a SIFT
dataset of computational predictions functional impact for a set of variants, or
the contents of a VCF file that describes variations observed in a particular patient
and various annotations made on them.'
from_schema: https://w3id.org/ga4gh-va-core-im
is_a: InformationEntity
slot_usage:
subtype:
name: subtype
description: A specific type of data set the DataSet object represents.
comments:
- The recorded type is typically based on the nature of the data in the dataset,
and what it describes (e.g. genomic sequence dataset, genome feature annotation
dataset) .
multivalued: false
domain_of:
- DataItem
- DataSet
- Document
- Statement
- StudyResult
- EvidenceLine
- Method
- Activity
- Agent
- Proposition
range: Coding
required: false
releaseDate:
name: releaseDate
description: The date when a version-level Data Set was released.
comments:
- This attribute may apply to version and distribution-level Data Set representations.
multivalued: false
domain_of:
- DataSet
range: DateTime
required: false
license:
name: license
description: The type of license that dictates legal permissions for how the Data
Set can be used - typically referenced by its URL.
comments:
- 'The VA Model can record license information for resources such as Data Sets
and Documents that get published and released as a unit for community use.
This attribute may apply to summary, version, and distribution-level Data Set
representations.'
multivalued: false
domain_of:
- DataSet
- Document
- Method
range: string
required: false
version:
name: version
description: The version of the Data Set (use this field used in cases where version
is not reflected in an identifier associated with the Data Set)
comments:
- The VA Model can record version information for resources such as Data Sets
and Documents that get published and released as a unit for community use. These
may go through rounds of revisions that add or modify content, but don’t change
the identity of the resource.
multivalued: false
domain_of:
- DataSet
- Document
range: string
required: false
attributes:
subtype:
name: subtype
description: A specific type of data set the DataSet object represents.
comments:
- The recorded type is typically based on the nature of the data in the dataset,
and what it describes (e.g. genomic sequence dataset, genome feature annotation
dataset) .
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: false
alias: subtype
owner: DataSet
domain_of:
- DataItem
- DataSet
- Document
- Statement
- StudyResult
- EvidenceLine
- Method
- Activity
- Agent
- Proposition
range: Coding
required: false
releaseDate:
name: releaseDate
description: The date when a version-level Data Set was released.
comments:
- This attribute may apply to version and distribution-level Data Set representations.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: false
alias: releaseDate
owner: DataSet
domain_of:
- DataSet
range: DateTime
required: false
license:
name: license
description: The type of license that dictates legal permissions for how the Data
Set can be used - typically referenced by its URL.
comments:
- 'The VA Model can record license information for resources such as Data Sets
and Documents that get published and released as a unit for community use.
This attribute may apply to summary, version, and distribution-level Data Set
representations.'
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: false
alias: license
owner: DataSet
domain_of:
- DataSet
- Document
- Method
range: string
required: false
version:
name: version
description: The version of the Data Set (use this field used in cases where version
is not reflected in an identifier associated with the Data Set)
comments:
- The VA Model can record version information for resources such as Data Sets
and Documents that get published and released as a unit for community use. These
may go through rounds of revisions that add or modify content, but don’t change
the identity of the resource.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: false
alias: version
owner: DataSet
domain_of:
- DataSet
- Document
range: string
required: false
contributions:
name: contributions
description: A specific contribution made by some Agent to the creation, modification,
or validation of the information represented in the Information Entity.
comments:
- This attribute points to a Contribution object, which holds a structured description
of the actions taken by a particular agent in contributing to an Information
Entity.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: true
alias: contributions
owner: DataSet
domain_of:
- InformationEntity
- RecordMetadata
range: Contribution
required: false
dateAuthored:
name: dateAuthored
description: The date the information content expressed in this entity was generated.
comments:
- We use the term 'authored' to refer to the generation of information in the
abstract - i.e. the information content expressed in a Statement, not a concrete
encoding of it in a specific language or format. The 'dateAuthored' attribute
captures when this abstract information content was first created. Information
about when a particular concrete encoding of this information was created (e.g.
as a VA-based json object) would live in a RecordMetadata object attached to
the Information Entity).
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: false
alias: dateAuthored
owner: DataSet
domain_of:
- InformationEntity
range: DateTime
required: false
specifiedBy:
name: specifiedBy
description: A particular plan specification that describes all or part of the
process that led to creation of the reported information (e.g. a specific experimental
protocol, analysis specification, cohort selection criteria, interpretation
guideline, etc.).
comments:
- 'This field captures specific instances of specifications / methods (vs the
''methodTypes'' attribute which captures types of methods applied).
In various domains and communities, such specifications may be called ''protocols'',
''guidelines'', ''methods'', ''rule sets'', etc.'
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: true
alias: specifiedBy
owner: DataSet
domain_of:
- InformationEntity
- Activity
range: Method
required: false
derivedFrom:
name: derivedFrom
description: An information resource from which the Information Entity is derived,
in whole or in part
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: true
alias: derivedFrom
owner: DataSet
domain_of:
- InformationEntity
- StudyResult
- RecordMetadata
range: InformationEntity
required: false
reportedIn:
name: reportedIn
description: A document in which the information content carried by the Information
Entity is expressed
comments:
- "This attribute is used specifically to reference documents/publications where\
\ the Information Entity is expressed or reported. For a Statement, this might\
\ be a publication where the authors express the statement in text. For a Data\
\ Item, this might be a publication with a table or figure that reports the\
\ value of the data. \nNote that the VA-Spec provide separate attributes for\
\ describing different types of 'references' from an Information Entity to a\
\ Document (e.g. hasEvidenceFromSource is used by a Statement to reference a\
\ Document that provided evidence for the knowldege the Statement expresses."
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: true
alias: reportedIn
owner: DataSet
domain_of:
- InformationEntity
range: Document
required: false
id:
name: id
description: The logical identifier of the entity in the system of record, e.g.
a UUID. This 'id' is unique within a given system. The identified entity may
have a different 'id' in a different system.
comments:
- FHIR naming conventions are followed here, where an 'id' field holds logical
identifiers which are unique only within a given system, and an 'identifier'
field holds business identifiers, which are globally unique and used to connect
entities and share content across systems.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: false
alias: id
owner: DataSet
domain_of:
- Entity
range: Identifier
required: true
identifiers:
name: identifiers
description: A business identifier or accession number for the entity, typically
as provided by an external system or authority, that is globally unique and
persists across implementing systems.
comments:
- FHIR naming conventions are followed here, where an 'id' field holds logical
identifiers which are unique only within a given system, and an 'identifier'
field holds business identifiers, which are globally unique and used to connect
entities and share content across systems.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: true
alias: identifiers
owner: DataSet
domain_of:
- Entity
range: Identifier
required: false
label:
name: label
description: A primary name for the Entity.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: false
alias: label
owner: DataSet
domain_of:
- Entity
- Coding
range: string
required: false
urls:
name: urls
description: The URL/web address of a digital resource representing the entity,
or providing information about it.
comments:
- This attribute is meant to point directly to locations on the web where more
information about the Entity can be found.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: true
alias: urls
owner: DataSet
domain_of:
- Entity
range: Url
required: false
xrefs:
name: xrefs
description: Cross-references to database identifier(s) representing the same
(or a closely related) entity or concept as the Entity.
comments:
- Preferred values for this field are CURIEs or URLs for database records - so
the system that provisioned the identifier is clear.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: true
alias: xrefs
owner: DataSet
domain_of:
- Entity
range: string
required: false
recordMetadata:
name: recordMetadata
description: A reusable structure that encapsulates provenance metadata about
the present record/data object (as opposed to provenance information about
the real world entity this record/data object represents).
comments:
- Record-level metadata applies to a specific concrete encoding/serialization
of information (e.g as a record in a specific knowlegebase, or an online digital
resource). A RecordMetadata object can capture when, how, and by whom a specific
record was generated or modified; what upstream resources it was derived/retrieved
from; and record-level administrative information such as versioning and system
/ lifecycle status.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: false
alias: recordMetadata
owner: DataSet
domain_of:
- Entity
range: RecordMetadata
required: false
type:
name: type
description: The schema class that is instantiated by the data object. Must be
the name of a class from the VA schema.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: false
alias: type
owner: DataSet
domain_of:
- Element
range: Class
required: true
description:
name: description
description: A free text description of the Element.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: false
alias: description
owner: DataSet
domain_of:
- Element
- Extension
range: string
required: false
extensions:
name: extensions
description: A key-value data structure that allows definition of custom fields
to capture information not directly supported by the VA specification.
comments:
- The VA-Spec provides implementers the ability to extend any model elements
with new attributes using this flexible Extension element.
from_schema: https://w3id.org/ga4gh-va-core-im
rank: 1000
multivalued: true
alias: extensions
owner: DataSet
domain_of:
- Element
range: Extension
required: false