Variables
Variables describe the individual measurements, calculations, or contextual data columns within a dataset. The OAE Data Protocol uses a class hierarchy to capture the different levels of metadata required for different kinds of variables — a directly measured pH value needs calibration and instrument details, while a calculated CO₂ variable needs the calculation method, and a contextual column like a station ID needs minimal metadata.
Variable Hierarchy
graph LR
V("`*Variable*
(abstract)`")
ISV("`*InSituVariable*
(abstract)`")
MV("`*MeasuredVariable*
(abstract)`")
NMV["NonMeasuredVariable"]
SEV["SocioeconomicVariable"]
CV["CalculatedVariable"]
DM["DiscreteMeasuredVariable"]
CM["ContinuousMeasuredVariable"]
DPH["`DiscretePHVariable
DiscreteTAVariable
DiscreteDICVariable
DiscretePhysiologicalVariable
*…and others*`"]
CPH["`ContinuousPHVariable
ContinuousTAVariable
ContinuousDICVariable
ContinuousPhysiologicalVariable
*…and others*`"]
V --> NMV
V --> ISV
ISV --> SEV
ISV --> CV
ISV --> MV
MV --> DM
MV --> CM
DM --> DPH
CM --> CPH
classDef abstract fill:#f5f5f5,stroke:#999,stroke-dasharray: 4 3,color:#555
classDef concrete fill:#e0e8f0,stroke:#4F656A
classDef leaf fill:#d0e8d0,stroke:#4F656A
class V,ISV,MV abstract
class NMV,SEV,CV,DPH,CPH concrete
class DM,CM leaf
This hierarchy aims to align with NOAA-PMEL's OAPMetadata XSD schema to make interoperability easier between NOAA's OCADS system, and other repositories where OAE researchers may choose to host their data, whether they be other ocean data repositories, and generalist repositories such as Zenodo.
Choosing a Variable Type
Every variable requires three selections that determine which schema class is used:
1. Variable Type (variable_type)
What kind of measurement is this?
| Value | Description | Examples |
|---|---|---|
pH |
pH measurement | pH on total scale, NBS scale |
ta |
Total alkalinity | TA from titration |
dic |
Dissolved inorganic carbon | DIC from coulometry |
co2 |
CO₂ measurement variables | pCO₂, fCO₂, xCO₂ |
sediment |
Sediment variable | Sediment core measurements |
hplc |
HPLC pigments | Chlorophyll, carotenoids |
physiological |
Physiological response | Organism growth rates, calcification |
socioeconomic |
Social/economic data | Survey responses, ecosystem valuations |
other |
Generic variable | Temperature, salinity, nutrients |
non_measured |
Contextual data | Station ID, timestamps, coordinates |
2. Genesis (genesis)
How was this variable produced? (Not applicable for non_measured)
| Value | Description |
|---|---|
measured |
Directly measured by an instrument |
calculated |
Derived from other variables (e.g., CO₂ from pH + DIC) |
3. Sampling (sampling)
How were measurements collected? (Only for measured genesis)
| Value | Description |
|---|---|
discrete |
Bottle samples, grab samples |
continuous |
Autonomous sensors, underway systems |
Selection → Schema Class Mapping
| variable_type | genesis | sampling | Schema Class |
|---|---|---|---|
pH |
measured |
discrete |
DiscretePHVariable |
pH |
measured |
continuous |
ContinuousPHVariable |
ta |
measured |
discrete |
DiscreteTAVariable |
ta |
measured |
continuous |
ContinuousTAVariable |
dic |
measured |
discrete |
DiscreteDICVariable |
dic |
measured |
continuous |
ContinuousDICVariable |
co2 |
measured |
discrete |
DiscreteCO2Variable |
co2 |
measured |
continuous |
ContinuousCO2Variable |
sediment |
measured |
discrete |
DiscreteSedimentVariable |
sediment |
measured |
continuous |
ContinuousSedimentVariable |
hplc |
measured |
discrete |
HPLCVariable |
physiological |
measured |
discrete |
DiscretePhysiologicalVariable |
physiological |
measured |
continuous |
ContinuousPhysiologicalVariable |
socioeconomic |
measured |
— | SocioeconomicVariable |
other |
measured |
discrete |
DiscreteMeasuredVariable |
other |
measured |
continuous |
ContinuousMeasuredVariable |
Any except non_measured |
calculated |
— | CalculatedVariable |
non_measured |
— | — | NonMeasuredVariable |
What Each Level Adds
All Variables
Every variable has these basic fields:
schema_class— identifies which class this variable is (auto-set)variable_type— the high-level classificationdataset_variable_name— column header name in the data filelong_name— full descriptive namestandard_identifier— reference to a community vocabulary (e.g., NERC P01)
InSituVariable (measured or calculated)
Adds project-acquired data fields:
units(required)genesis— measured or calculatedmethod_reference— citation for the method usedmeasurement_researcher— the individual who measured/derived this parameter
MeasuredVariable
Adds instrument and sampling fields:
sampling_method,analyzing_method— how samples were collected and analyzedsampling,observation_type— discrete/continuous, profile/underway/etc.analyzing_instrument— instrument details with calibration- QC fields:
uncertainty,qc_steps_taken,missing_value_indicators
CalculatedVariable
Adds calculation provenance:
calculation_method_and_parameters— software, input variables, constants used
Type-Specific Fields (Traits / Mixins)
Many measured variables (either discrete or continuous) inherit additional fields based on their variable_type that are
always present whether the specific variable is discrete or continuous. In these instances, we use LinkML's mixin
feature to allow for trait-like composability of these fields into both the corresponding DiscreteVariable and
ContinuousVariable classes for that variable_type.
| Variable Type | Mixin |
|---|---|
pH |
MeasuredPHFields |
ta |
MeasuredTAFields |
dic |
MeasuredDICFields |
co2 |
MeasuredCO2Fields |
sediment |
MeasuredSedimentFields |
physiological |
MeasuredPhysiologicalFields |
other |
— |
Leaf classes (e.g., DiscretePHVariable, DiscreteTAVariable) may add further type-specific fields as well. For a comprehensive list of all required and optional fields please refer to the individual class pages linked in the mapping table above.