Describing a dataset (metadata)

Table of contents

The Data Store will have a minimal dataset metadata record schema using RO-CRATE as the metadata data file format. The minimal dataset metadata record fields are:

User entered data fields

Record Creator Organisation

  • Record Creator Organisation*: The registered Organisation which is registering the data. This is searchable by typing the Organisation’s name in the search bar or manually entering the known ID of the Organisation.
  • Dataset Custodian: The registered Person who could best be described as the custodian of this data. This is searchable by typing the Person’s name in the search bar or manually entering the known ID of the Person.
  • Point of Contact: Please provide a point of contact for enquiries about this data e.g. email address. Please ensure you have sought consent to include these details in the record.

Dataset Approvals

Warning! The Dataset Approvals section must be carefully considered by the registrant of the data. If you believe the dataset is subject to any of the below concerns, but the necessary consents, approvals or permissions have not been granted and/or provided, the dataset should not be registered in Provena's Data Store. Feel free to contact us if you are uncertain about registering a dataset.

  • Dataset Registration Ethics and Privacy*: Does this dataset include any human data or require ethics/privacy approval for its registration? If so, have you included any required ethics approvals, consent from the participants and/or appropriate permissions to register this dataset in this information system? Required consents or permissions can be reposited as part of the dataset files where appropriate.
    • Subject to ethics and privacy concerns for registration?: Use the tick box to specify whether this dataset is subject to ethical and privacy concerns for registration.
    • Necessary consents and permissions required?: This tick box will only appear if the dataset is subject to the aforementioned concerns. If you have not acquired the necessary consents and permissions, the dataset should not be registered and submission will fail.
  • Dataset Access Ethics and Privacy*: Does this dataset include any human data or require ethics/privacy approval for enabling its access by users of the information system? If so, have you included any required consent from the participants and/or appropriate permissions to facilitate access to this dataset in this information system? Required consents or permissions can be reposited as part of the dataset files where appropriate.
    • Subject to ethics and privacy concerns for data access?: Use the tick box to specify whether this dataset is subject to ethical and privacy concerns for data access.
    • Necessary consents and permissions required?: This tick box will only appear if the dataset is subject to the aforementioned concerns. If you have not acquired the necessary consents and permissions, the dataset should not be registered and submission will fail.
  • Indigenous Knowledge and Consent*: Does this dataset contain Indigenous Knowledge? If so, do you have consent from the relevant Indigenous communities for its use and access via this data store?
    • Contains Indigenous Knowledge?: Use the tick box to specify whether this dataset contains Indigenous Knowledge.
    • Necessary permission acquired?: This tick box will only appear if the dataset contains Indigenous Knowledge. If you have not acquired the necessary consents and permissions, the dataset should not be registered and submission will fail.
  • Export Controls*: Is this dataset subject to any export controls permits? If so, has this dataset cleared any required due diligence checks and have you obtained any required permits?
    • Subject to export controls?: Use the tick box to specify whether this dataset is subject to export controls.
    • Cleared due diligence checks and obtained required permits?: This tick box will only appear if the dataset was marked as subject to export controls. If you have not performed the necessary due diligence checks and acquired the relevant permits, the dataset should not be registered and submission will fail.

Dataset Information

  • Dataset name*: A title to identify the dataset well enough to disambiguate it from other datasets. i.e. “Coral reef locations with turtle activity in the Capricorn Group (Great Barrier Reef)”
  • Dataset description*: Short description of the dataset. This should include the nature of the data, the intended usage, and any other relevant information. i.e. The dataset Coral reef locations in the Capricorn Group (Great Barrier Reef), contains polygons of 150 reefs and islands that have turtle activity. The data was collected from satellite and survey information. Please see the readme.txt file for details on data processing steps undertaken. The data was obtained as part of the Reef Turtle monitoring program.
  • Access Info*: Provides information about whether the dataset files will be stored in the Data Store, or hosted externally. Externally hosted datasets can be described and registered to enable data and activity provenance without enabling file upload or download. Use the checkbox “Store data in the Provena Data Store” to toggle this setting (checked indicates that the dataset is to be stored on the Provena Data Store, unchecked indicates that the dataset is stored externally). If the data is externally hosted, you must provide two additional fields:
    • URI: Provide a valid RFC3986 URI which describes the location of the data. You should provide information about how to use this URI to access the data in the description below. Examples of valid URIs include: http://website.com/file/path, https://website.com/file/path, ftp://ftp.server.com/file/path,file:///path/to/file.
    • Description: Provide a description of how the above URI can be used to access the dataset files.
  • Publisher*: The registered Organisation which is publishing/produced the data. This is searchable by typing the Organisation’s name in the search bar.
  • Dataset creation date*: The date on which this version of the dataset was produced or generated.
  • Dataset publish date*: The date on which this version of the dataset was first published. If the data has never been published before, please select today’s date.
  • Usage licence*: Select a licence from the dropdown list. The default will be ‘Copyright’. A list of licences is available here.
  • Dataset purpose: A brief description of the reason a data asset was created. Should be a good guide to the potential usefulness of a data asset to other users.
  • Dataset rights holder: Specify the party owning or managing rights over the resource. Please ensure you have sought consent to include these details in the record.
  • Usage limitations: A statement that provides information on any caveats or restrictions on access or on the use of the data asset, including legal, security, privacy, commercial or other limitations.
  • Preferred Citation: Optionally specify a citation which users of this dataset should use when referencing this dataset. To provide a preferred citation, tick the “Provide preferred citation” checkbox and enter your citation into the textfield.
  • Spatial Information: If your dataset includes spatial data, you can provide more information about the extent and resolution of this data.
    • Spatial Coverage: The geographic area applicable to the data asset. Please specify spatial coverage using the EWKT format.
    • Spatial Resolution: The spatial resolution applicable to the data asset. Please use the Decimal Degrees standard.
    • Spatial Extent: The range of spatial coordinates applicable to the data asset. Please provide a bounding box extent using the EWKT format.
  • Temporal Information: If your dataset includes data spanning a period of time, you can provide more information about the duration and resolution of this data.
    • Temporal Duration: The start and end date of the time period applicable to the data asset (note that a start and end date both must be provided if a temporal duration is to be specified).
    • Temporal Resolution: The temporal resolution (i.e. time step) of the data. Please use the ISO8601 duration format e.g. “P1Y2M10DT2H30M”.
  • Dataset File Formats: What file formats are present in this dataset? E.g. “pdf”, “csv” etc. You can use the plus and minus symbol to add and remove file formats.
  • Keywords: List of keywords which describe the dataset. These keywords are searchable.
  • Custom User Metadata: If you would like to include additional custom annotations to describe your dataset, you can do so here. Please tick “Include Custom User Metadata” and then click the “Add a new entry” plus icon to add a row. Your metadata is composed of a set of key value pairs. Click and enter a key, and value, for example “my_special_dataset_id” and “1234”. You can add another row using the plus icon on the right, or remove an entry using the red minus sign on the right. To remove all custom metadata, untick the “Include Custom User Metadata” box.

* Denotes required field.

Generated data fields

Auto generated dataset details

  • Handle: Persistent identifier [Auto generated]
  • URL: Path to the online dataset [Auto generated]

The metadata fields included in Data Store enables data registration, upload, sharing via the S3 APIs (application programming interface) to support current modelling activities in projects. Future releases will incrementally add additional capability to capture project, ISO and Science related metadata for when we need wider data publication.