Externally Hosted Datasets

What is an externally hosted dataset?

By default, datasets registered in the Data Store include dataset files which are hosted as part of the Information System. There are various reasons which could make it a better choice to instead reference an existing source of the dataset files, e.g.

  • Dataset size - if the dataset is extremely large, it may be cost prohibitive to rehost it
  • Unnecessary duplication - if the dataset is already hosted in a reliable external repository, there is no need to rehost it in the Data Store

If you are unsure about whether a dataset can or should be stored in the Data Store, please get in touch with us.

What can I do with an externally hosted dataset?

Externally hosted datasets behave identically to normal datasets. The option to label data as externally hosted is provided as a way to conveniently redirect users towards the primary dataset files; you can still use Provena’s file hosting mechanisms to store and retrieve data. This could include file attachments, metadata, logs or other information which should be accessible easily from the data store.