The provenance layer in Hopsworks allows for the tracking of file(HopsFS) and application(YARN) operations in order to provide users with additional details. There are currently four types of provenance:

  • META - enables search of projects and datasets.
  • MIN - enables search of featuregroups, training datasets and features including attached tags.
  • FULL - allows linking of feature groups, training datasets, experiments and models.

Note: Linking of ML artifacts (FULL provenance) is a Hopsworks Enterprise feature.

Temporary limitations:

  • Provenance can only be set cluster wide, by an administrator, under Hopsworks Variables. The variable of interest is provenance_type and it can take the values DISABLED / META / MIN / FULL.
  • Changing provenance type only affects newly created projects.
  • There is currently no way to change/clean provenance for old projects. The only workaround for the moment is to delete and recreate the project under a different cluster provenance setup.
Provenance Cluster Variable

Provenance Cluster Variable