Provenance
The provenance layer in Hopsworks allows for the tracking of file(HopsFS) and application(YARN) operations in order to provide users with additional details.
There are currently four types of provenance:
- DISABLED
- META - enables search of projects and datasets.
- MIN - enables search of featuregroups, training datasets and features including attached tags.
- FULL - allows linking of feature groups, training datasets, experiments and models.
Note: Linking of ML artifacts (FULL provenance) is a Hopsworks Enterprise feature.
Temporary limitations:
- Provenance can only be set cluster wide, by an administrator, under Hopsworks Variables. The variable of interest is provenance_type and it can take the values DISABLED / META / MIN / FULL.
- Changing provenance type only affects newly created projects.
- There is currently no way to change/clean provenance for old projects. The only workaround for the moment is to delete and recreate the project under a different cluster provenance setup.
Provenance Links
For Feature Groups we can expand Provenance with further more Details including:
- Training Datasets generated from this Feature Group
For Training Datasets we can expand Provenance with further more Details including:
- Feature Groups used to create this Traininig Dataset
- Experiments using this Training Dataset
- Models generated by an Experiment using the current Training Dataset.
For Experiments Provenance adds:
- Training Dataset used in this Experiment
For Models Provenance adds:
- Training Dataset used to generate this Model