This tutorial will go over how Python environments are managed.
Hopsworks provides a premade environment for machine learning and data science development using python 3.7. The environment contain the most popular machine learning libraries including TensorFlow, Keras, PyTorch and scikit-learn.
The environment ensures compatibility between the CUDA version and the installed TensorFlow and PyTorch versions for applications using NVIDIA GPUs.
There is one Anaconda environment for each project, and the environment is layered on top of the project’s base Docker image. That is, a library installed in the project’s Anaconda environment can be used in a Job or Jupyter notebook (Python or PySpark) run in the project. In the Python UI, libraries can be installed or uninstalled from the project’s Anaconda environment.
The Docker image is pulled on all hosts to ensure that the running application gets the latest version of the Anaconda environment, as libraries can be installed and uninstalled at any time in the project.
The preinstalled libraries are listed under the Manage Environment tab.
Python packages can be installed from the following sources:
Installation option 1: Install by name and version
Enter the name and the desired version to install of your python package.
Installation option 2: Search and install
Enter the search term and select the version from the drop down.
Installation option 3: Install from .whl, .egg, requirements.txt or environment.yml file
Select the uploaded package to install by selecting it in the file browser.
Installation option 4: Install from .git repository
To install from a git repository simply provide the repository URL. The URL you should provide is the same as you would enter on the command line using pip install git+{repo_url}. In the case of a private git repository, also select whether it is a GitHub or GitLab repository and the preconfigured access token for the repository.
Note: If you are installing from a git repository which is not GitHub or GitLab simply supply the access token in the URL. Keep in mind that in this case the access token may be visible in logs for other users to see.
Track installation progress
The progress of libraries being installed and uninstalled can be tracked in the Ongoing Operations tab. The CREATE operation is the operation for creating a new Docker image based on the project’s Anaconda environment, after that operation is finished the INSTALL operation will run and install the library in the new environment.
To uninstall a library navigate to the Manage Environment tab and click Uninstall to remove the library.
After each installation or uninstall of a library, the environment is analyzed to detect libraries that may not work properly. In order to do so we use the pip check
tool, which is able to identify missing dependencies or if a dependency is installed with the incorrect version.
The alert will automatically show if such an issue was found.
Sometimes it may be desirable to recreate the environment in case it ended up in a bad state. In order to do that, first click Remove Environment in the Manage Environment tab. After removing the environment, simply recreate it by clicking Enable Environment
An existing Anaconda environment can be exported as a yml file.
An environment can be created from an Anaconda yml file.
An environment can be created from a requirements.txt file.