Python

This tutorial will go over how Python environments are managed.

Python Environment Basics

Hopsworks provides a premade environment for machine learning and data science development using python 3.7. The environment contain the most popular machine learning libraries including TensorFlow, Keras, PyTorch and scikit-learn.

The environment ensures compatibility between the CUDA version and the installed TensorFlow and PyTorch versions for applications using NVIDIA GPUs.

There is one Anaconda environment for each project, and the environment is layered on top of the project’s base Docker image. That is, a library installed in the project’s Anaconda environment can be used in a Job or Jupyter notebook (Python or PySpark) run in the project. In the Python UI, libraries can be installed or uninstalled from the project’s Anaconda environment.

The Docker image is pulled on all hosts to ensure that the running application gets the latest version of the Anaconda environment, as libraries can be installed and uninstalled at any time in the project.

Listing installed libraries

The preinstalled libraries are listed under the Manage Environment tab.

List installed libraries

Installed libraries overview

Installing libraries

Python packages can be installed from the following sources:

  • PyPi, using pip package manager
  • A conda channel, using conda package manager
  • Packages saved in certain file formats, currently we support .whl or .egg
  • A public or private git repository
  • A requirements.txt file to install many libraries at the same time using pip
  • An environment.yml file to install many libraries at the same time using conda or pip

The search bar for pip libraries is not available since Hopsworks 2.1, it will be added back in 2.2.

Select library package manager

Select package manager

Installation option 1: Install by name and version

Enter the name and the desired version to install of your python package.

Install library by name and version

Installing library by name and version

Installation option 2: Install from .whl, .egg, requirements.txt or environment.yml file

Select the uploaded package to install by selecting it in the file browser.

Install library from file browser

Installing package uploaded in the file browser

Installation option 3: Install from .git repository

To install from a git repository simply provide the repository URL. The URL you should provide is the same as you would enter on the command line using pip install git+{repo_url}. In the case of a private git repository, also select whether it is a GitHub or GitLab repository and the preconfigured access token for the repository.

Note: If you are installing from a git repository which is not GitHub or GitLab simply supply the access token in the URL. Keep in mind that in this case the access token may be visible in logs for other users to see.

Install library from git repository

Installing a Python library using a git repository URL

Track installation progress

The progress of libraries being installed and uninstalled can be tracked in the Ongoing Operations tab. The CREATE operation is the operation for creating a new Docker image based on the project’s Anaconda environment, after that operation is finished the INSTALL operation will run and install the library in the new environment.

Uninstalling libraries

To uninstall a library navigate to the Manage Environment tab and click Uninstall to remove the library.

Uninstalling a library

Uninstalling a library

Debugging the environment

After each installation or uninstall of a library, the environment is analyzed to detect libraries that may not work properly. In order to do so we use the pip check tool, which is able to identify missing dependencies or if a dependency is installed with the incorrect version. The alert will automatically show if such an issue was found.

Environment conflicts

Environment conflicts

Recreating environment

Sometimes it may be desirable to recreate the environment in case it ended up in a bad state. In order to do that, first click Remove Environment in the Manage Environment tab. After removing the environment, simply recreate it by clicking Enable Environment

Removing an environment

Remove the environment

Exporting an environment

An existing Anaconda environment can be exported as a yml file.

Removing an environment

Exporting an environment

Create an environment from environment.yml

An environment can be created from an Anaconda yml file.

Create an environment from yml file

Create an environment from yml

Create an environment from requirements.txt

An environment can be created from a requirements.txt file.

Create an environment from requirements.txt file

Create an environment from requirements.txt