Jobs

Members of a project in Hopsworks, can launch the following types of applications through a project’s Jobs service:

  • Python (Hopsworks Enterprise only)
  • Apache Spark
  • Apache Flink

If you are a beginner it is highly recommended to click on the Spark button at landing page under the available tours. It will guide you through launching your first Spark application and the steps for launching any job type are similar. Details on running Python programs are provided in the Python section below.

Guided tours

To create a new job, click on the Jobs tab from the Project Menu and follow the steps below:

  • Step 1: Press the New Job button on the top left corner
  • Step 2: Give a name for you job
  • Step 3: Select one of the available job types
  • Step 4: Select the executable file of your job that you have uploaded earlier in a Dataset
  • Step 5: If the job is a Spark job the main class will be inferred from the jar, but can also be set manually
  • Step 6 (Optional): Configure default arguments to run your job with
  • Step 7: In the Configure and create tab you can manually specify the configuration you desire for your job and any additional dependencies and arbitrary Spark/Flink parameters.
  • Step 8: Click on the Create button
  • Step 9: Click on the Run button to launch your job. If no default arguments have been configured, a dialog textbox will ask for any runtime arguments the job

may require. If this job requires no arguments, the field can be left empty. The figure below shows the dialog.

Job runtime arguments

Job input arguments

After creating a job by following the new job wizard, you can manage all jobs and their runs from the landing page of the Jobs service. The figure below shows a project with 6 jobs where 5 jobs are shown per page. When a job has run at least once, all past and current runs are then shown in the UI.

Jobs

Jobs UI

Users can interact with the jobs in the following ways:

  1. Search jobs by using the Search text box
  2. Filter jobs by creation date
  3. Set the number of jobs to be displayed per page
  4. Run a job
  5. Stop a job, this stops all the ongoing runs of a job.
  6. Edit a job, for example change the Spark configuration parameters
  7. View Monitoring UI, with detailed Job information such as Spark UI, YARN, real-time logs and metrics
Job logs

Job real-time logs

  1. View a job’s details
Job real-time logs

Job details

  1. Make a copy of a job

10. Export a job, which prompts the user to download a json file. A job can then be imported by clicking on the New Job and then Import Job button.

Additionally, users click on a job and view additional information about their runs.

  1. Information about the run, such as location of log files and id.
  2. Stop a run
  3. Monitoring UI of this particular run
  4. View/Download stdout logs
  5. View/Download stderr logs
Job logs

Job aggregated logs

By default all files and folders created by Spark are group writable (i.e umask=007). If you want to change this default umask you can add additional spark property spark.hadoop.fs.permissions.umask-mode=<umask> in More Spark Properties when you create a new job.

Python

(Available in Hopsworks Enterprise only)

There are three ways of running Python programs in Hopsworks:

  • Jupyter notebooks: Covered in the Jupyter section of the user guide.
  • Jobs UI
  • Programmatically

The GIF below demonstrates how to create a Python job from the Jobs UI by selecting a python file that is already uploaded in a Hopsworks dataset and attaching a few other files to be immediately available to the application at runtime. However, any file can be made available to the application at runtime by using in the Python app to run, the copy_to_local function of the hdfs module of the hops Python library http://hops-py.logicalclocks.com/hops.html#module-hops.hdfs

Python new job UI

Create a new Python job from the Jobs UI

You do not have to upload the Python program UI to run it. That can be done so from within the Python program by using upload function of the dataset module of the hops Python library http://hops-py.logicalclocks.com

To do that, first generate an API key for your project, see Generate an API key, and then use the project.connect() function of the same library to connect to a project of your Hopsworks cluster and then dataset.upload.

Hopsworks IDE Plugin

It is also possible to work on jobs while developing in your IntelliJ/PyCharm IDE by installing the Hopsworks Plugin from the marketplace.

Usage

  • Open the Hopsworks Job Preferences UI for specifying user preferences under Settings -> Tools -> Hopsworks Job Preferences.
  • Input the Hopworks project preferences and job details you wish to work on.
  • Open a Project and within the Project Explorer right click on the program ( .jar, .py, .ipynb) you wish to execute as a job on Hopsworks. Different job actions possible are available in the context menu ( Create, Run, Stop, etc.)
  • Note: The Job Type Python only supports Hopsworks-EE

Actions

  • Create: Create or update job as specified in Hopsworks Job Preferences
  • Run: Uploads the program first to the HDFS path as specficied and runs job
  • Stop: Stops a job
  • Delete: Deletes a job
  • Job Execution Status / Job Execution Logs: Get the job status or logs respectively. You have the option of retrieving a particular job execution by specifying the execution id in the ‘Hopsworks Job Preferences’ UI, otherwise default is the last execution for the job name specified.
Hopworks Plugin

Working with jobs from Hopsworks IntelliJ/PyCharm plugin