Getting started with Hopsworks.ai

Hopsworks.ai is our managed platform for running Hopsworks and the Feature Store in the cloud. It integrates seamlessly with third party platforms such as Databricks, SageMaker and KubeFlow. This guide shows how to set up Hopsworks.ai with your organization’s AWS account.

Step 1: Connecting your AWS account

Hopsworks.ai deploys Hopsworks clusters to your AWS account. To enable this you have to give us permission to do so. This can be either achieved by using AWS cross-account roles or AWS access keys. We strongly recommend the usage of cross-account roles whenever possible due to security reasons.

Option 1: Using AWS Cross-Account Roles

To create a cross-account role for Hopsworks.ai, you need our AWS account id and the external id we created for you. You can find this information on the first screen of the cross-account configuration flow. Take note of the account id and external id and go to the Roles section of the IAM service in the AWS Management Console and select Create role.

Creating the cross-account role instructions

Select Another AWS account as trusted entity and fill in our AWS account id and the external id generated for you:

Creating the cross-account role step 1

Go to the last step of the wizard, name the role and create it:

Creating the cross-account role step 2

As a next step, you need to create an access policy to give Hopsworks.ai permissions to manage clusters in your organization’s AWS account. By default, Hopsworks.ai is automating all steps required to launch a new Hopsworks cluster. If you want to limit the required AWS permissions, see Limiting AWS permissions.

Copy the permission JSON from the instructions:

Adding the policy instructions

Identify your newly created cross-account role in the Roles section of the IAM service in the AWS Management Console and select Add inline policy:

Adding the inline policy step 1

Replace the JSON policy with the JSON from our instructions and continue in the wizard:

Adding the inline policy step 2

Name and create the policy:

Adding the inline policy step 3

Copy the Role ARN from the summary of your cross-account role:

Adding the inline policy step 4

Paste the Role ARN into Hopsworks.ai and select Configure:

Saving the cross-account role

Option 2: Using AWS Access Keys

You can either create a new IAM user or use an existing IAM user to create access keys for Hopsworks.ai. If you want to create a new IAM user, see Creating an IAM User in Your AWS Account.

Warning

We recommend using Cross-Account Roles instead of Access Keys whenever possible, see Option 1: Using AWS Cross-Account Roles

Hopsworks.ai requires a set of permissions to be able to launch clusters in your AWS account. The permissions can be granted by attaching an access policy to your IAM user. By default, Hopsworks.ai is automating all steps required to launch a new Hopsworks cluster. If you want to limit the required AWS permissions, see Limiting AWS permissions.

The required permissions are shown in the instructions. Copy them if you want to create a new access policy:

Configuring access key instructions

Add a new Inline policy to your AWS user:

Configuring the access key on AWS step 1

Replace the JSON policy with the JSON from our instructions and continue in the wizard:

Adding the inline policy step 2

Name and create the policy:

Adding the inline policy step 3

In the overview of your IAM user, select Create access key:

Configuring the access key on AWS step 2

Copy the Access Key ID and the Secret Access Key:

Configuring the access key on AWS step 3

Paste the Access Key ID and the Secret Access Key into Hopsworks.ai and select Configure:

Saving the access key pair

Step 2: Deploying a Hopsworks cluster

In Hopsworks.ai, select Run a new instance:

Create a Hopsworks cluster

Configure the instance by selecting the location, instance type and optionally the VPC, subnet and security group. Select Deploy.

Note

We recommend that you always configure an SSH key under advanced options to ensure you can troubleshoot the instance if necessary.

If you select a S3 bucket then HopsFS will store all the files in the S3 bucket. Do not forget to set appropriate instance profile so that the cluster instances can access the selected bucket.

Configuring Instance Profile for Hopsworks cluster

Following is an example of an instance profile needed by HopsFS to store the file system blocks in a S3 bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "HopsFSS3Permissions",
            "Effect": "Allow",
            "Action": [
                "S3:PutObject",
                "S3:ListBucket",
                "S3:GetBucketLocation",
                "S3:GetObject",
                "S3:DeleteObject",
                "S3:AbortMultipartUpload",
                "S3:ListBucketMultipartUploads"
            ],
            "Resource": [
                "arn:aws:s3:::bucket.name/*",
                "arn:aws:s3:::bucket.name"
            ]
        }
    ]
}

Replace bucket.name with appropriate S3 bucket name.

The instance will start. This might take a couple of minutes:

Booting Hopsworks cluster

As soon as the instance has started, you will be able to log in to your new Hopsworks instance with the username and password provided. We recommend that you change that password after your first login. You are also able to stop or terminate the instance.

Running Hopsworks cluster

Step 3: Outside Access to the Feature Store

By default, only the Hopsworks UI is exposed (made available to clients on external networks, like the Internet) by your Hopsworks instance. To integrate with external platforms and access APIs for services such as the Feature Store, you have to expose them.

Expose services by selecting them and pressing Update. This will update the Security Group attached to the Hopsworks instance to allow incoming traffic on the relevant ports.

Outside Access to the Feature Store

Step 4: Next steps

Check out our other guides for how to get started with Hopsworks and the Feature Store: