How to Run Jupyterhub in Kubernetes

With JupyterHub you can create a multi-user Hub that spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server. JupyterHub allows users to interact with a computing environment through a webpage. As most devices have access to a web browser, JupyterHub makes it is easy to provide and standardize the computing environment for a group of people (e.g., for a class of students or an analytics team).

Project Jupyter created JupyterHub to support many users. The Hub can offer notebook servers to a class of students, a corporate data science workgroup, a scientific research project, or a high-performance computing group.

JupyterHub is the best way to serve Jupyter notebook for multiple users. Because JupyterHub manages a separate Jupyter environment for each user, it can be used in a class of students, a corporate data science group, or a scientific research group. It is a multi-user Hub that spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server.

Jupyter Notebook is a simplified notebook authoring application, and is a part of Project Jupyter, a large umbrella project centered around the goal of providing tools (and standards) for interactive computing with computational notebooks.

In this guide we will examine how to install and use the JupyterHub with Kubernetes. We will use helm to install the helm chart.

Prerequisites

Ensure that you have the following before proceeding:

A kubernetes cluster set up and access to it
latest version of kubectl installed
latest version of helm installed

Installing JupyterHub

With a Kubernetes cluster available and Helm installed, we can install JupyterHub in the Kubernetes cluster using the JupyterHub Helm chart.

Initialize a Helm chart configuration file

Helm charts’ contain templates that can be rendered to the Kubernetes resources to be installed. A user of a Helm chart can override the chart’s default values to influence how the templates render.

In this step we will initialize a chart configuration file for you to adjust your installation of JupyterHub. We will name and refer to it as config.yaml going onwards.

Open config.yaml with your favourite text editor, I am using vim in my case:

1
vim config.yaml

You can leave it empty with some comments to use defaults, as of version 1.0.0, you don’t need any configuration to get started.

1
# This file can update the JupyterHub Helm chart's default configuration values.

Install JupyterHub

Make Helm aware of the JupyterHub Helm chart repository so you can install the JupyterHub chart from it without having to use a long URL name.

1
2
helm repo add jupyterhub https://hub.jupyter.org/helm-chart/
helm repo update

This should show output like:

1
2
3
4
5
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "stable" chart repository
...Successfully got an update from the "jupyterhub" chart repository
Update Complete. ⎈ Happy Helming!⎈

Now install the chart configured by your config.yaml by running this command from the directory that contains your config.yaml:

1
2
3
4
5
6
helm upgrade --cleanup-on-fail \
  --install <helm-release-name> jupyterhub/jupyterhub \
  --namespace <k8s-namespace> \
  --create-namespace \
  --version=<chart-version> \
  --values config.yaml

where:

<helm-release-name> refers to a Helm release name, an identifier used to differentiate chart installations. You need it when you are changing or deleting the configuration of this chart installation. If your Kubernetes cluster will contain multiple JupyterHubs make sure to differentiate them. You can list your Helm releases with helm list.
<k8s-namespace> refers to a Kubernetes namespace, an identifier used to group Kubernetes resources, in this case all Kubernetes resources associated with the JupyterHub chart. You’ll need the namespace identifier for performing any commands with kubectl.
This step may take a moment, during which time there will be no output to your terminal. JupyterHub is being installed in the background.
If you get a release named <helm-release-name> already exists error, then you should delete the release by running helm delete <helm-release-name>. Then reinstall by repeating this step. If it persists, also do kubectl delete namespace <k8s-namespace> and try again.
In general, if something goes wrong with the install step, delete the Helm release by running helm delete <helm-release-name> before re-running the install command.
If you’re pulling from a large Docker image you may get a Error: timed out waiting for the condition error, add a --timeout=<number-of-minutes>m parameter to the helm command.
The --version parameter corresponds to the version of the Helm chart, not the version of JupyterHub. Each version of the JupyterHub Helm chart is paired with a specific version of JupyterHub. E.g., 0.11.1 of the Helm chart runs JupyterHub 1.3.0. For a list of which JupyterHub version is installed in each version of the JupyterHub Helm Chart, see the Helm Chart repository. You can also run helm search repo jupyterhub to see the available versions of JupyterHub.

1
2
3
➜ helm search repo jupyterhub
NAME                 	CHART VERSION	APP VERSION	DESCRIPTION
jupyterhub/jupyterhub	3.0.3        	4.0.2      	Multi-user Jupyter installation

This is the command with parameters on my system

1
2
3
4
5
6
helm upgrade --cleanup-on-fail \
  --install jupyterhub jupyterhub/jupyterhub \
  --namespace jupyterhub \
  --create-namespace \
  --version=3.0.3 \
  --values config.yaml

Once the command succeeds, confirm:

1
2
3
➜ helm ls -n jupyterhub
NAME      	NAMESPACE 	REVISION	UPDATED                             	STATUS  	CHART           	APP VERSION
jupyterhub	jupyterhub	2       	2023-08-31 09:31:50.562032 +0300 EAT	deployed	jupyterhub-3.0.3	4.0.2

Get pods to confirm that everything is running successfully

1
2
3
4
5
6
7
8
9
➜ k get po
NAME                              READY   STATUS    RESTARTS   AGE
continuous-image-puller-fzrfg     1/1     Running   0          21m
continuous-image-puller-r98j8     1/1     Running   0          21m
continuous-image-puller-s9997     1/1     Running   0          21m
user-scheduler-6c7d7bb7c9-bzzbx   1/1     Running   0          21m
user-scheduler-6c7d7bb7c9-xvxp6   1/1     Running   0          21m
proxy-5cbd85956d-hkd4k            1/1     Running   0          17m
hub-68584d877f-5f782              1/1     Running   0          17m

Accessing JupyterHub

Wait for the hub and proxy pod to enter the Running state.

Find the IP we can use to access the JupyterHub. Run the following command until the EXTERNAL-IP of the proxy-public service is available like in the example output.

1
kubectl --namespace <k8s-namespace> get service proxy-public

This is it on my machine

1
2
3
4
➜ kubectl --namespace jupyterhub get service proxy-public

NAME           TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
proxy-public   LoadBalancer   10.43.5.78   104.196.41.97     80:31223/TCP   24m

Or, use the short form:

1
kubectl --namespace <k8s-namespace> get service proxy-public --output jsonpath='{.status.loadBalancer.ingress[].ip}'

To use JupyterHub, enter the external IP for the proxy-public service in to a browser. JupyterHub is running with a default dummy authenticator so entering any username and password combination will let you enter the hub.

Congratulations! Now that you have basic JupyterHub running, you can extend it and optimize it in many ways to meet your needs.

Common customizations that I normally perform

Customizing User Environment

The user environment is the set of software packages, environment variables, and various files that are present when the user logs into JupyterHub. The user may also see different tools that provide interfaces to perform specialized tasks, such as JupyterLab, RStudio, RISE and others.

A Docker image built from a Dockerfile will lay the foundation for the environment that you will provide for the users. The image will for example determine what Linux software (curl, vim …), programming languages (Julia, Python, R, …) and development environments (JupyterLab, RStudio, …) are made available for use.

These are some configs I normally update for the user environment in the config.yaml file.

1
2
3
4
5
6
7
8
9
singleuser:
  cpu:
    limit: 2
    guarantee: 0.05
  memory:
    limit: 2G
    guarantee: 512M
  storage:
    capacity: 2Gi

For extensive list of user environment options, please consult the documentation here.

Customizing the ingress

Ingress allows incoming traffic to access the cluster. It is easy for users to remember a domain like jhub.citizix.com than an IP address. Add these to the config.yaml file to configure the ingress.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
ingress:
  enabled: true
  hosts:
  - jhub.citizix.com
  annotations:
    kubernetes.io/tls-acme: "true"
    cert-manager.io/cluster-issuer: letsencrypt-prod-issuer
  tls:
  - hosts:
    - jhub.citizix.com
    secretName: jhub-tls-citizix

This is my final config.yaml file for my test environment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
singleuser:
  cpu:
    limit: 2
    guarantee: 0.05
  memory:
    limit: 2G
    guarantee: 512M
  storage:
    capacity: 2Gi

debug:
  enabled: true

ingress:
  enabled: true
  hosts:
  - hub.in.citizix.com
  annotations:
    kubernetes.io/tls-acme: "true"
    cert-manager.io/cluster-issuer: letsencrypt-prod-issuer
  tls:
  - hosts:
    - hub.in.citizix.com
    secretName: citizix-tls-jupyterhub

Applying the custom configurations

Once done with the customization, you can apply them to the cluster by doing an upgrade.

Make and save the changes in the config.yaml.
Run a helm upgrade:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
helm upgrade --cleanup-on-fail \
  <helm-release-name> jupyterhub/jupyterhub \
  --namespace <k8s-namespace> \
  --version=<chart-version> \
  --values config.yaml
``

Note that `helm list` should display `<YOUR_RELEASE_NAME>` if you forgot it.

Verify that the hub and proxy pods entered the Running state after the upgrade completed.

```sh
NAMESPACE=jupyterhub

kubectl get pod --namespace $NAMESPACE

Conclusion

We managed to set up JuputerHub in this guide. Please consult the Zero to JupyterHub with Kubernetes guide for more comprehensive guide on how to set up jupyterhub in your environment.