With JupyterHub you can create a multi-user Hub that spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server. JupyterHub allows users to interact with a computing environment through a webpage. As most devices have access to a web browser, JupyterHub makes it is easy to provide and standardize the computing environment for a group of people (e.g., for a class of students or an analytics team).
Project Jupyter created JupyterHub to support many users. The Hub can offer notebook servers to a class of students, a corporate data science workgroup, a scientific research project, or a high-performance computing group.
JupyterHub is the best way to serve Jupyter notebook for multiple users. Because JupyterHub manages a separate Jupyter environment for each user, it can be used in a class of students, a corporate data science group, or a scientific research group. It is a multi-user Hub that spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server.
Jupyter Notebook is a simplified notebook authoring application, and is a part of Project Jupyter, a large umbrella project centered around the goal of providing tools (and standards) for interactive computing with computational notebooks.
In this guide we will examine how to install and use the JupyterHub with Kubernetes. We will use helm to install the helm chart.
Related content:
Prerequisites
Ensure that you have the following before proceeding:
- A kubernetes cluster set up and access to it
- latest version of
kubectl
installed - latest version of
helm
installed
Installing JupyterHub
With a Kubernetes cluster available and Helm installed, we can install JupyterHub in the Kubernetes cluster using the JupyterHub Helm chart.
Initialize a Helm chart configuration file
Helm charts’ contain templates that can be rendered to the Kubernetes resources to be installed. A user of a Helm chart can override the chart’s default values to influence how the templates render.
In this step we will initialize a chart configuration file for you to adjust your installation of JupyterHub. We will name and refer to it as config.yaml
going onwards.
Open config.yaml
with your favourite text editor, I am using vim in my case:
|
|
You can leave it empty with some comments to use defaults, as of version 1.0.0, you don’t need any configuration to get started.
|
|
Install JupyterHub
Make Helm aware of the JupyterHub Helm chart repository so you can install the JupyterHub chart from it without having to use a long URL name.
|
|
This should show output like:
|
|
Now install the chart configured by your config.yaml by running this command from the directory that contains your config.yaml:
|
|
where:
<helm-release-name>
refers to a Helm release name, an identifier used to differentiate chart installations. You need it when you are changing or deleting the configuration of this chart installation. If your Kubernetes cluster will contain multiple JupyterHubs make sure to differentiate them. You can list your Helm releases withhelm list
.<k8s-namespace>
refers to a Kubernetes namespace, an identifier used to group Kubernetes resources, in this case all Kubernetes resources associated with the JupyterHub chart. You’ll need the namespace identifier for performing any commands withkubectl
.- This step may take a moment, during which time there will be no output to your terminal. JupyterHub is being installed in the background.
- If you get a release named
<helm-release-name>
already exists error, then you should delete the release by runninghelm delete <helm-release-name>
. Then reinstall by repeating this step. If it persists, also dokubectl delete namespace <k8s-namespace>
and try again. - In general, if something goes wrong with the install step, delete the Helm release by running
helm delete <helm-release-name>
before re-running the install command. - If you’re pulling from a large Docker image you may get a
Error: timed out waiting for the condition
error, add a--timeout=<number-of-minutes>m
parameter to the helm command. - The
--version
parameter corresponds to the version of the Helm chart, not the version of JupyterHub. Each version of the JupyterHub Helm chart is paired with a specific version of JupyterHub. E.g.,0.11.1
of the Helm chart runs JupyterHub1.3.0
. For a list of which JupyterHub version is installed in each version of the JupyterHub Helm Chart, see the Helm Chart repository. You can also runhelm search repo jupyterhub
to see the available versions of JupyterHub.
|
|
This is the command with parameters on my system
|
|
Once the command succeeds, confirm:
|
|
Get pods to confirm that everything is running successfully
|
|
Accessing JupyterHub
Wait for the hub and proxy pod to enter the Running
state.
Find the IP we can use to access the JupyterHub. Run the following command until the EXTERNAL-IP
of the proxy-public
service is available like in the example output.
|
|
This is it on my machine
|
|
Or, use the short form:
|
|
To use JupyterHub, enter the external IP for the proxy-public service in to a browser. JupyterHub is running with a default dummy authenticator so entering any username and password combination will let you enter the hub.
Congratulations! Now that you have basic JupyterHub running, you can extend it and optimize it in many ways to meet your needs.
Common customizations that I normally perform
Customizing User Environment
The user environment is the set of software packages, environment variables, and various files that are present when the user logs into JupyterHub. The user may also see different tools that provide interfaces to perform specialized tasks, such as JupyterLab, RStudio, RISE and others.
A Docker image built from a Dockerfile will lay the foundation for the environment that you will provide for the users. The image will for example determine what Linux software (curl, vim …), programming languages (Julia, Python, R, …) and development environments (JupyterLab, RStudio, …) are made available for use.
These are some configs I normally update for the user environment in the config.yaml
file.
|
|
For extensive list of user environment options, please consult the documentation here.
Customizing the ingress
Ingress allows incoming traffic to access the cluster. It is easy for users to remember a domain like jhub.citizix.com
than an IP address. Add these to the config.yaml
file to configure the ingress.
|
|
This is my final config.yaml file for my test environment:
|
|
Applying the custom configurations
Once done with the customization, you can apply them to the cluster by doing an upgrade.
- Make and save the changes in the
config.yaml
. - Run a helm upgrade:
|
|
Conclusion
We managed to set up JuputerHub in this guide. Please consult the Zero to JupyterHub with Kubernetes guide for more comprehensive guide on how to set up jupyterhub in your environment.