Explore Jupyter interface for Jetstream2
Problem / Opportunity Statement
Research institutions often face issues in providing consistent, scalable, and easy-to-access computing environments for their researchers. These issues include:
- Resource Allocation: Managing and giving computing resources like CPUs, GPUs, and memory to a large group of users with different needs.
- Environment Standardization: Making sure all researchers have the same software and tools to help with reproducibility and collaboration.
- Accessibility: Making it easy for researchers who do not have a lot of computer or IT knowledge to use computing tools.
- Scalability: Being able to handle different amounts of computing needs for different projects, and for different stages in the same project.
- Security and Compliance: Keeping code & data safe and following rules set by institutions and the government.
- Cost Efficiency: Lowering costs for hardware and software.
Use Cases for Deploying JupyterHubs
-
Collaborative Research Projects:
- Researchers from different institutions can work together on shared notebooks.
- Real-time sharing and editing of Jupyter notebooks help teamwork on complex projects.
-
Educational Programs and Workshops:
- Teachers can easily give out materials and assignments using Jupyter notebooks.
- Students can access a consistent computing environment from anywhere, making learning more flexible and accessible.
-
High-Performance Computing (HPC) Access:
- Linking with cloud-based HPC resources lets researchers expand their computations as needed.
- Makes it easier to do large-scale simulations and data analysis.
-
Reproducible Research:
- Researchers can share notebooks that show their research results, as well as the code and data that led to those results, making it easier to repeat scientific findings (especially when following best practices).
- Version control systems can be integrated to track changes and manage contributions.
-
Data-Intensive Research:
- Helps manage large data sets by using cloud storage and computing resources.
- Supports various data science and machine learning libraries and frameworks for advanced data analysis and modeling.
-
Customized Computational Environments:
- Researchers can make and manage custom environments for their specific needs without impacting others.
- Reduces the need to maintain individual setups and makes sure all users have the latest tools and libraries.
-
Grant and Research Compliance:
- Can make it easier for people to make sure their computational research follows rules about sharing code and data in an accessible way.
- Can help with the management of sensitive or proprietary data according to compliance standards.
Jupyter notebooks are very popular with (or at least very widely used by) researchers and data scientists. It is also possible to deploy other popular tools like Visual Studio Code and RStudio side-by-side with Jupyter notebooks.
By providing JupyterHubs Jetstream2 can [help meet the above needs and serve the above use cases for many research institutions, and a large segment (long tail) of researchers & data scientists. Such a platform could foster a more inclusive and effective research environment where advanced computing resources are more accessible and easier to manage. This will improve the productivity, collaboration, and innovative ability of the national research community.
Also see Jupyter's Institutional FAQ.
The vision is that as more users and more notebook environments are added to the JupyterHub, more instances (or containers, or whatever) are created to accommodate them. There is also a desire for customizability of the notebook environment, and probably a desire for efficient use of GPU slices.
Resolution
Making an ACCESS-compatible, Jetstream2 JupyterHub interface that researchers and educators could easily use and deploy for their work.
Next steps
Discover the functional requirements for this solution. In the spirit of #77: are there existing projects or automations to accomplish this with widespread community support and some maintainership?
Notes on Chameleon Cloud's implementation
Chameleon Cloud demonstrated their JupyterHub interface for the Jetstream2 team on a monthly NSF cloud call.
- Architecture doc
- Authentication workflow doc
- Python orchestration doc
- Slides about JupyterHub Integration
- User facing docs
- Chameleon SSO Paper
- Paper on JupyterHub Integration
- Chameleon with Trovi
- Jupyter_and_Trovi_Recipe.pdf
From Mike Sherman:
The current deployment is based off of the “zero-to-jupyterhub with kubernetes” project and helm charts. Our scripts and customizations (without secrets and such) are stored in this repo:
https://github.com/ChameleonCloud/jupyterhub-kubernetes
it does however currently point to the “trovi” and “keycloak” instances we operate, on my list to find out is how to “generalize” this, and what the plan is for integration in your side this will pull from and use our jupyterhub extension:
https://github.com/ChameleonCloud/jupyterhub-chameleon
and our customized “singleuser” container image