science-gateways.md 13 KB
Newer Older
1
2
3
4
# Science Gateways

## About Science Gateways

Brandon's avatar
Brandon committed
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
A science gateway is a web-based interface to access HPC computers and
storage systems.  Gateways allow science teams to access data, perform
shared computations, and generally interact with NERSC resources over
the web. Common gateway goals are

* to improve ease of use in HPC so that more scientists can benefit
  from NERSC resources
* to create collaborative workspaces around data and computing for
  science teams that use NERSC
* to make your data accessible and useful to the broader scientific
  community.

NERSC encourages its users to create their own science gateways by
using the resources described on this page. The center engages with
science teams interested in using web services, assists with
deployment, accepts feedback, and tries to recycle successful
approaches into methods that other science teams can benefit
from. Below you will find links to current projects and details about
the building blocks available to NERSC users. If you would like to
Brandon's avatar
Brandon committed
24
25
participate, or if you have questions, please open a ticket with
[NERSC Consulting](https://help.nersc.gov).
26
27
28

## Science Gateway Availability and Support

Brandon's avatar
Brandon committed
29
30
31
32
33
Developers of science gateway applications hosted at NERSC should be
aware that if their gateways critically depend on NERSC infrastructure
then their gateways will inherit availability from NERSC's underlying
infrastructure to some degree.  Some examples:

Lisa Gerhardt's avatar
Lisa Gerhardt committed
34
* If the Community file system is out of service for multiple days and
Brandon's avatar
Brandon committed
35
  a science gateway uses scripts, HTML templates, or other web content
Lisa Gerhardt's avatar
Lisa Gerhardt committed
36
37
  stored on the Community file system then the site will not work for
  the same period of time as the Community outage.
Brandon's avatar
Brandon committed
38
* In contrast, applications that only depend on data files stored
Lisa Gerhardt's avatar
Lisa Gerhardt committed
39
  (e.g. on Community) will not have the functionality the data files
Brandon's avatar
Brandon committed
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
  make possible.  In such cases it is up to maintainers of gateways to
  inform their users of any degradation in service.
* Applications that submit jobs to one of the supercomputer system
  queues via NEWT (see below) may be unable to support that
  functionality for the duration of the a system outage; however
  proper use of the API can mean a web site functions just with
  decreased functionality.  It is up to gateway maintainers to handle
  graceful failures in such cases.

Science gateway application developers should keep in mind that
NERSC's goal is to make sharing scientific data and high-performance
computing resources over the web practical, but NERSC is not a web
hosting service.  Developers and users should not anticipate
availability approaching what is available in commercial offerings.
The NEWT API provides one avenue for users who require >99% uptime for
a web presence that exposes NERSC resources like data or job
submission.  This offers a clean separation between the web
application and NERSC infrastructure that can be managed by
developers.

If a science gateway or website does not clearly depend on NERSC
resources or data, we encourage users to pursue other hosting
solutions.  For example, [Google sites](https://sites.google.com) is
an excellent alternative for users seeking to establish merely a web
presence for their project.  Such simple websites are not within the
scope of science gateways at NERSC, and we do not provide support to
users attempting to set them up.

The service level for NERSC science gateway support is formally 8x5
(business hours).  Outside of those hours the NERSC Data and Analytics
Services staff provide support on a best effort basis.
71
72
73

## Gateway Technologies

Brandon's avatar
Brandon committed
74
75
76
NERSC provides science teams with the building blocks to create their
own science gateways and web interfaces into NERSC. Many of these
interfaces are built on web and database technologies.
77
78
79

### Web Methods for Data

Brandon's avatar
Brandon committed
80
81
82
83
Science gateways can be configured to provide public unauthenticated
access to data sets and services as well as authenticated access, if
needed. The following features are available to projects that wish to
enable gateway access to their data through the web. Other features
Lisa Gerhardt's avatar
Lisa Gerhardt committed
84
can be made available on request. Direct access to the Community
85
file system and HPSS tape archives are described in the table below.
Brandon's avatar
Brandon committed
86
87
88

There are two science gateway nodes that are available to all NERSC
users. These are portal and portal-auth. They function very similarly
Rebecca Hartman-Baker's avatar
Rebecca Hartman-Baker committed
89
but the former is for port 80 unauthenticated traffic and the latter
Brandon's avatar
Brandon committed
90
91
92
93
94
95
96
97
98
99
100
101
is for https. These two gateway nodes are available for users to do
general development on. Service level agreements are possible along
with dedicated resources for projects that wish to build robustly
monitored web services.

NERSC encourages its users to create their own science gateways by
using the resources described on this page. The center engages with
science teams interested in using web services, assists with
deployment, accepts feedback, and tries to recycle successful
approaches into methods that other science teams can benefit
from. Below you will find links to current projects and details about
the building blocks available to NERSC users. If you would like to
Brandon's avatar
Brandon committed
102
103
participate, or if you have questions, please open a ticket with
[NERSC Consulting](https://help.nersc.gov)
104
105
106

NERSC Resource | Path On NERSC Resource | URL on the Web
--- | --- | ---
Lisa Gerhardt's avatar
Lisa Gerhardt committed
107
Community | /global/cfs/cdirs/myproj/www | https://portal.nersc.gov/cfs/myproj/
108
DNA file system | /global/dna/projectdirs/myproj/mysubproj/www | https://portal.nersc.gov/dna/myproj/mysubproj/
Brandon's avatar
Brandon committed
109
110
HPSS archive (home) | /home/m/myuser/www | https://portal.nersc.gov/archive/home/m/myuser/www/
HPSS archive (project) | /home/projects/myproj/www | https://portal.nersc.gov/archive/projects/myproj/www/
111
112
113

### Web Methods for Computing

Brandon's avatar
Brandon committed
114
115
116
117
118
119
120
Science gateways can use a REST-based web API
([NEWT](https://newt.nersc.gov/)) to access the NERSC center,
including authentication, file management, job submission and
accounting interfaces. These interfaces allow you to run large or
small jobs on NERSC machines through the web. The NEWT demos show how
to submit a parallel batch job through a simple HTML form. Other
programming language and web-toolkit-level building blocks include
121

Brandon's avatar
Brandon committed
122
123
* Full-featured back-end programming environments in the language of
  your choice (PHP or Python recommended).
124
125
* Support for LDAP and Shibboleth authentication.
* Conduits to PostGRESQL/MySQL/NoSQL Databases.
Brandon's avatar
Brandon committed
126
127
* Modern Web 2.0 interfaces with AJAX front-ends such as Google maps
  and visualization kits.
128
* OpenDAP access to large data sets (netCDF and HDF5)
129
* Access to NERSC file systems and HPSS through the NEWT API, grid
Brandon's avatar
Brandon committed
130
  tools, or other custom interfaces
131
132
133

### Database Methods

Brandon's avatar
Brandon committed
134
135
136
137
138
Science gateways can also access data from NERSC's science database
nodes. These are specially configured nodes which support MySQL,
Postgres, and MongoDB for high-performance access. More detail on the
science gateway database services is provided on
the
139
[Databases page](../databases). Some
Brandon's avatar
Brandon committed
140
141
142
143
144
145
146
147
examples of database methods used by gateways are

* Access file catalogs and other persistently stored collections from
  your batch jobs
* Connect a web-based gateway to datasets stored in a database (read
  and read-write)
* Store, search, and analyze data objects (e.g., job output) through
  map/reduce-like MongoDB methods
148
149
* Expose public read-only data collections through database protocols

150
151
For more information on databases for user science data, please submit a
question or request via the [science database request
Helen He's avatar
Helen He committed
152
form](https://nersc.servicenowservices.com/com.glideapp.servicecatalog_cat_item_view.do?v=1&sysparm_id=ff78364bdbdb3200b259fb0e0f9619b9&sysparm_link_parent=e15706fc0a0a0aa7007fc21e1ab70c2f&sysparm_catalog=e0d08b13c3330100c8b837659bba8fb4&sysparm_catalog_view=catalog_default).
153
154
155

## Science Gateways in Production

Brandon's avatar
Brandon committed
156
157
Science gateways that have moved from development to providing
services to broader communities are listed on
Brandon's avatar
Brandon committed
158
the [Science Gateways index page](https://portal.nersc.gov/).
159

Brandon's avatar
Brandon committed
160
161
Nagios monitoring and service level checks of gateway functions are
available.
162
163
164

## Getting Started

Lisa Gerhardt's avatar
Lisa Gerhardt committed
165
166
167
168
169
170
171
A [Community directory](../filesystems/community.md) is a good place
to host a science gateway. Both Community and HPSS allow users to
create a special web directory. You can publish data through a
publicly accessible URL by simply making an appropriate subdirectory
called "www". The procedure differs slightly depending on which file
system you choose, as detailed below. You can also use the NEWT API to
make web applications that use NERSC resources.
172

173
### How to publish your data on NGF to the web:
174
175
176

    ssh portal-auth.nersc.gov

Brandon's avatar
Brandon committed
177
In the above example, you can replace portal-auth with any other NERSC
Lisa Gerhardt's avatar
Lisa Gerhardt committed
178
179
compute platform that has access to Community. Create a www directory
in your Community directory:
180

Lisa Gerhardt's avatar
Lisa Gerhardt committed
181
    mkdir /global/cfs/cdirs/yourproject/www
182

Lisa Gerhardt's avatar
Lisa Gerhardt committed
183
Make sure your Community directory and the www directory are world
Brandon's avatar
Brandon committed
184
185
executable and that the www directory is also world readable. If not,
the owner of each of them will need to change its permissions:
186

Lisa Gerhardt's avatar
Lisa Gerhardt committed
187
188
    chmod 751 /global/cfs/cdirs/yourproject/
    chmod 755 /global/cfs/cdirs/yourproject/www
189

Brandon's avatar
Brandon committed
190
191
192
Copy your data to this www directory. Any public data will need to be
world readable. Add PHP and HTML files to this directory to build
custom gateway interfaces to the data. Any data under
Lisa Gerhardt's avatar
Lisa Gerhardt committed
193
194
`/global/cfs/cdirs/yourproject/www` will be publicly
accessible through `https://portal.nersc.gov/cfs/yourproject/`.
195

196
### How to publish data in HPSS to the web:
197

Brandon's avatar
Brandon committed
198
199
200
201
202
You can also publish data in the archive HPSS system directly to a
public URL on the web. Note that this is not intended to be a
high-performance interface; it is just a quick way to make data
publicly available.

203
204
Generally we recommend that users share data from the Community File
System when creating a science gateway. Sharing data from the HPSS
Brandon's avatar
Brandon committed
205
206
tape archives via a science gateway should only be reserved for
infrequent accesses from a data pool that is too large to be
207
practically kept on the Community File System. If you need to serve
Brandon's avatar
Brandon committed
208
very large files very frequently via a science gateway, please
Brandon's avatar
Brandon committed
209
contact [NERSC Consulting](https://help.nersc.gov) for assistance.
Brandon's avatar
Brandon committed
210
211
212
213
214
215
216
217
218

Retrieving data from HPSS via a science gateway can be very slow. If
files have not been accessed in some time they will have to be
retrieved from tape. If you are accessing multiple files, multiple
tapes may need to be read and special care will need to be taken to
retrieve the data files in the most optimal way. Finally, the number
of concurrent connections per IP address is limited to two. All of
these factors can combine for long delays in file retrieval from HPSS
via a NERSC web portal.
219
220
221
222
223
224
225
226
227

Login to archive via hsi:

    hsi -h archive.nersc.gov

Create a www directory

    mkdir /home/projects/DIRNAME/www

Brandon's avatar
Brandon committed
228
229
230
Make sure the parent directory and the www directory are world
executable and that the www directory is also world readable. If not,
the owner of each of them will need to change its permissions:
231
232
233
234

    chmod 751 /home/projects/DIRNAME
    chmod 755 /home/projects/DIRNAME/www

Brandon's avatar
Brandon committed
235
236
The data in the www directory will now be available at a URL of the
form
Brandon's avatar
Brandon committed
237
https://portal.nersc.gov/archive/home/projects/DIRNAME/www/{FILE|DIR}
Brandon's avatar
Brandon committed
238
239
240
241
242
243
where DIRNAME is the project directory and FILE|DIR is the name of a
file.

Files will be downloaded directly, while directories will give you a
listing. Note that all files and directories in the path must be world
readable.
244

Brandon's avatar
Brandon committed
245
Here is an example:
Brandon's avatar
Brandon committed
246
https://portal.nersc.gov/archive/home/projects/incite11/www/1935
247

248
249
250
251
252
253
254
255
256
For a home directory in HPSS, the permissions should be as follows 
(where i is the initial of your home directory's name):

    chmod 751 /home/i/HOMEDIRNAME
    chmod 755 /home/i/HOMEDIRNAME/www
    
The data in www should then be available at a URL of the form
https://portal.nersc.gov/archive/home/HOMEDIRNAME/www/{FILE|DIR}.

Brandon's avatar
Brandon committed
257
258
259
!!! note
    The time to download files from tape may take some time to
    start as the tape robot finds and mounts the correct tape.
260

261
### How to get started with NEWT
262

Brandon's avatar
Brandon committed
263
264
265
266
267
To build more sophisticated web apps, we recommend using
the [NEWT API](https://newt.nersc.gov/), which allows you to build
rich, interactive JavaScript applications that can communicate
directly with NERSC HPC resources via a RESTful Web API. This includes
access to authentication, jobs, files, interactive commands, system
Brandon's avatar
Brandon committed
268
information, accounting information and object storage.
269

Brandon's avatar
Brandon committed
270
271
To get started, insert the following in your HTML files to give you
access to all NERSC compute and data resources through NEWT:
272

Brandon's avatar
Brandon committed
273
274
    <script src="[https://newt.nersc.gov/js/jquery-1.7.2.js](https://newt.nersc.gov/js/jquery-1.7.2.js)" />
    <script src="[https://newt.nersc.gov/js/newt.js](https://newt.nersc.gov/js/newt.js)" />
275

Brandon's avatar
Brandon committed
276
277
278
279
280
281
282
Follow the "Hello World" example
at [https://newt.nersc.gov/](https://newt.nersc.gov/), or work through
some of the fuller examples
at
[https://newt.nersc.gov/examples/](https://newt.nersc.gov/examples/)
to get a feel for how NEWT works. The complete NEWT API docs can be
found at [https://newt.nersc.gov/api](https://newt.nersc.gov/api).
283
284
285

## Moving Beyond Simple Gateway Functions

Brandon's avatar
Brandon committed
286
If you are building a web gateway to your science at NERSC, please
Brandon's avatar
Brandon committed
287
288
contact us by opening a ticket
with [NERSC Consulting](https://help.nersc.gov). We are interested in
Rebecca Hartman-Baker's avatar
Rebecca Hartman-Baker committed
289
engaging directly with science teams so that you can build a gateway
Brandon's avatar
Brandon committed
290
that meets your specific needs.