Commit f8cc9c38 authored by Mat's avatar Mat
Browse files

Add support for file:// and shm://

Previous versions only used POSIX shared memory as data storage for
the arrays. This adds the option of using regular files instead of
shared memory. Regular files are cached in memory by the kernel and
therefore there shouldn't be too much performance difference. A key
benefit is data persistence across reboots.
parent 59d46b5d
Loading
Loading
Loading
Loading
+61 −49
Original line number Diff line number Diff line
SharedArray python/numpy extension
==================================
# SharedArray python/numpy extension

This is a simple python extension that lets you share numpy arrays
with other processes on the same computer.  It uses posix shared
memory internally and therefore should work on most operating systems.
with other processes on the same computer. It uses either shared files
or POSIX shared memory as data stores and therefore should work on
most operating systems.

Example
-------
## Example

Here's a simple example to give an idea of how it works. This example
does everything from a single python interpreter for the sake of
clarity, but the actual intentention is to share arrays between
clarity, but the real point is to share arrays between python
interpreters.

	import numpy as np
	import SharedArray as sa

	# Create an array in shared memory
	a = sa.create("test1", 10)
	a = sa.create("shm://test", 10)

	# Attach it as a different array. This can be done from another
	# python interpreter as long as it runs on the same computer.
	b = sa.attach("test1")
	b = sa.attach("shm://test")

	# See how they are actually sharing the same memory block
	a[0] = 42
@@ -31,55 +30,70 @@ interpreters.
	del a
	print(b[0])

	# See how "test1" is still present in shared memory even though we
	# See how "test" is still present in shared memory even though we
	# destroyed the array a.
	sa.list()

	# Now destroy the array "test1" from memory.
	sa.delete("test1")
	# Now destroy the array "test" from memory.
	sa.delete("test")

	# The array b is not affected, but once you destroy it then the
	# data are lost.
	print(b[0])

Functions
---------
## Functions

### `SharedArray.create(name, shape, dtype=float)`

This function creates an array in shared memory identified by `name`.
The `shape` and `dtype` arguments are the same as the numpy function
`numpy.zeros()`.  The returned array is initialized to zero.  The
shared memory block holding the content of the array will not be
deleted when this array is destroyed, either implicitly or explicitly
by calling `del`, it will simply be detached from the current process.
To delete a shared array use the `SharedArray.delete()` function.
This function creates an array identified by `name`, which can use the
`file://` prefix to indicate that the data backend will be a file, or
`shm://` to indicate that the data backend shall be a POSIX shared
memory object. For backward compatibility `shm://` is assumed when no
prefix is given. The `shape` and `dtype` arguments are the same as the
numpy function `numpy.zeros()` and the returned array is indeed
initialized to zero.

The contents of the array will not be deleted when this array is
destroyed, either implicitly or explicitly by calling `del`, it will
simply be detached from the current process.  To delete a shared array
and therefore reclaim system resources use the `SharedArray.delete()`
function.

### `SharedArray.attach(name)`

This function attaches an array previously created in shared memory
and identified by `name`.  The shared memory block holding the content
of the array will not be deleted when this array is destroyed, either
implicitly or explicitly by calling `del`, it will simply be detached
from the current process.  To delete a shared array use the
`SharedArray.delete()` function.
This function attaches the previously created array identified by
`name`, which can use the `file://` prefix to indicate that the array
is stored as a file, or `shm://` to indicate that the array is stored
as a POSIX shared memory object. For backward compatibility `shm://`
is assumed when no prefix is given

The contents of the array will not be deleted when this array is
destroyed, either implicitly or explicitly by calling `del`, it will
simply be detached from the current process.  To delete a shared array
and therefore reclaim system resources use the `SharedArray.delete()`
function.

### `SharedArray.delete(name)`

This function destroys an array previously created in shared memory
and identified by `name`.  After calling `delete`, the array will not
be attachable anymore, but existing attachments will remain valid
until they are themselves destroyed.
This function destroys the previously created array identified by
`name`, which can use the `file://` prefix to indicate that the array
is stored as a file, or `shm://` to indicate that the array is stored
as a POSIX shared memory object. For backward compatibility `shm://`
is assumed when no prefix is given

After calling `delete`, the array will not be attachable anymore, but
existing attachments will remain valid until they are themselves
destroyed.

### `SharedArray.list()`

This function returns a list of previously created shared arrays,
their name, data type and dimensions.  At the moment this function
only works on Linux because it accesses files exposed under
`/dev/shm`.  There doesn't seem to be a portable method of doing that.
This function returns a list of previously created arrays stored as
POSIX SHM objects, along with their name, data type and dimensions.
At the moment this function only works on Linux because it accesses
files exposed under `/dev/shm`.  There doesn't seem to be a portable
method of doing that.

Requirements
------------
## Requirements

* Python 2.7 or 3+
* Numpy 1.8
@@ -89,23 +103,22 @@ SharedArray uses the posix shm interface (`shm_open` and `shm_unlink`)
and so should work on most operating systems that follow the posix
standards (Linux, *BSD, etc.).

FAQ
---
## FAQ

### On Linux, I get segfaults when working with very large arrays.

A few people have reported segfaults with very large arrays. To my
great relief I eventually found out that this is not a bug in
SharedArray but rather an indication that the system ran out of POSIX
shared memory. On Linux a `tmpfs` virtual filesystem is used to
provide POSIX shared memory, and by default it is given only about 20%
of the total available memory. That amount can be changed by
re-mounting the `tmpfs` filesystem with the `size` option:
A few people have reported segfaults with very large arrays using
POSIX shared memory. This is not a bug in SharedArray but rather an
indication that the system ran out of POSIX shared memory. On Linux a
`tmpfs` virtual filesystem is used to provide POSIX shared memory, and
by default it is given only about 20% of the total available memory,
depending on the distribution. That amount can be changed by
re-mounting the `tmpfs` filesystem with the `size=100%` option:

	sudo mount -o remount,size=100% /run/shm

Also you can make the change permanent, on next boot, by setting
`SHM_SIZE=100%` in `/etc/defaults/tmpfs` on recent Debian/Devuan
`SHM_SIZE=100%` in `/etc/defaults/tmpfs` on recent Debian
installations.

### I can't attach old (pre 0.4) arrays anymore.
@@ -116,8 +129,7 @@ arrays created with a previous version of SharedArray aren't
compatible with the new version (the location of the metadata
changed). Save your work before upgrading.

Installation
------------
## Installation

The extension uses the `distutils` python package that should be
familiar to most python users. To test the extension directly from the
+8 −0
Original line number Diff line number Diff line
@@ -19,6 +19,10 @@
#ifndef __SHARED_ARRAY_H__
#define __SHARED_ARRAY_H__

#define NPY_NO_DEPRECATED_API	NPY_1_8_API_VERSION
#define PY_ARRAY_UNIQUE_SYMBOL	SHARED_ARRAY_ARRAY_API
#define NO_IMPORT_ARRAY

#include <Python.h>
#include <structseq.h>
#include <numpy/arrayobject.h>
@@ -57,4 +61,8 @@ extern PyObject *shared_array_attach(PyObject *self, PyObject *args);
extern PyObject *shared_array_delete(PyObject *self, PyObject *args);
extern PyObject *shared_array_list(PyObject *self, PyObject *args);

/* Support functions */
extern int open_file(const char *name, int flags, mode_t mode);
extern int unlink_file(const char *name);

#endif /* !__SHARED_ARRAY_H__ */
+2 −2
Original line number Diff line number Diff line
@@ -40,8 +40,8 @@ static PyObject *do_attach(const char *name)
	PyObject *ret;
	PyLeonObject *leon;

	/* Open the shm block */
	if ((fd = shm_open(name, O_RDWR, 0)) < 0)
	/* Open the file */
	if ((fd = open_file(name, O_RDWR, 0)) < 0)
		return PyErr_SetFromErrnoWithFilename(PyExc_OSError, name);

	/* Seek to the meta data location */
+2 −2
Original line number Diff line number Diff line
@@ -56,8 +56,8 @@ static PyObject *do_create(const char *name, int ndims, npy_intp *dims, PyArray_
	/* Calculate the size of the mmap'd area */
	map_size = size + sizeof (*meta);

	/* Create the shm block */
	if ((fd = shm_open(name, O_RDWR | O_CREAT | O_EXCL, 0666)) < 0)
	/* Create the file */
	if ((fd = open_file(name, O_RDWR | O_CREAT | O_EXCL, 0666)) < 0)
		return PyErr_SetFromErrnoWithFilename(PyExc_OSError, name);

	/* Grow the file */
+4 −4
Original line number Diff line number Diff line
@@ -36,8 +36,8 @@ static PyObject *do_delete(const char *name)
	int fd;
	int size;

	/* Open the shm block */
	if ((fd = shm_open(name, O_RDWR, 0)) < 0)
	/* Open the file */
	if ((fd = open_file(name, O_RDWR, 0)) < 0)
		return PyErr_SetFromErrnoWithFilename(PyExc_OSError, name);

	/* Seek to the meta data location */
@@ -66,8 +66,8 @@ static PyObject *do_delete(const char *name)
		return NULL;
	}

	/* Unlink the shm block */
	if (shm_unlink(name) < 0)
	/* Unlink the file */
	if (unlink_file(name) < 0)
		return PyErr_SetFromErrnoWithFilename(PyExc_OSError, name);

	Py_RETURN_NONE;
Loading