Commit 4571164b authored by gerd's avatar gerd


git-svn-id:[email protected] 55289a75-7b90-4627-9e07-ffb4263930b2
parent 5db2ce8b
...@@ -133,7 +133,7 @@ BUG "Cannot sync as fast as configured" at namenode startup ...@@ -133,7 +133,7 @@ BUG "Cannot sync as fast as configured" at namenode startup
Plasma_client/other clients: configurable retry constant Plasma_client/other clients: configurable retry constant
rename directories rename directories - DONE
documentation for release: documentation for release:
- quickstart - quickstart
...@@ -10,14 +10,17 @@ LOAD = \ ...@@ -10,14 +10,17 @@ LOAD = \
-load $(SRC)/pfs_nfs3/pfs_nfs3.idoc \ -load $(SRC)/pfs_nfs3/pfs_nfs3.idoc \
-load $(SRC)/mr_framework/mapred.idoc -load $(SRC)/mr_framework/mapred.idoc
DOCS = plasmafs_start.txt \ DOCS = plasma_release.txt \
plasmafs_start.txt \
plasmafs_deployment.txt \ plasmafs_deployment.txt \
plasmafs_protocol.txt \ plasmafs_protocol.txt \
commands/cmd_plasma.txt \ commands/cmd_plasma.txt \
commands/cmd_plasmad.txt \ commands/cmd_plasmad.txt \
commands/cmd_plasma_datanode_init.txt \ commands/cmd_plasma_datanode_init.txt \
commands/cmd_plasma_admin.txt \ commands/cmd_plasma_admin.txt \
commands/cmd_nfs3d.txt commands/cmd_nfs3d.txt \
plasmamr_start.txt \
IPC = ../ipc IPC = ../ipc
X = $(IPC)/pfs_types.x \ X = $(IPC)/pfs_types.x \
...@@ -3,6 +3,9 @@ Plasma known to interested developers. This release contains: ...@@ -3,6 +3,9 @@ Plasma known to interested developers. This release contains:
- {{:#l_pfs_main} PlasmaFS: filesystem} - {{:#l_pfs_main} PlasmaFS: filesystem}
- {{:#l_pmr_main} PlasmaMapReduce: compute framework} - {{:#l_pmr_main} PlasmaMapReduce: compute framework}
General documents:
- {!Plasma_release}: Release notes
{1:l_pfs_main PlasmaFS Documentation} {1:l_pfs_main PlasmaFS Documentation}
PlasmaFS is the distributed transactional filesystem. It is implemented PlasmaFS is the distributed transactional filesystem. It is implemented
...@@ -123,6 +126,13 @@ the server): ...@@ -123,6 +126,13 @@ the server):
{1:l_pmr_main Plasma MapReduce} {1:l_pmr_main Plasma MapReduce}
Plasma MapReduce is the compute framework for Map/Reduce-style data
transformations. It uses PlasmaFS for storing the data:
- {!Plasmamr_start}: What is Map/Reduce?
- {!Plasmamr_howto}: How to run a Map/Reduce job
{2 Plasma MapReduce Interfaces} {2 Plasma MapReduce Interfaces}
{3 [mr_framework]: Framework} {3 [mr_framework]: Framework}
{1 Release Notes For Plasma}
{b This is version:} 0.1 "vorfreude". This is an alpha release to make
Plasma known to interested developers.
{2 What is working and not working in PlasmaFS}
Generally, PlasmaFS works as described in the documentation. Crashes
have not been observed for quite some time now, but occasionally one
might see critical exceptions in the log file.
PlasmaFS has so far only been tested on 64 bit, and only on Linux
as operation system. There are known issues for 32 bit machines,
especially the blocksize must not be larger than 4M.
Data safety: Cannot be guaranteed. It is not suggested to put valuable
data into PlasmaFS.
Known problems:
- It is still unclear whether the timeout settings are acceptable.
- There might be name clashes for generated file names. Right now it is
assumed that the random number generator returns unique names, but this
is for sure not the case.
- The generated inode numbers are not necessarily unique after namenode
- The transaction semantics is not fully clear. Some operations use
"read committed", some "repeatable read".
Not implemented features:
- The namenodes cannot yet detect crashed datanodes. Datanodes are always
reported as alive.
- The ticket system is not fully implemented (support for "read").
- There is no authorization system (file access rights are ignored)
- There is no authentication system to secure filesystem accesses (such
as Kerberos)
- There are too many hard-coded constants.
- The [names] SQL table should also store the parent inode number.
- The file name read/lookup functions should never return [ECONFLICT]
- Write support for NFS
- Translation of access rights to NFS
- Support for checksums
- Support for "host groups", so that it is easier to control which machines
may store which blocks. Semantics have to be specified yet.
- Better block allocation algorithms. The current algorithm works only well
when many blocks are allocated at once. It is very bad when a file is
extended block by block.
- Define how blocks are handled that are allocated but never written.
- Support for resolving symbolic links
- Recognition of the death of the coordinator, and restart of the
election algorithm.
- Multicast discovery of datanodes
- Lock manager (avoid that clients have to busy wait on locks)
- Restoration of missing replicas
- Rebalancing of the cluster
- Automated copying of the namenode database to freshly added namenode slaves
{2 What is working and not working in Plasma MapReduce}
Known problems:
- Management of available RAM is not yet sufficient
Not implemented features:
- Task servers should be able to provide several kinds of jobs
- Think about dynamically extensible task servers
- Run jobs only defining [map] but no [reduce].
- Streaming mode as in Hadoop
- Support for combining (an additional fold function run after each
shuffle task to reduce the amount of data)
- nice web interface
- per-task log files (stored in PlasmaFS)
- support user counters as in Hadoop
- restart/relocation of failed tasks
- recompute intermediate files that are no longer accessible due to node
- Support chunk sizes larger than the block size
- Speculative execution of tasks
- Support job management (remember which jobs have been run etc.)
- Support to execute several jobs at once
What we will never implement:
- Jobs only consisting of [reduce] but no [map] cannot be supported
due to the task scheme. (Reason: Input files for sort tasks must
not exceed [sort_limit].)
This diff is collapsed.
This diff is collapsed.
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment