Currently to use Hadoop with GlusterFS, the Hadoop Map/Reduce daemons viz. TaskTracker and JobTracker needs to run as super-user. This is needed to be able to mount/unmount GlusterFS volume and access/modify data in it. On the contrary using Hadoop with HDFS had no such limitation. The daemons can run as any user and have full permission of the FS.
The solution to the above case is solved by using Mounbroker. A detailed explanation of it's working in mentioned here https://gist.github.com/71ff8faa041425662185
In short, it allows an unprivileged process to own a GlusterFS mount. This is done by registering a label (and DSL options) with glusterd (via glusterd volfile). Then a mount
request can be sent to glusterd from cli to get an alias (symlink) of the mounted volume. This alias is then sent as to umount
the volume.
Mountbroker has pre-defines DSL for geo-replication. Pre-defining DSL for Hadoop specific options would be a good option.
Config DSL for Hadoop:
"SUP("
"volfile-server=%s "
"volfile-id=%s "
"user-map-root=%s "
)"
"SUB+("
"log-file="DEFAULT_LOG_FILE_DIRECTORY"/"GHADOOP"*/* "
"log-level=* "
")"
glusterd options:
option mountbroker-root <path>
option mountbroker-glusterfs-hadoop.foo <volume>:<user>:<volfile-server> # excluding <volfile-server> will use localhost as --volfile-server arg
This would require some additions in this patch http://review.gluster.com/#change,128
Once the above configurations are done, the user mounts the volume using
gluster>system:: mount foo user-map-root=<user> volfile-id=<volume> volfile-server=<volfile-server>
/mnt/mbr/mb_hive/mntUKSQlK
This mount alias goes in the Hadoop configuration file where the plugin does all I/O.