Authorization in Hadoop using Apache Ranger

Published by

on

Summary

Over the past couple of years, Apache Hadoop has made great progress in the area of security. Security for any computing system is divided in two categories:

  1. Authentication is the process of ascertaining that somebody really is who he claims to be. In Hadoop, this is achieved via Kerberization. This post will not cover any details about this as there’s ample material already available online.
  2. Authorization refers to rules that determine who is allowed to do what. E.g. Manoj may be authorized to create and delete databases, while Terence is only authorized to read the tables.

For folks who are coming from a traditional database world, authorization is a well understood concept. It involves creation of roles, grants at various levels on variety of objects. Apache Hadoop community is trying to emulate some of those concepts in the recently incubated Apache Ranger project. We recently implemented this in one of our customer sites and talk about that experience here.

What does Ranger do?

Ranger provides a centralized way to manage security across various components in a Hadoop cluster. Currently, it’s capable of providing authorization as well as auditing for HDFS, Hive, HBase, Knox and Storm services.  At the core, Ranger is  a centralized web application, which consists of the policy administration, audit and reporting modules. Authorized users will be able to manage their security policies using the web tool or using REST APIs. These security policies are enforced within Hadoop ecosystem using lightweight Ranger Java plugins, which run as part of the same process as the Namenode (HDFS), Hive2Server(Hive), HBase server (Hbase), Nimbus server (Storm) and Knox server (Knox). Thus there is no additional OS level process to manage. In addition, this means there’s no single point of failure (for example, if the web application or the policy database goes down, security is not compromised. It just disables security administrator from pushing new policies).

Components of Ranger

There are three main components of Ranger:

  1. Portal/Policy Manager is central UI for security administration. Users can create and update policies, which are then stored in a policy database. Plugins within each component poll these policies at regular intervals.
  2. Plugins are lightweight Java programs which embed within processes of each cluster component. For example, plugin for HDFS runs as part of the namenode process. The plugins pull policies from the policy database at regular intervals (configurable) and store them locally (in a file). Whenever a request is made for a resource, the plugins intercept the request and evaluate against the security policy in effect. Plugins also collect data for all the requests and send it back to the audit server via separate thread.
  3. User/Group Sync is a utility provided to enable synchronization of users and groups from OS/LDAP/AD. This information is used while defining policies (and we’ll shortly see an example).

Setup

Ranger Admin

Ranger software is already included in HDP 2.2 repos. Find the ranger policy admin software (assuming you have setup yum repos correctly):

yum search ranger
====================================================== N/S Matched: ranger =========================================
ranger_2_2_0_0_1947-admin.x86_64 : Web Interface for Ranger
ranger_2_2_0_0_1947-debuginfo.x86_64 : Debug information for package ranger_2_2_0_0_1947
ranger_2_2_0_0_1947-hbase-plugin.x86_64 : ranger plugin for hbase
ranger_2_2_0_0_1947-hdfs-plugin.x86_64 : ranger plugin for hdfs
ranger_2_2_0_0_1947-hive-plugin.x86_64 : ranger plugin for hive
ranger_2_2_0_0_1947-knox-plugin.x86_64 : ranger plugin for knox
ranger_2_2_0_0_1947-storm-plugin.x86_64 : ranger plugin for storm
ranger_2_2_0_0_1947-usersync.x86_64 : Synchronize User/Group information from Corporate LD/AD or Unix

Install the admin module

yum install ranger_2_2_0_0_1947-admin

In the installation directory (/usr/hdp/current/ranger-admin/) edit install.properties file:

SQL_COMMAND_INVOKER=mysql
SQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar

# DB password for the DB admin user-id
db_root_user=root
db_root_password=<password>
db_host=<database host>

#
# DB UserId used for the XASecure schema
#
db_name=ranger
db_user=rangeradmin
db_password=<password>

# DB UserId for storing auditlog infromation
#
# * audit_db can be same as the XASecure schema db
# * audit_db must exists in the same ${db_host} as xaserver database ${db_name}
# * audit_user must be a different user than db_user (as audit user has access to only
audit tables)
#
audit_db_name=ranger_audit
audit_db_user=rangerlogger
audit_db_password=<password>

#
# ------- PolicyManager CONFIG ----------------
#

policymgr_external_url=http://<portal host>:6080
policymgr_http_enabled=true

#
# ------- UNIX User CONFIG ----------------
#
unix_user=ranger
unix_group=ranger

#
# ** The installation of xasecure-unix-ugsync package can be installed after the
policymanager installation is finished.
#
#LDAP|ACTIVE_DIRECTORY|UNIX|NONE
authentication_method=NONE

Run the setup as root:

export JAVA_HOME=<path of installed jdk version folder>
/usr/hdp/current/ranger-admin/setup.sh
service ranger-admin start

UserSync

Install the module

yum install ranger_2_2_0_0_1947-usersync

In the Ranger UserSync installation directory (/usr/hdp/current/ranger-usersync), update install.properties file as appropriate for HDP 2.2:

POLICY_MGR_URL = http://<portal host>:6080

# sync source,  only unix and ldap are supported at present
# defaults to unix
SYNC_SOURCE =  ldap

# sync interval in minutes
# user, groups would be synced again at the end of each sync interval
# defaults to 5   if SYNC_SOURCE is unix
# defaults to 360 if SYNC_SOURCE is ldap
SYNC_INTERVAL=1

#User and group for the usersync process
unix_user=ranger
unix_group=ranger

# URL of source ldap
# a sample value would be:  ldap://ldap.example.com:389
# Must specify a value if SYNC_SOURCE is ldap
SYNC_LDAP_URL = ldap://<ldap host>:389

# ldap bind dn used to connect to ldap and query for users and groups
# a sample value would be cn=admin,ou=users,dc=hadoop,dc=apache,dc-org
# Must specify a value if SYNC_SOURCE is ldap
SYNC_LDAP_BIND_DN = <bind username>

# ldap bind password for the bind dn specified above
# please ensure read access to this file  is limited to root, to protect the password
# Must specify a value if SYNC_SOURCE is ldap
# unless anonymous search is allowed by the directory on users and group
SYNC_LDAP_BIND_PASSWORD = <password>
CRED_KEYSTORE_FILENAME=/usr/lib/xausersync/.jceks/xausersync.jceks

# search base for users
# sample value would be ou=users,dc=hadoop,dc=apache,dc=org
SYNC_LDAP_USER_SEARCH_BASE = <Value depends upon your LDAP setup>

# search scope for the users, only base, one and sub are supported values 
# please customize the value to suit your deployment 
# default value: sub SYNC_LDAP_USER_SEARCH_SCOPE = sub 
# objectclass to identify user entries 
# please customize the value to suit your deployment 
# default value: person SYNC_LDAP_USER_OBJECT_CLASS = person 
# optional additional filter constraining the users selected for syncing 
# a sample value would be (dept=eng) 
# please customize the value to suit your deployment 
# default value is empty
SYNC_LDAP_USER_SEARCH_FILTER = <Value depends upon your LDAP setup>

# attribute from user entry that would be treated as user name
# please customize the value to suit your deployment
# default value: cn
SYNC_LDAP_USER_NAME_ATTRIBUTE=sAMAccountName

# attribute from user entry whose values would be treated as
# group values to be pushed into Policy Manager database
# You could provide multiple attribute names separated by comma
# default value: memberof, ismemberof
SYNC_LDAP_USER_GROUP_NAME_ATTRIBUTE=memberOf

#
# UserSync - Case Conversion Flags
# possible values:  none, lower, upper
SYNC_LDAP_GROUPNAME_CASE_CONVERSION=lower

NOTE: Customize SYNC_LDAP_USER_SEARCH_FILTER parameter to suit your needs.

Run the setup:

export JAVA_HOME=<path of installed jdk version folder>
./usr/hdp/current/ranger-usersync/setup.sh

service ranger-usersync start

Verify by visiting ranger portal and clicking Users/Groups tab. You should see all LDAP users. Furthermore, you may add LDAP/AD user/group and it should show up in the portal within SYNC_INTERVAL.

PlugIns

We will go over one of the plugins. Similar setup should be followed for all interested plugins.

HDFS

On NameNode (in case of HA NameNode setup, on all the namenodes), install the plugin.

yum install ranger_2_2_0_0_1947-hdfs-plugin

In the plugin installation directory (/usr/hdp/current/ranger-hdfs-plugin), edit install.properties.

POLICY_MGR_URL=http://<portal host>:6080
SQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar
#
# Example:
# REPOSITORY_NAME=hadoopdev
#
REPOSITORY_NAME=<This is the repo that'll be looked up when plugin is loaded>
XAAUDIT.DB.IS_ENABLED=true
XAAUDIT.DB.FLAVOUR=MYSQL
XAAUDIT.DB.HOSTNAME=<database host>
XAAUDIT.DB.DATABASE_NAME=ranger_audit
XAAUDIT.DB.USER_NAME=rangerlogger
XAAUDIT.DB.PASSWORD=<password>
XAAUDIT.HDFS.IS_ENABLED=true
XAAUDIT.HDFS.DESTINATION_DIRECTORY=hdfs://<NameNode>:8020/ranger/audit/%app-type%/%time:yyyyMMdd%
XAAUDIT.HDFS.LOCAL_BUFFER_DIRECTORY=/var/log/hadoop/%app-type%/audit
XAAUDIT.HDFS.LOCAL_ARCHIVE_DIRECTORY=/var/log/hadoop/%app-type%/audit/archive

Run the script to enable the plugin:

export JAVA_HOME=<path of installed jdk version folder>
/usr/hdp/current/ranger-hdfs-plugin/enable-hdfs-plugin.sh

Restart namenode(s) from Ambari or manually.

Test the setup

  • On ranger portal, click “Policy Manager”. Click “+” sign on HDFS tab and create a repository. Ensure name of this repository is EXACTLY same as the one you specified during installation.
  • Let’s take a test user “svemuri” and check his permissions on a test directory:
[svemuri@sfdmgctmn005 ~]$ hadoop fs -ls /user/mmurumkar
ls: Permission denied: user=svemuri, access=READ_EXECUTE,
inode="/user/mmurumkar":mmurumkar:sfdmgct_admin:drwxr-x---:user:mmurumkar:r--,user:ranger:---,user:rbolla:---,user:svemuri:---,group::r-x
  • Now, let’s create a policy called “TestPolicy”, that allows “svemuri” all the privileges on “/user/mmurumkar”

Policy

  • Now the earlier command should work:
[svemuri@sfdmgctmn005 ~]$ hadoop fs -ls /user/mmurumkar
Found 9 items
drwxr-xr-x   - mmurumkar sfdmgct_admin          0 2014-11-20 02:22 /user/mmurumkar/.hiveJars
drwxr-xr-x   - mmurumkar sfdmgct_admin          0 2014-11-18 20:00 /user/mmurumkar/test
drwxr-xr-x   - mmurumkar sfdmgct_admin          0 2014-11-18 20:01 /user/mmurumkar/test1
drwxr-xr-x   - mmurumkar sfdmgct_admin          0 2014-11-18 20:08 /user/mmurumkar/test2
drwxr-xr-x   - rbolla    sfdmgct_admin          0 2014-11-18 20:09 /user/mmurumkar/test3
drwxr-xr-x   - rbolla    sfdmgct_admin          0 2014-11-18 20:10 /user/mmurumkar/test4
drwxr-xr-x   - ranger    sfdmgct_admin          0 2014-11-18 20:18 /user/mmurumkar/test5
drwxr-xr-x   - mmurumkar sfdmgct_admin          0 2014-11-20 18:01 /user/mmurumkar/test7
drwxr-xr-x   - ranger    sfdmgct_admin          0 2014-11-19 14:21 /user/mmurumkar/test8
  •  Audit records will now show up in audit UI on the portal.
Audit

 Conclusion

Apache Ranger is starting to fill critical security needs in Hadoop environment, marking a big progress towards making Hadoop an enterprise data platform.