Summary
Over the past couple of years, Apache Hadoop has made great progress in the area of security. Security for any computing system is divided in two categories:
- Authentication is the process of ascertaining that somebody really is who he claims to be. In Hadoop, this is achieved via Kerberization. This post will not cover any details about this as there’s ample material already available online.
- Authorization refers to rules that determine who is allowed to do what. E.g. Manoj may be authorized to create and delete databases, while Terence is only authorized to read the tables.
For folks who are coming from a traditional database world, authorization is a well understood concept. It involves creation of roles, grants at various levels on variety of objects. Apache Hadoop community is trying to emulate some of those concepts in the recently incubated Apache Ranger project. We recently implemented this in one of our customer sites and talk about that experience here.
What does Ranger do?
Ranger provides a centralized way to manage security across various components in a Hadoop cluster. Currently, it’s capable of providing authorization as well as auditing for HDFS, Hive, HBase, Knox and Storm services. At the core, Ranger is a centralized web application, which consists of the policy administration, audit and reporting modules. Authorized users will be able to manage their security policies using the web tool or using REST APIs. These security policies are enforced within Hadoop ecosystem using lightweight Ranger Java plugins, which run as part of the same process as the Namenode (HDFS), Hive2Server(Hive), HBase server (Hbase), Nimbus server (Storm) and Knox server (Knox). Thus there is no additional OS level process to manage. In addition, this means there’s no single point of failure (for example, if the web application or the policy database goes down, security is not compromised. It just disables security administrator from pushing new policies).
Components of Ranger
There are three main components of Ranger:
- Portal/Policy Manager is central UI for security administration. Users can create and update policies, which are then stored in a policy database. Plugins within each component poll these policies at regular intervals.
- Plugins are lightweight Java programs which embed within processes of each cluster component. For example, plugin for HDFS runs as part of the namenode process. The plugins pull policies from the policy database at regular intervals (configurable) and store them locally (in a file). Whenever a request is made for a resource, the plugins intercept the request and evaluate against the security policy in effect. Plugins also collect data for all the requests and send it back to the audit server via separate thread.
- User/Group Sync is a utility provided to enable synchronization of users and groups from OS/LDAP/AD. This information is used while defining policies (and we’ll shortly see an example).
Setup
Ranger Admin
Ranger software is already included in HDP 2.2 repos. Find the ranger policy admin software (assuming you have setup yum repos correctly):
yum search ranger ====================================================== N/S Matched: ranger ========================================= ranger_2_2_0_0_1947-admin.x86_64 : Web Interface for Ranger ranger_2_2_0_0_1947-debuginfo.x86_64 : Debug information for package ranger_2_2_0_0_1947 ranger_2_2_0_0_1947-hbase-plugin.x86_64 : ranger plugin for hbase ranger_2_2_0_0_1947-hdfs-plugin.x86_64 : ranger plugin for hdfs ranger_2_2_0_0_1947-hive-plugin.x86_64 : ranger plugin for hive ranger_2_2_0_0_1947-knox-plugin.x86_64 : ranger plugin for knox ranger_2_2_0_0_1947-storm-plugin.x86_64 : ranger plugin for storm ranger_2_2_0_0_1947-usersync.x86_64 : Synchronize User/Group information from Corporate LD/AD or Unix
Install the admin module
yum install ranger_2_2_0_0_1947-admin
In the installation directory (/usr/hdp/current/ranger-admin/) edit install.properties file:
SQL_COMMAND_INVOKER=mysql SQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar # DB password for the DB admin user-id db_root_user=root db_root_password=<password> db_host=<database host> # # DB UserId used for the XASecure schema # db_name=ranger db_user=rangeradmin db_password=<password> # DB UserId for storing auditlog infromation # # * audit_db can be same as the XASecure schema db # * audit_db must exists in the same ${db_host} as xaserver database ${db_name} # * audit_user must be a different user than db_user (as audit user has access to only audit tables) # audit_db_name=ranger_audit audit_db_user=rangerlogger audit_db_password=<password> # # ------- PolicyManager CONFIG ---------------- # policymgr_external_url=http://<portal host>:6080 policymgr_http_enabled=true # # ------- UNIX User CONFIG ---------------- # unix_user=ranger unix_group=ranger # # ** The installation of xasecure-unix-ugsync package can be installed after the policymanager installation is finished. # #LDAP|ACTIVE_DIRECTORY|UNIX|NONE authentication_method=NONE
Run the setup as root:
export JAVA_HOME=<path of installed jdk version folder> /usr/hdp/current/ranger-admin/setup.sh service ranger-admin start
UserSync
Install the module
yum install ranger_2_2_0_0_1947-usersync
In the Ranger UserSync installation directory (/usr/hdp/current/ranger-usersync), update install.properties file as appropriate for HDP 2.2:
POLICY_MGR_URL = http://<portal host>:6080 # sync source, only unix and ldap are supported at present # defaults to unix SYNC_SOURCE = ldap # sync interval in minutes # user, groups would be synced again at the end of each sync interval # defaults to 5 if SYNC_SOURCE is unix # defaults to 360 if SYNC_SOURCE is ldap SYNC_INTERVAL=1 #User and group for the usersync process unix_user=ranger unix_group=ranger # URL of source ldap # a sample value would be: ldap://ldap.example.com:389 # Must specify a value if SYNC_SOURCE is ldap SYNC_LDAP_URL = ldap://<ldap host>:389 # ldap bind dn used to connect to ldap and query for users and groups # a sample value would be cn=admin,ou=users,dc=hadoop,dc=apache,dc-org # Must specify a value if SYNC_SOURCE is ldap SYNC_LDAP_BIND_DN = <bind username> # ldap bind password for the bind dn specified above # please ensure read access to this file is limited to root, to protect the password # Must specify a value if SYNC_SOURCE is ldap # unless anonymous search is allowed by the directory on users and group SYNC_LDAP_BIND_PASSWORD = <password> CRED_KEYSTORE_FILENAME=/usr/lib/xausersync/.jceks/xausersync.jceks # search base for users # sample value would be ou=users,dc=hadoop,dc=apache,dc=org SYNC_LDAP_USER_SEARCH_BASE = <Value depends upon your LDAP setup> # search scope for the users, only base, one and sub are supported values # please customize the value to suit your deployment # default value: sub SYNC_LDAP_USER_SEARCH_SCOPE = sub # objectclass to identify user entries # please customize the value to suit your deployment # default value: person SYNC_LDAP_USER_OBJECT_CLASS = person # optional additional filter constraining the users selected for syncing # a sample value would be (dept=eng) # please customize the value to suit your deployment # default value is empty SYNC_LDAP_USER_SEARCH_FILTER = <Value depends upon your LDAP setup> # attribute from user entry that would be treated as user name # please customize the value to suit your deployment # default value: cn SYNC_LDAP_USER_NAME_ATTRIBUTE=sAMAccountName # attribute from user entry whose values would be treated as # group values to be pushed into Policy Manager database # You could provide multiple attribute names separated by comma # default value: memberof, ismemberof SYNC_LDAP_USER_GROUP_NAME_ATTRIBUTE=memberOf # # UserSync - Case Conversion Flags # possible values: none, lower, upper SYNC_LDAP_GROUPNAME_CASE_CONVERSION=lower
NOTE: Customize SYNC_LDAP_USER_SEARCH_FILTER parameter to suit your needs.
Run the setup:
export JAVA_HOME=<path of installed jdk version folder> ./usr/hdp/current/ranger-usersync/setup.sh service ranger-usersync start
Verify by visiting ranger portal and clicking Users/Groups tab. You should see all LDAP users. Furthermore, you may add LDAP/AD user/group and it should show up in the portal within SYNC_INTERVAL.
PlugIns
We will go over one of the plugins. Similar setup should be followed for all interested plugins.
HDFS
On NameNode (in case of HA NameNode setup, on all the namenodes), install the plugin.
yum install ranger_2_2_0_0_1947-hdfs-plugin
In the plugin installation directory (/usr/hdp/current/ranger-hdfs-plugin), edit install.properties.
POLICY_MGR_URL=http://<portal host>:6080 SQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar # # Example: # REPOSITORY_NAME=hadoopdev # REPOSITORY_NAME=<This is the repo that'll be looked up when plugin is loaded> XAAUDIT.DB.IS_ENABLED=true XAAUDIT.DB.FLAVOUR=MYSQL XAAUDIT.DB.HOSTNAME=<database host> XAAUDIT.DB.DATABASE_NAME=ranger_audit XAAUDIT.DB.USER_NAME=rangerlogger XAAUDIT.DB.PASSWORD=<password> XAAUDIT.HDFS.IS_ENABLED=true XAAUDIT.HDFS.DESTINATION_DIRECTORY=hdfs://<NameNode>:8020/ranger/audit/%app-type%/%time:yyyyMMdd% XAAUDIT.HDFS.LOCAL_BUFFER_DIRECTORY=/var/log/hadoop/%app-type%/audit XAAUDIT.HDFS.LOCAL_ARCHIVE_DIRECTORY=/var/log/hadoop/%app-type%/audit/archive
Run the script to enable the plugin:
export JAVA_HOME=<path of installed jdk version folder> /usr/hdp/current/ranger-hdfs-plugin/enable-hdfs-plugin.sh
Restart namenode(s) from Ambari or manually.
Test the setup
- On ranger portal, click “Policy Manager”. Click “+” sign on HDFS tab and create a repository. Ensure name of this repository is EXACTLY same as the one you specified during installation.
- Let’s take a test user “svemuri” and check his permissions on a test directory:
[svemuri@sfdmgctmn005 ~]$ hadoop fs -ls /user/mmurumkar ls: Permission denied: user=svemuri, access=READ_EXECUTE, inode="/user/mmurumkar":mmurumkar:sfdmgct_admin:drwxr-x---:user:mmurumkar:r--,user:ranger:---,user:rbolla:---,user:svemuri:---,group::r-x
- Now, let’s create a policy called “TestPolicy”, that allows “svemuri” all the privileges on “/user/mmurumkar”
- Now the earlier command should work:
[svemuri@sfdmgctmn005 ~]$ hadoop fs -
ls
/user/mmurumkar Found 9 items drwxr-xr-x - mmurumkar sfdmgct_admin 0 2014-11-20 02:22
/user/mmurumkar/
.hiveJars drwxr-xr-x - mmurumkar sfdmgct_admin 0 2014-11-18 20:00
/user/mmurumkar/test drwxr-xr-x - mmurumkar sfdmgct_admin 0 2014-11-18 20:01
/user/mmurumkar/test1 drwxr-xr-x - mmurumkar sfdmgct_admin 0 2014-11-18 20:08
/user/mmurumkar/test2 drwxr-xr-x - rbolla sfdmgct_admin 0 2014-11-18 20:09
/user/mmurumkar/test3 drwxr-xr-x - rbolla sfdmgct_admin 0 2014-11-18 20:10
/user/mmurumkar/test4 drwxr-xr-x - ranger sfdmgct_admin 0 2014-11-18 20:18
/user/mmurumkar/test5 drwxr-xr-x - mmurumkar sfdmgct_admin 0 2014-11-20 18:01
/user/mmurumkar/test7 drwxr-xr-x - ranger sfdmgct_admin 0 2014-11-19 14:21
/user/mmurumkar/test8
- Audit records will now show up in audit UI on the portal.
Conclusion
Apache Ranger is starting to fill critical security needs in Hadoop environment, marking a big progress towards making Hadoop an enterprise data platform.
So you don’t need to setup a separate server outside of the cluster to run Ranger on?
LikeLike
No, you don’t.
LikeLike
Nice blog ..Thanks for the post.
LikeLike
Excellent blog, you have provided a lot more useful information than Ranger’s docs. I agree, this product will eventually be an excellent addition to HDP but they do need to sort their docs out.
Have you done the HBase plugin? Are you planning on posting a tutorial for it? I have got it working with UNIX users but not by their groups
LikeLike