Installing Livy on Your HDFS Cluster
Audience: System Administrators
Content Summary: Livy is an open-source REST web hook for Apache Spark. It allows users to submit spark jobs over HTTP from outside the cluster. This page details installing and configuring Livy to meet the needs of your cluster, including Basic Installation, Cloudera Manager Installation, Amazon EMR Installation, Configuration, and Security.
Basic Installation
Cloudera provides compiled Livy packages on their website. For the most basic installation, simply download the 0.3.0 version and unzip it.
- Select a node in your cluster to install Livy on. Be sure that you can run
spark-submit
from this node. - Configurations can be modified in the
livy-server-0.3.0/conf/livy.conf
file. - You can start Livy in the background by running
./livy-server-0.3.0/bin/livy-server &
. You may want to create init scripts or other functionality for making sure that Livy is restarted when the installation node is restarted.
This method is only recommended for highly customized clusters that are not managed by Cloudera Manager or Amazon EMR.
Cloudera Manager Installation
Livy can be deployed as a parcel in Cloudera Manager. For more information on Cloudera Parcels, see the official Cloudera Documentation.
If you wish to install Livy through Cloudera Manager, you will receive three files from Immuta:
- A jar file with Livy's Custom Service Descriptor (e.g.
LIVY-0.3.0.jar
). - A parcel file (e.g.
LIVY-0.3.0-el7.parcel
). - A SHA1 hash of the parcel file (e.g.
LIVY-0.3.0-el7.parcel.sha
).
To install Livy via a Cloudera Parcel:
- Copy these files to the node on your cluster where Cloudera Manager is installed.
- Move the parcel and parcel hash to the Parcel Repository. This is usually
/opt/cloudera/parcel-repo
. - Move the CSD jar file to the CSD directory. This is usually
/opt/cloudera/csd
. - Be sure to execute
chown cloudera-scm:cloudera-scm
on all of the files in their new locations. - In the Cloudera Manager UI, open the Parcel page.
- Click
Check for New Parcels
in the top right corner of the page. - The
LIVY
parcel should appear in the Parcel table. ClickDistribute
in theActions
tab for Livy. - Click
Activate
in theActions
tab for Livy. - Now restart the Cloudera SCM. From a terminal on the node where Cloudera Manager is installed, run
service cloudera-scm-server restart
. - Restart the Cloudera Management Service from the Cloudera Manager UI.
- Now select add a service
from the Cloudera Manager UI, and select
Livy
.- When choosing a node for Livy, be sure to select a node that you can run
spark-submit
from.
- When choosing a node for Livy, be sure to select a node that you can run
- Start the Service. Note that the Livy service is dependent on an existing
Yarn
service to be running in Cloudera Manager.
Amazon EMR Installation
If you wish to install Livy on an Amazon Elastic Map Reduce cluster, Immuta will provide a script artifact for you to upload to S3 and use as a bootstrap action when deploying your EMR cluster.
Custom configuration options can be set in the setLivyConfig
method in the provided script.
Configuration
Livy has a number of configuration options that you may want to tweak to fit the needs of your cluster.
Option | Description | Required? | Default |
---|---|---|---|
livy.spark.master |
Where Spark jobs are submitted. | Y | yarn |
livy.spark.deploy-mode |
How Spark jobs are deployed. | Y | cluster |
livy.server.host |
The host address that the server will start on. | Y | 0.0.0.0 |
livy.server.port |
The host port that the server will start on. | Y | 8998 |
livy.server.session.timeout |
Time in milliseconds for idle sessions to timeout. | Y | 600000 |
livy.impersonation.enabled |
If Livy should proxy users when submitting a spark job. Always true. | Y | true |
livy.keystore |
Used for SSL certificate & key. | N | N/A |
livy.keystore.password |
The keystore password. | N | N/A |
livy.server.auth.type |
Can only be set to kerberos . |
N | N/A |
livy.server.auth.kerberos.principal |
This should be set to the HTTP principal for this node. | N | N/A |
livy.server.auth.kerberos.keytab |
The keytab used to kinit with this principal. |
N | N/A |
livy.server.launch.kerberos.principal |
The principal used to launch Spark Jobs. | N | N/A |
livy.server.launch.kerberos.keytab |
The keytab used to kinit with this principal |
N | N/A |
HDFS Configuration
In order for Livy to be able to submit Spark jobs on behalf of Immuta users, the livy
user must be given full
impersonation permissions. Add the following to your core-site.xml
. If you are using Cloudera Manager, this
can be set in Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml
.
<property>
<name>hadoop.proxyuser.livy.hosts</name>
<value>*</value>
<final>true</final>
</property>
<property>
<name>hadoop.proxyuser.livy.users</name>
<value>*</value>
<final>true</final>
</property>
<property>
<name>hadoop.proxyuser.livy.groups</name>
<value>*</value>
<final>true</final>
</property>
Java KeyStore KMS Configuration
If your cluster has the Java KeyStore KMS configured, you will need to give the livy
user the ability to impersonate
any users who will run Spark jobs using Livy. Add the following to kms-site.xml
. If you are using Cloudera Manager,
this can be set in Key Management Server Advanced Configuration Snippet (Safety Valve) for kms-site.xml
.
<property>
<name>hadoop.kms.proxyuser.livy.hosts</name>
<value>*</value>
<final>true</final>
</property>
<property>
<name>hadoop.kms.proxyuser.livy.users</name>
<value>*</value>
<final>true</final>
</property>
<property>
<name>hadoop.kms.proxyuser.livy.groups</name>
<value>*</value>
<final>true</final>
</property>
Security
Authentication
Livy can integrate with kerberos if it is enabled for your cluster. You must provide two principals and corresponding keytabs. One must be the HTTP principal for the node that Livy is installed on. The other is a custom principal that you must create for Livy. If you use Cloudera Manager and choose to install via parcel, all relevant principals will be created for you.
See the Configuration section for details on configuring authentication.
SSL
If your cluster has SSL enabled, all you need to do is point Livy to the keystore where your SSL certificate and key are stored. See the Configuration section for more details.