Skip to content

You are viewing documentation for Immuta version 2.8.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Installing Livy on Your HDFS Cluster

Audience: System Administrators

Content Summary: Livy is an open-source REST web hook for Apache Spark. It allows users to submit spark jobs over HTTP from outside the cluster. This page details installing and configuring Livy to meet the needs of your cluster, including Basic Installation, Cloudera Manager Installation, Amazon EMR Installation, Configuration, and Security.

Basic Installation

Cloudera provides compiled Livy packages on their website. For the most basic installation, simply download the 0.3.0 version and unzip it.

  • Select a node in your cluster to install Livy on. Be sure that you can run spark-submit from this node.
  • Configurations can be modified in the livy-server-0.3.0/conf/livy.conf file.
  • You can start Livy in the background by running ./livy-server-0.3.0/bin/livy-server &. You may want to create init scripts or other functionality for making sure that Livy is restarted when the installation node is restarted.

This method is only recommended for highly customized clusters that are not managed by Cloudera Manager or Amazon EMR.

Cloudera Manager Installation

Livy can be deployed as a parcel in Cloudera Manager. For more information on Cloudera Parcels, see the official Cloudera Documentation.

If you wish to install Livy through Cloudera Manager, you will receive three files from Immuta:

  • A jar file with Livy's Custom Service Descriptor (e.g. LIVY-0.3.0.jar).
  • A parcel file (e.g. LIVY-0.3.0-el7.parcel).
  • A SHA1 hash of the parcel file (e.g. LIVY-0.3.0-el7.parcel.sha).

To install Livy via a Cloudera Parcel:

  1. Copy these files to the node on your cluster where Cloudera Manager is installed.
  2. Move the parcel and parcel hash to the Parcel Repository. This is usually /opt/cloudera/parcel-repo.
  3. Move the CSD jar file to the CSD directory. This is usually /opt/cloudera/csd.
  4. Be sure to execute chown cloudera-scm:cloudera-scm on all of the files in their new locations.
  5. In the Cloudera Manager UI, open the Parcel page.
  6. Click Check for New Parcels in the top right corner of the page.
  7. The LIVY parcel should appear in the Parcel table. Click Distribute in the Actions tab for Livy.
  8. Click Activate in the Actions tab for Livy.
  9. Now restart the Cloudera SCM. From a terminal on the node where Cloudera Manager is installed, run service cloudera-scm-server restart.
  10. Restart the Cloudera Management Service from the Cloudera Manager UI.
  11. Now select add a service from the Cloudera Manager UI, and select Livy.
    • When choosing a node for Livy, be sure to select a node that you can run spark-submit from.
  12. Start the Service. Note that the Livy service is dependent on an existing Yarn service to be running in Cloudera Manager.

Amazon EMR Installation

If you wish to install Livy on an Amazon Elastic Map Reduce cluster, Immuta will provide a script artifact for you to upload to S3 and use as a bootstrap action when deploying your EMR cluster.

Custom configuration options can be set in the setLivyConfig method in the provided script.

Configuration

Livy has a number of configuration options that you may want to tweak to fit the needs of your cluster.

Option Description Required? Default
livy.spark.master Where Spark jobs are submitted. Y yarn
livy.spark.deploy-mode How Spark jobs are deployed. Y cluster
livy.server.host The host address that the server will start on. Y 0.0.0.0
livy.server.port The host port that the server will start on. Y 8998
livy.server.session.timeout Time in milliseconds for idle sessions to timeout. Y 600000
livy.impersonation.enabled If Livy should proxy users when submitting a spark job. Always true. Y true
livy.keystore Used for SSL certificate & key. N N/A
livy.keystore.password The keystore password. N N/A
livy.server.auth.type Can only be set to kerberos. N N/A
livy.server.auth.kerberos.principal This should be set to the HTTP principal for this node. N N/A
livy.server.auth.kerberos.keytab The keytab used to kinit with this principal. N N/A
livy.server.launch.kerberos.principal The principal used to launch Spark Jobs. N N/A
livy.server.launch.kerberos.keytab The keytab used to kinit with this principal N N/A

HDFS Configuration

In order for Livy to be able to submit Spark jobs on behalf of Immuta users, the livy user must be given full impersonation permissions. Add the following to your core-site.xml. If you are using Cloudera Manager, this can be set in Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml.

<property>
    <name>hadoop.proxyuser.livy.hosts</name>
    <value>*</value>
    <final>true</final>
</property>
<property>
    <name>hadoop.proxyuser.livy.users</name>
    <value>*</value>
    <final>true</final>
</property>
<property>
    <name>hadoop.proxyuser.livy.groups</name>
    <value>*</value>
    <final>true</final>
</property>

Java KeyStore KMS Configuration

If your cluster has the Java KeyStore KMS configured, you will need to give the livy user the ability to impersonate any users who will run Spark jobs using Livy. Add the following to kms-site.xml. If you are using Cloudera Manager, this can be set in Key Management Server Advanced Configuration Snippet (Safety Valve) for kms-site.xml.

<property>
    <name>hadoop.kms.proxyuser.livy.hosts</name>
    <value>*</value>
    <final>true</final>
</property>
<property>
    <name>hadoop.kms.proxyuser.livy.users</name>
    <value>*</value>
    <final>true</final>
</property>
<property>
    <name>hadoop.kms.proxyuser.livy.groups</name>
    <value>*</value>
    <final>true</final>
</property>

Security

Authentication

Livy can integrate with kerberos if it is enabled for your cluster. You must provide two principals and corresponding keytabs. One must be the HTTP principal for the node that Livy is installed on. The other is a custom principal that you must create for Livy. If you use Cloudera Manager and choose to install via parcel, all relevant principals will be created for you.

See the Configuration section for details on configuring authentication.

SSL

If your cluster has SSL enabled, all you need to do is point Livy to the keystore where your SSL certificate and key are stored. See the Configuration section for more details.