Skip to content

You are viewing documentation for Immuta version 2023.1.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Immuta Log Analysis Tool for CDH Deployments

Audience: System Administrators

Content Summary: This page details how to use the immuta_hdfs_log_analyzer tool to troubleshoot slowdowns in your CDH cluster.

Overview

Sub-optimal configuration of the Immuta HDFS NameNode plugin may cause cluster-wide slowdowns under certain conditions. The NameNode plugin contains a variety of cache settings to limit the number of network calls that occur within the NameNode's locked permission checking operation. If these settings are configured properly, there will be little to no impact on the performance of HDFS operations.

You can use the immuta_hdfs_log_analyzer command-line utility to track the number of API calls coming from NameNode plugin to the Immuta Web Service.

Usage

You can download the log analysis tool here, and it can be invoked like so:

./immuta_hdfs_log_analyzer [-s START_TIME] [-e END_TIME] [-g {MINUTES,HOURS,DAYS}] [-t TIME_FORMAT] <file>

Options

  • START_TIME (-s, --start-time): Timestamp for the beginning of the period to analyze.
  • END_TIME (-e, --end-time): Timestamp for the end of the period to analyze.
  • GRANULARITY (g, --granularity): Defines time buckets for analysis. Can be MINUTES, HOURS or DAYS.
  • TIME_FORMAT (-t, --time-format): The format to use for timestamps. This should match the timestamp format in the Immuta Web Service logs.

Output

$ ./immuta_hdfs_log_analyzer \
    -s "2020-02-03T02:00:00.000000Z" \
    -e "2020-02-03T08:00:00.000000Z" \
    -g HOURS \
    immuta.log
2020-02-03T02:00:00.000000Z -- HDFS API Calls: 641, Mean ResponseTime: 8.0 ms, Max ResponseTime: 76 ms
2020-02-03T03:00:00.000000Z -- HDFS API Calls: 368, Mean ResponseTime: 6.0 ms, Max ResponseTime: 79 ms
2020-02-03T04:00:00.000000Z -- HDFS API Calls: 407, Mean ResponseTime: 7.0 ms, Max ResponseTime: 63 ms
2020-02-03T05:00:00.000000Z -- HDFS API Calls: 440, Mean ResponseTime: 8.0 ms, Max ResponseTime: 89 ms
2020-02-03T06:00:00.000000Z -- HDFS API Calls: 491, Mean ResponseTime: 10.0 ms, Max ResponseTime: 70 ms
2020-02-03T07:00:00.000000Z -- HDFS API Calls: 481, Mean ResponseTime: 15.0 ms, Max ResponseTime: 422 ms
2020-02-03T08:00:00.000000Z -- HDFS API Calls: 321, Mean ResponseTime: 6.0 ms, Max ResponseTime: 78 ms
HDFS API Calls: 3149
Other API Calls: 10398

If you are able to correlate time buckets from this tool's output to periods of slow cluster performance, you may need to adjust configuration for the Immuta HDFS NameNode plugin.