Skip to content

Immuta Architecture

Audience: All Immuta users

Content Summary: This page details the major components, installation, scalability, availability, and security of the Immuta platform.

Immuta Components

Immuta's server-side software comprises the following major components:

  • The Immuta Web Service: This service is responsible for all web-based user interaction with Immuta, metadata ingest, data fingerprinting, and backing the Query Engine, Spark partition server, and NameNode plugin. Notionally a single web service, the fingerprinting functionality does run as a separate service internally and can be independently scaled.

  • The Immuta Metadata Catalog: This internal database maintains a small amount of data on each object you register with Immuta so that Immuta can provide responsive access to objects to your users while enabling you to dynamically create access policies on the objects.

  • The Immuta Virtual Filesystem: This mountable filesystem represents connected data from across an organization in a directory hierarchy. However, all the files remain empty until a file is read; at that point, the file is hydrated dynamically with the data from the underlying storage technology, and the policies are enforced automatically.

  • The Immuta SQL Query Engine: This service tracks queryable data source configuration and exposes a SQL connection to the Immuta web service and to any SQL client, such as a SQL library in Python or R or a BI / Data Science tool. This service interprets client SQL queries, pushes queries to your connected business databases, applies policies, and returns query responses to your SQL clients.

  • Immuta HDFS Layer (optional): The Immuta HDFS layer allows Immuta to enforce data access policies within your Hadoop infrastructure. To enable this functionality, Immuta supplies two .jar files to be installed on your name node(s) and data nodes.

  • Immuta Spark Context (optional): The Immuta Spark Context is a subclass of SparkSQL that allows you to enforce row and column level controls on data in HDFS which backs Impala or Hive tables when being processed by SparkSQL.

Installation

Immuta's standard installation is a Helm installation to a Kubernetes cluster. This could be a Kubernetes cluster you manage or a hosted solution such as AKS, EKS, or GKE. This is the preferred deployment because of the minimal administration needed to achieve scale and availability. However, Kubernetes is not a prerequisite for installing Immuta. Immuta supports installation on single Enterprise Linux (6 or 7) nodes via Docker or RPM installations. These installation types can be configured for scale and availability. Please see the Immuta Installation Guide for details on all deployment options.

Immuta's optional SparkSQL and Hadoop capabilities install as plugins on your Hadoop cluster. Please see the Hadoop Installation Guide for full details.

Immuta Architecture

Scalability

Immuta is designed to be scalable in several dimensions. For the standard Immuta deployment, minimal administrative effort is required to manage scaling beyond the addition of nodes to the Immuta system. Scalability can also be achieved in non-standard deployments, but requires the time of skilled systems administrator resources.

  • The Immuta web service is stateless and horizontally scalable.
  • By keeping a metadata catalog rather than maintaining separate copies of data, Immuta's database is designed to remain small and responsive. By running replicated instances of this internal database, the catalog can scale in support of the web service.
  • The Immuta SQL Query Engine can scale horizontally with user load. Individual queries are limited by the memory allocated to an individual instance in scenarios where queries cannot be fully pushed-down to business databases.

High Availability

Because each component of Immuta is designed to be horizontally scalable, Immuta can be configured for high availability. Upgrades and major configuration changes may require scheduled downtime, but even if Immuta's master internal database fails, recovery happens within seconds. With the addition of an external load balancer, Immuta's standard deployment comes preconfigured with these availability features.

Security

Immuta’s core function of policy enforcement and management is designed to improve your data security. Beyond this primary feature, Immuta protects your data in several other ways.

  • Immuta is designed to leverage your existing identity management system when desired. This design allows Immuta to benefit from the work your security team has already done to validate users, protect credentials, and define roles and attributes.

  • By default, all network communications with Immuta and within Immuta are encrypted via TLS. This practice ensures your data is protected while in transit.

  • Immuta does not make any persistent copies of data. The Immuta Virtual Filesystem allows for the temporary local caching of file representations of data for performance purposes, but all data is encrypted on disk, and the encryption keys are never written to disk. Data is only decrypted by the Immuta Virtual Filesystem when a file is opened and the content is read; user access to data is revalidated each time the Immuta Virtual Filesystem opens a virtual file.