We've Got Answers
Frequently Asked Questions
We compiled a list of the most common questions we hear. Hopefully you can find what you are looking for, but please reach out to us at email@example.com with any unanswered questions.
Gluent Data Platform
Gluent Data Platform is a transparent data virtualization solution that migrates data and processing from Oracle to modern data platforms, without needing to change any code. This transparent data virtualization also provides the ability for applications to query the offloaded data and join it with local RDBMS data, from a single query in the RDBMS’s existing SQL interface.
Gluent Data Platform comprises three core components:
- Gluent Advisor - Gluent Advisor is a free tool that analyzes your RDBMS to determine how much data can be offloaded and safely dropped. This open-text SQL script does not require any software installation.
- Gluent Offload Engine - Gluent Data Platform's fully-automated, rule-based technology that orchestrates all of the processes required to offload data from your RDBMS to a target data platform and to present virtualized data from a modern data platform to your RDBMS.
- Gluent Query Engine - Gluent Data Platform's transparent query engine includes Smart Connector, which enables virtualized data (offloaded or presented with Gluent Offload Engine) to be accessed with local RDBMS data from a single query in the RDBMS’s existing SQL interface.
Gluent Data Platform also has a companion product, Conductor for Gluent. Conductor for Gluent is the UI for Gluent Data Platform that streamlines and provides monitoring for offload operations.
- Current supported RDBMS - Oracle Database
- Current supported backends - Cloudera Data Hub, Cloudera Data Platform, Google BigQuery and Amazon EMR.
- Current supported storage solutions - HDFS, Azure ADLS and ABFS, Amazon S3 and Google Cloud Storage.
No, Gluent Data Platform is not a federated solution. In simple terms, a federated solution provides a SQL engine that can connect to various backend databases to facilitate access to virtually pull data together. There are a couple of downsides to the federated approach. First, applications and users must rewrite their SQL for the new SQL engine. Second, joining multiple large datasets from different databases can present a performance challenge that is very difficult to overcome.
Gluent Data Platform, on the other hand, is a transparent data virtualization solution. While the problem definition that Gluent addresses is very similar to federated offerings (trying to eliminate or bridge the silos), the approach to solving the problem is very different.
Unlike other data virtualization products, Gluent Data Platform also migrates data from your RDBMS to your target platform without changing code or building any ETL code or data pipelines. This migration is designed to be a background process and transparent to systems that access the data.
Gluent Advisor is a free tool that gathers data from your RDBMS via a simple open-text SQL script and can be run without installing any software. Nothing proprietary is gathered, only information about the RDBMS’s storage and workload. Gluent Advisor will then analyze the information to determine how much data can be offloaded and safely dropped from the RDBMS. Everything is visualized in a shareable Gluent Advisor report. This report summarizes the benefits of using Gluent Data Platform and provides specific details about each table and partition.
You can run Gluent Advisor here.
The best way to get started is by running Gluent Advisor. Gluent Advisor reports the expected benefits of implementing Gluent Data Platform before installing any software. This information enables Gluent to have conversations with you about your needs, your databases, and the benefits you should expect.
You can find out more about Gluent Advisor here.
Transparent Data Virtualization
Transparent data virtualization enables enterprise data sharing by providing simple, virtual access to data throughout the organization. Rather than developing and maintaining data pipelines or replication jobs to copy data from silo to silo, data can be queried directly from its source.
Unlike many data virtualization products that require a federation engine to translate between variations of datastores and force code changes to existing applications, transparent data virtualization makes data access transparent to applications and end users without code changes, regardless of where the underlying data lives.
- Accelerating cloud migrations
- Enterprise data sharing
- Eliminating enterprise data sprawl
- Active archiving of historical data
- Integrating IoT data, machine learning results and other big data sources back to legacy RDBMS applications
- Faster adoption of cloud platforms and modern data architectures
- Cost reduction in storage, compute and data movement processes
- Decreased RDBMS license cost due to having a smaller footprint
- Enhanced capabilities, such as machine learning across previously siloed datasets
- Transparent access to virtualized data allows applications to continue operating with zero code changes
- Improved application performance by leveraging modern, scalable infrastructure without refactoring existing applications
Gluent Query Engine
When retrieving virtualized data to satisfy existing RDBMS SQL queries, Gluent Query Engine is designed to push down as much processing and filtering to the target platform as possible. This includes pushing down one or more of the following for processing on the target platform:
- Predicates (i.e., the filters in the WHERE clause)
- Projections (i.e., to retrieve only the columns required by the query)
- Aggregations (using our Advanced Aggregation Pushdown feature)
- Join filters (using our Join Filter Pulldown feature)
- Joins (using our Join Pushdown feature)
- Data type conversion and formatting (for optimal performance).
Gluent Data Platform creates a hybrid architecture that allows data stored in both your RDBMS and your target platform to be accessed seamlessly. Data from a table can be spread across the RDBMS and your target platform. When a query executes, parts of the query execute on the RDBMS while other parts execute on the target platform. The goal is to push as much processing down to the target platform as possible, while returning only the minimum amount of data needed to the RDBMS. Joins can be pushed down entirely if the required tables are present in the target platform. This is particularly useful when doing joins with multiple large fact tables. Gluent Query Engine has several techniques and features to push down filtering and join logic to the target platform to avoid moving large datasets and keeping hybrid joins performant.
Gluent Data Platform supports all standard SQL supported by Oracle Database including extensions such as PL/SQL.
Gluent Data Platform keeps the original SQL engine in play, eliminating the need to rewrite any application code and allowing transparent access to the data offloaded to your target platform. Existing PL/SQL continues to work in Oracle database deployments. Applications continue to connect to Oracle (usually with a much smaller footprint), which processes the incoming SQL and proprietary extensions such as PL/SQL. The heavy lifting is pushed down to the target platform by Gluent Query Engine. Having more data consolidated to the target platform maximizes the amount of processing that can be pushed down.
Gluent Data Platform uses open-source SQL engines on Hadoop and is certified with Hive (with or without LLAP), Impala and Spark. When accessing other platforms such as Google BigQuery, Gluent Data Platform utilizes that platform’s recommended SQL interface.
Yes. Gluent Query Engine allows you to access data that is native to your modern data platform directly and in real-time from your RDBMS without persisting the data in the RDBMS. This feature is enabled by a Gluent Offload Engine component called Present.
Tables that can be queried via Google BigQuery or Hive/Impala (e.g., Parquet, Avro, CSV, JSON) can be presented to your RDBMS, where they can be queried using the existing RDBMS SQL syntax and interface.
This functionality makes it very easy to share data from new modern data platforms back to your existing relational database systems.
Yes. Gluent Data Platform supports queries which run from the RDBMS in parallel.
Gluent Offload Engine
Gluent Offload Engine orchestrates the processes that are required to virtualize data between RDBMSs and modern data platforms (either on-premises or in the cloud). Components include Offload, Present, Schema Sync, Offload Status Report and several other tools that cover a wide range of functional and operational requirements for maintaining hybrid environments.
Offload, for example, synchronizes data from the RDBMS to a modern data platform and automatically builds the appropriate backend table structures, data mappings, partitioning schemes, in addition to the metadata and RDBMS objects required to query the offloaded data from existing RDBMS SQL. Because offloaded data should be shared and accessible, data is stored in open formats when possible. Data is stored in native formats for platforms such as Google BigQuery.
Present is a lightweight tool that creates all of the metadata and RDBMS objects necessary to query virtualized data from the RDBMS. This can be used to share native backend data with the RDBMS and/or to present different views of previously-offloaded data to optimize virtualized data access.
Gluent Data Platform supports compression, but is dependent on the format that the offloaded data is stored in. When storing data in Hadoop, for example, data can be stored in a compressed columnar format such as Parquet. Google BigQuery's Capacitor columnar file format also supports compression.
Gluent Offload Engine supports a variety of storage formats for offloading data:
- Google BigQuery Capacitor
Depending on the target data platform, Gluent Offload Engine supports a number of on-premises and cloud storage solutions for either transient data (i.e. staging data), offloaded table data or both:
- Google Cloud Storage
- Amazon S3
- Azure ADLS/ABFS
Sqoop is the de-facto bulk data movement tool for Hadoop. Gluent Offload Engine is much more than just a data movement tool because of its automation, intelligence and support for platforms other than Hadoop-based systems (such as Google BigQuery).
For example, Gluent Offload Engine:
- Automates table structure creation on the target platform, including data mapping and partitioning
- Has built-in, configurable rules for offloading data, such as by partition, subpartition, predicate or full table
- Performs a variety of validations to ensure data consistency
- Manages and exposes metadata to ensure hybrid queries run optimally from the RDBMS
Gluent Offload Engine can utilize either Sqoop or Spark for its data transport phase (i.e. the movement of data from the RDBMS to the target data platform), but this is just one of many operations that are orchestrated when offloading data.
Gluent Offload Engine automates the creation of tables, including partitioning and views when offloading or presenting data. Administration tasks such as adding new columns to existing offloaded tables are also automated. Depending on the target platform, databases can be created up-front by administrators or created automatically with Gluent Offload Engine. Database objects such as user-defined functions may be created by Gluent Data Platform during installation, but this is target platform-dependent and is not required on many systems.
Conductor for Gluent provides scheduling capabilities and other operational automations.
Additionally, offload commands can be executed through any external scheduling tool that supports generic command line calls.
Gluent Offload Engine has several available options for optimizing offload performance, depending on the specific use case, data types involved, degree of parallelism and other factors. These optimizations can be found and followed in the Gluent Data Platform documentation. For more detailed optimization support, ask about Gluent consulting services for Gluent Data Platform implementation.
Gluent Offload Engine supports several offload scenarios:
- Offloading entire tables
- Offloading some or all partitions for range- or list-partitioned tables
- Offloading some or all range subpartitions for range-subpartitioned tables
- Predicate-based offloading
- Offloading joins
Yes. There are times when data that has been offloaded from your RDBMS needs to be updated. This can be either to fix historical data or to refresh dimension tables. Gluent Data Platform’s Incremental Update functionality provides a variety of options to handle these scenarios.
When Incremental Update is enabled, additional objects are created (in both the RDBMS and the target platform) to track and store changes to the offloaded data. Changes are periodically synchronized to the target platform.
Yes, if Conductor for Gluent is installed, it can be configured to manage data removal/retention from the RDBMS. Gluent Data Platform does not automatically remove data from your RDBMS when it is offloaded. For example, if an entire table is offloaded to the target platform, the table remains in the RDBMS but access to the data is routed to the target platform. Similarly, when partitions are offloaded, the partitions remain in the RDBMS but queries needing data from those partitions will pull from the target platform. Removal of data is decoupled from offloading and is controlled by Conductor for Gluent.
Gluent Data Platform supports the use of encryption both at rest and in flight for all communication between its components. Privileges for service/system accounts used by Gluent Data Platform are assigned on a least privilege principle. Access to data in the backend system via Gluent Query Engine is managed via privileges in the RDBMS system.
Gluent Data Platform's architecture allows for a variety of installation patterns. Some lightweight components must be installed on the RDBMS server(s) to support Gluent Query Engine, but there are multiple options for installing the remaining components such as Gluent Offload Engine. These components can be installed on shared or dedicated servers, physical or VM, on-premises or in the cloud. The exact configuration depends on your existing environment, target platform, and security requirements.
No, there are no specific hardware requirements for Gluent Data Platform. Hardware can be deployed as shared or dedicated servers, physical or VM, on-premises or in the cloud. Gluent provides pre-configured VM images on various cloud providers' marketplace platforms but these, too, are customizable.
Gluent Data Platform has multiple licensing models designed to meet your needs.
- On-premises perpetual licenses
- On-premises subscription licenses
- Cloud-based licenses
Detailed pricing descriptions can be found in our Licensing Guide