Gluent

We've Got Answers

Gluent Offload Engine - FAQ's

Frequently Asked Questions About Gluent Offload Engine

We’ve compiled a list of the most common questions we’ve been asked about Gluent Offload Engine. Please reach out to us at info@gluent.com with any unanswered questions.

Gluent Offload Engine orchestrates the processes that are required to virtualize data between RDBMSs and modern data platforms (either on-premises or in the cloud). Components include Offload, Present, Schema Sync, Offload Status Report and several other tools that cover a wide range of functional and operational requirements for maintaining hybrid environments.

Offload, for example, synchronizes data from the RDBMS to a modern data platform and automatically builds the appropriate backend table structures, data mappings, partitioning schemes, in addition to the metadata and RDBMS objects required to query the offloaded data from existing RDBMS SQL. Because offloaded data should be shared and accessible, data is stored in open formats when possible. Data is stored in native formats for platforms such as Google BigQuery.

Present is a lightweight tool that creates all of the metadata and RDBMS objects necessary to query virtualized data from the RDBMS. This can be used to share native backend data with the RDBMS and/or to present different views of previously-offloaded data to optimize virtualized data access.

Yes, Gluent Data Platform does compress data, but is dependent on the format that the offloaded data is stored in. When storing data in Hadoop, for example, data can be stored in a compressed columnar format such as Parquet. Google BigQuery’s Capacitor columnar file format also supports compression.

Gluent Offload Engine supports a variety of storage formats for offloading data:  

  • Parquet
  • ORC
  • Avro
  • Google BigQuery Capacitor

Depending on the target data platform, Gluent Offload Engine supports a number of on-premises and cloud storage solutions for either transient data (i.e. staging data), offloaded table data or both:  

  • HDFS
  • Google Cloud Storage
  • Amazon S3
  • Azure ADLS/ABFS

Gluent Offload Engine is different from open source tools, such as Sqoop, as it is much more than just a data movement tool due to its automation, intelligence, and support for platforms other than Hadoop-based systems (such as Google BigQuery). Sqoop is the de-facto bulk data movement tool for Hadoop.

For example, Gluent Offload Engine:

  • Automates table structure creation on the target platform, including data mapping and partitioning
  • Has built-in, configurable rules for offloading data, such as by partition, subpartition, predicate or full table
  • Performs a variety of validations to ensure data consistency
  • Manages and exposes metadata to ensure hybrid queries run optimally from the RDBMS

Gluent Offload Engine can utilize either Sqoop or Spark for its data transport phase (i.e. the movement of data from the RDBMS to the target data platform), but this is just one of many operations that are orchestrated when offloading data.

No, you do not need to create any tables or metadata on the target platform. Gluent Offload Engine automates the creation of tables, including partitioning and views when offloading or presenting data. Administration tasks such as adding new columns to existing offloaded tables are also automated. Depending on the target platform, databases can be created up-front by administrators or created automatically with Gluent Offload Engine. Database objects such as user-defined functions may be created by Gluent Data Platform during installation, but this is target platform-dependent and is not required on many systems.

Yes, Gluent Offload Engine does provide a scheduling tool. Conductor for Gluent provides scheduling capabilities and other operational automations. Additionally, offload commands can be executed through any external scheduling tool that supports generic command line calls.

Gluent Offload Engine has several available options for optimizing offload performance, depending on the specific use case, data types involved, degree of parallelism and other factors. These optimizations can be found and followed in the Gluent Data Platform documentation. For more detailed optimization support, ask about Gluent consulting services for Gluent Data Platform implementation.

Gluent Offload Engine supports several offload scenarios including:

  • Offloading entire tables
  • Offloading some or all partitions for range- or list-partitioned tables
  • Offloading some or all range subpartitions for range-subpartitioned tables
  • Predicate-based offloading
  • Offloading joins

Yes, data can be updated after it has been offloaded from the RDBMS. This can be either to fix historical data or to refresh dimension tables. Gluent Data Platform’s Incremental Update functionality provides a variety of options to handle these scenarios.

When Incremental Update is enabled, additional objects are created (in both the RDBMS and the target platform) to track and store changes to the offloaded data. Changes are periodically synchronized to the target platform.

No, Gluent Data Platform does not automatically remove data from your RDBMS when it is offloaded. However, if Conductor for Gluent is installed, it can be configured to manage data removal/retention from the RDBMS. For example, if an entire table is offloaded to the target platform, the table remains in the RDBMS but access to the data is routed to the target platform. Similarly, when partitions are offloaded, the partitions remain in the RDBMS but queries needing data from those partitions will pull from the target platform. Removal of data is decoupled from offloading and is controlled by Conductor for Gluent.

Want to learn more about how Gluent can help you?​​​