Gluent Offload Engine FAQs
Q: What is Gluent Offload Engine?
Gluent Offload Engine is software that synchronizes tables from enterprise relational databases to modern data storage platforms like Hadoop, both on-premises and in the cloud. The offloaded data is stored in open data formats, with no need for a proprietary database engine for data access. Gluent Offload Engine will sync the data to Hadoop, store it in a compressed, columnar format, and create the metadata for the table structure and partitions. Once offloaded, data instantly becomes available in open data formats, can be accessed by native Hadoop tools for data scientists and others, and is easily shared throughout the enterprise.
Q: How is Gluent Offload Engine different from open source tools such as Sqoop?
While Sqoop is a well known de-facto bulk data movement tool for Hadoop, it has a number of limitations that Gluent Offload Engine is able to offset.
- Different capabilities of datatypes in commercial RDBMS vs Hadoop
- Different capabilities of range partitioning methods in RDBMS vs Hadoop
Gluent Offload Engine will translate each RDBMS datatype to the appropriate Hadoop datatype to ensure there is no data loss or corruption.
What Sqoop doesn’t provide out of the box (without extra scripting & coding):
- Atomic offloading (no data loss nor duplicates even in case of Hadoop node or network failure)
- Quick offloading of small tables (even small table offloads launch a big MapReduce job)
- Sync, merge, and compact incremental changes on HDFS for one “current” view of data
- Load microsecond-precision timestamps without precision loss (Sqoop loads only milliseconds by default)
Gluent Offload Engine has built-in data validation and HDFS update support to ensure the RDBMS data is fully synchronized with Hadoop and represents the current state of the source data.
Q: What Hadoop storage formats are supported?
When syncing data from RDBMS to Hadoop, we support Parquet and ORC (ORC on Hortonworks). We have built a technology to allow updates/changes to existing offloaded data (normally HDFS/Parquet is immutable). When presenting existing Hadoop data to RDBMS (not offloading from database first) we support all datatypes what your underlying Hadoop distribution supports. Whatever tables/files can be queried via Hive or Impala (for example: Parquet, Avro, CSV, JSON), they can be presented to your RDBMS for transparent query.
Q: Where is the software installed?
The software is required to be installed on the relational database server. Optionally, if you would like to run the offload command from the Hadoop cluster, or for specific security implementations, Gluent Offload Engine components can also be installed on a Hadoop Edge Node.
Q: Does Gluent require specific hardware resources for the Hadoop cluster (for example, SSD storage, number of cores etc.)?
There are no specific hardware requirements.
Q: What change data capture methods are used to identify and capture source data changes?
Gluent Offload Engine will periodically query tables and/or partitions for changes and sync these to Hadoop. It can also poll Gluent-maintained log tables (populated by triggers on source tables or when updating offloaded, Hadoop-only data from within the RDBMS) and syncs the logged changes to Hadoop.
Q: How does Gluent Offload Engine keep my data secure?
Gluent Offload Engine supports data encryption on data at rest and data in-motion. Role based access control is also fully supported.
Q: Does Gluent Offload Engine provide a scheduling tool?
No, but the Gluent Offload command can be run via any external scheduling tool that supports generic command line calls.
Q: Do I need to create any tables or metadata on Hadoop manually?
Not at all. Gluent Offload Engine will both copy data to HDFS in a compressed, columnar format, and it will create the table metadata in Hadoop (Impala or Hive). While creating the Hadoop based table, Gluent Offload Engine will determine which data type will best represent each source RDBMS data type, ensuring no data is compromised or lost during the offload.
Q: How do I optimize the offload of data?
Gluent Offload Engine has several available options for optimizing the speed of ingestion, depending on the specific use case, data types involved, and other information. These optimizations can be found and followed in the Gluent documentation. For more detailed optimization support, ask about Gluent consulting services for Gluent Offload Engine implementation.