Klustron Access to Oracle using oracle_fdw
Klustron Access to Oracle using oracle_fdw
Note:
Unless specifically stated otherwise, the version numbers mentioned in the text can be substituted with any released version number. For a list of all released versions, please visit: http://doc.klustron.com/zh/Release_notes.html
Objective of This Article:
Klustron provides Foreign Data Wrappers (FDW) as a mechanism to access various external data sources. This article explains how to configure oracle_fdw in Klustron to access tables in an Oracle database and then demonstrates scenarios involving multi-table joins from different data sources.
01 Introduction to oracle_fdw
The oracle_fdw extension is a Foreign Data Wrapper (FDW) that allows users to access Oracle tables and views (including materialized views) through external tables.
When PostgreSQL clients access these external tables, oracle_fdw accesses the corresponding data in the external Oracle database via Oracle Call Interface (OCI) libraries on the PostgreSQL server.
The specific workflow of oracle_fdw is illustrated in the following diagram:
02 Specific Configuration Process
Unless otherwise noted, all operations mentioned below are performed by Klustron's installation user, kunlun.
2.1 Downloading the oracle_fdw Software Package
The specific download URL is: https://github.com/laurenz/oracle_fdw. Download it to the path /kunlun/fdw/oracle and unzip it:
cd /kunlun/fdw/oracle
unzip oracle_fdw-master.zip
2.2 Installing Oracle Client Software
Install Oracle 11.2.0.4 client on all compute nodes, choosing the “Administrator” version of the client. The installation process is omitted here. The specific installation path is /kunlun/fdw/oracle/11.2.0/client_1.
Also, add the following environment variables in the /home/kunlun/.bash_profile of the kunlun user:
export ORACLE_HOME=/kunlun/fdw/oracle/11.2.0/client_1
export PATH=$ORACLE_HOME/bin:$PATH
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH
2.3 Oracle Client Configuration
Configure the ORACLE_HOME environment variable in the server processes of Klustron's compute nodes (if there are multiple compute nodes, each node needs the following configuration). Modify the file /kunlun/kunlun-node-manager-1.3.1/bin/extra.env
Add the following environment variables at the end of the file:
ORACLE_HOME=/kunlun/fdw/oracle/11.2.0/client_1; #KUNLUN_SET_ENV
LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH
export ORACLE_HOME; #KUNLUN_SET_ENV
export LD_LIBRARY_PATH
Then, restart the node manager service of the compute node.
ORACLE_HOME=/kunlun/fdw/oracle/11.2.0/client_1; #KUNLUN_SET_ENV
LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH
export ORACLE_HOME; #KUNLUN_SET_ENV
export LD_LIBRARY_PATH
2.4 Installing oracle_fdw
Ensure the kunlun user can directly run the pg_config command.
Execute the following commands to install oracle_fdw:
cd /kunlun/fdw/oracle/oracle_fdw-master/
make
make install
2.5 Loading the oracle_fdw Extension
First, create a Klustron user and database:
psql -h 192.168.40.152 -p 47001 postgres
create user kunlun_test with superuser password 'kunlun';
create database testdb owner kunlun_test;
grant all privileges on database testdb to kunlun_test;
\q
Log in to testdb as user kunlun_test and then load the oracle_fdw extension:
psql -h 192.168.40.152 -p 47001 -U kunlun_test testdb
CREATE EXTENSION oracle_fdw;
Check if oracle_fdw is installed successfully:
2.6 Preparing the Oracle Database
Execute the following SQL in the Oracle database:
sqlplus / as sysdba
create user orauser identified by oracle;
grant dba to orauser;
con orauser/oracle
create table employee ( id int primary key, name varchar2(10));
insert into employee values(1,'Jack');
insert into employee values(2,'Tom');
commit;
The Oracle version is 11.2.0.4, server IP is 192.168.40.163, listening port is 1521, and service name is testdb.
2.7 Using oracle_fdw in Klustron
Create an external server, 'server', to configure the connection to the Oracle database:
CREATE SERVER oradb FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver
'//192.168.40.163:1521/testdb');
-- oradb: A custom server name.
-- OPTIONS: There are 3 parameters - dbserver, isolation_level, nchar.
-- dbserver (required): Defines the connection string to the Oracle database.
-- isolation_level (optional, default is serializable): Transaction isolation level used in Oracle, with possible values being serializable, read_committed, read_only.
-- nchar (optional, default is off): Whether to enable character conversion on the Oracle side, which can significantly impact performance.
View the created Server:
Create a mapping between Klustron and Oracle:
CREATE USER MAPPING FOR kunlun_test SERVER oradb OPTIONS (user 'orauser',
password 'oracle');
-- kunlun_test: An existing user in Klustron.
-- oradb: The created server name.
-- OPTIONS: There are 2 parameters - user, password.
-- user (required): Oracle username.
-- password (required): Password for the Oracle user.
View the newly created mapping:
Create an external table in Klustron associated with an Oracle table:
CREATE FOREIGN TABLE ora_emp
( empno NUMERIC(4,0) OPTIONS (key 'true') NOT NULL
, ename VARCHAR(10)
)
SERVER oradb OPTIONS (schema 'ORAUSER', table 'EMPLOYEE');
Please note:
- External table names and column names do not need to match those in Oracle, but the column order must.
- The primary key of the external table must match Oracle's table.
- In the OPTIONS attribute, schema and table names must be uppercase, or Klustron operations on the external table will throw 'ORA-00942: table or view does not exist'.
- Schema and table in OPTIONS must be enclosed in single quotes.
- Only columns that oracle_fdw can convert must be defined.
Runtime errors occur if data lengths exceed actual column lengths. Note that data type behavior, like in floating-point and datetime types, might differ.
By default, CHAR and VARCHAR2 types in Oracle are specified in bytes, whereas CHAR, VARCHAR, and TEXT types in Klustron are specified in characters.
After creating the external table, you can perform queries to access Oracle's tables from Klustron:
Batch creation of external tables in Klustron:
If many Oracle tables need to be queried in Klustron, for example, all tables under the orauser schema, you can use the following method for batch creation of external tables:
IMPORT FOREIGN SCHEMA "ORAUSER" FROM SERVER oradb INTO public;
First, create a new table 'salary' in Oracle's orauser and insert data:
create table salary (id int primary key, amount int);
insert into salary values(1,10000);
insert into salary values(2,5000);
commit;
Then execute in Klustron:
drop foreign table ora_emp;
IMPORT FOREIGN SCHEMA "ORAUSER" FROM SERVER oradb INTO public;
Things to note about IMPORT FOREIGN SCHEMA:
- This method does not require specifying table structures, but external table names must match Oracle's, meaning no table with the same name can exist in Klustron's schema, or creation fails.
- Oracle's schema names are usually uppercase. Since Klustron converts names to lowercase before processing, you must enclose the schema name in double quotes (e.g., "ORAUSER").
- LIMIT TO imports tables listed within the parentheses, separated by commas, while EXCEPT imports all but the tables listed (excluded), also separated by commas.
This method can import specific tables, for example:
IMPORT FOREIGN SCHEMA "ORAUSER" limit to (SALARY) from server oradb into public;
IMPORT FOREIGN SCHEMA "ORAUSER" limit to (EMPLOYEE,SALARY) from server oradb into public;
View the created external tables:
Execute corresponding queries:
2.8 Multi-Table Joins with External Tables
First, create the kl_salary table in Klustron:
create table kl_salary (id int primary key, amount money);
insert into kl_salary values (1,10000.00);
insert into kl_salary values (2,8888.88);
Refer to the article “Klustron Access to MSSQL using tds_fdw” to prepare the following data in MSSQL:
create table empinfo (id int primary key, firstday varchar(10));
insert into empinfo values(1,'2018-09-01');
insert into empinfo values(1,'2010-07-15');
In Klustron, create the tds_fdw and an external table mssql_empinfo to associate with the empinfo table in MSSQL.
Users can then perform the following multi-table join, fetching data from Klustron, Oracle, and MSSQL:
select ora.ename,kl.amount,ms.firstday from kl_salary kl, ora_emp ora, mssql_empinfo ms
where kl.id=ora.empno and kl.id=ms.id;
View the execution plan:
It can be seen that data is joined from tables in Klustron, Oracle, and MSSQL.
03 Key Points
When using Zetuo Kunlun Klustron Cluster, if a transaction writes data within the cluster and simultaneously uses the compute node's FDW interface to write to several external database systems, those external databases are not part of the distributed transaction branches. Klustron does not perform two-phase commits or fault recovery on these external database systems.
As a result, at the global transaction level, data updates to external databases lack any ACID guarantees or data consistency and fault recovery assurance from Klustron. If, during such write operations, a Klustron compute node or external database instance unexpectedly exits (e.g., due to power failure), it's possible that some external database updates have been committed while others have not. Furthermore, if the external data sources themselves do not support transaction processing, faults during such write operations could lead to data corruption, loss, or inconsistency in the external data sources.