Oracle Machine Learning: SQL-Native ML in the Database

Oracle Machine Learning (OML) is Oracle’s bet that machine learning belongs in the database, not on a separate Python or Spark cluster. It’s a real product with real capabilities — and also one that frequently surprises developers who didn’t know it existed.

This post is a practical introduction to what OML is, what it can do, and when it’s the right tool.

The two-sentence pitch

OML lets you train and apply machine learning models directly inside Oracle Database — using SQL, PL/SQL, Python, or R — without exporting data to a separate ML platform. The data stays in the database; the model runs where the data lives.

The three flavors

OML ships in three forms:

OML4SQL — SQL functions for prediction, classification, regression, clustering, and anomaly detection. You build a model with DBMS_DATA_MINING and then use it via SQL.
OML4Py — a Python client (with embedded execution) that runs Python code inside the database engine. Familiar pandas-style API; data never leaves Oracle.
OML4R — the same idea for R.

For most developers in Oracle environments, OML4SQL is the most accessible. You can train a model with a stored procedure and query it from any application that speaks SQL.

A concrete example

Suppose you have a customers table with purchase history and you want to predict churn. With OML4SQL:

-- Settings table describing the model
CREATE TABLE churn_settings (
  setting_name  VARCHAR2(30),
  setting_value VARCHAR2(4000)
);

INSERT INTO churn_settings VALUES ('ALGO_NAME', 'ALGO_RANDOM_FOREST');
INSERT INTO churn_settings VALUES ('PREP_AUTO', 'ON');

-- Train the model
BEGIN
  DBMS_DATA_MINING.CREATE_MODEL(
    model_name          => 'CHURN_MODEL',
    mining_function     => DBMS_DATA_MINING.CLASSIFICATION,
    data_table_name     => 'customer_training',
    case_id_column_name => 'customer_id',
    target_column_name  => 'churned',
    settings_table_name => 'churn_settings'
  );
END;

-- Apply the model to new data
SELECT
  customer_id,
  PREDICTION(CHURN_MODEL USING *)                    AS predicted_churn,
  PREDICTION_PROBABILITY(CHURN_MODEL, 1 USING *)     AS churn_probability
FROM customer_features;

The model is now a database object. Other queries can use it without re-training. Stored procedures, APEX apps, and external clients can all call it via SQL.

Where OML earns its place

Data stays in the database. No data exfiltration, no copy-and-sync, no second source of truth. For regulated data, this is significant.
No separate ML platform to operate. If you’re already running Oracle, OML adds capability without adding infrastructure.
Models are queryable. A model is just another schema object. You can grant access, audit usage, version it.
Scaling for free. OML uses the database’s existing parallelism. Training on a billion rows uses the same parallel-query infrastructure your other workloads use.
Multiple languages. SQL, Python, and R all access the same models. Data scientists can use familiar tools; developers can call from SQL.

Where it’s the wrong choice

Deep learning at scale. OML supports neural networks, but for serious deep-learning work (large transformer models, computer vision, etc.) you want GPUs and a framework like PyTorch. OML4Py can call external libraries, but the database isn’t the right host.
Iterating with notebooks against arbitrary data. If your data scientists live in Jupyter and need access to data from many sources, a separate platform fits better.
You’re not committed to Oracle. OML is Oracle-specific. If your data lives in Snowflake or BigQuery, use their equivalent (Snowpark, BigQuery ML).

The Autonomous Database connection

OML is included with Autonomous Database. If you’re on ATP or ADW, OML is sitting there ready to use — no separate licensing, no extra setup. This is one of the under-marketed reasons Autonomous Database is a strong choice for analytics-heavy workloads.

If you’re on on-prem Oracle, OML is included with Enterprise Edition (specifically the Advanced Analytics option). Verify your license covers it before assuming.

The OML Notebooks experience

Autonomous Database ships with OML Notebooks — a Zeppelin-based notebook environment built into the OCI console. SQL, Python, R, and Markdown cells side-by-side, running directly against the database. For data scientists who’d otherwise need to set up their own environment, this is a meaningful time saver.

When to evaluate it

OML deserves serious consideration if:

Your data lives in Oracle (or you’re already paying for it)
You’re doing classification, regression, clustering, or anomaly detection (the bread-and-butter ML workloads)
You want models accessible from SQL applications
You care about keeping data inside the database for compliance or operational reasons

Skip it if:

Your ML work is dominated by deep learning at scale
Your data sources are heterogeneous and Oracle is one of many
Your team’s expertise is firmly in the Python/Spark ecosystem

The honest summary

OML is one of the better-kept secrets in the Oracle stack. It does what it claims to do — train and apply real models in the database, with the same tools developers and DBAs already use. For workloads where the data is already in Oracle and the ML problem fits classical patterns, it’s frequently the simplest answer.

The mistake is treating it as a competitor to PyTorch or scikit-learn at scale. OML competes with not doing ML at all because the data is locked in Oracle. That’s a different problem, and OML solves it well.