Self-Hosted AI Text Summarization¶
Overview¶
Learn how to invoke open-source Large Language Models (LLMs) directly from your Exasol database using UDFs and Ollama. This tutorial demonstrates a fully self-hosted AI pipeline where your data never leaves your infrastructure.
Why Self-Hosted?¶
Running open-source AI models in your own environment offers significant advantages:
- Cost Savings
No per-token API fees
No usage-based pricing
One-time model download, unlimited use
- Data Privacy & Security
Data never leaves your infrastructure
No third-party API calls
Full compliance control
- Open Source Freedom
Apache 2.0 licensed models (Mistral 7B)
No vendor lock-in
Modify and fine-tune as needed
- Performance Control
Predictable latency (~1-2 seconds per summary)
No rate limits
Scale on your own hardware
Architecture¶
┌─────────────────┐ HTTP Request ┌─────────────────┐
│ Exasol DB │ ────────────────────> │ Mistral model │
AI text <───── │ (Docker) │ (Python UDF) │ (Ollama) │
Summary SQL │ │ <──────────────────── │ │
└─────────────────┘ JSON Response └─────────────────┘
Prerequisites¶
Required:
Docker (for Exasol database)
Ollama installed locally
Basic SQL and Python knowledge
~4GB disk space (for Mistral model)
~15 minutes
Step 1: Start Exasol Database¶
If you don’t have an Exasol database running, set one up quickly using Docker. Exasol provides a free Docker image for development and testing which supports up to 10GB of data (plenty for this tutorial). Full instructions for deploying the Exasol Docker image can be found in the Exasol Docker repository.
Run Exasol in Docker:
docker run -d \
--name exasol \
-p 8563:8563 \
exasol/docker-db:latest
Connect using your SQL client using the default credentials for the Docker image. Unless you are using a prod instance or have modified the credentials, they are as follows:
Host:
localhost:8563(you will need to append the unique fingerprint for your instance - follow above documentation)User:
sysPassword:
exasol
DBvisualizer is the preferred client for Exasol, but you can use any SQL client of your choice including DBeaver, SQL Workbench, Visual Stutio or any JDBC-compatible tool.
Step 2: Prepare Sample Data¶
Create Schema and Table¶
-- Create a dedicated schema for the demo
CREATE SCHEMA IF NOT EXISTS DEMO;
-- Create the articles table
CREATE OR REPLACE TABLE DEMO.ARTICLES (
article VARCHAR(20000), -- Full article text
article_date VARCHAR(100), -- Publication date
heading VARCHAR(1000), -- Article headline
news_type VARCHAR(100) -- Category (e.g., business, sports)
);
Import CSV Data¶
You can import the data directly from the URL:
IMPORT INTO DEMO.ARTICLES
FROM CSV AT 'https://dut5tonqye28.cloudfront.net/ai_lab/gen-ai/articles_data.csv'
COLUMN SEPARATOR = ','
SKIP = 1; -- Skip header row
Alternatively, download articles_data.csv and import it locally:
IMPORT INTO DEMO.ARTICLES
FROM LOCAL CSV FILE '/path/to/articles_data.csv'
COLUMN SEPARATOR = ','
SKIP = 1; -- Skip header row
Verify the import:
SELECT COUNT(*) as total_articles FROM DEMO.ARTICLES;
Step 3: Set Up Ollama¶
Ollama is a local LLM management tool that makes it easy to download, run, and serve open-source models.
Install Ollama¶
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows
# Download installer from ollama.com
Download the Model¶
We chose Mistral 7B for the best balance of speed, quality, and open source licensing:
Model |
Speed |
Quality |
Notes |
|---|---|---|---|
qwen3:4b |
⚡⚡⚡ Fast (~1s) |
Good |
Includes verbose “thinking” output |
mistral:latest |
⚡⚡ Medium (~1-2s) |
Excellent |
Clean output, recommended |
gpt-oss:20b |
⚡ Slow (~5s+) |
Best |
Too slow for real-time use |
Download Mistral:
ollama pull mistral:latest
This downloads approximately 4GB. The model is fully open source under Apache 2.0 license.
Step 4: Test Ollama API¶
Find Your Local IP Address¶
Since Exasol runs in Docker, it needs to access Ollama via your machine’s local network IP (not localhost):
# macOS
ipconfig getifaddr en0
# Linux
hostname -I | awk '{print $1}'
Example output: 10.0.0.186
Test the Endpoint¶
curl http://YOUR_IP:11434/api/generate -d '{
"model": "mistral:latest",
"prompt": "Summarize in one sentence: This is a test article about AI.",
"stream": false
}'
Expected response (truncated):
{
"model": "mistral:latest",
"created_at": "2024-01-07T19:23:24.548236Z",
"response": "This article discusses AI technology.",
"done": true,
"total_duration": 1068666750
}
If you see this JSON response, Ollama is working correctly!
Note
Ollama is accessible on your local network by default. The UDF running in Docker can reach it using your machine’s IP address.
Step 5: Create UDFs¶
We will create Python UDFs in Exasol that call the Ollama API to summarize articles. Exasol’s User Defined Functions (UDF) framework allows us to run Python code directly within the database. UDFs support various languages: Python, R, Lua and Java. UDFs also have default access to a variety of packages including requests, which we will use to make HTTP calls to Ollama. UDFs can be either simple scalar functions or more advanced SET functions that emit multiple columns. We will write both a simple SET function and a more complex EMITS function to tackle different use cases.
Simple UDF: Summary Only¶
Returns just the summarized text. Best for: * Ad-hoc queries * Simple SELECT statements
Create the UDF:
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT DEMO.SUMMARIZE_ARTICLE(article_text VARCHAR(20000))
RETURNS VARCHAR(2000) AS
import requests
import json
def run(ctx):
"""
Summarizes article text using a local Ollama model.
Returns:
str: One-sentence summary, or error message if processing fails
"""
text = ctx.article_text
# Handle NULL or empty input
if text is None or text.strip() == '':
return None
try:
# Prepare request to Ollama API
payload = {
'model': 'mistral:latest',
'prompt': 'Summarize this news article in exactly one brief sentence: ' + text,
'stream': False,
'options': {
'temperature': 0.3, # Lower = more focused
'num_predict': 50 # Limit to ~50 tokens
}
}
# IMPORTANT: Replace 10.0.0.186 with YOUR machine's IP
response = requests.post(
'http://10.0.0.186:11434/api/generate',
json=payload,
timeout=15
)
response.raise_for_status()
result = response.json()
return result['response'].strip()
except Exception as e:
return 'ERROR: ' + str(e)
/
Important
Replace 10.0.0.186 with your actual machine’s IP address from Step 4.
Test the UDF:
SELECT
HEADING,
DEMO.SUMMARIZE_ARTICLE(ARTICLE) as summary
FROM DEMO.ARTICLES
LIMIT 5;
Note
The first time you call the UDF, Ollama may take a few extra seconds to load the model into memory.
Adding a reusable Ollama Connection¶
For the Ollama connection we can create an connection object within Exasol. This securely stores connection information and allows us to reuse the connection without hardcoding the details into the UDF.
First we build the connection object:
CREATE OR REPLACE CONNECTION OLLAMA_CONNECTION
TO 'http://10.0.0.186:11434/api/generate'
IDENTIFIED BY ''; -- No authentication needed
Next, we update the UDF to use the connection object:
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT DEMO.SUMMARIZE_ARTICLE_WITH_CONNECTION(
connection_name VARCHAR(200),
article_text VARCHAR(20000)
)
RETURNS VARCHAR(2000) AS
import requests
import json
def run(ctx):
"""
Summarizes article text using a local Ollama model via CONNECTION object.
Args:
connection_name: Name of Exasol CONNECTION object with Ollama URL
article_text: Text to summarize
Returns:
str: One-sentence summary, or error message if processing fails
"""
text = ctx.article_text
# Handle NULL or empty input
if text is None or text.strip() == '':
return None
try:
# Get Ollama endpoint from CONNECTION object
conn_info = exa.get_connection(ctx.connection_name)
ollama_url = conn_info.address
# Prepare request to Ollama API
payload = {
'model': 'mistral:latest',
'prompt': 'Summarize this news article in exactly one brief sentence: ' + text,
'stream': False,
'options': {
'temperature': 0.3,
'num_predict': 50
}
}
response = requests.post(
ollama_url,
json=payload,
timeout=15
)
response.raise_for_status()
result = response.json()
return result['response'].strip()
except Exception as e:
return 'ERROR: ' + str(e)
/
Finally, we test the new UDF:
SELECT
HEADING,
DEMO.SUMMARIZE_ARTICLE_WITH_CONNECTION('OLLAMA_CONNECTION', ARTICLE) as summary
FROM DEMO.ARTICLES
LIMIT 5;
Advanced UDF: Summary + Timing¶
Returns both summary and execution duration. Used the Ollama connection object. Also includes the header for unique row identification (future iterations could include row id). Best for:
Performance monitoring
Batch processing workflows
Production deployments
Analyzing model performance
Create the UDF:
CREATE OR REPLACE PYTHON3 SET SCRIPT DEMO.SUMMARIZE_ARTICLE_WITH_TIMING(
connection_name VARCHAR(200),
heading VARCHAR(1000),
article_text VARCHAR(20000)
)
EMITS (heading VARCHAR(1000), summary VARCHAR(2000), duration_seconds DOUBLE) AS
import requests
import json
def run(ctx):
"""
Summarizes articles and tracks execution time, using CONNECTION object.
Args:
connection_name: Name of Exasol CONNECTION object with Ollama URL
heading: Article heading (identifier)
article_text: Text to summarize
Emits:
heading (str): Original article heading
summary (str): One-sentence summary
duration_seconds (float): Processing time from Ollama
"""
# Get Ollama endpoint from CONNECTION object (once, outside the loop)
conn_info = exa.get_connection(ctx.connection_name)
ollama_url = conn_info.address
while True:
heading = ctx.heading
text = ctx.article_text
# Handle NULL input
if text is None:
ctx.emit(heading, None, None)
if not ctx.next():
break
continue
try:
payload = {
'model': 'mistral:latest',
'prompt': 'Summarize this news article in exactly one brief sentence: ' + text,
'stream': False,
'options': {
'temperature': 0.3,
'num_predict': 50
}
}
response = requests.post(
ollama_url,
json=payload,
timeout=15
)
response.raise_for_status()
result = response.json()
# Extract timing (nanoseconds → seconds)
duration = float(result.get('total_duration', 0)) / 1000000000.0
summary = result['response'].strip()
# Emit all three values
ctx.emit(heading, summary, duration)
except Exception as e:
ctx.emit(heading, 'ERROR: ' + str(e), 0.0)
if not ctx.next():
break
/
Warning
UDF scripts process the entire input table. You must explicitly limit the data before calling the UDF using temp tables or WHERE clauses.
Results¶
Sample output from our test run with 5 articles:
Duration (s) |
Summary |
|---|---|
2.83 |
Saina Nehwal attributes her success in badminton to hard work and dedication. |
2.07 |
Video shows Cristiano Ronaldo throwing reporter’s microphone into lake. |
2.34 |
Pakistan cricket team will offer full support to Mohammad Amir’s return. |
1.37 |
Barcelona do not plan to sell Neymar to Paris Saint Germain. |
1.89 |
Portugal and Hungary advance to Euro 2016 after exciting 3-3 draw. |
Performance Metrics:
Average: ~2.1 seconds per article
Range: 1.37s to 2.83s
Consistent quality across different article types
Next Steps¶
Expand UDF & Analytics tasks¶
Monitor summarization performance over time using duration.
Add enhanced error handling for better debugging experience.
Cache summarized responses to only generate summaries for new articles.
Extend the Concept¶
This pattern works for many AI tasks:
- Text Classification
Categorize documents, detect sentiment, identify topics
- Entity Extraction
Pull out names, dates, locations from unstructured text
- Question Answering
Query your data in natural language
- SQL Generation
Use models like
sqlcoderto generate queries from descriptions- Data Quality
Detect anomalies, fix formatting, validate content
Try Different Models¶
Explore Ollama’s model library:
# Specialized for SQL
ollama pull sqlcoder
# Code generation
ollama pull codellama
# Multilingual support
ollama pull aya
# Smaller, faster models
ollama pull phi3
Optimize Performance¶
Fine-tune model parameters:
'options': {
'temperature': 0.1, # More deterministic
'top_p': 0.9, # Nucleus sampling
'top_k': 40, # Limit vocabulary
'repeat_penalty': 1.1, # Reduce repetition
'num_predict': 100 # Max output tokens
}
Use quantized models for faster inference:
# Q4 quantization (4-bit, very fast)
ollama pull mistral:7b-q4
Scale Up¶
Multiple Models:
You can run different models best suited different tasks:
-- Summarization
CREATE SCRIPT SUMMARIZE_WITH_MISTRAL(...)
-- SQL generation
CREATE SCRIPT GENERATE_SQL_WITH_SQLCODER(...)
-- Classification
CREATE SCRIPT CLASSIFY_WITH_LLAMA(...)
It is also straightforward to modify the UDF to accept model name as a parameter, making it easier to reuse and switch models on the fly.
Distributed Processing:
For large-scale deployments, consider:
Multiple Ollama instances with load balancing
GPU acceleration for faster inference
Dedicated inference servers
Resources¶
Documentation:
Feedback¶
Contact us on the Exasol Community
—
Last updated: January 2026