# AI Chat Flow
**Document Version:** 1.0
**Last Updated:** 2026-01-10
**Status:** Production LIVE
**Flow Type:** AI-Powered Company Discovery & Chat
---
## Overview
This document describes the **complete AI chat flow** for the Norda Biznes Partner application, covering:
- **Chat Interface** (`/chat` route)
- **Conversation Management** (start, message, history)
- **Context Building** with full company database
- **Gemini API Integration** for intelligent responses
- **Cost Tracking** and performance metrics
- **Search Integration** for company discovery
**Key Technology:**
- **AI Model:** Google Gemini 2.5 Flash (gemini-2.5-flash)
- **Chat Engine:** NordaBizChatEngine (nordabiz_chat.py)
- **Gemini Service:** Centralized GeminiService (gemini_service.py)
- **Search Integration:** Unified SearchService (search_service.py)
- **Database:** PostgreSQL (conversations, messages, companies)
**Key Features:**
- Full company database context (all 80 companies available to AI)
- Multi-turn conversation with history (last 10 messages)
- Intelligent company selection by AI (no pre-filtering)
- Real-time cost tracking (tokens, latency, theoretical cost)
- Free tier usage monitoring (1,500 requests/day limit)
- Compact data format to minimize token usage
**Cost & Performance:**
- **Model:** Gemini 2.5 Flash
- **Pricing:** $0.075/$0.30 per 1M tokens (input/output)
- **Free Tier:** 1,500 requests/day, unlimited tokens
- **Typical Response:** 200-400ms latency, 5,000-15,000 tokens
- **Actual Cost:** $0.00 (free tier)
- **Theoretical Cost:** $0.003-0.006 per message
---
## 1. High-Level Chat Flow
### 1.1 Complete Chat Flow Diagram
```mermaid
flowchart TD
User[User] -->|1. Navigate to /chat| Browser[Browser]
Browser -->|2. GET /chat| Flask[Flask App
app.py]
Flask -->|3. Require login| AuthCheck{Authenticated?}
AuthCheck -->|No| Login[Redirect to /login]
AuthCheck -->|Yes| ChatUI[Render chat.html]
ChatUI -->|4. Load UI| Browser
Browser -->|5. POST /api/chat/start| Flask
Flask -->|6. Create conversation| ChatEngine[NordaBizChatEngine
nordabiz_chat.py]
ChatEngine -->|7. INSERT| ConvDB[(ai_chat_conversations)]
ConvDB -->|8. conversation_id| ChatEngine
ChatEngine -->|9. Return conversation| Flask
Flask -->|10. JSON response| Browser
Browser -->|11. User types message| UserInput[User Message]
UserInput -->|12. POST /api/chat/:id/message| Flask
Flask -->|13. Verify ownership| DB[(PostgreSQL)]
Flask -->|14. send_message| ChatEngine
ChatEngine -->|15. Save user message| MsgDB[(ai_chat_messages)]
ChatEngine -->|16. Build context| ContextBuilder[Context Builder
_build_conversation_context]
ContextBuilder -->|17. Load ALL companies| DB
ContextBuilder -->|18. Load last 10 messages| MsgDB
ContextBuilder -->|19. Compact format| Context[Full Context
JSON]
Context -->|20. Query AI| GeminiService[Gemini Service
gemini_service.py]
GeminiService -->|21. API call| GeminiAPI[Google Gemini API
gemini-2.5-flash]
GeminiAPI -->|22. AI response| GeminiService
GeminiService -->|23. Track cost| CostDB[(ai_api_costs)]
GeminiService -->|24. Response text| ChatEngine
ChatEngine -->|25. Count tokens| TokenCounter[Tokenizer]
TokenCounter -->|26. tokens_input, tokens_output| ChatEngine
ChatEngine -->|27. Save AI message| MsgDB
ChatEngine -->|28. Update conversation| ConvDB
ChatEngine -->|29. Return response| Flask
Flask -->|30. JSON + tech_info| Browser
Browser -->|31. Display message| User
style ChatEngine fill:#4CAF50
style GeminiService fill:#2196F3
style ContextBuilder fill:#FF9800
style DB fill:#9C27B0
```
---
## 2. Chat Initialization Flow
### 2.1 Start Conversation
**Route:** `POST /api/chat/start`
**File:** `app.py` (lines 3511-3533)
**Authentication:** Required (`@login_required`)
```mermaid
sequenceDiagram
actor User
participant Browser
participant Flask as Flask App
(app.py)
participant Engine as NordaBizChatEngine
(nordabiz_chat.py)
participant DB as PostgreSQL
(ai_chat_conversations)
User->>Browser: Click "Start Chat"
Browser->>Flask: POST /api/chat/start
{title: "Rozmowa..."}
Note over Flask: @login_required
Flask->>Flask: Get current_user.id
Flask->>Engine: start_conversation(
user_id=current_user.id,
title="Rozmowa - 2026-01-10 10:30"
)
Engine->>Engine: Auto-generate title if not provided
Engine->>DB: INSERT INTO ai_chat_conversations
(user_id, started_at, title,
conversation_type, is_active,
message_count, model_name)
DB->>Engine: conversation.id = 123
Engine->>Flask: Return AIChatConversation object
Flask->>Browser: JSON {
success: true,
conversation_id: 123,
title: "Rozmowa - 2026-01-10 10:30"
}
Browser->>User: Chat session ready
```
**Database Operation:**
```sql
INSERT INTO ai_chat_conversations (
user_id, started_at, conversation_type, title,
is_active, message_count, model_name, created_at
) VALUES (
?, NOW(), 'general', ?,
TRUE, 0, 'gemini-2.5-flash', NOW()
);
```
**Response:**
```json
{
"success": true,
"conversation_id": 123,
"title": "Rozmowa - 2026-01-10 10:30"
}
```
---
## 3. Message Flow (Core Chat Logic)
### 3.1 Send Message Sequence
**Route:** `POST /api/chat//message`
**File:** `app.py` (lines 3536-3603)
**Authentication:** Required (`@login_required`)
```mermaid
sequenceDiagram
actor User
participant Browser
participant Flask as Flask App
participant Engine as NordaBizChatEngine
participant DB as PostgreSQL
participant Context as Context Builder
participant Search as SearchService
participant Gemini as GeminiService
participant API as Gemini API
participant CostDB as ai_api_costs
User->>Browser: Type: "Kto robi strony www?"
Browser->>Flask: POST /api/chat/123/message
{message: "Kto robi strony www?"}
Note over Flask: Verify conversation ownership
Flask->>DB: SELECT * FROM ai_chat_conversations
WHERE id = 123 AND user_id = ?
DB->>Flask: Conversation found
Flask->>Engine: send_message(
conversation_id=123,
user_message="Kto robi strony www?",
user_id=current_user.id
)
Note over Engine: 1. Save user message
Engine->>DB: INSERT INTO ai_chat_messages
(conversation_id, role='user',
content="Kto robi strony www?")
DB->>Engine: Message saved
Note over Engine: 2. Build context with ALL companies
Engine->>Context: _build_conversation_context(
db, conversation, message
)
Context->>DB: SELECT * FROM companies
WHERE status = 'active'
DB->>Context: 80 companies
Context->>DB: SELECT * FROM ai_chat_messages
WHERE conversation_id = 123
ORDER BY created_at DESC
LIMIT 10
DB->>Context: Last 10 messages
Context->>Context: Build compact JSON format
(minimize tokens)
Context->>Engine: Return full context dict
Note over Engine: 3. Query AI with full context
Engine->>Gemini: generate_text(
prompt=system_prompt + context + history,
feature='ai_chat',
user_id=current_user.id,
temperature=0.7
)
Gemini->>API: POST /v1/models/gemini-2.5-flash:generateContent
API->>Gemini: AI response text
Note over Gemini: Track API cost to database
Gemini->>Gemini: Count tokens (input, output)
Gemini->>Gemini: Calculate cost
($0.075/$0.30 per 1M tokens)
Gemini->>CostDB: INSERT INTO ai_api_costs
(api_provider, model_name, feature,
tokens, cost, latency_ms)
Gemini->>Engine: Return response text
Note over Engine: 4. Calculate per-message metrics
Engine->>Engine: tokenizer.count_tokens(user_message)
Engine->>Engine: tokenizer.count_tokens(response)
Engine->>Engine: Calculate latency_ms, cost_usd
Note over Engine: 5. Save AI response
Engine->>DB: INSERT INTO ai_chat_messages
(conversation_id, role='assistant',
content=response, tokens_input,
tokens_output, cost_usd, latency_ms)
Note over Engine: 6. Update conversation stats
Engine->>DB: UPDATE ai_chat_conversations
SET message_count = message_count + 2,
updated_at = NOW()
WHERE id = 123
Engine->>Flask: Return AIChatMessage object
Note over Flask: Get free tier usage stats
Flask->>CostDB: SELECT COUNT(*), SUM(tokens)
FROM ai_api_costs
WHERE DATE(timestamp) = TODAY()
CostDB->>Flask: requests_today, tokens_today
Flask->>Browser: JSON {
success: true,
message: "PIXLAB, WebStorm...",
tech_info: {...}
}
Browser->>User: Display AI response
```
### 3.2 Message Implementation Details
**Input Validation:**
- Message cannot be empty (`.strip()` check)
- Conversation ownership verified (user_id match)
- Conversation must exist and be active
**Database Operations:**
```sql
-- Save user message
INSERT INTO ai_chat_messages (
conversation_id, created_at, role, content,
edited, regenerated
) VALUES (?, NOW(), 'user', ?, FALSE, FALSE);
-- Save AI response with metrics
INSERT INTO ai_chat_messages (
conversation_id, created_at, role, content,
tokens_input, tokens_output, cost_usd, latency_ms,
edited, regenerated
) VALUES (?, NOW(), 'assistant', ?, ?, ?, ?, ?, FALSE, FALSE);
-- Update conversation
UPDATE ai_chat_conversations
SET message_count = message_count + 2,
updated_at = NOW()
WHERE id = ?;
```
**Response Format:**
```json
{
"success": true,
"message": "Znalazłem kilka firm zajmujących się stronami www: PIXLAB (www.pixlab.pl, tel: 509 509 689), WebStorm Agencja Interaktywna...",
"message_id": 456,
"created_at": "2026-01-10T10:35:22.123456",
"tech_info": {
"model": "gemini-2.5-flash",
"data_source": "PostgreSQL (80 firm Norda Biznes)",
"architecture": "Full DB Context (wszystkie firmy w kontekście AI)",
"tokens_input": 8543,
"tokens_output": 234,
"tokens_total": 8777,
"latency_ms": 342,
"theoretical_cost_usd": 0.00128,
"actual_cost_usd": 0.0,
"free_tier": {
"is_free": true,
"daily_limit": 1500,
"requests_today": 47,
"tokens_today": 423891,
"remaining": 1453
}
}
}
```
---
## 4. Context Building (Core Intelligence)
### 4.1 Context Building Flow
**Method:** `_build_conversation_context(db, conversation, current_message)`
**File:** `nordabiz_chat.py` (lines 254-310)
**Strategy:** Full database context (AI does intelligent filtering)
```mermaid
flowchart TD
Start([User Message:
"Kto robi strony www?"]) --> LoadCompanies[Load ALL active companies
FROM companies WHERE status='active']
LoadCompanies --> Count[Total: 80 companies]
Count --> LoadCategories[Load all categories with counts]
LoadCategories --> LoadHistory[Load last 10 conversation messages
ORDER BY created_at DESC]
LoadHistory --> BuildContext[Build context dict]
BuildContext --> CompactFormat[Convert ALL companies
to compact format]
CompactFormat --> CompactLoop{For each
company}
CompactLoop -->|Process| CompactFields[Include only non-empty fields:
- name, cat (category)
- desc (description_short)
- history (founding_history)
- svc (services)
- comp (competencies)
- web, tel, mail
- city, year
- cert (top 3 certifications)]
CompactFields --> SaveTokens[Save tokens by:
- Short field names
- Omit empty fields
- Limit certs to 3]
SaveTokens --> NextCompany{More
companies?}
NextCompany -->|Yes| CompactLoop
NextCompany -->|No| ContextReady[Context ready]
ContextReady --> ContextDict{Context Dictionary}
ContextDict --> Field1[conversation_type: 'general']
ContextDict --> Field2[total_companies: 80]
ContextDict --> Field3[categories: Array]
ContextDict --> Field4[all_companies: Array
~8,000-12,000 tokens]
ContextDict --> Field5[recent_messages: Array
Last 10 messages]
Field1 & Field2 & Field3 & Field4 & Field5 --> Return[Return to _query_ai]
style BuildContext fill:#4CAF50
style CompactFormat fill:#FF9800
style ContextDict fill:#2196F3
```
### 4.2 Compact Company Format
**Purpose:** Minimize token usage while preserving all important data
**Example Company Object:**
```json
{
"name": "PIXLAB Sp. z o.o.",
"cat": "IT i Technologie",
"desc": "Agencja interaktywna - strony www, sklepy online, aplikacje",
"history": "Założona przez Macieja Pieńczyńskiego w 2015 roku",
"svc": ["Strony WWW", "E-commerce", "Aplikacje webowe", "SEO"],
"comp": ["WordPress", "Shopify", "React", "Node.js"],
"web": "https://pixlab.pl",
"tel": "509 509 689",
"mail": "kontakt@pixlab.pl",
"city": "Wejherowo",
"year": 2015,
"cert": ["ISO 9001", "Google Partner"]
}
```
**Token Savings:**
- Short field names: `svc` instead of `services` (-40%)
- Omit empty fields: Only include if data exists (-30%)
- Limit certifications: Top 3 instead of all (-20%)
- Compact JSON: No extra whitespace (-10%)
**Typical Token Usage:**
- Single company: ~100-150 tokens (compact)
- All 80 companies: ~8,000-12,000 tokens
- System prompt: ~500 tokens
- Conversation history (10 msgs): ~1,000-2,000 tokens
- **Total input:** ~10,000-15,000 tokens
---
## 5. AI Query & Prompt Engineering
### 5.1 AI Query Flow
**Method:** `_query_ai(context, user_message, user_id)`
**File:** `nordabiz_chat.py` (lines 406-481)
```mermaid
flowchart TD
Start([Context + User Message]) --> BuildPrompt[Build system prompt]
BuildPrompt --> SystemPrompt[SYSTEM PROMPT:
- Role definition
- Database stats
- Instructions
- Data format guide]
SystemPrompt --> AddCompanies[Add ALL companies JSON
~8,000-12,000 tokens]
AddCompanies --> AddHistory[Add conversation history
Last 10 messages]
AddHistory --> AddUserMsg[Add current user message]
AddUserMsg --> FullPrompt[Complete prompt ready
~10,000-15,000 tokens]
FullPrompt --> UseGlobal{use_global_service?}
UseGlobal -->|Yes (default)| GeminiSvc[gemini_service.generate_text]
UseGlobal -->|No (legacy)| DirectAPI[model.generate_content]
GeminiSvc --> AutoCost[Automatic cost tracking
to ai_api_costs table]
DirectAPI --> NoCost[No cost tracking]
AutoCost --> APICall[Gemini API Call
gemini-2.5-flash]
NoCost --> APICall
APICall --> Response[AI Response
~200-400 tokens]
Response --> Return[Return response text]
style SystemPrompt fill:#4CAF50
style GeminiSvc fill:#2196F3
style AutoCost fill:#FF9800
```
### 5.2 System Prompt Structure
**File:** `nordabiz_chat.py` (lines 426-458)
```
Jesteś pomocnym asystentem portalu Norda Biznes - katalogu firm
zrzeszonych w stowarzyszeniu Norda Biznes z Wejherowa.
📊 MASZ DOSTĘP DO PEŁNEJ BAZY DANYCH:
- Liczba firm: 80
- Kategorie: IT i Technologie (25), Budownictwo (18), Usługi (15), ...
🎯 TWOJA ROLA:
- Analizujesz CAŁĄ bazę firm i wybierasz najlepsze dopasowania do pytania
- Odpowiadasz zwięźle (2-3 zdania), chyba że użytkownik prosi o szczegóły
- Podajesz konkretne nazwy firm z kontaktem
- Możesz wyszukiwać po: nazwie, usługach, kompetencjach, właścicielach, mieście
📋 FORMAT DANYCH (skróty):
- name: nazwa firmy
- cat: kategoria
- desc: krótki opis
- history: historia firmy, właściciele, założyciele
- svc: usługi
- comp: kompetencje
- web/tel/mail: kontakt
- city: miasto
- cert: certyfikaty
⚠️ WAŻNE:
- ZAWSZE podawaj nazwę firmy i kontakt (tel/web/mail jeśli dostępne)
- Jeśli pytanie o osobę (np. "kto to Roszman") - szukaj w polu "history"
- Odpowiadaj PO POLSKU
🏢 PEŁNA BAZA FIRM (wybierz najlepsze):
[JSON array with all 80 companies in compact format]
# HISTORIA ROZMOWY:
Użytkownik: [previous message 1]
Ty: [previous response 1]
Użytkownik: [previous message 2]
Ty: [previous response 2]
...
Użytkownik: Kto robi strony www?
Ty:
```
**Prompt Engineering Principles:**
1. **Clear role definition:** "Jesteś pomocnym asystentem..."
2. **Database context:** Total companies, category distribution
3. **Response guidelines:** Concise (2-3 sentences), specific contacts
4. **Data format guide:** Field name abbreviations explained
5. **Search capabilities:** What AI can search by
6. **Important notes:** Always include contact, search in "history" for people
7. **Language:** Always respond in Polish
8. **Full context:** ALL companies provided (AI does filtering)
9. **Conversation history:** Last 10 messages for context continuity
---
## 6. Cost Tracking & Performance
### 6.1 Dual Cost Tracking System
The application uses **TWO levels** of cost tracking:
**Level 1: Global API Cost Tracking** (ai_api_costs table)
- Managed by `gemini_service.py`
- Tracks ALL Gemini API calls (chat, image analysis, etc.)
- Automatic via `_log_api_cost()` method
**Level 2: Per-Message Chat Metrics** (ai_chat_messages table)
- Managed by `nordabiz_chat.py`
- Tracks tokens, cost, latency per chat message
- User-facing metrics for transparency
### 6.2 Cost Tracking Flow
```mermaid
sequenceDiagram
participant Engine as NordaBizChatEngine
participant Gemini as GeminiService
participant API as Gemini API
participant GlobalDB as ai_api_costs
participant ChatDB as ai_chat_messages
Engine->>Gemini: generate_text(
prompt, feature='ai_chat',
user_id=123
)
Note over Gemini: Start timer
Gemini->>API: POST /generateContent
API->>Gemini: Response text
Note over Gemini: Stop timer (latency_ms)
Note over Gemini: Count tokens
Gemini->>Gemini: input_tokens = count_tokens(prompt)
Gemini->>Gemini: output_tokens = count_tokens(response)
Note over Gemini: Calculate cost
Gemini->>Gemini: input_cost = (input/1M) * $0.075
Gemini->>Gemini: output_cost = (output/1M) * $0.30
Gemini->>Gemini: total_cost = input + output
Note over Gemini: Global cost tracking
Gemini->>GlobalDB: INSERT INTO ai_api_costs
(api_provider='gemini',
model='gemini-2.5-flash',
feature='ai_chat',
user_id=123,
tokens, cost, latency)
Gemini->>Engine: Return response text
Note over Engine: Per-message tracking
Engine->>Engine: tokenizer.count_tokens(user_msg)
Engine->>Engine: tokenizer.count_tokens(response)
Engine->>Engine: Calculate cost again (for message record)
Engine->>ChatDB: INSERT INTO ai_chat_messages
(role='assistant',
content, tokens_input,
tokens_output, cost_usd,
latency_ms)
```
### 6.3 Cost Calculation
**Gemini 2.5 Flash Pricing:**
- **Input:** $0.075 per 1M tokens
- **Output:** $0.30 per 1M tokens
- **Free Tier:** 1,500 requests/day (unlimited tokens)
**Typical Chat Message:**
```
Input: 10,000 tokens (system prompt + companies + history) = $0.00075
Output: 300 tokens (AI response) = $0.00009
Total: = $0.00084
```
**Daily Usage Estimate:**
- 100 chat messages/day
- Average 10,000 input + 300 output tokens
- Theoretical cost: $0.084/day ($2.52/month)
- **Actual cost: $0.00** (free tier covers all usage)
### 6.4 Free Tier Monitoring
**Function:** `get_free_tier_usage()`
**File:** `app.py`
```python
def get_free_tier_usage():
"""Get free tier usage stats for today"""
db = SessionLocal()
try:
today_start = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
stats = db.query(
func.count(AIAPICostLog.id).label('requests'),
func.sum(AIAPICostLog.total_tokens).label('tokens')
).filter(
AIAPICostLog.timestamp >= today_start,
AIAPICostLog.api_provider == 'gemini',
AIAPICostLog.success == True
).first()
return {
'requests_today': stats.requests or 0,
'tokens_today': stats.tokens or 0,
'daily_limit': 1500,
'remaining': max(0, 1500 - (stats.requests or 0))
}
finally:
db.close()
```
**Response in `/api/chat/:id/message`:**
```json
{
"tech_info": {
"free_tier": {
"is_free": true,
"daily_limit": 1500,
"requests_today": 47,
"tokens_today": 423891,
"remaining": 1453
}
}
}
```
---
## 7. Conversation History
### 7.1 Get History Flow
**Route:** `GET /api/chat//history`
**File:** `app.py` (lines 3606-3634)
**Authentication:** Required (`@login_required`)
```mermaid
sequenceDiagram
actor User
participant Browser
participant Flask as Flask App
participant Engine as NordaBizChatEngine
participant DB as ai_chat_messages
User->>Browser: Load chat history
Browser->>Flask: GET /api/chat/123/history
Note over Flask: Verify ownership
Flask->>DB: SELECT * FROM ai_chat_conversations
WHERE id = 123 AND user_id = ?
DB->>Flask: Conversation found
Flask->>Engine: get_conversation_history(123)
Engine->>DB: SELECT * FROM ai_chat_messages
WHERE conversation_id = 123
ORDER BY created_at ASC
DB->>Engine: All messages in conversation
Engine->>Engine: Format messages as dicts
Engine->>Flask: Return messages array
Flask->>Browser: JSON {
success: true,
messages: [...]
}
Browser->>User: Display conversation history
```
**Response Format:**
```json
{
"success": true,
"messages": [
{
"id": 789,
"role": "user",
"content": "Kto robi strony www?",
"created_at": "2026-01-10T10:35:00.123456",
"tokens_input": 0,
"tokens_output": 0,
"cost_usd": 0.0,
"latency_ms": 0
},
{
"id": 790,
"role": "assistant",
"content": "Znalazłem kilka firm zajmujących się stronami www...",
"created_at": "2026-01-10T10:35:02.456789",
"tokens_input": 8543,
"tokens_output": 234,
"cost_usd": 0.00128,
"latency_ms": 342
}
]
}
```
---
## 8. Database Schema
### 8.1 Conversation Tables
**ai_chat_conversations** (conversation metadata)
```sql
CREATE TABLE ai_chat_conversations (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
started_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP,
conversation_type VARCHAR(50) DEFAULT 'general',
title VARCHAR(500),
is_active BOOLEAN DEFAULT TRUE,
message_count INTEGER DEFAULT 0,
model_name VARCHAR(100)
);
CREATE INDEX idx_chat_conv_user_id ON ai_chat_conversations(user_id);
CREATE INDEX idx_chat_conv_started_at ON ai_chat_conversations(started_at DESC);
```
**ai_chat_messages** (individual messages)
```sql
CREATE TABLE ai_chat_messages (
id SERIAL PRIMARY KEY,
conversation_id INTEGER NOT NULL REFERENCES ai_chat_conversations(id) ON DELETE CASCADE,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
role VARCHAR(20) NOT NULL, -- 'user' or 'assistant'
content TEXT NOT NULL,
tokens_input INTEGER,
tokens_output INTEGER,
cost_usd DECIMAL(10,6),
latency_ms INTEGER,
edited BOOLEAN DEFAULT FALSE,
regenerated BOOLEAN DEFAULT FALSE
);
CREATE INDEX idx_chat_msg_conv_id ON ai_chat_messages(conversation_id);
CREATE INDEX idx_chat_msg_created_at ON ai_chat_messages(created_at);
```
**ai_api_costs** (global API cost tracking)
```sql
CREATE TABLE ai_api_costs (
id SERIAL PRIMARY KEY,
timestamp TIMESTAMP NOT NULL DEFAULT NOW(),
api_provider VARCHAR(50) NOT NULL, -- 'gemini'
model_name VARCHAR(100), -- 'gemini-2.5-flash'
feature VARCHAR(100), -- 'ai_chat', 'image_analysis', etc.
user_id INTEGER REFERENCES users(id),
input_tokens INTEGER,
output_tokens INTEGER,
total_tokens INTEGER,
input_cost DECIMAL(10,6),
output_cost DECIMAL(10,6),
total_cost DECIMAL(10,6),
success BOOLEAN DEFAULT TRUE,
error_message TEXT,
latency_ms INTEGER,
prompt_hash VARCHAR(64)
);
CREATE INDEX idx_api_costs_timestamp ON ai_api_costs(timestamp DESC);
CREATE INDEX idx_api_costs_provider ON ai_api_costs(api_provider);
CREATE INDEX idx_api_costs_feature ON ai_api_costs(feature);
CREATE INDEX idx_api_costs_user_id ON ai_api_costs(user_id);
```
### 8.2 Entity Relationships
```mermaid
erDiagram
users ||--o{ ai_chat_conversations : "has many"
ai_chat_conversations ||--o{ ai_chat_messages : "contains"
users ||--o{ ai_api_costs : "generates"
users {
int id PK
varchar email
varchar name
boolean is_admin
}
ai_chat_conversations {
int id PK
int user_id FK
timestamp started_at
varchar conversation_type
varchar title
boolean is_active
int message_count
varchar model_name
}
ai_chat_messages {
int id PK
int conversation_id FK
timestamp created_at
varchar role
text content
int tokens_input
int tokens_output
decimal cost_usd
int latency_ms
}
ai_api_costs {
int id PK
timestamp timestamp
varchar api_provider
varchar model_name
varchar feature
int user_id FK
int total_tokens
decimal total_cost
int latency_ms
}
```
---
## 9. Error Handling
### 9.1 Common Error Scenarios
**1. Conversation Not Found**
```python
# app.py
conversation = db.query(AIChatConversation).filter_by(
id=conversation_id,
user_id=current_user.id
).first()
if not conversation:
return jsonify({
'success': False,
'error': 'Conversation not found'
}), 404
```
**2. Empty Message**
```python
message = data.get('message', '').strip()
if not message:
return jsonify({
'success': False,
'error': 'Wiadomość nie może być pusta'
}), 400
```
**3. Gemini API Error**
```python
# gemini_service.py
try:
response = self.model.generate_content(prompt)
# Check safety filters
if not response.candidates:
raise Exception("Response blocked by safety filters")
# Check finish reason
if candidate.finish_reason not in [1, 0]: # STOP or UNSPECIFIED
raise Exception(f"Response incomplete: {finish_reason}")
except Exception as e:
logger.error(f"Gemini API error: {e}")
# Log failed request to database
self._log_api_cost(
prompt=prompt,
response_text='',
input_tokens=self.count_tokens(prompt),
output_tokens=0,
success=False,
error_message=str(e)
)
raise Exception(f"Gemini API call failed: {str(e)}")
```
**4. Database Connection Error**
```python
# nordabiz_chat.py
db = SessionLocal()
try:
# Database operations
conversation = db.query(AIChatConversation).filter_by(id=conversation_id).first()
# ...
finally:
db.close() # Always close connection
```
### 9.2 Error Response Format
```json
{
"success": false,
"error": "Conversation not found"
}
```
**HTTP Status Codes:**
- `400` - Bad Request (empty message, invalid input)
- `404` - Not Found (conversation doesn't exist)
- `500` - Internal Server Error (Gemini API failure, database error)
---
## 10. Search Integration
### 10.1 Search Service Integration
**Method:** `_find_relevant_companies(db, message)`
**File:** `nordabiz_chat.py` (lines 383-404)
**Status:** DEPRECATED (kept for reference, not used in production)
**Historical Context:**
The chat engine originally used SearchService to **pre-filter** companies before sending to AI:
```python
# OLD APPROACH (deprecated):
def _find_relevant_companies(self, db, message):
"""Find companies relevant to user's message"""
results = search_companies(db, message, limit=10)
return [result.company for result in results]
# In _build_conversation_context:
relevant_companies = self._find_relevant_companies(db, current_message)
context['companies'] = [self._company_to_compact_dict(c) for c in relevant_companies]
```
**Current Approach:**
Send **ALL companies** to AI and let it do intelligent filtering:
```python
# NEW APPROACH (current production):
def _build_conversation_context(self, db, conversation, current_message):
"""Build context with ALL companies (not pre-filtered)"""
all_companies = db.query(Company).filter_by(status='active').all()
context['all_companies'] = [
self._company_to_compact_dict(c)
for c in all_companies
]
return context
```
**Why the Change?**
| Aspect | Old (Pre-filtered) | New (Full Context) |
|--------|-------------------|-------------------|
| **Companies sent** | 8-10 (search filtered) | 80 (all active) |
| **Token usage** | ~1,500 tokens | ~10,000 tokens |
| **Search quality** | Keyword-based, limited | AI-powered, intelligent |
| **Multi-criteria** | Difficult | Excellent |
| **Owner searches** | Impossible | Works perfectly |
| **Cost** | $0.0001/msg | $0.0008/msg |
| **User experience** | Sometimes misses results | Always comprehensive |
**Example:**
- User: "Kto to Roszman?" (Who is Roszman?)
- Old approach: Search for "roszman" in services/competencies → 0 results ❌
- New approach: AI searches `founding_history` field → Finds company owner ✅
---
## 11. Performance & Optimization
### 11.1 Performance Metrics
**Typical Chat Message:**
- **Latency:** 200-400ms
- **Input tokens:** 8,000-15,000 (system prompt + 80 companies + history)
- **Output tokens:** 200-500 (AI response)
- **Total tokens:** 8,500-15,500
- **Theoretical cost:** $0.0008-0.0015
- **Actual cost:** $0.00 (free tier)
**Database Queries:**
- Conversation lookup: ~5ms (indexed on user_id, id)
- All companies query: ~50ms (80 rows, no complex joins)
- Last 10 messages: ~10ms (indexed on conversation_id, created_at)
- **Total DB time:** ~65ms
**Gemini API:**
- Network latency: ~100-200ms
- Processing time: ~100-200ms
- **Total API time:** ~250-350ms
### 11.2 Token Optimization Strategies
**1. Compact Field Names**
```python
# GOOD (saves ~40% tokens):
{"name": "PIXLAB", "svc": ["WWW", "SEO"], "comp": ["WordPress"]}
# BAD (wasteful):
{"company_name": "PIXLAB", "services": ["WWW", "SEO"], "competencies": ["WordPress"]}
```
**2. Omit Empty Fields**
```python
# GOOD:
compact = {"name": c.name}
if c.description_short:
compact['desc'] = c.description_short
# Only adds field if data exists
# BAD:
compact = {
"name": c.name,
"desc": c.description_short or "", # Wastes tokens on ""
}
```
**3. Limit Arrays**
```python
# GOOD (top 3 certifications):
if c.certifications:
compact['cert'] = [cert.name for cert in c.certifications[:3]]
# BAD (all certifications):
compact['cert'] = [cert.name for cert in c.certifications] # May be 10+
```
**4. Compact JSON (no whitespace)**
```python
# GOOD:
json.dumps(data, ensure_ascii=False, indent=None)
# {"name":"PIXLAB","svc":["WWW"]}
# BAD:
json.dumps(data, ensure_ascii=False, indent=2)
# {
# "name": "PIXLAB",
# "svc": ["WWW"]
# }
```
**Token Savings:**
- Single company: 200 tokens → 100 tokens (50% reduction)
- 80 companies: 16,000 tokens → 8,000 tokens (50% reduction)
- Cost savings: $0.0016 → $0.0008 per message (50% reduction)
### 11.3 Caching Opportunities (Future)
**Not Currently Implemented** (all companies loaded per message)
**Potential Optimizations:**
1. **Company data caching** (Redis)
- Cache all companies JSON for 5 minutes
- Invalidate on company data changes
- Reduce DB query time: 50ms → 5ms
2. **Prompt template caching**
- Cache system prompt template
- Only rebuild when companies change
3. **Conversation context caching**
- Cache last 10 messages per conversation
- Invalidate on new message
- Reduce DB query time: 10ms → 1ms
**Why Not Implemented Yet:**
- Current performance is acceptable (250-350ms total)
- Free tier has no rate limits on DB queries
- Premature optimization (80 companies is small dataset)
- Complexity vs. benefit tradeoff
---
## 12. Security & Access Control
### 12.1 Authentication & Authorization
**All chat routes require authentication:**
```python
@app.route('/chat')
@login_required
def chat():
"""AI Chat interface"""
return render_template('chat.html')
@app.route('/api/chat/start', methods=['POST'])
@login_required
def chat_start():
# Only logged-in users can start conversations
...
@app.route('/api/chat//message', methods=['POST'])
@login_required
def chat_send_message(conversation_id):
# Verify conversation ownership
conversation = db.query(AIChatConversation).filter_by(
id=conversation_id,
user_id=current_user.id # IMPORTANT: Ownership check
).first()
if not conversation:
return jsonify({'error': 'Conversation not found'}), 404
...
```
### 12.2 Input Sanitization
**User message sanitization:**
```python
# app.py
message = data.get('message', '').strip()
# No HTML/JavaScript injection possible
# Gemini API treats all input as plain text
# Database stores as TEXT (no code execution)
```
**No SQL Injection:**
```python
# Safe (parameterized query):
conversation = db.query(AIChatConversation).filter_by(
id=conversation_id,
user_id=current_user.id
).first()
# PostgreSQL parameters prevent SQL injection
```
### 12.3 Rate Limiting
**Gemini API Free Tier Limits:**
- 1,500 requests/day
- No per-minute limit
- No token limit
**Application-Level Limits:**
- No specific rate limiting on chat endpoints (yet)
- User must be logged in (reduces abuse)
- Flask-Limiter can be added if needed
**Future Rate Limiting:**
```python
from flask_limiter import Limiter
limiter = Limiter(app, key_func=lambda: current_user.id)
@app.route('/api/chat//message', methods=['POST'])
@login_required
@limiter.limit("60 per hour") # 60 messages per hour per user
def chat_send_message(conversation_id):
...
```
---
## 13. Monitoring & Debugging
### 13.1 Cost Tracking Queries
**Daily API usage:**
```sql
SELECT
DATE(timestamp) as date,
COUNT(*) as requests,
SUM(total_tokens) as tokens,
SUM(total_cost) as cost_usd
FROM ai_api_costs
WHERE api_provider = 'gemini'
AND feature = 'ai_chat'
GROUP BY DATE(timestamp)
ORDER BY date DESC;
```
**Top users by API usage:**
```sql
SELECT
u.name,
u.email,
COUNT(*) as chat_messages,
SUM(c.total_tokens) as total_tokens,
SUM(c.total_cost) as total_cost_usd
FROM ai_api_costs c
JOIN users u ON c.user_id = u.id
WHERE c.api_provider = 'gemini'
AND c.feature = 'ai_chat'
GROUP BY u.id, u.name, u.email
ORDER BY total_cost_usd DESC
LIMIT 10;
```
**Free tier usage today:**
```sql
SELECT
COUNT(*) as requests_today,
SUM(total_tokens) as tokens_today,
1500 - COUNT(*) as remaining_requests
FROM ai_api_costs
WHERE DATE(timestamp) = CURRENT_DATE
AND api_provider = 'gemini'
AND success = TRUE;
```
### 13.2 Chat Analytics
**Most active conversations:**
```sql
SELECT
c.id,
c.title,
u.name as user_name,
c.message_count,
c.started_at,
c.updated_at
FROM ai_chat_conversations c
JOIN users u ON c.user_id = u.id
WHERE c.is_active = TRUE
ORDER BY c.message_count DESC
LIMIT 20;
```
**Average response metrics:**
```sql
SELECT
AVG(tokens_input) as avg_input_tokens,
AVG(tokens_output) as avg_output_tokens,
AVG(latency_ms) as avg_latency_ms,
AVG(cost_usd) as avg_cost_usd
FROM ai_chat_messages
WHERE role = 'assistant'
AND created_at > NOW() - INTERVAL '7 days';
```
### 13.3 Error Monitoring
**Failed API requests:**
```sql
SELECT
timestamp,
model_name,
feature,
error_message,
latency_ms
FROM ai_api_costs
WHERE success = FALSE
AND api_provider = 'gemini'
ORDER BY timestamp DESC
LIMIT 20;
```
**Conversations with errors:**
```sql
-- Conversations where last message is from user (AI didn't respond)
SELECT
c.id,
c.title,
c.message_count,
c.updated_at,
(SELECT content FROM ai_chat_messages
WHERE conversation_id = c.id
ORDER BY created_at DESC LIMIT 1) as last_message
FROM ai_chat_conversations c
WHERE c.message_count % 2 = 1 -- Odd number (user message without response)
AND c.updated_at > NOW() - INTERVAL '1 hour'
ORDER BY c.updated_at DESC;
```
---
## 14. Future Enhancements
### 14.1 Planned Features
**1. Conversation Context Memory**
- Remember user preferences across sessions
- "Remember that I'm looking for IT services"
- Personalized recommendations
**2. Conversation Sharing**
- Share conversation URL with other users
- Public vs. private conversations
- Embed chat widget on company profiles
**3. Voice Input/Output**
- Web Speech API for voice input
- Text-to-speech for AI responses
- Hands-free interaction
**4. Multi-Modal Input**
- Upload images (company logo, product photos)
- Gemini Vision API for image analysis
- "Find companies similar to this logo"
**5. Conversation Search**
- Full-text search across all user conversations
- Filter by date, company mentioned, topic
- Export conversation history
**6. Advanced Analytics**
- Which companies are most recommended by AI?
- What services are users asking about most?
- Conversation funnel (browse → chat → contact)
### 14.2 Optimization Opportunities
**1. Redis Caching**
```python
# Cache all companies JSON
redis_key = f"companies:all:{version_hash}"
cached = redis.get(redis_key)
if cached:
all_companies = json.loads(cached)
else:
all_companies = load_from_db()
redis.setex(redis_key, 300, json.dumps(all_companies)) # 5 min TTL
```
**2. Prompt Compression**
- Use Gemini's context caching feature (when available)
- Cache system prompt + company database
- Only send new user message (save 90% tokens)
**3. Streaming Responses**
```python
@app.route('/api/chat//message', methods=['POST'])
def chat_send_message(conversation_id):
# Enable streaming
response = gemini_service.generate_text(
prompt=full_prompt,
stream=True # Return generator
)
# Server-Sent Events (SSE)
def generate():
for chunk in response:
yield f"data: {json.dumps({'text': chunk.text})}\n\n"
return Response(generate(), mimetype='text/event-stream')
```
**4. Conversation Summarization**
- Auto-summarize conversations > 20 messages
- Include summary instead of full history
- Reduce token usage by 50%
---
## 15. Troubleshooting Guide
### 15.1 Common Issues
**Issue: "Conversation not found" error**
```
Cause: User trying to access someone else's conversation
Fix: Verify conversation_id belongs to current_user.id
SQL Debug:
SELECT id, user_id FROM ai_chat_conversations WHERE id = 123;
```
**Issue: Empty AI responses**
```
Cause: Gemini safety filters blocking response
Fix: Check ai_api_costs for error_message
SQL Debug:
SELECT error_message, prompt_hash FROM ai_api_costs
WHERE success = FALSE ORDER BY timestamp DESC LIMIT 10;
```
**Issue: Slow response times (> 1 second)**
```
Cause: Large context (many companies, long history)
Fix: Check token counts, consider summarization
SQL Debug:
SELECT tokens_input, tokens_output, latency_ms
FROM ai_chat_messages
WHERE latency_ms > 1000
ORDER BY created_at DESC LIMIT 20;
```
**Issue: "Free tier limit exceeded"**
```
Cause: > 1,500 requests in 24 hours
Fix: Wait for quota reset (midnight Pacific Time)
SQL Debug:
SELECT COUNT(*) FROM ai_api_costs
WHERE DATE(timestamp) = CURRENT_DATE AND api_provider = 'gemini';
```
### 15.2 Diagnostic Commands
**Check Gemini API connectivity:**
```bash
python3 -c "
from gemini_service import GeminiService
svc = GeminiService()
response = svc.generate_text('Hello', feature='test')
print(response)
"
```
**Verify database connection:**
```bash
psql -U nordabiz_app -d nordabiz -c "
SELECT COUNT(*) as conversations FROM ai_chat_conversations;
SELECT COUNT(*) as messages FROM ai_chat_messages;
SELECT COUNT(*) as api_calls FROM ai_api_costs WHERE api_provider = 'gemini';
"
```
**Test chat flow:**
```python
from nordabiz_chat import NordaBizChatEngine
engine = NordaBizChatEngine()
conv = engine.start_conversation(user_id=1, title="Test")
response = engine.send_message(conv.id, "Test message", user_id=1)
print(f"Response: {response.content}")
```
---
## 16. Related Documentation
- **[Search Flow](./02-search-flow.md)** - Company search integration
- **[Authentication Flow](./01-authentication-flow.md)** - User authentication
- **[Flask Components](../04-flask-components.md)** - Application architecture
- **[External Integrations](../06-external-integrations.md)** - Gemini API details
- **[Database Schema](../05-database-schema.md)** - Database structure
---
## 17. Glossary
| Term | Definition |
|------|------------|
| **NordaBizChatEngine** | Main chat engine class in `nordabiz_chat.py` |
| **GeminiService** | Centralized Gemini API wrapper in `gemini_service.py` |
| **Conversation** | Chat session with multiple messages |
| **Context** | Full company database + history sent to AI |
| **Compact Format** | Token-optimized company data format |
| **Free Tier** | Google Gemini free tier (1,500 req/day) |
| **Token** | Unit of text (~4 characters) for AI models |
| **Latency** | Response time in milliseconds |
| **Cost Tracking** | Dual-level system (global + per-message) |
| **System Prompt** | Instructions sent to AI with each query |
---
## 18. Maintenance
**When to Update This Document:**
- ✅ Gemini model version change (e.g., 2.5 → 3.0)
- ✅ Pricing changes
- ✅ New chat features (voice, images, etc.)
- ✅ Context building algorithm changes
- ✅ Database schema changes
- ✅ Performance optimization implementations
**Document Owner:** Development Team
**Review Frequency:** Quarterly or after major changes
**Last Review:** 2026-01-10
---
**END OF DOCUMENT**