nordabiz/docs/architecture/flows/03-ai-chat-flow.md

# AI Chat Flow

**Document Version:** 1.0
**Last Updated:** 2026-01-10
**Status:** Production LIVE
**Flow Type:** AI-Powered Company Discovery & Chat

---

## Overview

This document describes the **complete AI chat flow** for the Norda Biznes Partner application, covering:

- **Chat Interface** (`/chat` route)
- **Conversation Management** (start, message, history)
- **Context Building** with full company database
- **Gemini API Integration** for intelligent responses
- **Cost Tracking** and performance metrics
- **Search Integration** for company discovery

**Key Technology:**
- **AI Model:** Google Gemini 2.5 Flash (gemini-2.5-flash)
- **Chat Engine:** NordaBizChatEngine (nordabiz_chat.py)
- **Gemini Service:** Centralized GeminiService (gemini_service.py)
- **Search Integration:** Unified SearchService (search_service.py)
- **Database:** PostgreSQL (conversations, messages, companies)

**Key Features:**
- Full company database context (all 80 companies available to AI)
- Multi-turn conversation with history (last 10 messages)
- Intelligent company selection by AI (no pre-filtering)
- Real-time cost tracking (tokens, latency, theoretical cost)
- Free tier usage monitoring (1,500 requests/day limit)
- Compact data format to minimize token usage

**Cost & Performance:**
- **Model:** Gemini 2.5 Flash
- **Pricing:** $0.075/$0.30 per 1M tokens (input/output)
- **Free Tier:** 1,500 requests/day, unlimited tokens
- **Typical Response:** 200-400ms latency, 5,000-15,000 tokens
- **Actual Cost:** $0.00 (free tier)
- **Theoretical Cost:** $0.003-0.006 per message

---

## 1. High-Level Chat Flow

### 1.1 Complete Chat Flow Diagram

```mermaid
flowchart TD
    User[User] -->|1. Navigate to /chat| Browser[Browser]
    Browser -->|2. GET /chat| Flask[Flask App<br/>app.py]
    Flask -->|3. Require login| AuthCheck{Authenticated?}

    AuthCheck -->|No| Login[Redirect to /login]
    AuthCheck -->|Yes| ChatUI[Render chat.html]

    ChatUI -->|4. Load UI| Browser
    Browser -->|5. POST /api/chat/start| Flask
    Flask -->|6. Create conversation| ChatEngine[NordaBizChatEngine<br/>nordabiz_chat.py]
    ChatEngine -->|7. INSERT| ConvDB[(ai_chat_conversations)]

    ConvDB -->|8. conversation_id| ChatEngine
    ChatEngine -->|9. Return conversation| Flask
    Flask -->|10. JSON response| Browser

    Browser -->|11. User types message| UserInput[User Message]
    UserInput -->|12. POST /api/chat/:id/message| Flask

    Flask -->|13. Verify ownership| DB[(PostgreSQL)]
    Flask -->|14. send_message| ChatEngine

    ChatEngine -->|15. Save user message| MsgDB[(ai_chat_messages)]
    ChatEngine -->|16. Build context| ContextBuilder[Context Builder<br/>_build_conversation_context]

    ContextBuilder -->|17. Load ALL companies| DB
    ContextBuilder -->|18. Load last 10 messages| MsgDB
    ContextBuilder -->|19. Compact format| Context[Full Context<br/>JSON]

    Context -->|20. Query AI| GeminiService[Gemini Service<br/>gemini_service.py]
    GeminiService -->|21. API call| GeminiAPI[Google Gemini API<br/>gemini-2.5-flash]

    GeminiAPI -->|22. AI response| GeminiService
    GeminiService -->|23. Track cost| CostDB[(ai_api_costs)]
    GeminiService -->|24. Response text| ChatEngine

    ChatEngine -->|25. Count tokens| TokenCounter[Tokenizer]
    TokenCounter -->|26. tokens_input, tokens_output| ChatEngine
    ChatEngine -->|27. Save AI message| MsgDB
    ChatEngine -->|28. Update conversation| ConvDB

    ChatEngine -->|29. Return response| Flask
    Flask -->|30. JSON + tech_info| Browser
    Browser -->|31. Display message| User

    style ChatEngine fill:#4CAF50
    style GeminiService fill:#2196F3
    style ContextBuilder fill:#FF9800
    style DB fill:#9C27B0
```

---

## 2. Chat Initialization Flow

### 2.1 Start Conversation

**Route:** `POST /api/chat/start`
**File:** `app.py` (lines 3511-3533)
**Authentication:** Required (`@login_required`)

```mermaid
sequenceDiagram
    actor User
    participant Browser
    participant Flask as Flask App<br/>(app.py)
    participant Engine as NordaBizChatEngine<br/>(nordabiz_chat.py)
    participant DB as PostgreSQL<br/>(ai_chat_conversations)

    User->>Browser: Click "Start Chat"
    Browser->>Flask: POST /api/chat/start<br/>{title: "Rozmowa..."}

    Note over Flask: @login_required
    Flask->>Flask: Get current_user.id

    Flask->>Engine: start_conversation(<br/>  user_id=current_user.id,<br/>  title="Rozmowa - 2026-01-10 10:30"<br/>)

    Engine->>Engine: Auto-generate title if not provided
    Engine->>DB: INSERT INTO ai_chat_conversations<br/>(user_id, started_at, title,<br/> conversation_type, is_active,<br/> message_count, model_name)

    DB->>Engine: conversation.id = 123
    Engine->>Flask: Return AIChatConversation object

    Flask->>Browser: JSON {<br/>  success: true,<br/>  conversation_id: 123,<br/>  title: "Rozmowa - 2026-01-10 10:30"<br/>}

    Browser->>User: Chat session ready
```

**Database Operation:**
```sql
INSERT INTO ai_chat_conversations (
    user_id, started_at, conversation_type, title,
    is_active, message_count, model_name, created_at
) VALUES (
    ?, NOW(), 'general', ?,
    TRUE, 0, 'gemini-2.5-flash', NOW()
);
```

**Response:**
```json
{
  "success": true,
  "conversation_id": 123,
  "title": "Rozmowa - 2026-01-10 10:30"
}
```

---

## 3. Message Flow (Core Chat Logic)

### 3.1 Send Message Sequence

**Route:** `POST /api/chat/<conversation_id>/message`
**File:** `app.py` (lines 3536-3603)
**Authentication:** Required (`@login_required`)

```mermaid
sequenceDiagram
    actor User
    participant Browser
    participant Flask as Flask App
    participant Engine as NordaBizChatEngine
    participant DB as PostgreSQL
    participant Context as Context Builder
    participant Search as SearchService
    participant Gemini as GeminiService
    participant API as Gemini API
    participant CostDB as ai_api_costs

    User->>Browser: Type: "Kto robi strony www?"
    Browser->>Flask: POST /api/chat/123/message<br/>{message: "Kto robi strony www?"}

    Note over Flask: Verify conversation ownership
    Flask->>DB: SELECT * FROM ai_chat_conversations<br/>WHERE id = 123 AND user_id = ?
    DB->>Flask: Conversation found

    Flask->>Engine: send_message(<br/>  conversation_id=123,<br/>  user_message="Kto robi strony www?",<br/>  user_id=current_user.id<br/>)

    Note over Engine: 1. Save user message
    Engine->>DB: INSERT INTO ai_chat_messages<br/>(conversation_id, role='user',<br/> content="Kto robi strony www?")
    DB->>Engine: Message saved

    Note over Engine: 2. Build context with ALL companies
    Engine->>Context: _build_conversation_context(<br/>  db, conversation, message<br/>)

    Context->>DB: SELECT * FROM companies<br/>WHERE status = 'active'
    DB->>Context: 80 companies

    Context->>DB: SELECT * FROM ai_chat_messages<br/>WHERE conversation_id = 123<br/>ORDER BY created_at DESC<br/>LIMIT 10
    DB->>Context: Last 10 messages

    Context->>Context: Build compact JSON format<br/>(minimize tokens)
    Context->>Engine: Return full context dict

    Note over Engine: 3. Query AI with full context
    Engine->>Gemini: generate_text(<br/>  prompt=system_prompt + context + history,<br/>  feature='ai_chat',<br/>  user_id=current_user.id,<br/>  temperature=0.7<br/>)

    Gemini->>API: POST /v1/models/gemini-2.5-flash:generateContent
    API->>Gemini: AI response text

    Note over Gemini: Track API cost to database
    Gemini->>Gemini: Count tokens (input, output)
    Gemini->>Gemini: Calculate cost<br/>($0.075/$0.30 per 1M tokens)
    Gemini->>CostDB: INSERT INTO ai_api_costs<br/>(api_provider, model_name, feature,<br/> tokens, cost, latency_ms)

    Gemini->>Engine: Return response text

    Note over Engine: 4. Calculate per-message metrics
    Engine->>Engine: tokenizer.count_tokens(user_message)
    Engine->>Engine: tokenizer.count_tokens(response)
    Engine->>Engine: Calculate latency_ms, cost_usd

    Note over Engine: 5. Save AI response
    Engine->>DB: INSERT INTO ai_chat_messages<br/>(conversation_id, role='assistant',<br/> content=response, tokens_input,<br/> tokens_output, cost_usd, latency_ms)

    Note over Engine: 6. Update conversation stats
    Engine->>DB: UPDATE ai_chat_conversations<br/>SET message_count = message_count + 2,<br/>    updated_at = NOW()<br/>WHERE id = 123

    Engine->>Flask: Return AIChatMessage object

    Note over Flask: Get free tier usage stats
    Flask->>CostDB: SELECT COUNT(*), SUM(tokens)<br/>FROM ai_api_costs<br/>WHERE DATE(timestamp) = TODAY()
    CostDB->>Flask: requests_today, tokens_today

    Flask->>Browser: JSON {<br/>  success: true,<br/>  message: "PIXLAB, WebStorm...",<br/>  tech_info: {...}<br/>}

    Browser->>User: Display AI response
```

### 3.2 Message Implementation Details

**Input Validation:**
- Message cannot be empty (`.strip()` check)
- Conversation ownership verified (user_id match)
- Conversation must exist and be active

**Database Operations:**
```sql
-- Save user message
INSERT INTO ai_chat_messages (
    conversation_id, created_at, role, content,
    edited, regenerated
) VALUES (?, NOW(), 'user', ?, FALSE, FALSE);

-- Save AI response with metrics
INSERT INTO ai_chat_messages (
    conversation_id, created_at, role, content,
    tokens_input, tokens_output, cost_usd, latency_ms,
    edited, regenerated
) VALUES (?, NOW(), 'assistant', ?, ?, ?, ?, ?, FALSE, FALSE);

-- Update conversation
UPDATE ai_chat_conversations
SET message_count = message_count + 2,
    updated_at = NOW()
WHERE id = ?;
```

**Response Format:**
```json
{
  "success": true,
  "message": "Znalazłem kilka firm zajmujących się stronami www: PIXLAB (www.pixlab.pl, tel: 509 509 689), WebStorm Agencja Interaktywna...",
  "message_id": 456,
  "created_at": "2026-01-10T10:35:22.123456",
  "tech_info": {
    "model": "gemini-2.5-flash",
    "data_source": "PostgreSQL (80 firm Norda Biznes)",
    "architecture": "Full DB Context (wszystkie firmy w kontekście AI)",
    "tokens_input": 8543,
    "tokens_output": 234,
    "tokens_total": 8777,
    "latency_ms": 342,
    "theoretical_cost_usd": 0.00128,
    "actual_cost_usd": 0.0,
    "free_tier": {
      "is_free": true,
      "daily_limit": 1500,
      "requests_today": 47,
      "tokens_today": 423891,
      "remaining": 1453
    }
  }
}
```

---

## 4. Context Building (Core Intelligence)

### 4.1 Context Building Flow

**Method:** `_build_conversation_context(db, conversation, current_message)`
**File:** `nordabiz_chat.py` (lines 254-310)
**Strategy:** Full database context (AI does intelligent filtering)

```mermaid
flowchart TD
    Start([User Message:<br/>"Kto robi strony www?"]) --> LoadCompanies[Load ALL active companies<br/>FROM companies WHERE status='active']

    LoadCompanies --> Count[Total: 80 companies]
    Count --> LoadCategories[Load all categories with counts]
    LoadCategories --> LoadHistory[Load last 10 conversation messages<br/>ORDER BY created_at DESC]

    LoadHistory --> BuildContext[Build context dict]

    BuildContext --> CompactFormat[Convert ALL companies<br/>to compact format]

    CompactFormat --> CompactLoop{For each<br/>company}
    CompactLoop -->|Process| CompactFields[Include only non-empty fields:<br/>- name, cat (category)<br/>- desc (description_short)<br/>- history (founding_history)<br/>- svc (services)<br/>- comp (competencies)<br/>- web, tel, mail<br/>- city, year<br/>- cert (top 3 certifications)]

    CompactFields --> SaveTokens[Save tokens by:<br/>- Short field names<br/>- Omit empty fields<br/>- Limit certs to 3]

    SaveTokens --> NextCompany{More<br/>companies?}
    NextCompany -->|Yes| CompactLoop
    NextCompany -->|No| ContextReady[Context ready]

    ContextReady --> ContextDict{Context Dictionary}
    ContextDict --> Field1[conversation_type: 'general']
    ContextDict --> Field2[total_companies: 80]
    ContextDict --> Field3[categories: Array]
    ContextDict --> Field4[all_companies: Array<br/>~8,000-12,000 tokens]
    ContextDict --> Field5[recent_messages: Array<br/>Last 10 messages]

    Field1 & Field2 & Field3 & Field4 & Field5 --> Return[Return to _query_ai]

    style BuildContext fill:#4CAF50
    style CompactFormat fill:#FF9800
    style ContextDict fill:#2196F3
```

### 4.2 Compact Company Format

**Purpose:** Minimize token usage while preserving all important data

**Example Company Object:**
```json
{
  "name": "PIXLAB Sp. z o.o.",
  "cat": "IT i Technologie",
  "desc": "Agencja interaktywna - strony www, sklepy online, aplikacje",
  "history": "Założona przez Macieja Pieńczyńskiego w 2015 roku",
  "svc": ["Strony WWW", "E-commerce", "Aplikacje webowe", "SEO"],
  "comp": ["WordPress", "Shopify", "React", "Node.js"],
  "web": "https://pixlab.pl",
  "tel": "509 509 689",
  "mail": "kontakt@pixlab.pl",
  "city": "Wejherowo",
  "year": 2015,
  "cert": ["ISO 9001", "Google Partner"]
}
```

**Token Savings:**
- Short field names: `svc` instead of `services` (-40%)
- Omit empty fields: Only include if data exists (-30%)
- Limit certifications: Top 3 instead of all (-20%)
- Compact JSON: No extra whitespace (-10%)

**Typical Token Usage:**
- Single company: ~100-150 tokens (compact)
- All 80 companies: ~8,000-12,000 tokens
- System prompt: ~500 tokens
- Conversation history (10 msgs): ~1,000-2,000 tokens
- **Total input:** ~10,000-15,000 tokens

---

## 5. AI Query & Prompt Engineering

### 5.1 AI Query Flow

**Method:** `_query_ai(context, user_message, user_id)`
**File:** `nordabiz_chat.py` (lines 406-481)

```mermaid
flowchart TD
    Start([Context + User Message]) --> BuildPrompt[Build system prompt]

    BuildPrompt --> SystemPrompt[SYSTEM PROMPT:<br/>- Role definition<br/>- Database stats<br/>- Instructions<br/>- Data format guide]

    SystemPrompt --> AddCompanies[Add ALL companies JSON<br/>~8,000-12,000 tokens]

    AddCompanies --> AddHistory[Add conversation history<br/>Last 10 messages]

    AddHistory --> AddUserMsg[Add current user message]

    AddUserMsg --> FullPrompt[Complete prompt ready<br/>~10,000-15,000 tokens]

    FullPrompt --> UseGlobal{use_global_service?}

    UseGlobal -->|Yes (default)| GeminiSvc[gemini_service.generate_text]
    UseGlobal -->|No (legacy)| DirectAPI[model.generate_content]

    GeminiSvc --> AutoCost[Automatic cost tracking<br/>to ai_api_costs table]
    DirectAPI --> NoCost[No cost tracking]

    AutoCost --> APICall[Gemini API Call<br/>gemini-2.5-flash]
    NoCost --> APICall

    APICall --> Response[AI Response<br/>~200-400 tokens]

    Response --> Return[Return response text]

    style SystemPrompt fill:#4CAF50
    style GeminiSvc fill:#2196F3
    style AutoCost fill:#FF9800
```

### 5.2 System Prompt Structure

**File:** `nordabiz_chat.py` (lines 426-458)

```
Jesteś pomocnym asystentem portalu Norda Biznes - katalogu firm
zrzeszonych w stowarzyszeniu Norda Biznes z Wejherowa.

📊 MASZ DOSTĘP DO PEŁNEJ BAZY DANYCH:
- Liczba firm: 80
- Kategorie: IT i Technologie (25), Budownictwo (18), Usługi (15), ...

🎯 TWOJA ROLA:
- Analizujesz CAŁĄ bazę firm i wybierasz najlepsze dopasowania do pytania
- Odpowiadasz zwięźle (2-3 zdania), chyba że użytkownik prosi o szczegóły
- Podajesz konkretne nazwy firm z kontaktem
- Możesz wyszukiwać po: nazwie, usługach, kompetencjach, właścicielach, mieście

📋 FORMAT DANYCH (skróty):
- name: nazwa firmy
- cat: kategoria
- desc: krótki opis
- history: historia firmy, właściciele, założyciele
- svc: usługi
- comp: kompetencje
- web/tel/mail: kontakt
- city: miasto
- cert: certyfikaty

⚠️ WAŻNE:
- ZAWSZE podawaj nazwę firmy i kontakt (tel/web/mail jeśli dostępne)
- Jeśli pytanie o osobę (np. "kto to Roszman") - szukaj w polu "history"
- Odpowiadaj PO POLSKU

🏢 PEŁNA BAZA FIRM (wybierz najlepsze):
[JSON array with all 80 companies in compact format]

# HISTORIA ROZMOWY:
Użytkownik: [previous message 1]
Ty: [previous response 1]
Użytkownik: [previous message 2]
Ty: [previous response 2]
...

Użytkownik: Kto robi strony www?
Ty:
```

**Prompt Engineering Principles:**
1. **Clear role definition:** "Jesteś pomocnym asystentem..."
2. **Database context:** Total companies, category distribution
3. **Response guidelines:** Concise (2-3 sentences), specific contacts
4. **Data format guide:** Field name abbreviations explained
5. **Search capabilities:** What AI can search by
6. **Important notes:** Always include contact, search in "history" for people
7. **Language:** Always respond in Polish
8. **Full context:** ALL companies provided (AI does filtering)
9. **Conversation history:** Last 10 messages for context continuity

---

## 6. Cost Tracking & Performance

### 6.1 Dual Cost Tracking System

The application uses **TWO levels** of cost tracking:

**Level 1: Global API Cost Tracking** (ai_api_costs table)
- Managed by `gemini_service.py`
- Tracks ALL Gemini API calls (chat, image analysis, etc.)
- Automatic via `_log_api_cost()` method

**Level 2: Per-Message Chat Metrics** (ai_chat_messages table)
- Managed by `nordabiz_chat.py`
- Tracks tokens, cost, latency per chat message
- User-facing metrics for transparency

### 6.2 Cost Tracking Flow

```mermaid
sequenceDiagram
    participant Engine as NordaBizChatEngine
    participant Gemini as GeminiService
    participant API as Gemini API
    participant GlobalDB as ai_api_costs
    participant ChatDB as ai_chat_messages

    Engine->>Gemini: generate_text(<br/>  prompt, feature='ai_chat',<br/>  user_id=123<br/>)

    Note over Gemini: Start timer
    Gemini->>API: POST /generateContent
    API->>Gemini: Response text
    Note over Gemini: Stop timer (latency_ms)

    Note over Gemini: Count tokens
    Gemini->>Gemini: input_tokens = count_tokens(prompt)
    Gemini->>Gemini: output_tokens = count_tokens(response)

    Note over Gemini: Calculate cost
    Gemini->>Gemini: input_cost = (input/1M) * $0.075
    Gemini->>Gemini: output_cost = (output/1M) * $0.30
    Gemini->>Gemini: total_cost = input + output

    Note over Gemini: Global cost tracking
    Gemini->>GlobalDB: INSERT INTO ai_api_costs<br/>(api_provider='gemini',<br/> model='gemini-2.5-flash',<br/> feature='ai_chat',<br/> user_id=123,<br/> tokens, cost, latency)

    Gemini->>Engine: Return response text

    Note over Engine: Per-message tracking
    Engine->>Engine: tokenizer.count_tokens(user_msg)
    Engine->>Engine: tokenizer.count_tokens(response)
    Engine->>Engine: Calculate cost again (for message record)

    Engine->>ChatDB: INSERT INTO ai_chat_messages<br/>(role='assistant',<br/> content, tokens_input,<br/> tokens_output, cost_usd,<br/> latency_ms)
```

### 6.3 Cost Calculation

**Gemini 2.5 Flash Pricing:**
- **Input:** $0.075 per 1M tokens
- **Output:** $0.30 per 1M tokens
- **Free Tier:** 1,500 requests/day (unlimited tokens)

**Typical Chat Message:**
```
Input:  10,000 tokens (system prompt + companies + history) = $0.00075
Output:    300 tokens (AI response)                         = $0.00009
Total:                                                       = $0.00084
```

**Daily Usage Estimate:**
- 100 chat messages/day
- Average 10,000 input + 300 output tokens
- Theoretical cost: $0.084/day ($2.52/month)
- **Actual cost: $0.00** (free tier covers all usage)

### 6.4 Free Tier Monitoring

**Function:** `get_free_tier_usage()`
**File:** `app.py`

```python
def get_free_tier_usage():
    """Get free tier usage stats for today"""
    db = SessionLocal()
    try:
        today_start = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)

        stats = db.query(
            func.count(AIAPICostLog.id).label('requests'),
            func.sum(AIAPICostLog.total_tokens).label('tokens')
        ).filter(
            AIAPICostLog.timestamp >= today_start,
            AIAPICostLog.api_provider == 'gemini',
            AIAPICostLog.success == True
        ).first()

        return {
            'requests_today': stats.requests or 0,
            'tokens_today': stats.tokens or 0,
            'daily_limit': 1500,
            'remaining': max(0, 1500 - (stats.requests or 0))
        }
    finally:
        db.close()
```

**Response in `/api/chat/:id/message`:**
```json
{
  "tech_info": {
    "free_tier": {
      "is_free": true,
      "daily_limit": 1500,
      "requests_today": 47,
      "tokens_today": 423891,
      "remaining": 1453
    }
  }
}
```

---

## 7. Conversation History

### 7.1 Get History Flow

**Route:** `GET /api/chat/<conversation_id>/history`
**File:** `app.py` (lines 3606-3634)
**Authentication:** Required (`@login_required`)

```mermaid
sequenceDiagram
    actor User
    participant Browser
    participant Flask as Flask App
    participant Engine as NordaBizChatEngine
    participant DB as ai_chat_messages

    User->>Browser: Load chat history
    Browser->>Flask: GET /api/chat/123/history

    Note over Flask: Verify ownership
    Flask->>DB: SELECT * FROM ai_chat_conversations<br/>WHERE id = 123 AND user_id = ?
    DB->>Flask: Conversation found

    Flask->>Engine: get_conversation_history(123)

    Engine->>DB: SELECT * FROM ai_chat_messages<br/>WHERE conversation_id = 123<br/>ORDER BY created_at ASC

    DB->>Engine: All messages in conversation

    Engine->>Engine: Format messages as dicts
    Engine->>Flask: Return messages array

    Flask->>Browser: JSON {<br/>  success: true,<br/>  messages: [...]<br/>}

    Browser->>User: Display conversation history
```

**Response Format:**
```json
{
  "success": true,
  "messages": [
    {
      "id": 789,
      "role": "user",
      "content": "Kto robi strony www?",
      "created_at": "2026-01-10T10:35:00.123456",
      "tokens_input": 0,
      "tokens_output": 0,
      "cost_usd": 0.0,
      "latency_ms": 0
    },
    {
      "id": 790,
      "role": "assistant",
      "content": "Znalazłem kilka firm zajmujących się stronami www...",
      "created_at": "2026-01-10T10:35:02.456789",
      "tokens_input": 8543,
      "tokens_output": 234,
      "cost_usd": 0.00128,
      "latency_ms": 342
    }
  ]
}
```

---

## 8. Database Schema

### 8.1 Conversation Tables

**ai_chat_conversations** (conversation metadata)
```sql
CREATE TABLE ai_chat_conversations (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    started_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP,
    conversation_type VARCHAR(50) DEFAULT 'general',
    title VARCHAR(500),
    is_active BOOLEAN DEFAULT TRUE,
    message_count INTEGER DEFAULT 0,
    model_name VARCHAR(100)
);

CREATE INDEX idx_chat_conv_user_id ON ai_chat_conversations(user_id);
CREATE INDEX idx_chat_conv_started_at ON ai_chat_conversations(started_at DESC);
```

**ai_chat_messages** (individual messages)
```sql
CREATE TABLE ai_chat_messages (
    id SERIAL PRIMARY KEY,
    conversation_id INTEGER NOT NULL REFERENCES ai_chat_conversations(id) ON DELETE CASCADE,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    role VARCHAR(20) NOT NULL,  -- 'user' or 'assistant'
    content TEXT NOT NULL,
    tokens_input INTEGER,
    tokens_output INTEGER,
    cost_usd DECIMAL(10,6),
    latency_ms INTEGER,
    edited BOOLEAN DEFAULT FALSE,
    regenerated BOOLEAN DEFAULT FALSE
);

CREATE INDEX idx_chat_msg_conv_id ON ai_chat_messages(conversation_id);
CREATE INDEX idx_chat_msg_created_at ON ai_chat_messages(created_at);
```

**ai_api_costs** (global API cost tracking)
```sql
CREATE TABLE ai_api_costs (
    id SERIAL PRIMARY KEY,
    timestamp TIMESTAMP NOT NULL DEFAULT NOW(),
    api_provider VARCHAR(50) NOT NULL,  -- 'gemini'
    model_name VARCHAR(100),             -- 'gemini-2.5-flash'
    feature VARCHAR(100),                -- 'ai_chat', 'image_analysis', etc.
    user_id INTEGER REFERENCES users(id),
    input_tokens INTEGER,
    output_tokens INTEGER,
    total_tokens INTEGER,
    input_cost DECIMAL(10,6),
    output_cost DECIMAL(10,6),
    total_cost DECIMAL(10,6),
    success BOOLEAN DEFAULT TRUE,
    error_message TEXT,
    latency_ms INTEGER,
    prompt_hash VARCHAR(64)
);

CREATE INDEX idx_api_costs_timestamp ON ai_api_costs(timestamp DESC);
CREATE INDEX idx_api_costs_provider ON ai_api_costs(api_provider);
CREATE INDEX idx_api_costs_feature ON ai_api_costs(feature);
CREATE INDEX idx_api_costs_user_id ON ai_api_costs(user_id);
```

### 8.2 Entity Relationships

```mermaid
erDiagram
    users ||--o{ ai_chat_conversations : "has many"
    ai_chat_conversations ||--o{ ai_chat_messages : "contains"
    users ||--o{ ai_api_costs : "generates"

    users {
        int id PK
        varchar email
        varchar name
        boolean is_admin
    }

    ai_chat_conversations {
        int id PK
        int user_id FK
        timestamp started_at
        varchar conversation_type
        varchar title
        boolean is_active
        int message_count
        varchar model_name
    }

    ai_chat_messages {
        int id PK
        int conversation_id FK
        timestamp created_at
        varchar role
        text content
        int tokens_input
        int tokens_output
        decimal cost_usd
        int latency_ms
    }

    ai_api_costs {
        int id PK
        timestamp timestamp
        varchar api_provider
        varchar model_name
        varchar feature
        int user_id FK
        int total_tokens
        decimal total_cost
        int latency_ms
    }
```

---

## 9. Error Handling

### 9.1 Common Error Scenarios

**1. Conversation Not Found**
```python
# app.py
conversation = db.query(AIChatConversation).filter_by(
    id=conversation_id,
    user_id=current_user.id
).first()

if not conversation:
    return jsonify({
        'success': False,
        'error': 'Conversation not found'
    }), 404
```

**2. Empty Message**
```python
message = data.get('message', '').strip()

if not message:
    return jsonify({
        'success': False,
        'error': 'Wiadomość nie może być pusta'
    }), 400
```

**3. Gemini API Error**
```python
# gemini_service.py
try:
    response = self.model.generate_content(prompt)

    # Check safety filters
    if not response.candidates:
        raise Exception("Response blocked by safety filters")

    # Check finish reason
    if candidate.finish_reason not in [1, 0]:  # STOP or UNSPECIFIED
        raise Exception(f"Response incomplete: {finish_reason}")

except Exception as e:
    logger.error(f"Gemini API error: {e}")

    # Log failed request to database
    self._log_api_cost(
        prompt=prompt,
        response_text='',
        input_tokens=self.count_tokens(prompt),
        output_tokens=0,
        success=False,
        error_message=str(e)
    )

    raise Exception(f"Gemini API call failed: {str(e)}")
```

**4. Database Connection Error**
```python
# nordabiz_chat.py
db = SessionLocal()
try:
    # Database operations
    conversation = db.query(AIChatConversation).filter_by(id=conversation_id).first()
    # ...
finally:
    db.close()  # Always close connection
```

### 9.2 Error Response Format

```json
{
  "success": false,
  "error": "Conversation not found"
}
```

**HTTP Status Codes:**
- `400` - Bad Request (empty message, invalid input)
- `404` - Not Found (conversation doesn't exist)
- `500` - Internal Server Error (Gemini API failure, database error)

---

## 10. Search Integration

### 10.1 Search Service Integration

**Method:** `_find_relevant_companies(db, message)`
**File:** `nordabiz_chat.py` (lines 383-404)
**Status:** DEPRECATED (kept for reference, not used in production)

**Historical Context:**
The chat engine originally used SearchService to **pre-filter** companies before sending to AI:

```python
# OLD APPROACH (deprecated):
def _find_relevant_companies(self, db, message):
    """Find companies relevant to user's message"""
    results = search_companies(db, message, limit=10)
    return [result.company for result in results]

# In _build_conversation_context:
relevant_companies = self._find_relevant_companies(db, current_message)
context['companies'] = [self._company_to_compact_dict(c) for c in relevant_companies]
```

**Current Approach:**
Send **ALL companies** to AI and let it do intelligent filtering:

```python
# NEW APPROACH (current production):
def _build_conversation_context(self, db, conversation, current_message):
    """Build context with ALL companies (not pre-filtered)"""
    all_companies = db.query(Company).filter_by(status='active').all()

    context['all_companies'] = [
        self._company_to_compact_dict(c)
        for c in all_companies
    ]
    return context
```

**Why the Change?**

| Aspect | Old (Pre-filtered) | New (Full Context) |
|--------|-------------------|-------------------|
| **Companies sent** | 8-10 (search filtered) | 80 (all active) |
| **Token usage** | ~1,500 tokens | ~10,000 tokens |
| **Search quality** | Keyword-based, limited | AI-powered, intelligent |
| **Multi-criteria** | Difficult | Excellent |
| **Owner searches** | Impossible | Works perfectly |
| **Cost** | $0.0001/msg | $0.0008/msg |
| **User experience** | Sometimes misses results | Always comprehensive |

**Example:**
- User: "Kto to Roszman?" (Who is Roszman?)
- Old approach: Search for "roszman" in services/competencies → 0 results ❌
- New approach: AI searches `founding_history` field → Finds company owner ✅

---

## 11. Performance & Optimization

### 11.1 Performance Metrics

**Typical Chat Message:**
- **Latency:** 200-400ms
- **Input tokens:** 8,000-15,000 (system prompt + 80 companies + history)
- **Output tokens:** 200-500 (AI response)
- **Total tokens:** 8,500-15,500
- **Theoretical cost:** $0.0008-0.0015
- **Actual cost:** $0.00 (free tier)

**Database Queries:**
- Conversation lookup: ~5ms (indexed on user_id, id)
- All companies query: ~50ms (80 rows, no complex joins)
- Last 10 messages: ~10ms (indexed on conversation_id, created_at)
- **Total DB time:** ~65ms

**Gemini API:**
- Network latency: ~100-200ms
- Processing time: ~100-200ms
- **Total API time:** ~250-350ms

### 11.2 Token Optimization Strategies

**1. Compact Field Names**
```python
# GOOD (saves ~40% tokens):
{"name": "PIXLAB", "svc": ["WWW", "SEO"], "comp": ["WordPress"]}

# BAD (wasteful):
{"company_name": "PIXLAB", "services": ["WWW", "SEO"], "competencies": ["WordPress"]}
```

**2. Omit Empty Fields**
```python
# GOOD:
compact = {"name": c.name}
if c.description_short:
    compact['desc'] = c.description_short
# Only adds field if data exists

# BAD:
compact = {
    "name": c.name,
    "desc": c.description_short or "",  # Wastes tokens on ""
}
```

**3. Limit Arrays**
```python
# GOOD (top 3 certifications):
if c.certifications:
    compact['cert'] = [cert.name for cert in c.certifications[:3]]

# BAD (all certifications):
compact['cert'] = [cert.name for cert in c.certifications]  # May be 10+
```

**4. Compact JSON (no whitespace)**
```python
# GOOD:
json.dumps(data, ensure_ascii=False, indent=None)
# {"name":"PIXLAB","svc":["WWW"]}

# BAD:
json.dumps(data, ensure_ascii=False, indent=2)
# {
#   "name": "PIXLAB",
#   "svc": ["WWW"]
# }
```

**Token Savings:**
- Single company: 200 tokens → 100 tokens (50% reduction)
- 80 companies: 16,000 tokens → 8,000 tokens (50% reduction)
- Cost savings: $0.0016 → $0.0008 per message (50% reduction)

### 11.3 Caching Opportunities (Future)

**Not Currently Implemented** (all companies loaded per message)

**Potential Optimizations:**
1. **Company data caching** (Redis)
   - Cache all companies JSON for 5 minutes
   - Invalidate on company data changes
   - Reduce DB query time: 50ms → 5ms

2. **Prompt template caching**
   - Cache system prompt template
   - Only rebuild when companies change

3. **Conversation context caching**
   - Cache last 10 messages per conversation
   - Invalidate on new message
   - Reduce DB query time: 10ms → 1ms

**Why Not Implemented Yet:**
- Current performance is acceptable (250-350ms total)
- Free tier has no rate limits on DB queries
- Premature optimization (80 companies is small dataset)
- Complexity vs. benefit tradeoff

---

## 12. Security & Access Control

### 12.1 Authentication & Authorization

**All chat routes require authentication:**
```python
@app.route('/chat')
@login_required
def chat():
    """AI Chat interface"""
    return render_template('chat.html')

@app.route('/api/chat/start', methods=['POST'])
@login_required
def chat_start():
    # Only logged-in users can start conversations
    ...

@app.route('/api/chat/<int:conversation_id>/message', methods=['POST'])
@login_required
def chat_send_message(conversation_id):
    # Verify conversation ownership
    conversation = db.query(AIChatConversation).filter_by(
        id=conversation_id,
        user_id=current_user.id  # IMPORTANT: Ownership check
    ).first()

    if not conversation:
        return jsonify({'error': 'Conversation not found'}), 404
    ...
```

### 12.2 Input Sanitization

**User message sanitization:**
```python
# app.py
message = data.get('message', '').strip()

# No HTML/JavaScript injection possible
# Gemini API treats all input as plain text
# Database stores as TEXT (no code execution)
```

**No SQL Injection:**
```python
# Safe (parameterized query):
conversation = db.query(AIChatConversation).filter_by(
    id=conversation_id,
    user_id=current_user.id
).first()

# PostgreSQL parameters prevent SQL injection
```

### 12.3 Rate Limiting

**Gemini API Free Tier Limits:**
- 1,500 requests/day
- No per-minute limit
- No token limit

**Application-Level Limits:**
- No specific rate limiting on chat endpoints (yet)
- User must be logged in (reduces abuse)
- Flask-Limiter can be added if needed

**Future Rate Limiting:**
```python
from flask_limiter import Limiter

limiter = Limiter(app, key_func=lambda: current_user.id)

@app.route('/api/chat/<int:conversation_id>/message', methods=['POST'])
@login_required
@limiter.limit("60 per hour")  # 60 messages per hour per user
def chat_send_message(conversation_id):
    ...
```

---

## 13. Monitoring & Debugging

### 13.1 Cost Tracking Queries

**Daily API usage:**
```sql
SELECT
    DATE(timestamp) as date,
    COUNT(*) as requests,
    SUM(total_tokens) as tokens,
    SUM(total_cost) as cost_usd
FROM ai_api_costs
WHERE api_provider = 'gemini'
  AND feature = 'ai_chat'
GROUP BY DATE(timestamp)
ORDER BY date DESC;
```

**Top users by API usage:**
```sql
SELECT
    u.name,
    u.email,
    COUNT(*) as chat_messages,
    SUM(c.total_tokens) as total_tokens,
    SUM(c.total_cost) as total_cost_usd
FROM ai_api_costs c
JOIN users u ON c.user_id = u.id
WHERE c.api_provider = 'gemini'
  AND c.feature = 'ai_chat'
GROUP BY u.id, u.name, u.email
ORDER BY total_cost_usd DESC
LIMIT 10;
```

**Free tier usage today:**
```sql
SELECT
    COUNT(*) as requests_today,
    SUM(total_tokens) as tokens_today,
    1500 - COUNT(*) as remaining_requests
FROM ai_api_costs
WHERE DATE(timestamp) = CURRENT_DATE
  AND api_provider = 'gemini'
  AND success = TRUE;
```

### 13.2 Chat Analytics

**Most active conversations:**
```sql
SELECT
    c.id,
    c.title,
    u.name as user_name,
    c.message_count,
    c.started_at,
    c.updated_at
FROM ai_chat_conversations c
JOIN users u ON c.user_id = u.id
WHERE c.is_active = TRUE
ORDER BY c.message_count DESC
LIMIT 20;
```

**Average response metrics:**
```sql
SELECT
    AVG(tokens_input) as avg_input_tokens,
    AVG(tokens_output) as avg_output_tokens,
    AVG(latency_ms) as avg_latency_ms,
    AVG(cost_usd) as avg_cost_usd
FROM ai_chat_messages
WHERE role = 'assistant'
  AND created_at > NOW() - INTERVAL '7 days';
```

### 13.3 Error Monitoring

**Failed API requests:**
```sql
SELECT
    timestamp,
    model_name,
    feature,
    error_message,
    latency_ms
FROM ai_api_costs
WHERE success = FALSE
  AND api_provider = 'gemini'
ORDER BY timestamp DESC
LIMIT 20;
```

**Conversations with errors:**
```sql
-- Conversations where last message is from user (AI didn't respond)
SELECT
    c.id,
    c.title,
    c.message_count,
    c.updated_at,
    (SELECT content FROM ai_chat_messages
     WHERE conversation_id = c.id
     ORDER BY created_at DESC LIMIT 1) as last_message
FROM ai_chat_conversations c
WHERE c.message_count % 2 = 1  -- Odd number (user message without response)
  AND c.updated_at > NOW() - INTERVAL '1 hour'
ORDER BY c.updated_at DESC;
```

---

## 14. Future Enhancements

### 14.1 Planned Features

**1. Conversation Context Memory**
- Remember user preferences across sessions
- "Remember that I'm looking for IT services"
- Personalized recommendations

**2. Conversation Sharing**
- Share conversation URL with other users
- Public vs. private conversations
- Embed chat widget on company profiles

**3. Voice Input/Output**
- Web Speech API for voice input
- Text-to-speech for AI responses
- Hands-free interaction

**4. Multi-Modal Input**
- Upload images (company logo, product photos)
- Gemini Vision API for image analysis
- "Find companies similar to this logo"

**5. Conversation Search**
- Full-text search across all user conversations
- Filter by date, company mentioned, topic
- Export conversation history

**6. Advanced Analytics**
- Which companies are most recommended by AI?
- What services are users asking about most?
- Conversation funnel (browse → chat → contact)

### 14.2 Optimization Opportunities

**1. Redis Caching**
```python
# Cache all companies JSON
redis_key = f"companies:all:{version_hash}"
cached = redis.get(redis_key)

if cached:
    all_companies = json.loads(cached)
else:
    all_companies = load_from_db()
    redis.setex(redis_key, 300, json.dumps(all_companies))  # 5 min TTL
```

**2. Prompt Compression**
- Use Gemini's context caching feature (when available)
- Cache system prompt + company database
- Only send new user message (save 90% tokens)

**3. Streaming Responses**
```python
@app.route('/api/chat/<int:conversation_id>/message', methods=['POST'])
def chat_send_message(conversation_id):
    # Enable streaming
    response = gemini_service.generate_text(
        prompt=full_prompt,
        stream=True  # Return generator
    )

    # Server-Sent Events (SSE)
    def generate():
        for chunk in response:
            yield f"data: {json.dumps({'text': chunk.text})}\n\n"

    return Response(generate(), mimetype='text/event-stream')
```

**4. Conversation Summarization**
- Auto-summarize conversations > 20 messages
- Include summary instead of full history
- Reduce token usage by 50%

---

## 15. Troubleshooting Guide

### 15.1 Common Issues

**Issue: "Conversation not found" error**
```
Cause: User trying to access someone else's conversation
Fix: Verify conversation_id belongs to current_user.id

SQL Debug:
SELECT id, user_id FROM ai_chat_conversations WHERE id = 123;
```

**Issue: Empty AI responses**
```
Cause: Gemini safety filters blocking response
Fix: Check ai_api_costs for error_message

SQL Debug:
SELECT error_message, prompt_hash FROM ai_api_costs
WHERE success = FALSE ORDER BY timestamp DESC LIMIT 10;
```

**Issue: Slow response times (> 1 second)**
```
Cause: Large context (many companies, long history)
Fix: Check token counts, consider summarization

SQL Debug:
SELECT tokens_input, tokens_output, latency_ms
FROM ai_chat_messages
WHERE latency_ms > 1000
ORDER BY created_at DESC LIMIT 20;
```

**Issue: "Free tier limit exceeded"**
```
Cause: > 1,500 requests in 24 hours
Fix: Wait for quota reset (midnight Pacific Time)

SQL Debug:
SELECT COUNT(*) FROM ai_api_costs
WHERE DATE(timestamp) = CURRENT_DATE AND api_provider = 'gemini';
```

### 15.2 Diagnostic Commands

**Check Gemini API connectivity:**
```bash
python3 -c "
from gemini_service import GeminiService
svc = GeminiService()
response = svc.generate_text('Hello', feature='test')
print(response)
"
```

**Verify database connection:**
```bash
psql -U nordabiz_app -d nordabiz -c "
SELECT COUNT(*) as conversations FROM ai_chat_conversations;
SELECT COUNT(*) as messages FROM ai_chat_messages;
SELECT COUNT(*) as api_calls FROM ai_api_costs WHERE api_provider = 'gemini';
"
```

**Test chat flow:**
```python
from nordabiz_chat import NordaBizChatEngine

engine = NordaBizChatEngine()
conv = engine.start_conversation(user_id=1, title="Test")
response = engine.send_message(conv.id, "Test message", user_id=1)
print(f"Response: {response.content}")
```

---

## 16. Related Documentation

- **[Search Flow](./02-search-flow.md)** - Company search integration
- **[Authentication Flow](./01-authentication-flow.md)** - User authentication
- **[Flask Components](../04-flask-components.md)** - Application architecture
- **[External Integrations](../06-external-integrations.md)** - Gemini API details
- **[Database Schema](../05-database-schema.md)** - Database structure

---

## 17. Glossary

| Term | Definition |
|------|------------|
| **NordaBizChatEngine** | Main chat engine class in `nordabiz_chat.py` |
| **GeminiService** | Centralized Gemini API wrapper in `gemini_service.py` |
| **Conversation** | Chat session with multiple messages |
| **Context** | Full company database + history sent to AI |
| **Compact Format** | Token-optimized company data format |
| **Free Tier** | Google Gemini free tier (1,500 req/day) |
| **Token** | Unit of text (~4 characters) for AI models |
| **Latency** | Response time in milliseconds |
| **Cost Tracking** | Dual-level system (global + per-message) |
| **System Prompt** | Instructions sent to AI with each query |

---

## 18. Maintenance

**When to Update This Document:**
- ✅ Gemini model version change (e.g., 2.5 → 3.0)
- ✅ Pricing changes
- ✅ New chat features (voice, images, etc.)
- ✅ Context building algorithm changes
- ✅ Database schema changes
- ✅ Performance optimization implementations

**Document Owner:** Development Team
**Review Frequency:** Quarterly or after major changes
**Last Review:** 2026-01-10

---

**END OF DOCUMENT**