# NordaGPT Identity, Memory & Performance — Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Transform NordaGPT from an anonymous chatbot into a personalized assistant with user identity, persistent memory, smart routing, and streaming responses.

**Architecture:** Four-phase rollout: (1) inject user identity into AI prompt, (2) smart router + selective context loading, (3) streaming SSE responses, (4) persistent user memory with async extraction. Each phase is independently deployable and testable.

**Tech Stack:** Flask 3.0, SQLAlchemy 2.0, PostgreSQL, Google Gemini API (3-Flash, 3.1-Flash-Lite), Server-Sent Events, Jinja2 inline JS.

**Spec:** `docs/superpowers/specs/2026-03-28-nordagpt-identity-memory-design.md`

---

## File Structure

### New files

| File | Responsibility |
|------|---------------|
| `smart_router.py` | Classifies query complexity, selects data categories and model |
| `memory_service.py` | CRUD for user memory facts + conversation summaries, extraction prompt |
| `context_builder.py` | Loads selective data from DB based on router decision |
| `database/migrations/092_ai_user_memory.sql` | Memory + summary tables |
| `database/migrations/093_ai_conversation_summary.sql` | Summary table |

### Modified files

| File | Changes |
|------|---------|
| `database.py` | Add AIUserMemory, AIConversationSummary models (before line 5954) |
| `nordabiz_chat.py` | Accept user_context, integrate router, selective context, memory injection |
| `gemini_service.py` | Token counting for streamed responses |
| `blueprints/chat/routes.py` | Build user_context, add streaming endpoint, memory CRUD routes |
| `templates/chat.html` | Streaming UI, thinking animation, memory settings panel |

---

## Phase 1: User Identity (Tasks 1-3)

### Task 1: Pass user context from route to chat engine

**Files:**
- Modify: `blueprints/chat/routes.py:234-309`
- Modify: `nordabiz_chat.py:163-180`

- [ ] **Step 1: Build user_context dict in chat route**

In `blueprints/chat/routes.py`, modify `chat_send_message()`. After line 262 (where `current_user.id` and `current_user.email` are used for limit check), add user_context construction:

```python
# After line 262, before line 268
# Build user context for AI personalization
user_context = {
    'user_id': current_user.id,
    'user_name': current_user.name,
    'user_email': current_user.email,
    'company_name': current_user.company.name if current_user.company else None,
    'company_id': current_user.company.id if current_user.company else None,
    'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None,
    'company_role': current_user.company_role or 'MEMBER',
    'is_norda_member': current_user.is_norda_member,
    'chamber_role': current_user.chamber_role,
    'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None,
}
```

- [ ] **Step 2: Pass user_context to send_message()**

In the same function, modify the `chat_engine.send_message()` call (around line 282):

```python
# Before:
ai_response = chat_engine.send_message(
    conversation_id,
    user_message=message,
    user_id=current_user.id,
    thinking_level=thinking_level
)

# After:
ai_response = chat_engine.send_message(
    conversation_id,
    user_message=message,
    user_id=current_user.id,
    thinking_level=thinking_level,
    user_context=user_context
)
```

- [ ] **Step 3: Update send_message() signature in nordabiz_chat.py**

In `nordabiz_chat.py`, modify `send_message()` at line 163:

```python
# Before:
def send_message(
    self,
    conversation_id: int,
    user_message: str,
    user_id: int,
    thinking_level: str = 'high'
) -> AIChatMessage:

# After:
def send_message(
    self,
    conversation_id: int,
    user_message: str,
    user_id: int,
    thinking_level: str = 'high',
    user_context: Optional[Dict[str, Any]] = None
) -> AIChatMessage:
```

Add `from typing import Optional, Dict, Any` to imports if not already present.

- [ ] **Step 4: Thread user_context through to _query_ai()**

In `send_message()`, find the call to `_query_ai()` (around line 239) and add user_context:

```python
# Before:
ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level)

# After:
ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context)
```

- [ ] **Step 5: Update _query_ai() signature**

In `nordabiz_chat.py`, modify `_query_ai()` at line 890:

```python
# Before:
def _query_ai(
    self,
    context: Dict[str, Any],
    user_message: str,
    user_id: Optional[int] = None,
    thinking_level: str = 'high'
) -> str:

# After:
def _query_ai(
    self,
    context: Dict[str, Any],
    user_message: str,
    user_id: Optional[int] = None,
    thinking_level: str = 'high',
    user_context: Optional[Dict[str, Any]] = None
) -> str:
```

- [ ] **Step 6: Commit**

```bash
git add blueprints/chat/routes.py nordabiz_chat.py
git commit -m "refactor(chat): thread user_context from route through to _query_ai"
```

---

### Task 2: Inject user identity into system prompt

**Files:**
- Modify: `nordabiz_chat.py:920-930`

- [ ] **Step 1: Add user identity block to system prompt**

In `nordabiz_chat.py`, inside `_query_ai()`, find line ~922 where `system_prompt` starts. Insert the user identity block BEFORE the main system prompt string (after line 921, before line 922):

```python
        # Build user identity section
        user_identity = ""
        if user_context:
            user_identity = f"""
# AKTUALNY UŻYTKOWNIK
Rozmawiasz z: {user_context.get('user_name', 'Nieznany')}
Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')}
Rola w firmie: {user_context.get('company_role', 'MEMBER')}
Członek Izby Norda Biznes: {'tak' if user_context.get('is_norda_member') else 'nie'}
Rola w Izbie: {user_context.get('chamber_role') or '—'}
Na portalu od: {user_context.get('member_since', 'nieznana data')}

ZASADY PERSONALIZACJI:
- Zwracaj się do użytkownika po imieniu (pierwsze słowo z imienia i nazwiska)
- W pierwszej wiadomości konwersacji przywitaj się: "Cześć [imię], w czym mogę pomóc?"
- Na pytania "co wiesz o mnie?" / "kim jestem?" — wypisz powyższe dane + powiązania firmowe z bazy
- Uwzględniaj kontekst firmy użytkownika w odpowiedziach (np. sugeruj partnerów z komplementarnych branż)
- NIE ujawniaj danych technicznych (user_id, company_id, rola systemowa)
"""
```

- [ ] **Step 2: Prepend user_identity to system_prompt**

Find where `system_prompt` is first assigned (line 922) and prepend:

```python
        # Line 922 area - the system_prompt f-string starts here
        system_prompt = user_identity + f"""Jesteś pomocnym asystentem portalu Norda Biznes...
```

This is a minimal change — just concatenate `user_identity` (which is empty string if no context) before the existing prompt.

- [ ] **Step 3: Verify syntax compiles**

```bash
python3 -m py_compile nordabiz_chat.py && echo "OK"
```

- [ ] **Step 4: Test locally**

Start local dev server and send a chat message. Verify in logs that the prompt now contains the user identity block. Check that the AI greets by name.

```bash
python3 app.py
# In another terminal:
curl -X POST http://localhost:5000/api/chat/1/message \
  -H "Content-Type: application/json" \
  -d '{"message": "Kim jestem?"}'
```

(Note: requires auth cookie — easier to test via browser)

- [ ] **Step 5: Commit**

```bash
git add nordabiz_chat.py
git commit -m "feat(nordagpt): inject user identity into AI system prompt — personalized greetings and context"
```

---

### Task 3: Deploy Phase 1 and verify

**Files:** None (deployment only)

- [ ] **Step 1: Push to remotes**

```bash
git push origin master && git push inpi master
```

- [ ] **Step 2: Deploy to staging**

```bash
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
```

- [ ] **Step 3: Test on staging — verify AI greets by name**

Open https://staging.nordabiznes.pl/chat, start new conversation, type "Cześć". Verify AI responds with your name.

Type "Co wiesz o mnie?" — verify AI lists your profile data.

- [ ] **Step 4: Deploy to production**

```bash
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
```

- [ ] **Step 5: Commit deployment notes (update release_notes in routes.py)**

Add new release entry in `blueprints/public/routes.py` `_get_releases()` function.

---

## Phase 2: Smart Router + Context Builder (Tasks 4-7)

### Task 4: Create context_builder.py — selective data loading

**Files:**
- Create: `context_builder.py`

- [ ] **Step 1: Create context_builder.py with selective loading functions**

```python
"""
Context Builder for NordaGPT Smart Router
==========================================
Loads only the data categories requested by the Smart Router,
instead of loading everything for every query.
"""

import json
import logging
from typing import Dict, Any, List, Optional
from datetime import datetime, timedelta

from database import (
    SessionLocal, Company, Category, CompanyRecommendation,
    NordaEvent, Classified, ForumTopic, ForumReply,
    CompanyPerson, Person, User, CompanySocialMedia,
    GBPAudit, CompanyWebsiteAnalysis, ZOPKNews,
    UserCompanyPermissions
)
from sqlalchemy import func, desc

logger = logging.getLogger(__name__)


def _company_to_compact_dict(company) -> Dict:
    """Convert company to compact dict for AI context. Mirrors nordabiz_chat.py format."""
    return {
        'name': company.name,
        'cat': company.category.name if company.category else None,
        'profile': f'/firma/{company.slug}',
        'desc': company.description_short,
        'about': company.description_full[:500] if company.description_full else None,
        'svc': company.services,
        'comp': company.competencies,
        'web': company.website,
        'tel': company.phone,
        'mail': company.email,
        'city': company.city,
    }


def build_selective_context(
    data_needed: List[str],
    conversation_id: int,
    current_message: str,
    user_context: Optional[Dict] = None
) -> Dict[str, Any]:
    """
    Build AI context with only the requested data categories.

    Args:
        data_needed: List of category strings from Smart Router, e.g.:
            ["companies_all", "companies_filtered:IT", "companies_single:termo",
             "events", "news", "classifieds", "forum", "company_people",
             "registered_users", "social_media", "audits"]
        conversation_id: Current conversation ID for history
        current_message: User's message text
        user_context: User identity dict

    Returns:
        Context dict compatible with nordabiz_chat.py _query_ai()
    """
    db = SessionLocal()
    context = {}

    try:
        # Always load: basic stats and conversation history
        active_companies = db.query(Company).filter_by(status='active').all()
        context['total_companies'] = len(active_companies)

        categories = db.query(Category).all()
        context['categories'] = [
            {'name': c.name, 'slug': c.slug, 'company_count': len([co for co in active_companies if co.category_id == c.id])}
            for c in categories
        ]

        # Conversation history (always loaded)
        from database import AIChatMessage, AIChatConversation
        messages = db.query(AIChatMessage).filter_by(
            conversation_id=conversation_id
        ).order_by(AIChatMessage.created_at.desc()).limit(10).all()
        context['recent_messages'] = [
            {'role': msg.role, 'content': msg.content}
            for msg in reversed(messages)
        ]

        # Selective data loading based on router decision
        for category in data_needed:
            if category == 'companies_all':
                context['all_companies'] = [_company_to_compact_dict(c) for c in active_companies]

            elif category.startswith('companies_filtered:'):
                filter_cat = category.split(':', 1)[1]
                filtered = [c for c in active_companies
                           if c.category and c.category.name.lower() == filter_cat.lower()]
                context['all_companies'] = [_company_to_compact_dict(c) for c in filtered]

            elif category.startswith('companies_single:'):
                search = category.split(':', 1)[1].lower()
                matched = [c for c in active_companies
                          if search in c.name.lower() or search in (c.slug or '')]
                context['all_companies'] = [_company_to_compact_dict(c) for c in matched[:5]]

            elif category == 'events':
                events = db.query(NordaEvent).filter(
                    NordaEvent.event_date >= datetime.now(),
                    NordaEvent.event_date <= datetime.now() + timedelta(days=60)
                ).order_by(NordaEvent.event_date).all()
                context['upcoming_events'] = [
                    {'title': e.title, 'date': str(e.event_date), 'type': e.event_type,
                     'location': e.location, 'url': f'/kalendarz/{e.id}'}
                    for e in events
                ]

            elif category == 'news':
                news = db.query(ZOPKNews).filter(
                    ZOPKNews.published_at >= datetime.now() - timedelta(days=30),
                    ZOPKNews.status == 'approved'
                ).order_by(ZOPKNews.published_at.desc()).limit(10).all()
                context['recent_news'] = [
                    {'title': n.title, 'summary': n.ai_summary, 'date': str(n.published_at),
                     'source': n.source_name, 'url': n.source_url}
                    for n in news
                ]

            elif category == 'classifieds':
                classifieds = db.query(Classified).filter(
                    Classified.status == 'active',
                    Classified.is_test == False
                ).order_by(Classified.created_at.desc()).limit(20).all()
                context['classifieds'] = [
                    {'type': c.listing_type, 'title': c.title, 'description': c.description,
                     'company': c.company.name if c.company else None,
                     'budget': c.budget_text, 'url': f'/b2b/{c.id}'}
                    for c in classifieds
                ]

            elif category == 'forum':
                topics = db.query(ForumTopic).filter(
                    ForumTopic.is_test == False
                ).order_by(ForumTopic.created_at.desc()).limit(15).all()
                context['forum_topics'] = [
                    {'title': t.title, 'content': t.content[:300],
                     'author': t.author.name if t.author else None,
                     'replies': t.reply_count, 'url': f'/forum/{t.slug}'}
                    for t in topics
                ]

            elif category == 'company_people':
                people_query = db.query(CompanyPerson).join(Person).join(Company).filter(
                    Company.status == 'active'
                ).all()
                grouped = {}
                for cp in people_query:
                    cname = cp.company.name
                    if cname not in grouped:
                        grouped[cname] = []
                    grouped[cname].append({
                        'name': cp.person.name,
                        'role': cp.role_description,
                        'shares': cp.shares_value
                    })
                context['company_people'] = grouped

            elif category == 'registered_users':
                users = db.query(User).filter(
                    User.is_active == True,
                    User.company_id.isnot(None)
                ).all()
                grouped = {}
                for u in users:
                    cname = u.company.name if u.company else 'Brak firmy'
                    if cname not in grouped:
                        grouped[cname] = []
                    grouped[cname].append({
                        'name': u.name, 'email': u.email,
                        'role': u.company_role, 'member': u.is_norda_member
                    })
                context['registered_users'] = grouped

            elif category == 'social_media':
                socials = db.query(CompanySocialMedia).filter_by(is_valid=True).all()
                grouped = {}
                for s in socials:
                    cname = s.company.name if s.company else 'Unknown'
                    if cname not in grouped:
                        grouped[cname] = []
                    grouped[cname].append({
                        'platform': s.platform, 'url': s.url,
                        'followers': s.followers_count
                    })
                context['company_social_media'] = grouped

            elif category == 'audits':
                # GBP audits
                gbp = db.query(GBPAudit).order_by(GBPAudit.created_at.desc()).all()
                seen = set()
                gbp_unique = []
                for g in gbp:
                    if g.company_id not in seen:
                        seen.add(g.company_id)
                        gbp_unique.append({
                            'company': g.company.name if g.company else None,
                            'score': g.overall_score, 'reviews': g.total_reviews,
                            'rating': g.average_rating
                        })
                context['gbp_audits'] = gbp_unique

                # SEO audits
                seo = db.query(CompanyWebsiteAnalysis).all()
                context['seo_audits'] = [
                    {'company': s.company.name if s.company else None,
                     'seo': s.seo_score, 'performance': s.performance_score}
                    for s in seo
                ]

        # If no companies were loaded by any category, load a minimal summary
        if 'all_companies' not in context:
            context['all_companies'] = []

    finally:
        db.close()

    return context
```

- [ ] **Step 2: Verify syntax**

```bash
python3 -m py_compile context_builder.py && echo "OK"
```

- [ ] **Step 3: Commit**

```bash
git add context_builder.py
git commit -m "feat(nordagpt): add context_builder.py — selective data loading for smart router"
```

---

### Task 5: Create smart_router.py — query classification

**Files:**
- Create: `smart_router.py`

- [ ] **Step 1: Create smart_router.py**

```python
"""
Smart Router for NordaGPT
==========================
Classifies query complexity and selects which data categories to load.
Uses Gemini 3.1 Flash-Lite for fast, cheap classification (~1-2s).
"""

import json
import logging
import time
from typing import Dict, Any, List, Optional

logger = logging.getLogger(__name__)

# Keyword-based fast routing (no API call needed)
FAST_ROUTES = {
    'companies_all': ['wszystkie firmy', 'ile firm', 'lista firm', 'katalog', 'porównaj firmy'],
    'events': ['wydarzenie', 'spotkanie', 'kalendarz', 'konferencja', 'szkolenie', 'kiedy'],
    'news': ['aktualności', 'nowości', 'wiadomości', 'pej', 'atom', 'elektrownia', 'zopk'],
    'classifieds': ['ogłoszenie', 'b2b', 'zlecenie', 'oferta', 'szukam', 'oferuję'],
    'forum': ['forum', 'dyskusja', 'temat', 'wątek', 'post'],
    'company_people': ['zarząd', 'krs', 'właściciel', 'prezes', 'udziały', 'wspólnik'],
    'registered_users': ['użytkownik', 'kto jest', 'profil', 'zarejestrowany', 'członek'],
    'social_media': ['facebook', 'instagram', 'linkedin', 'social media', 'media społeczn'],
    'audits': ['seo', 'google', 'gbp', 'opinie', 'ocena', 'pageSpeed'],
}

# Model selection by complexity
MODEL_MAP = {
    'simple': {'model': '3.1-flash-lite', 'thinking': 'minimal'},
    'medium': {'model': '3-flash', 'thinking': 'low'},
    'complex': {'model': '3-flash', 'thinking': 'high'},
}

ROUTER_PROMPT = """Jesteś routerem zapytań. Przeanalizuj pytanie i zdecyduj jakie dane potrzebne.

Użytkownik: {user_name} z firmy {company_name}
Pytanie: {message}

Zwróć TYLKO JSON (bez markdown):
{{
  "complexity": "simple|medium|complex",
  "data_needed": ["lista kategorii z poniższych"]
}}

Kategorie:
- companies_all — wszystkie firmy (porównania, przeglądy, "ile firm")
- companies_filtered:KATEGORIA — firmy z kategorii (np. companies_filtered:IT)
- companies_single:NAZWA — jedna firma (np. companies_single:termo)
- events — nadchodzące wydarzenia
- news — aktualności, PEJ, ZOPK
- classifieds — ogłoszenia B2B
- forum — tematy forum
- company_people — zarząd, KRS, udziałowcy
- registered_users — użytkownicy portalu
- social_media — profile social media firm
- audits — wyniki SEO/GBP

Zasady:
- "simple" = jedno pytanie o konkretną rzecz (telefon, adres, link)
- "medium" = porównanie, lista, filtrowanie
- "complex" = analiza, strategia, rekomendacje
- Wybierz MINIMUM kategorii. Nie ładuj niepotrzebnych danych.
- Jeśli pytanie dotyczy konkretnej firmy, użyj companies_single:nazwa
- Pytania ogólne o użytkownika (kim jestem, co wiesz) = [] (dane z profilu wystarczą)
"""


def route_query_fast(message: str, user_context: Optional[Dict] = None) -> Dict[str, Any]:
    """
    Fast keyword-based routing. No API call.
    Returns routing decision or None if uncertain (needs AI router).
    """
    msg_lower = message.lower()

    # Check for personal questions — no data needed
    personal_patterns = ['kim jestem', 'co wiesz o mnie', 'mój profil', 'moje dane']
    if any(p in msg_lower for p in personal_patterns):
        return {
            'complexity': 'simple',
            'data_needed': [],
            'model': '3.1-flash-lite',
            'thinking': 'minimal',
            'routed_by': 'fast'
        }

    # Check for greetings — no data needed
    greeting_patterns = ['cześć', 'hej', 'witam', 'dzień dobry', 'siema', 'hello']
    if any(msg_lower.strip().startswith(p) for p in greeting_patterns) and len(message) < 30:
        return {
            'complexity': 'simple',
            'data_needed': [],
            'model': '3.1-flash-lite',
            'thinking': 'minimal',
            'routed_by': 'fast'
        }

    # Check keyword matches
    matched_categories = []
    for category, keywords in FAST_ROUTES.items():
        if any(kw in msg_lower for kw in keywords):
            matched_categories.append(category)

    # Check for specific company name mention
    # Simple heuristic: if message has quotes or specific capitalized words
    if not matched_categories:
        # Can't determine — return None to trigger AI router
        return None

    # Determine complexity
    if len(matched_categories) <= 1 and len(message) < 80:
        complexity = 'simple'
    elif len(matched_categories) <= 2:
        complexity = 'medium'
    else:
        complexity = 'complex'

    model_config = MODEL_MAP[complexity]
    return {
        'complexity': complexity,
        'data_needed': matched_categories,
        'model': model_config['model'],
        'thinking': model_config['thinking'],
        'routed_by': 'fast'
    }


def route_query_ai(
    message: str,
    user_context: Optional[Dict] = None,
    gemini_service=None
) -> Dict[str, Any]:
    """
    AI-powered routing using Flash-Lite. Called when fast routing is uncertain.
    """
    if not gemini_service:
        # Fallback: load everything
        return _fallback_route()

    user_name = user_context.get('user_name', 'Nieznany') if user_context else 'Nieznany'
    company_name = user_context.get('company_name', 'brak') if user_context else 'brak'

    prompt = ROUTER_PROMPT.format(
        user_name=user_name,
        company_name=company_name,
        message=message
    )

    try:
        start = time.time()
        response = gemini_service.generate_text(
            prompt=prompt,
            temperature=0.1,
            max_tokens=200,
            model='gemini-3.1-flash-lite-preview',
            thinking_level='minimal',
            feature='smart_router'
        )
        latency = int((time.time() - start) * 1000)
        logger.info(f"Smart Router AI response in {latency}ms: {response[:200]}")

        # Parse JSON from response
        # Handle potential markdown wrapping
        text = response.strip()
        if text.startswith('```'):
            text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()

        result = json.loads(text)
        complexity = result.get('complexity', 'medium')
        model_config = MODEL_MAP.get(complexity, MODEL_MAP['medium'])

        return {
            'complexity': complexity,
            'data_needed': result.get('data_needed', []),
            'model': model_config['model'],
            'thinking': model_config['thinking'],
            'routed_by': 'ai',
            'router_latency_ms': latency
        }

    except (json.JSONDecodeError, KeyError, Exception) as e:
        logger.warning(f"Smart Router AI failed: {e}, falling back to full context")
        return _fallback_route()


def route_query(
    message: str,
    user_context: Optional[Dict] = None,
    gemini_service=None
) -> Dict[str, Any]:
    """
    Main entry point. Tries fast routing first, falls back to AI routing.
    """
    # Try fast keyword-based routing
    result = route_query_fast(message, user_context)
    if result is not None:
        logger.info(f"Smart Router FAST: complexity={result['complexity']}, data={result['data_needed']}")
        return result

    # Fall back to AI routing
    result = route_query_ai(message, user_context, gemini_service)
    logger.info(f"Smart Router AI: complexity={result['complexity']}, data={result['data_needed']}")
    return result


def _fallback_route() -> Dict[str, Any]:
    """Fallback: load everything, use default model. Safe but slow."""
    return {
        'complexity': 'medium',
        'data_needed': [
            'companies_all', 'events', 'news', 'classifieds',
            'forum', 'company_people', 'registered_users'
        ],
        'model': '3-flash',
        'thinking': 'low',
        'routed_by': 'fallback'
    }
```

- [ ] **Step 2: Verify syntax**

```bash
python3 -m py_compile smart_router.py && echo "OK"
```

- [ ] **Step 3: Commit**

```bash
git add smart_router.py
git commit -m "feat(nordagpt): add smart_router.py — fast keyword routing + AI fallback"
```

---

### Task 6: Integrate Smart Router into nordabiz_chat.py

**Files:**
- Modify: `nordabiz_chat.py:163-282, 347-643, 890-1365`

- [ ] **Step 1: Add imports at top of nordabiz_chat.py**

After existing imports (around line 30), add:

```python
from smart_router import route_query
from context_builder import build_selective_context
```

- [ ] **Step 2: Modify send_message() to use Smart Router**

In `send_message()`, replace the call to `_build_conversation_context()` and `_query_ai()` (around lines 236-239). The key change: use the router to decide model and data, then use context_builder for selective loading.

Find the section where context is built and AI is queried (around lines 236-241):

```python
# Before (approximately lines 236-241):
# context = self._build_conversation_context(db, conversation, original_message)
# ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context)

# After:
# Smart Router — classify query and select data + model
route_decision = route_query(
    message=original_message,
    user_context=user_context,
    gemini_service=self.gemini_service
)

# Override model and thinking based on router decision
effective_model = route_decision.get('model', '3-flash')
effective_thinking = route_decision.get('thinking', thinking_level)

# Build selective context (only requested data categories)
context = build_selective_context(
    data_needed=route_decision.get('data_needed', []),
    conversation_id=conversation.id,
    current_message=original_message,
    user_context=user_context
)

# Use the original _query_ai but with router-selected parameters
ai_response_text = self._query_ai(
    context, original_message,
    user_id=user_id,
    thinking_level=effective_thinking,
    user_context=user_context
)
```

Note: Keep `_build_conversation_context()` and full `_query_ai()` intact as fallback. The router's `_fallback_route()` loads all data, so it's safe.

- [ ] **Step 3: Log routing decisions**

After the route_query call, add logging:

```python
logger.info(
    f"NordaGPT Router: user={user_context.get('user_name') if user_context else '?'}, "
    f"complexity={route_decision['complexity']}, model={effective_model}, "
    f"thinking={effective_thinking}, data={route_decision['data_needed']}, "
    f"routed_by={route_decision.get('routed_by')}"
)
```

- [ ] **Step 4: Update the GeminiService call in _query_ai() to use effective model**

Currently `_query_ai()` uses `self.gemini_service` which has a fixed model. We need to pass the router-selected model to the generate_text call. In `_query_ai()`, around line 1352, modify:

```python
# Before:
response = self.gemini_service.generate_text(
    prompt=full_prompt,
    temperature=0.7,
    thinking_level=thinking_level,
    user_id=user_id,
    feature='chat'
)

# After:
response = self.gemini_service.generate_text(
    prompt=full_prompt,
    temperature=0.7,
    thinking_level=thinking_level,
    user_id=user_id,
    feature='chat',
    model=route_decision.get('model') if hasattr(self, '_current_route_decision') else None
)
```

Actually, a cleaner approach — pass the model through context:

In `send_message()`, add to context before calling `_query_ai()`:
```python
context['_route_decision'] = route_decision
```

In `_query_ai()`, read it at the generate_text call:
```python
route = context.get('_route_decision', {})
effective_model_id = None
model_alias = route.get('model')
if model_alias:
    from gemini_service import GEMINI_MODELS
    effective_model_id = GEMINI_MODELS.get(model_alias)

response = self.gemini_service.generate_text(
    prompt=full_prompt,
    temperature=0.7,
    thinking_level=thinking_level,
    user_id=user_id,
    feature='chat',
    model=effective_model_id
)
```

- [ ] **Step 5: Verify syntax**

```bash
python3 -m py_compile nordabiz_chat.py && echo "OK"
```

- [ ] **Step 6: Commit**

```bash
git add nordabiz_chat.py
git commit -m "feat(nordagpt): integrate smart router — selective context loading + adaptive model selection"
```

---

### Task 7: Deploy Phase 2 and verify

- [ ] **Step 1: Push and deploy to staging**

```bash
git push origin master && git push inpi master
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
```

- [ ] **Step 2: Test on staging — verify routing works**

Test simple query: "Jaki jest telefon do TERMO?" — should be fast (2-3s), Flash-Lite model.
Test medium query: "Porównaj firmy budowlane w Izbie" — should load companies_all, medium speed.
Test complex query: "Jakie firmy mogłyby współpracować przy projekcie PEJ?" — should use full context.

Check logs for routing decisions:
```bash
ssh maciejpi@10.22.68.248 "journalctl -u nordabiznes -n 30 --no-pager | grep 'Router'"
```

- [ ] **Step 3: Deploy to production**

```bash
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
```

---

## Phase 3: Streaming Responses (Tasks 8-10)

### Task 8: Add streaming endpoint in Flask

**Files:**
- Modify: `blueprints/chat/routes.py`
- Modify: `nordabiz_chat.py`

- [ ] **Step 1: Add SSE streaming endpoint**

In `blueprints/chat/routes.py`, add a new route after `chat_send_message()` (after line ~309):

```python
@bp.route('/api/chat/<int:conversation_id>/message/stream', methods=['POST'])
@login_required
@member_required
def chat_send_message_stream(conversation_id):
    """Send message to AI chat with streaming response (SSE)"""
    from flask import Response, stream_with_context
    import json as json_module

    data = request.get_json()
    if not data or not data.get('message', '').strip():
        return jsonify({'error': 'Wiadomość nie może być pusta'}), 400

    message = data['message'].strip()

    # Check limits
    from nordabiz_chat import check_user_limits
    limit_result = check_user_limits(current_user.id, current_user.email)
    if limit_result.get('limited'):
        return jsonify({'error': 'Przekroczono limit', 'limit_info': limit_result}), 429

    # Build user context
    user_context = {
        'user_id': current_user.id,
        'user_name': current_user.name,
        'user_email': current_user.email,
        'company_name': current_user.company.name if current_user.company else None,
        'company_id': current_user.company.id if current_user.company else None,
        'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None,
        'company_role': current_user.company_role or 'MEMBER',
        'is_norda_member': current_user.is_norda_member,
        'chamber_role': current_user.chamber_role,
        'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None,
    }

    model_choice = data.get('model') or session.get('chat_model', 'flash')
    model_key = '3-flash' if model_choice == 'flash' else '3-pro'

    def generate():
        try:
            chat_engine = NordaBizChatEngine(model=model_key)
            for chunk in chat_engine.send_message_stream(
                conversation_id=conversation_id,
                user_message=message,
                user_id=current_user.id,
                user_context=user_context
            ):
                yield f"data: {json_module.dumps(chunk, ensure_ascii=False)}\n\n"
        except PermissionError:
            yield f"data: {json_module.dumps({'type': 'error', 'content': 'Brak dostępu do tej konwersacji'})}\n\n"
        except Exception as e:
            logger.error(f"Streaming error: {e}")
            yield f"data: {json_module.dumps({'type': 'error', 'content': 'Wystąpił błąd'})}\n\n"

    return Response(
        stream_with_context(generate()),
        mimetype='text/event-stream',
        headers={
            'Cache-Control': 'no-cache',
            'X-Accel-Buffering': 'no',  # Disable Nginx buffering
        }
    )
```

- [ ] **Step 2: Add send_message_stream() to NordaBizChatEngine**

In `nordabiz_chat.py`, add a new method after `send_message()` (after line ~282):

```python
def send_message_stream(
    self,
    conversation_id: int,
    user_message: str,
    user_id: int,
    user_context: Optional[Dict[str, Any]] = None
):
    """
    Generator that yields streaming chunks for SSE.
    Yields dicts: {'type': 'thinking'|'token'|'done'|'error', 'content': '...'}
    """
    import time

    db = SessionLocal()
    try:
        conversation = db.query(AIChatConversation).filter_by(
            id=conversation_id, user_id=user_id
        ).first()
        if not conversation:
            yield {'type': 'error', 'content': 'Konwersacja nie znaleziona'}
            return

        # Save user message
        original_message = user_message
        sanitized = self._sanitize_message(user_message)
        user_msg = AIChatMessage(
            conversation_id=conversation_id,
            role='user',
            content=sanitized
        )
        db.add(user_msg)
        db.commit()

        # Smart Router
        route_decision = route_query(
            message=original_message,
            user_context=user_context,
            gemini_service=self.gemini_service
        )

        yield {'type': 'thinking', 'content': 'Analizuję pytanie...'}

        # Build selective context
        context = build_selective_context(
            data_needed=route_decision.get('data_needed', []),
            conversation_id=conversation.id,
            current_message=original_message,
            user_context=user_context
        )
        context['_route_decision'] = route_decision

        # Build prompt (reuse _query_ai logic for prompt building)
        full_prompt = self._build_prompt(context, original_message, user_context, route_decision.get('thinking', 'low'))

        # Get effective model
        from gemini_service import GEMINI_MODELS
        model_alias = route_decision.get('model', '3-flash')
        effective_model = GEMINI_MODELS.get(model_alias, self.model_name)

        # Stream from Gemini
        start_time = time.time()
        stream_response = self.gemini_service.generate_text(
            prompt=full_prompt,
            temperature=0.7,
            stream=True,
            thinking_level=route_decision.get('thinking', 'low'),
            user_id=user_id,
            feature='chat_stream',
            model=effective_model
        )

        full_text = ""
        for chunk in stream_response:
            if hasattr(chunk, 'text') and chunk.text:
                full_text += chunk.text
                yield {'type': 'token', 'content': chunk.text}

        latency_ms = int((time.time() - start_time) * 1000)

        # Save AI response to DB
        ai_msg = AIChatMessage(
            conversation_id=conversation_id,
            role='assistant',
            content=full_text,
            latency_ms=latency_ms
        )
        db.add(ai_msg)
        conversation.updated_at = datetime.now()
        conversation.message_count = (conversation.message_count or 0) + 2
        db.commit()

        yield {
            'type': 'done',
            'message_id': ai_msg.id,
            'latency_ms': latency_ms,
            'model': model_alias,
            'complexity': route_decision.get('complexity')
        }

    except Exception as e:
        logger.error(f"Stream error: {e}", exc_info=True)
        yield {'type': 'error', 'content': 'Wystąpił błąd podczas generowania odpowiedzi'}
    finally:
        db.close()
```

- [ ] **Step 3: Extract prompt building into reusable method**

Add a `_build_prompt()` method to `NordaBizChatEngine` that extracts prompt construction from `_query_ai()`. This method builds the full prompt string without calling Gemini:

```python
def _build_prompt(
    self,
    context: Dict[str, Any],
    user_message: str,
    user_context: Optional[Dict[str, Any]] = None,
    thinking_level: str = 'low'
) -> str:
    """Build the full prompt string. Extracted from _query_ai() for reuse in streaming."""
    # Build user identity section
    user_identity = ""
    if user_context:
        user_identity = f"""
# AKTUALNY UŻYTKOWNIK
Rozmawiasz z: {user_context.get('user_name', 'Nieznany')}
Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')}
Rola w firmie: {user_context.get('company_role', 'MEMBER')}
Członek Izby: {'tak' if user_context.get('is_norda_member') else 'nie'}
Rola w Izbie: {user_context.get('chamber_role') or '—'}
Na portalu od: {user_context.get('member_since', 'nieznana data')}
"""

    # Reuse the existing system_prompt from _query_ai() lines 922-1134
    # This is the same static prompt — extract it to a class attribute or method
    # For now, call _query_ai's prompt logic
    # NOTE: In implementation, refactor the static prompt into a separate method
    # to avoid duplication. The key point is that _build_prompt returns the
    # same prompt string that _query_ai would build.

    # ... (reuse existing system prompt construction logic) ...

    return full_prompt
```

**Implementation note:** The actual implementation should refactor `_query_ai()` to call `_build_prompt()` internally, then the streaming method also calls `_build_prompt()`. This avoids prompt duplication.

- [ ] **Step 4: Verify syntax**

```bash
python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK"
```

- [ ] **Step 5: Commit**

```bash
git add nordabiz_chat.py blueprints/chat/routes.py
git commit -m "feat(nordagpt): add streaming SSE endpoint + send_message_stream method"
```

---

### Task 9: Frontend streaming UI

**Files:**
- Modify: `templates/chat.html`

- [ ] **Step 1: Add streaming sendMessage function**

In `templates/chat.html`, replace the existing `sendMessage()` function (lines 2373-2454) with a streaming version:

```javascript
async function sendMessage() {
    const input = document.getElementById('messageInput');
    const message = input.value.trim();
    if (!message || isSending) return;

    isSending = true;
    document.getElementById('sendBtn').disabled = true;
    input.value = '';
    autoResizeTextarea();

    // Add user message to chat
    addMessage('user', message);

    // Create conversation if needed
    if (!currentConversationId) {
        try {
            const startRes = await fetch('/api/chat/start', {
                method: 'POST',
                headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
                body: JSON.stringify({title: message.substring(0, 50)})
            });
            const startData = await startRes.json();
            currentConversationId = startData.conversation_id;
        } catch (e) {
            addMessage('assistant', 'Błąd tworzenia konwersacji.');
            isSending = false;
            document.getElementById('sendBtn').disabled = false;
            return;
        }
    }

    // Add empty assistant bubble with thinking animation
    const msgDiv = document.createElement('div');
    msgDiv.className = 'message assistant';
    msgDiv.innerHTML = `
        <div class="message-avatar">AI</div>
        <div class="message-content">
            <div class="thinking-dots"><span>.</span><span>.</span><span>.</span></div>
        </div>
    `;
    document.getElementById('chatMessages').appendChild(msgDiv);
    scrollToBottom();

    const contentDiv = msgDiv.querySelector('.message-content');

    try {
        const response = await fetch(`/api/chat/${currentConversationId}/message/stream`, {
            method: 'POST',
            headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
            body: JSON.stringify({message: message, model: currentModel})
        });

        if (response.status === 429) {
            contentDiv.innerHTML = '';
            contentDiv.textContent = 'Przekroczono limit zapytań.';
            showLimitBanner();
            isSending = false;
            document.getElementById('sendBtn').disabled = false;
            return;
        }

        const reader = response.body.getReader();
        const decoder = new TextDecoder();
        let fullText = '';
        let thinkingRemoved = false;

        while (true) {
            const {done, value} = await reader.read();
            if (done) break;

            const text = decoder.decode(value, {stream: true});
            const lines = text.split('\n');

            for (const line of lines) {
                if (!line.startsWith('data: ')) continue;
                try {
                    const chunk = JSON.parse(line.slice(6));

                    if (chunk.type === 'thinking') {
                        // Keep thinking dots visible
                        continue;
                    }

                    if (chunk.type === 'token') {
                        if (!thinkingRemoved) {
                            contentDiv.innerHTML = '';
                            thinkingRemoved = true;
                        }
                        fullText += chunk.content;
                        contentDiv.innerHTML = formatMessage(fullText);
                        scrollToBottom();
                    }

                    if (chunk.type === 'done') {
                        // Add tech info badge
                        if (chunk.latency_ms) {
                            const badge = document.createElement('div');
                            badge.className = 'thinking-info-badge';
                            badge.textContent = `${chunk.model || 'AI'} · ${(chunk.latency_ms/1000).toFixed(1)}s`;
                            msgDiv.appendChild(badge);
                        }
                        loadConversations();
                    }

                    if (chunk.type === 'error') {
                        contentDiv.innerHTML = '';
                        contentDiv.textContent = chunk.content || 'Wystąpił błąd';
                    }
                } catch (e) {
                    // Skip malformed chunks
                }
            }
        }
    } catch (e) {
        contentDiv.innerHTML = '';
        contentDiv.textContent = 'Błąd połączenia z serwerem.';
    }

    isSending = false;
    document.getElementById('sendBtn').disabled = false;
}
```

- [ ] **Step 2: Add CSS for thinking animation**

In `templates/chat.html`, in the `{% block extra_css %}` section, add:

```css
.thinking-dots {
    display: flex;
    gap: 4px;
    padding: 8px 0;
}

.thinking-dots span {
    animation: thinkBounce 1.4s infinite ease-in-out both;
    font-size: 1.5rem;
    color: var(--text-secondary);
}

.thinking-dots span:nth-child(1) { animation-delay: -0.32s; }
.thinking-dots span:nth-child(2) { animation-delay: -0.16s; }
.thinking-dots span:nth-child(3) { animation-delay: 0s; }

@keyframes thinkBounce {
    0%, 80%, 100% { transform: scale(0); }
    40% { transform: scale(1); }
}
```

- [ ] **Step 3: Verify locally and commit**

```bash
python3 -m py_compile app.py && echo "OK"
git add templates/chat.html
git commit -m "feat(nordagpt): streaming UI — word-by-word response with thinking animation"
```

---

### Task 10: Deploy Phase 3 and verify streaming

- [ ] **Step 1: Check Nginx/NPM config for SSE support**

SSE requires Nginx to NOT buffer the response. The streaming endpoint sets `X-Accel-Buffering: no` header. Verify NPM custom config allows this:

```bash
ssh maciejpi@57.128.200.27 "cat /etc/nginx/sites-enabled/nordabiznes.conf 2>/dev/null || echo 'Using NPM proxy'"
```

If using NPM, the `X-Accel-Buffering: no` header should be sufficient. If not, add to NPM custom Nginx config for nordabiznes.pl:
```
proxy_buffering off;
proxy_cache off;
```

- [ ] **Step 2: Push, deploy to staging, test streaming**

```bash
git push origin master && git push inpi master
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
```

Test on staging: open chat, send message, verify text appears word-by-word.

- [ ] **Step 3: Deploy to production**

```bash
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
```

---

## Phase 4: Persistent User Memory (Tasks 11-15)

### Task 11: Database migration — memory tables

**Files:**
- Create: `database/migrations/092_ai_user_memory.sql`
- Create: `database/migrations/093_ai_conversation_summary.sql`

- [ ] **Step 1: Create migration 092**

```sql
-- 092_ai_user_memory.sql
-- Persistent memory for NordaGPT — per-user facts extracted from conversations

CREATE TABLE IF NOT EXISTS ai_user_memory (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    fact TEXT NOT NULL,
    category VARCHAR(50) DEFAULT 'general',
    source_conversation_id INTEGER REFERENCES ai_chat_conversations(id) ON DELETE SET NULL,
    confidence FLOAT DEFAULT 1.0,
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP DEFAULT (NOW() + INTERVAL '12 months'),
    is_active BOOLEAN DEFAULT TRUE
);

CREATE INDEX idx_ai_user_memory_user_active ON ai_user_memory(user_id, is_active, confidence DESC);
CREATE INDEX idx_ai_user_memory_expires ON ai_user_memory(expires_at) WHERE is_active = TRUE;

GRANT ALL ON TABLE ai_user_memory TO nordabiz_app;
GRANT USAGE, SELECT ON SEQUENCE ai_user_memory_id_seq TO nordabiz_app;
```

- [ ] **Step 2: Create migration 093**

```sql
-- 093_ai_conversation_summary.sql
-- Auto-generated summaries of AI conversations for memory context

CREATE TABLE IF NOT EXISTS ai_conversation_summary (
    id SERIAL PRIMARY KEY,
    conversation_id INTEGER NOT NULL UNIQUE REFERENCES ai_chat_conversations(id) ON DELETE CASCADE,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    summary TEXT NOT NULL,
    key_topics JSONB DEFAULT '[]',
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_ai_conv_summary_user ON ai_conversation_summary(user_id, created_at DESC);

GRANT ALL ON TABLE ai_conversation_summary TO nordabiz_app;
GRANT USAGE, SELECT ON SEQUENCE ai_conversation_summary_id_seq TO nordabiz_app;
```

- [ ] **Step 3: Commit migrations**

```bash
git add database/migrations/092_ai_user_memory.sql database/migrations/093_ai_conversation_summary.sql
git commit -m "feat(nordagpt): add migrations for user memory and conversation summary tables"
```

---

### Task 12: Add SQLAlchemy models

**Files:**
- Modify: `database.py` (insert before line 5954)

- [ ] **Step 1: Add AIUserMemory model**

Insert before the `# DATABASE INITIALIZATION` comment (line 5954):

```python
class AIUserMemory(Base):
    __tablename__ = 'ai_user_memory'

    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False)
    fact = Column(Text, nullable=False)
    category = Column(String(50), default='general')
    source_conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='SET NULL'), nullable=True)
    confidence = Column(Float, default=1.0)
    created_at = Column(DateTime, default=datetime.utcnow)
    expires_at = Column(DateTime)
    is_active = Column(Boolean, default=True)

    user = relationship('User')
    source_conversation = relationship('AIChatConversation')


class AIConversationSummary(Base):
    __tablename__ = 'ai_conversation_summary'

    id = Column(Integer, primary_key=True)
    conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='CASCADE'), nullable=False, unique=True)
    user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False)
    summary = Column(Text, nullable=False)
    key_topics = Column(JSON, default=list)
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow)

    user = relationship('User')
    conversation = relationship('AIChatConversation')
```

- [ ] **Step 2: Verify syntax**

```bash
python3 -m py_compile database.py && echo "OK"
```

- [ ] **Step 3: Commit**

```bash
git add database.py
git commit -m "feat(nordagpt): add AIUserMemory and AIConversationSummary ORM models"
```

---

### Task 13: Create memory_service.py

**Files:**
- Create: `memory_service.py`

- [ ] **Step 1: Create memory_service.py**

```python
"""
Memory Service for NordaGPT
=============================
Manages persistent per-user memory: fact extraction, storage, retrieval, cleanup.
"""

import json
import logging
from datetime import datetime, timedelta
from typing import Dict, Any, List, Optional

from database import SessionLocal, AIUserMemory, AIConversationSummary, AIChatMessage

logger = logging.getLogger(__name__)

EXTRACT_FACTS_PROMPT = """Na podstawie tej rozmowy wyciągnij kluczowe fakty o użytkowniku {user_name} ({company_name}).

Rozmowa:
{conversation_text}

Istniejące fakty (NIE DUPLIKUJ):
{existing_facts}

Zwróć TYLKO JSON array (bez markdown):
[{{"fact": "...", "category": "interests|needs|contacts|insights"}}]

Zasady:
- Tylko nowe, nietrywialne fakty przydatne w przyszłych rozmowach
- Nie zapisuj: "zapytał o firmę X" (to za mało)
- Zapisuj: "szuka podwykonawców do projektu PEJ w branży elektrycznej"
- Max 3 fakty. Jeśli nie ma nowych faktów, zwróć []
- Kategorie: interests (zainteresowania), needs (potrzeby biznesowe), contacts (kontakty), insights (wnioski/preferencje)
"""

SUMMARIZE_PROMPT = """Podsumuj tę rozmowę w 1-3 zdaniach. Skup się na tym, czego użytkownik szukał i co ustalono.

Rozmowa:
{conversation_text}

Zwróć TYLKO JSON (bez markdown):
{{"summary": "...", "key_topics": ["temat1", "temat2"]}}
"""


def get_user_memory(user_id: int, limit: int = 10) -> List[Dict]:
    """Get active memory facts for a user, sorted by recency and confidence."""
    db = SessionLocal()
    try:
        facts = db.query(AIUserMemory).filter(
            AIUserMemory.user_id == user_id,
            AIUserMemory.is_active == True,
            AIUserMemory.expires_at > datetime.now()
        ).order_by(
            AIUserMemory.confidence.desc(),
            AIUserMemory.created_at.desc()
        ).limit(limit).all()

        return [
            {
                'id': f.id,
                'fact': f.fact,
                'category': f.category,
                'confidence': f.confidence,
                'created_at': f.created_at.isoformat()
            }
            for f in facts
        ]
    finally:
        db.close()


def get_conversation_summaries(user_id: int, limit: int = 5) -> List[Dict]:
    """Get recent conversation summaries for a user."""
    db = SessionLocal()
    try:
        summaries = db.query(AIConversationSummary).filter(
            AIConversationSummary.user_id == user_id
        ).order_by(
            AIConversationSummary.created_at.desc()
        ).limit(limit).all()

        return [
            {
                'summary': s.summary,
                'topics': s.key_topics or [],
                'date': s.created_at.strftime('%Y-%m-%d')
            }
            for s in summaries
        ]
    finally:
        db.close()


def format_memory_for_prompt(user_id: int) -> str:
    """Format user memory and summaries for injection into AI prompt."""
    facts = get_user_memory(user_id)
    summaries = get_conversation_summaries(user_id)

    if not facts and not summaries:
        return ""

    parts = ["\n# PAMIĘĆ O UŻYTKOWNIKU"]

    if facts:
        parts.append("Znane fakty:")
        for f in facts:
            parts.append(f"- [{f['category']}] {f['fact']}")

    if summaries:
        parts.append("\nOstatnie rozmowy:")
        for s in summaries:
            topics = ", ".join(s['topics'][:3]) if s['topics'] else ""
            parts.append(f"- {s['date']}: {s['summary']}" + (f" (tematy: {topics})" if topics else ""))

    parts.append("\nWykorzystuj tę wiedzę do personalizacji odpowiedzi. Nawiązuj do wcześniejszych rozmów gdy to naturalne.")

    return "\n".join(parts)


def extract_facts_async(
    conversation_id: int,
    user_id: int,
    user_context: Dict,
    gemini_service
):
    """
    Extract memory facts from a conversation. Run async after response is sent.
    Uses Flash-Lite for minimal cost.
    """
    db = SessionLocal()
    try:
        # Get conversation messages
        messages = db.query(AIChatMessage).filter_by(
            conversation_id=conversation_id
        ).order_by(AIChatMessage.created_at).all()

        if len(messages) < 2:
            return  # Too short to extract

        conversation_text = "\n".join([
            f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content}"
            for m in messages[-10:]  # Last 10 messages
        ])

        # Get existing facts to avoid duplicates
        existing = db.query(AIUserMemory).filter(
            AIUserMemory.user_id == user_id,
            AIUserMemory.is_active == True
        ).all()
        existing_text = "\n".join([f"- {f.fact}" for f in existing]) or "Brak"

        prompt = EXTRACT_FACTS_PROMPT.format(
            user_name=user_context.get('user_name', 'Nieznany'),
            company_name=user_context.get('company_name', 'brak'),
            conversation_text=conversation_text,
            existing_facts=existing_text
        )

        response = gemini_service.generate_text(
            prompt=prompt,
            temperature=0.1,
            max_tokens=300,
            model='gemini-3.1-flash-lite-preview',
            thinking_level='minimal',
            feature='memory_extraction'
        )

        # Parse response
        text = response.strip()
        if text.startswith('```'):
            text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()

        facts = json.loads(text)
        if not isinstance(facts, list):
            return

        for fact_data in facts[:3]:
            if not fact_data.get('fact'):
                continue
            memory = AIUserMemory(
                user_id=user_id,
                fact=fact_data['fact'],
                category=fact_data.get('category', 'general'),
                source_conversation_id=conversation_id,
                expires_at=datetime.now() + timedelta(days=365)
            )
            db.add(memory)

        db.commit()
        logger.info(f"Extracted {len(facts)} memory facts for user {user_id}")

    except Exception as e:
        logger.warning(f"Memory extraction failed for conversation {conversation_id}: {e}")
        db.rollback()
    finally:
        db.close()


def summarize_conversation_async(
    conversation_id: int,
    user_id: int,
    gemini_service
):
    """Generate or update conversation summary. Run async."""
    db = SessionLocal()
    try:
        messages = db.query(AIChatMessage).filter_by(
            conversation_id=conversation_id
        ).order_by(AIChatMessage.created_at).all()

        if len(messages) < 2:
            return

        conversation_text = "\n".join([
            f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content[:200]}"
            for m in messages[-10:]
        ])

        prompt = SUMMARIZE_PROMPT.format(conversation_text=conversation_text)

        response = gemini_service.generate_text(
            prompt=prompt,
            temperature=0.1,
            max_tokens=200,
            model='gemini-3.1-flash-lite-preview',
            thinking_level='minimal',
            feature='conversation_summary'
        )

        text = response.strip()
        if text.startswith('```'):
            text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()

        result = json.loads(text)

        existing = db.query(AIConversationSummary).filter_by(
            conversation_id=conversation_id
        ).first()

        if existing:
            existing.summary = result.get('summary', existing.summary)
            existing.key_topics = result.get('key_topics', existing.key_topics)
            existing.updated_at = datetime.now()
        else:
            summary = AIConversationSummary(
                conversation_id=conversation_id,
                user_id=user_id,
                summary=result.get('summary', ''),
                key_topics=result.get('key_topics', [])
            )
            db.add(summary)

        db.commit()
        logger.info(f"Summarized conversation {conversation_id}")

    except Exception as e:
        logger.warning(f"Conversation summary failed for {conversation_id}: {e}")
        db.rollback()
    finally:
        db.close()


def delete_user_fact(user_id: int, fact_id: int) -> bool:
    """Soft-delete a memory fact. Returns True if deleted."""
    db = SessionLocal()
    try:
        fact = db.query(AIUserMemory).filter_by(id=fact_id, user_id=user_id).first()
        if fact:
            fact.is_active = False
            db.commit()
            return True
        return False
    finally:
        db.close()
```

- [ ] **Step 2: Verify syntax**

```bash
python3 -m py_compile memory_service.py && echo "OK"
```

- [ ] **Step 3: Commit**

```bash
git add memory_service.py
git commit -m "feat(nordagpt): add memory_service.py — fact extraction, summaries, CRUD"
```

---

### Task 14: Integrate memory into chat flow

**Files:**
- Modify: `nordabiz_chat.py`
- Modify: `blueprints/chat/routes.py`

- [ ] **Step 1: Inject memory into system prompt**

In `nordabiz_chat.py`, in the `_build_prompt()` or `_query_ai()` method, after the user identity block and before the data sections, add memory:

```python
from memory_service import format_memory_for_prompt

# After user_identity block, before data injection:
user_memory_text = ""
if user_context and user_context.get('user_id'):
    user_memory_text = format_memory_for_prompt(user_context['user_id'])

# Prepend to system prompt:
system_prompt = user_identity + user_memory_text + f"""Jesteś pomocnym asystentem..."""
```

- [ ] **Step 2: Trigger async memory extraction after response**

In `send_message()` and `send_message_stream()`, after saving the AI response, trigger async extraction using threading:

```python
import threading
from memory_service import extract_facts_async, summarize_conversation_async

# After saving AI response to DB (end of send_message/send_message_stream):
# Async memory extraction — don't block the response
def _extract_memory():
    extract_facts_async(conversation_id, user_id, user_context, self.gemini_service)
    # Summarize every 5 messages
    if (conversation.message_count or 0) % 5 == 0:
        summarize_conversation_async(conversation_id, user_id, self.gemini_service)

threading.Thread(target=_extract_memory, daemon=True).start()
```

- [ ] **Step 3: Add memory CRUD API routes**

In `blueprints/chat/routes.py`, add routes for viewing and deleting memory:

```python
@bp.route('/api/chat/memory', methods=['GET'])
@login_required
@member_required
def get_user_memory_api():
    """Get current user's NordaGPT memory facts and summaries"""
    from memory_service import get_user_memory, get_conversation_summaries
    return jsonify({
        'facts': get_user_memory(current_user.id, limit=20),
        'summaries': get_conversation_summaries(current_user.id, limit=10)
    })


@bp.route('/api/chat/memory/<int:fact_id>', methods=['DELETE'])
@login_required
@member_required
def delete_memory_fact(fact_id):
    """Delete a memory fact"""
    from memory_service import delete_user_fact
    if delete_user_fact(current_user.id, fact_id):
        return jsonify({'status': 'ok'})
    return jsonify({'error': 'Nie znaleziono'}), 404
```

- [ ] **Step 4: Verify syntax**

```bash
python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK"
```

- [ ] **Step 5: Commit**

```bash
git add nordabiz_chat.py blueprints/chat/routes.py
git commit -m "feat(nordagpt): integrate memory into chat — injection, async extraction, CRUD API"
```

---

### Task 15: Deploy Phase 4 — migrations + code

- [ ] **Step 1: Push to remotes**

```bash
git push origin master && git push inpi master
```

- [ ] **Step 2: Deploy to staging with migrations**

```bash
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull"
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql"
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql"
ssh maciejpi@10.22.68.248 "sudo systemctl restart nordabiznes"
```

- [ ] **Step 3: Test on staging**

1. Open chat, have a conversation about looking for IT companies
2. Open another chat, ask "o czym rozmawialiśmy?" — verify AI mentions previous topics
3. Check memory API: `curl https://staging.nordabiznes.pl/api/chat/memory` (with auth)
4. Verify facts are extracted

- [ ] **Step 4: Deploy to production**

```bash
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull"
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql"
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql"
ssh maciejpi@57.128.200.27 "sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
```

- [ ] **Step 5: Update release notes**

Add entry in `blueprints/public/routes.py` `_get_releases()`.

---

## Post-Implementation Checklist

- [ ] Verify AI greets users by name
- [ ] Verify Smart Router logs show correct classification
- [ ] Verify streaming works on mobile (Android + iOS)
- [ ] Verify memory facts are extracted after conversations
- [ ] Verify memory is private (user A cannot see user B's facts)
- [ ] Verify response times: simple <3s, medium <6s, complex <12s
- [ ] Monitor costs for first week — compare with estimates
- [ ] Send message to Jakub Pornowski confirming speed improvements