diff --git a/docs/superpowers/plans/2026-03-28-nordagpt-identity-memory.md b/docs/superpowers/plans/2026-03-28-nordagpt-identity-memory.md
new file mode 100644
index 0000000..6541ad9
--- /dev/null
+++ b/docs/superpowers/plans/2026-03-28-nordagpt-identity-memory.md
@@ -0,0 +1,1926 @@
+# NordaGPT Identity, Memory & Performance — Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Transform NordaGPT from an anonymous chatbot into a personalized assistant with user identity, persistent memory, smart routing, and streaming responses.
+
+**Architecture:** Four-phase rollout: (1) inject user identity into AI prompt, (2) smart router + selective context loading, (3) streaming SSE responses, (4) persistent user memory with async extraction. Each phase is independently deployable and testable.
+
+**Tech Stack:** Flask 3.0, SQLAlchemy 2.0, PostgreSQL, Google Gemini API (3-Flash, 3.1-Flash-Lite), Server-Sent Events, Jinja2 inline JS.
+
+**Spec:** `docs/superpowers/specs/2026-03-28-nordagpt-identity-memory-design.md`
+
+---
+
+## File Structure
+
+### New files
+
+| File | Responsibility |
+|------|---------------|
+| `smart_router.py` | Classifies query complexity, selects data categories and model |
+| `memory_service.py` | CRUD for user memory facts + conversation summaries, extraction prompt |
+| `context_builder.py` | Loads selective data from DB based on router decision |
+| `database/migrations/092_ai_user_memory.sql` | Memory + summary tables |
+| `database/migrations/093_ai_conversation_summary.sql` | Summary table |
+
+### Modified files
+
+| File | Changes |
+|------|---------|
+| `database.py` | Add AIUserMemory, AIConversationSummary models (before line 5954) |
+| `nordabiz_chat.py` | Accept user_context, integrate router, selective context, memory injection |
+| `gemini_service.py` | Token counting for streamed responses |
+| `blueprints/chat/routes.py` | Build user_context, add streaming endpoint, memory CRUD routes |
+| `templates/chat.html` | Streaming UI, thinking animation, memory settings panel |
+
+---
+
+## Phase 1: User Identity (Tasks 1-3)
+
+### Task 1: Pass user context from route to chat engine
+
+**Files:**
+- Modify: `blueprints/chat/routes.py:234-309`
+- Modify: `nordabiz_chat.py:163-180`
+
+- [ ] **Step 1: Build user_context dict in chat route**
+
+In `blueprints/chat/routes.py`, modify `chat_send_message()`. After line 262 (where `current_user.id` and `current_user.email` are used for limit check), add user_context construction:
+
+```python
+# After line 262, before line 268
+# Build user context for AI personalization
+user_context = {
+    'user_id': current_user.id,
+    'user_name': current_user.name,
+    'user_email': current_user.email,
+    'company_name': current_user.company.name if current_user.company else None,
+    'company_id': current_user.company.id if current_user.company else None,
+    'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None,
+    'company_role': current_user.company_role or 'MEMBER',
+    'is_norda_member': current_user.is_norda_member,
+    'chamber_role': current_user.chamber_role,
+    'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None,
+}
+```
+
+- [ ] **Step 2: Pass user_context to send_message()**
+
+In the same function, modify the `chat_engine.send_message()` call (around line 282):
+
+```python
+# Before:
+ai_response = chat_engine.send_message(
+    conversation_id,
+    user_message=message,
+    user_id=current_user.id,
+    thinking_level=thinking_level
+)
+
+# After:
+ai_response = chat_engine.send_message(
+    conversation_id,
+    user_message=message,
+    user_id=current_user.id,
+    thinking_level=thinking_level,
+    user_context=user_context
+)
+```
+
+- [ ] **Step 3: Update send_message() signature in nordabiz_chat.py**
+
+In `nordabiz_chat.py`, modify `send_message()` at line 163:
+
+```python
+# Before:
+def send_message(
+    self,
+    conversation_id: int,
+    user_message: str,
+    user_id: int,
+    thinking_level: str = 'high'
+) -> AIChatMessage:
+
+# After:
+def send_message(
+    self,
+    conversation_id: int,
+    user_message: str,
+    user_id: int,
+    thinking_level: str = 'high',
+    user_context: Optional[Dict[str, Any]] = None
+) -> AIChatMessage:
+```
+
+Add `from typing import Optional, Dict, Any` to imports if not already present.
+
+- [ ] **Step 4: Thread user_context through to _query_ai()**
+
+In `send_message()`, find the call to `_query_ai()` (around line 239) and add user_context:
+
+```python
+# Before:
+ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level)
+
+# After:
+ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context)
+```
+
+- [ ] **Step 5: Update _query_ai() signature**
+
+In `nordabiz_chat.py`, modify `_query_ai()` at line 890:
+
+```python
+# Before:
+def _query_ai(
+    self,
+    context: Dict[str, Any],
+    user_message: str,
+    user_id: Optional[int] = None,
+    thinking_level: str = 'high'
+) -> str:
+
+# After:
+def _query_ai(
+    self,
+    context: Dict[str, Any],
+    user_message: str,
+    user_id: Optional[int] = None,
+    thinking_level: str = 'high',
+    user_context: Optional[Dict[str, Any]] = None
+) -> str:
+```
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add blueprints/chat/routes.py nordabiz_chat.py
+git commit -m "refactor(chat): thread user_context from route through to _query_ai"
+```
+
+---
+
+### Task 2: Inject user identity into system prompt
+
+**Files:**
+- Modify: `nordabiz_chat.py:920-930`
+
+- [ ] **Step 1: Add user identity block to system prompt**
+
+In `nordabiz_chat.py`, inside `_query_ai()`, find line ~922 where `system_prompt` starts. Insert the user identity block BEFORE the main system prompt string (after line 921, before line 922):
+
+```python
+        # Build user identity section
+        user_identity = ""
+        if user_context:
+            user_identity = f"""
+# AKTUALNY UŻYTKOWNIK
+Rozmawiasz z: {user_context.get('user_name', 'Nieznany')}
+Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')}
+Rola w firmie: {user_context.get('company_role', 'MEMBER')}
+Członek Izby Norda Biznes: {'tak' if user_context.get('is_norda_member') else 'nie'}
+Rola w Izbie: {user_context.get('chamber_role') or '—'}
+Na portalu od: {user_context.get('member_since', 'nieznana data')}
+
+ZASADY PERSONALIZACJI:
+- Zwracaj się do użytkownika po imieniu (pierwsze słowo z imienia i nazwiska)
+- W pierwszej wiadomości konwersacji przywitaj się: "Cześć [imię], w czym mogę pomóc?"
+- Na pytania "co wiesz o mnie?" / "kim jestem?" — wypisz powyższe dane + powiązania firmowe z bazy
+- Uwzględniaj kontekst firmy użytkownika w odpowiedziach (np. sugeruj partnerów z komplementarnych branż)
+- NIE ujawniaj danych technicznych (user_id, company_id, rola systemowa)
+"""
+```
+
+- [ ] **Step 2: Prepend user_identity to system_prompt**
+
+Find where `system_prompt` is first assigned (line 922) and prepend:
+
+```python
+        # Line 922 area - the system_prompt f-string starts here
+        system_prompt = user_identity + f"""Jesteś pomocnym asystentem portalu Norda Biznes...
+```
+
+This is a minimal change — just concatenate `user_identity` (which is empty string if no context) before the existing prompt.
+
+- [ ] **Step 3: Verify syntax compiles**
+
+```bash
+python3 -m py_compile nordabiz_chat.py && echo "OK"
+```
+
+- [ ] **Step 4: Test locally**
+
+Start local dev server and send a chat message. Verify in logs that the prompt now contains the user identity block. Check that the AI greets by name.
+
+```bash
+python3 app.py
+# In another terminal:
+curl -X POST http://localhost:5000/api/chat/1/message \
+  -H "Content-Type: application/json" \
+  -d '{"message": "Kim jestem?"}'
+```
+
+(Note: requires auth cookie — easier to test via browser)
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add nordabiz_chat.py
+git commit -m "feat(nordagpt): inject user identity into AI system prompt — personalized greetings and context"
+```
+
+---
+
+### Task 3: Deploy Phase 1 and verify
+
+**Files:** None (deployment only)
+
+- [ ] **Step 1: Push to remotes**
+
+```bash
+git push origin master && git push inpi master
+```
+
+- [ ] **Step 2: Deploy to staging**
+
+```bash
+ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
+```
+
+- [ ] **Step 3: Test on staging — verify AI greets by name**
+
+Open https://staging.nordabiznes.pl/chat, start new conversation, type "Cześć". Verify AI responds with your name.
+
+Type "Co wiesz o mnie?" — verify AI lists your profile data.
+
+- [ ] **Step 4: Deploy to production**
+
+```bash
+ssh maciejpi@10.22.68.249 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
+curl -sI https://nordabiznes.pl/health | head -3
+```
+
+- [ ] **Step 5: Commit deployment notes (update release_notes in routes.py)**
+
+Add new release entry in `blueprints/public/routes.py` `_get_releases()` function.
+
+---
+
+## Phase 2: Smart Router + Context Builder (Tasks 4-7)
+
+### Task 4: Create context_builder.py — selective data loading
+
+**Files:**
+- Create: `context_builder.py`
+
+- [ ] **Step 1: Create context_builder.py with selective loading functions**
+
+```python
+"""
+Context Builder for NordaGPT Smart Router
+==========================================
+Loads only the data categories requested by the Smart Router,
+instead of loading everything for every query.
+"""
+
+import json
+import logging
+from typing import Dict, Any, List, Optional
+from datetime import datetime, timedelta
+
+from database import (
+    SessionLocal, Company, Category, CompanyRecommendation,
+    NordaEvent, Classified, ForumTopic, ForumReply,
+    CompanyPerson, Person, User, CompanySocialMedia,
+    GBPAudit, CompanyWebsiteAnalysis, ZOPKNews,
+    UserCompanyPermissions
+)
+from sqlalchemy import func, desc
+
+logger = logging.getLogger(__name__)
+
+
+def _company_to_compact_dict(company) -> Dict:
+    """Convert company to compact dict for AI context. Mirrors nordabiz_chat.py format."""
+    return {
+        'name': company.name,
+        'cat': company.category.name if company.category else None,
+        'profile': f'/firma/{company.slug}',
+        'desc': company.description_short,
+        'about': company.description_full[:500] if company.description_full else None,
+        'svc': company.services,
+        'comp': company.competencies,
+        'web': company.website,
+        'tel': company.phone,
+        'mail': company.email,
+        'city': company.city,
+    }
+
+
+def build_selective_context(
+    data_needed: List[str],
+    conversation_id: int,
+    current_message: str,
+    user_context: Optional[Dict] = None
+) -> Dict[str, Any]:
+    """
+    Build AI context with only the requested data categories.
+
+    Args:
+        data_needed: List of category strings from Smart Router, e.g.:
+            ["companies_all", "companies_filtered:IT", "companies_single:termo",
+             "events", "news", "classifieds", "forum", "company_people",
+             "registered_users", "social_media", "audits"]
+        conversation_id: Current conversation ID for history
+        current_message: User's message text
+        user_context: User identity dict
+
+    Returns:
+        Context dict compatible with nordabiz_chat.py _query_ai()
+    """
+    db = SessionLocal()
+    context = {}
+
+    try:
+        # Always load: basic stats and conversation history
+        active_companies = db.query(Company).filter_by(status='active').all()
+        context['total_companies'] = len(active_companies)
+
+        categories = db.query(Category).all()
+        context['categories'] = [
+            {'name': c.name, 'slug': c.slug, 'company_count': len([co for co in active_companies if co.category_id == c.id])}
+            for c in categories
+        ]
+
+        # Conversation history (always loaded)
+        from database import AIChatMessage, AIChatConversation
+        messages = db.query(AIChatMessage).filter_by(
+            conversation_id=conversation_id
+        ).order_by(AIChatMessage.created_at.desc()).limit(10).all()
+        context['recent_messages'] = [
+            {'role': msg.role, 'content': msg.content}
+            for msg in reversed(messages)
+        ]
+
+        # Selective data loading based on router decision
+        for category in data_needed:
+            if category == 'companies_all':
+                context['all_companies'] = [_company_to_compact_dict(c) for c in active_companies]
+
+            elif category.startswith('companies_filtered:'):
+                filter_cat = category.split(':', 1)[1]
+                filtered = [c for c in active_companies
+                           if c.category and c.category.name.lower() == filter_cat.lower()]
+                context['all_companies'] = [_company_to_compact_dict(c) for c in filtered]
+
+            elif category.startswith('companies_single:'):
+                search = category.split(':', 1)[1].lower()
+                matched = [c for c in active_companies
+                          if search in c.name.lower() or search in (c.slug or '')]
+                context['all_companies'] = [_company_to_compact_dict(c) for c in matched[:5]]
+
+            elif category == 'events':
+                events = db.query(NordaEvent).filter(
+                    NordaEvent.event_date >= datetime.now(),
+                    NordaEvent.event_date <= datetime.now() + timedelta(days=60)
+                ).order_by(NordaEvent.event_date).all()
+                context['upcoming_events'] = [
+                    {'title': e.title, 'date': str(e.event_date), 'type': e.event_type,
+                     'location': e.location, 'url': f'/kalendarz/{e.id}'}
+                    for e in events
+                ]
+
+            elif category == 'news':
+                news = db.query(ZOPKNews).filter(
+                    ZOPKNews.published_at >= datetime.now() - timedelta(days=30),
+                    ZOPKNews.status == 'approved'
+                ).order_by(ZOPKNews.published_at.desc()).limit(10).all()
+                context['recent_news'] = [
+                    {'title': n.title, 'summary': n.ai_summary, 'date': str(n.published_at),
+                     'source': n.source_name, 'url': n.source_url}
+                    for n in news
+                ]
+
+            elif category == 'classifieds':
+                classifieds = db.query(Classified).filter(
+                    Classified.status == 'active',
+                    Classified.is_test == False
+                ).order_by(Classified.created_at.desc()).limit(20).all()
+                context['classifieds'] = [
+                    {'type': c.listing_type, 'title': c.title, 'description': c.description,
+                     'company': c.company.name if c.company else None,
+                     'budget': c.budget_text, 'url': f'/b2b/{c.id}'}
+                    for c in classifieds
+                ]
+
+            elif category == 'forum':
+                topics = db.query(ForumTopic).filter(
+                    ForumTopic.is_test == False
+                ).order_by(ForumTopic.created_at.desc()).limit(15).all()
+                context['forum_topics'] = [
+                    {'title': t.title, 'content': t.content[:300],
+                     'author': t.author.name if t.author else None,
+                     'replies': t.reply_count, 'url': f'/forum/{t.slug}'}
+                    for t in topics
+                ]
+
+            elif category == 'company_people':
+                people_query = db.query(CompanyPerson).join(Person).join(Company).filter(
+                    Company.status == 'active'
+                ).all()
+                grouped = {}
+                for cp in people_query:
+                    cname = cp.company.name
+                    if cname not in grouped:
+                        grouped[cname] = []
+                    grouped[cname].append({
+                        'name': cp.person.name,
+                        'role': cp.role_description,
+                        'shares': cp.shares_value
+                    })
+                context['company_people'] = grouped
+
+            elif category == 'registered_users':
+                users = db.query(User).filter(
+                    User.is_active == True,
+                    User.company_id.isnot(None)
+                ).all()
+                grouped = {}
+                for u in users:
+                    cname = u.company.name if u.company else 'Brak firmy'
+                    if cname not in grouped:
+                        grouped[cname] = []
+                    grouped[cname].append({
+                        'name': u.name, 'email': u.email,
+                        'role': u.company_role, 'member': u.is_norda_member
+                    })
+                context['registered_users'] = grouped
+
+            elif category == 'social_media':
+                socials = db.query(CompanySocialMedia).filter_by(is_valid=True).all()
+                grouped = {}
+                for s in socials:
+                    cname = s.company.name if s.company else 'Unknown'
+                    if cname not in grouped:
+                        grouped[cname] = []
+                    grouped[cname].append({
+                        'platform': s.platform, 'url': s.url,
+                        'followers': s.followers_count
+                    })
+                context['company_social_media'] = grouped
+
+            elif category == 'audits':
+                # GBP audits
+                gbp = db.query(GBPAudit).order_by(GBPAudit.created_at.desc()).all()
+                seen = set()
+                gbp_unique = []
+                for g in gbp:
+                    if g.company_id not in seen:
+                        seen.add(g.company_id)
+                        gbp_unique.append({
+                            'company': g.company.name if g.company else None,
+                            'score': g.overall_score, 'reviews': g.total_reviews,
+                            'rating': g.average_rating
+                        })
+                context['gbp_audits'] = gbp_unique
+
+                # SEO audits
+                seo = db.query(CompanyWebsiteAnalysis).all()
+                context['seo_audits'] = [
+                    {'company': s.company.name if s.company else None,
+                     'seo': s.seo_score, 'performance': s.performance_score}
+                    for s in seo
+                ]
+
+        # If no companies were loaded by any category, load a minimal summary
+        if 'all_companies' not in context:
+            context['all_companies'] = []
+
+    finally:
+        db.close()
+
+    return context
+```
+
+- [ ] **Step 2: Verify syntax**
+
+```bash
+python3 -m py_compile context_builder.py && echo "OK"
+```
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add context_builder.py
+git commit -m "feat(nordagpt): add context_builder.py — selective data loading for smart router"
+```
+
+---
+
+### Task 5: Create smart_router.py — query classification
+
+**Files:**
+- Create: `smart_router.py`
+
+- [ ] **Step 1: Create smart_router.py**
+
+```python
+"""
+Smart Router for NordaGPT
+==========================
+Classifies query complexity and selects which data categories to load.
+Uses Gemini 3.1 Flash-Lite for fast, cheap classification (~1-2s).
+"""
+
+import json
+import logging
+import time
+from typing import Dict, Any, List, Optional
+
+logger = logging.getLogger(__name__)
+
+# Keyword-based fast routing (no API call needed)
+FAST_ROUTES = {
+    'companies_all': ['wszystkie firmy', 'ile firm', 'lista firm', 'katalog', 'porównaj firmy'],
+    'events': ['wydarzenie', 'spotkanie', 'kalendarz', 'konferencja', 'szkolenie', 'kiedy'],
+    'news': ['aktualności', 'nowości', 'wiadomości', 'pej', 'atom', 'elektrownia', 'zopk'],
+    'classifieds': ['ogłoszenie', 'b2b', 'zlecenie', 'oferta', 'szukam', 'oferuję'],
+    'forum': ['forum', 'dyskusja', 'temat', 'wątek', 'post'],
+    'company_people': ['zarząd', 'krs', 'właściciel', 'prezes', 'udziały', 'wspólnik'],
+    'registered_users': ['użytkownik', 'kto jest', 'profil', 'zarejestrowany', 'członek'],
+    'social_media': ['facebook', 'instagram', 'linkedin', 'social media', 'media społeczn'],
+    'audits': ['seo', 'google', 'gbp', 'opinie', 'ocena', 'pageSpeed'],
+}
+
+# Model selection by complexity
+MODEL_MAP = {
+    'simple': {'model': '3.1-flash-lite', 'thinking': 'minimal'},
+    'medium': {'model': '3-flash', 'thinking': 'low'},
+    'complex': {'model': '3-flash', 'thinking': 'high'},
+}
+
+ROUTER_PROMPT = """Jesteś routerem zapytań. Przeanalizuj pytanie i zdecyduj jakie dane potrzebne.
+
+Użytkownik: {user_name} z firmy {company_name}
+Pytanie: {message}
+
+Zwróć TYLKO JSON (bez markdown):
+{{
+  "complexity": "simple|medium|complex",
+  "data_needed": ["lista kategorii z poniższych"]
+}}
+
+Kategorie:
+- companies_all — wszystkie firmy (porównania, przeglądy, "ile firm")
+- companies_filtered:KATEGORIA — firmy z kategorii (np. companies_filtered:IT)
+- companies_single:NAZWA — jedna firma (np. companies_single:termo)
+- events — nadchodzące wydarzenia
+- news — aktualności, PEJ, ZOPK
+- classifieds — ogłoszenia B2B
+- forum — tematy forum
+- company_people — zarząd, KRS, udziałowcy
+- registered_users — użytkownicy portalu
+- social_media — profile social media firm
+- audits — wyniki SEO/GBP
+
+Zasady:
+- "simple" = jedno pytanie o konkretną rzecz (telefon, adres, link)
+- "medium" = porównanie, lista, filtrowanie
+- "complex" = analiza, strategia, rekomendacje
+- Wybierz MINIMUM kategorii. Nie ładuj niepotrzebnych danych.
+- Jeśli pytanie dotyczy konkretnej firmy, użyj companies_single:nazwa
+- Pytania ogólne o użytkownika (kim jestem, co wiesz) = [] (dane z profilu wystarczą)
+"""
+
+
+def route_query_fast(message: str, user_context: Optional[Dict] = None) -> Dict[str, Any]:
+    """
+    Fast keyword-based routing. No API call.
+    Returns routing decision or None if uncertain (needs AI router).
+    """
+    msg_lower = message.lower()
+
+    # Check for personal questions — no data needed
+    personal_patterns = ['kim jestem', 'co wiesz o mnie', 'mój profil', 'moje dane']
+    if any(p in msg_lower for p in personal_patterns):
+        return {
+            'complexity': 'simple',
+            'data_needed': [],
+            'model': '3.1-flash-lite',
+            'thinking': 'minimal',
+            'routed_by': 'fast'
+        }
+
+    # Check for greetings — no data needed
+    greeting_patterns = ['cześć', 'hej', 'witam', 'dzień dobry', 'siema', 'hello']
+    if any(msg_lower.strip().startswith(p) for p in greeting_patterns) and len(message) < 30:
+        return {
+            'complexity': 'simple',
+            'data_needed': [],
+            'model': '3.1-flash-lite',
+            'thinking': 'minimal',
+            'routed_by': 'fast'
+        }
+
+    # Check keyword matches
+    matched_categories = []
+    for category, keywords in FAST_ROUTES.items():
+        if any(kw in msg_lower for kw in keywords):
+            matched_categories.append(category)
+
+    # Check for specific company name mention
+    # Simple heuristic: if message has quotes or specific capitalized words
+    if not matched_categories:
+        # Can't determine — return None to trigger AI router
+        return None
+
+    # Determine complexity
+    if len(matched_categories) <= 1 and len(message) < 80:
+        complexity = 'simple'
+    elif len(matched_categories) <= 2:
+        complexity = 'medium'
+    else:
+        complexity = 'complex'
+
+    model_config = MODEL_MAP[complexity]
+    return {
+        'complexity': complexity,
+        'data_needed': matched_categories,
+        'model': model_config['model'],
+        'thinking': model_config['thinking'],
+        'routed_by': 'fast'
+    }
+
+
+def route_query_ai(
+    message: str,
+    user_context: Optional[Dict] = None,
+    gemini_service=None
+) -> Dict[str, Any]:
+    """
+    AI-powered routing using Flash-Lite. Called when fast routing is uncertain.
+    """
+    if not gemini_service:
+        # Fallback: load everything
+        return _fallback_route()
+
+    user_name = user_context.get('user_name', 'Nieznany') if user_context else 'Nieznany'
+    company_name = user_context.get('company_name', 'brak') if user_context else 'brak'
+
+    prompt = ROUTER_PROMPT.format(
+        user_name=user_name,
+        company_name=company_name,
+        message=message
+    )
+
+    try:
+        start = time.time()
+        response = gemini_service.generate_text(
+            prompt=prompt,
+            temperature=0.1,
+            max_tokens=200,
+            model='gemini-3.1-flash-lite-preview',
+            thinking_level='minimal',
+            feature='smart_router'
+        )
+        latency = int((time.time() - start) * 1000)
+        logger.info(f"Smart Router AI response in {latency}ms: {response[:200]}")
+
+        # Parse JSON from response
+        # Handle potential markdown wrapping
+        text = response.strip()
+        if text.startswith('```'):
+            text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()
+
+        result = json.loads(text)
+        complexity = result.get('complexity', 'medium')
+        model_config = MODEL_MAP.get(complexity, MODEL_MAP['medium'])
+
+        return {
+            'complexity': complexity,
+            'data_needed': result.get('data_needed', []),
+            'model': model_config['model'],
+            'thinking': model_config['thinking'],
+            'routed_by': 'ai',
+            'router_latency_ms': latency
+        }
+
+    except (json.JSONDecodeError, KeyError, Exception) as e:
+        logger.warning(f"Smart Router AI failed: {e}, falling back to full context")
+        return _fallback_route()
+
+
+def route_query(
+    message: str,
+    user_context: Optional[Dict] = None,
+    gemini_service=None
+) -> Dict[str, Any]:
+    """
+    Main entry point. Tries fast routing first, falls back to AI routing.
+    """
+    # Try fast keyword-based routing
+    result = route_query_fast(message, user_context)
+    if result is not None:
+        logger.info(f"Smart Router FAST: complexity={result['complexity']}, data={result['data_needed']}")
+        return result
+
+    # Fall back to AI routing
+    result = route_query_ai(message, user_context, gemini_service)
+    logger.info(f"Smart Router AI: complexity={result['complexity']}, data={result['data_needed']}")
+    return result
+
+
+def _fallback_route() -> Dict[str, Any]:
+    """Fallback: load everything, use default model. Safe but slow."""
+    return {
+        'complexity': 'medium',
+        'data_needed': [
+            'companies_all', 'events', 'news', 'classifieds',
+            'forum', 'company_people', 'registered_users'
+        ],
+        'model': '3-flash',
+        'thinking': 'low',
+        'routed_by': 'fallback'
+    }
+```
+
+- [ ] **Step 2: Verify syntax**
+
+```bash
+python3 -m py_compile smart_router.py && echo "OK"
+```
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add smart_router.py
+git commit -m "feat(nordagpt): add smart_router.py — fast keyword routing + AI fallback"
+```
+
+---
+
+### Task 6: Integrate Smart Router into nordabiz_chat.py
+
+**Files:**
+- Modify: `nordabiz_chat.py:163-282, 347-643, 890-1365`
+
+- [ ] **Step 1: Add imports at top of nordabiz_chat.py**
+
+After existing imports (around line 30), add:
+
+```python
+from smart_router import route_query
+from context_builder import build_selective_context
+```
+
+- [ ] **Step 2: Modify send_message() to use Smart Router**
+
+In `send_message()`, replace the call to `_build_conversation_context()` and `_query_ai()` (around lines 236-239). The key change: use the router to decide model and data, then use context_builder for selective loading.
+
+Find the section where context is built and AI is queried (around lines 236-241):
+
+```python
+# Before (approximately lines 236-241):
+# context = self._build_conversation_context(db, conversation, original_message)
+# ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context)
+
+# After:
+# Smart Router — classify query and select data + model
+route_decision = route_query(
+    message=original_message,
+    user_context=user_context,
+    gemini_service=self.gemini_service
+)
+
+# Override model and thinking based on router decision
+effective_model = route_decision.get('model', '3-flash')
+effective_thinking = route_decision.get('thinking', thinking_level)
+
+# Build selective context (only requested data categories)
+context = build_selective_context(
+    data_needed=route_decision.get('data_needed', []),
+    conversation_id=conversation.id,
+    current_message=original_message,
+    user_context=user_context
+)
+
+# Use the original _query_ai but with router-selected parameters
+ai_response_text = self._query_ai(
+    context, original_message,
+    user_id=user_id,
+    thinking_level=effective_thinking,
+    user_context=user_context
+)
+```
+
+Note: Keep `_build_conversation_context()` and full `_query_ai()` intact as fallback. The router's `_fallback_route()` loads all data, so it's safe.
+
+- [ ] **Step 3: Log routing decisions**
+
+After the route_query call, add logging:
+
+```python
+logger.info(
+    f"NordaGPT Router: user={user_context.get('user_name') if user_context else '?'}, "
+    f"complexity={route_decision['complexity']}, model={effective_model}, "
+    f"thinking={effective_thinking}, data={route_decision['data_needed']}, "
+    f"routed_by={route_decision.get('routed_by')}"
+)
+```
+
+- [ ] **Step 4: Update the GeminiService call in _query_ai() to use effective model**
+
+Currently `_query_ai()` uses `self.gemini_service` which has a fixed model. We need to pass the router-selected model to the generate_text call. In `_query_ai()`, around line 1352, modify:
+
+```python
+# Before:
+response = self.gemini_service.generate_text(
+    prompt=full_prompt,
+    temperature=0.7,
+    thinking_level=thinking_level,
+    user_id=user_id,
+    feature='chat'
+)
+
+# After:
+response = self.gemini_service.generate_text(
+    prompt=full_prompt,
+    temperature=0.7,
+    thinking_level=thinking_level,
+    user_id=user_id,
+    feature='chat',
+    model=route_decision.get('model') if hasattr(self, '_current_route_decision') else None
+)
+```
+
+Actually, a cleaner approach — pass the model through context:
+
+In `send_message()`, add to context before calling `_query_ai()`:
+```python
+context['_route_decision'] = route_decision
+```
+
+In `_query_ai()`, read it at the generate_text call:
+```python
+route = context.get('_route_decision', {})
+effective_model_id = None
+model_alias = route.get('model')
+if model_alias:
+    from gemini_service import GEMINI_MODELS
+    effective_model_id = GEMINI_MODELS.get(model_alias)
+
+response = self.gemini_service.generate_text(
+    prompt=full_prompt,
+    temperature=0.7,
+    thinking_level=thinking_level,
+    user_id=user_id,
+    feature='chat',
+    model=effective_model_id
+)
+```
+
+- [ ] **Step 5: Verify syntax**
+
+```bash
+python3 -m py_compile nordabiz_chat.py && echo "OK"
+```
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add nordabiz_chat.py
+git commit -m "feat(nordagpt): integrate smart router — selective context loading + adaptive model selection"
+```
+
+---
+
+### Task 7: Deploy Phase 2 and verify
+
+- [ ] **Step 1: Push and deploy to staging**
+
+```bash
+git push origin master && git push inpi master
+ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
+```
+
+- [ ] **Step 2: Test on staging — verify routing works**
+
+Test simple query: "Jaki jest telefon do TERMO?" — should be fast (2-3s), Flash-Lite model.
+Test medium query: "Porównaj firmy budowlane w Izbie" — should load companies_all, medium speed.
+Test complex query: "Jakie firmy mogłyby współpracować przy projekcie PEJ?" — should use full context.
+
+Check logs for routing decisions:
+```bash
+ssh maciejpi@10.22.68.248 "journalctl -u nordabiznes -n 30 --no-pager | grep 'Router'"
+```
+
+- [ ] **Step 3: Deploy to production**
+
+```bash
+ssh maciejpi@10.22.68.249 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
+curl -sI https://nordabiznes.pl/health | head -3
+```
+
+---
+
+## Phase 3: Streaming Responses (Tasks 8-10)
+
+### Task 8: Add streaming endpoint in Flask
+
+**Files:**
+- Modify: `blueprints/chat/routes.py`
+- Modify: `nordabiz_chat.py`
+
+- [ ] **Step 1: Add SSE streaming endpoint**
+
+In `blueprints/chat/routes.py`, add a new route after `chat_send_message()` (after line ~309):
+
+```python
+@bp.route('/api/chat/<int:conversation_id>/message/stream', methods=['POST'])
+@login_required
+@member_required
+def chat_send_message_stream(conversation_id):
+    """Send message to AI chat with streaming response (SSE)"""
+    from flask import Response, stream_with_context
+    import json as json_module
+
+    data = request.get_json()
+    if not data or not data.get('message', '').strip():
+        return jsonify({'error': 'Wiadomość nie może być pusta'}), 400
+
+    message = data['message'].strip()
+
+    # Check limits
+    from nordabiz_chat import check_user_limits
+    limit_result = check_user_limits(current_user.id, current_user.email)
+    if limit_result.get('limited'):
+        return jsonify({'error': 'Przekroczono limit', 'limit_info': limit_result}), 429
+
+    # Build user context
+    user_context = {
+        'user_id': current_user.id,
+        'user_name': current_user.name,
+        'user_email': current_user.email,
+        'company_name': current_user.company.name if current_user.company else None,
+        'company_id': current_user.company.id if current_user.company else None,
+        'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None,
+        'company_role': current_user.company_role or 'MEMBER',
+        'is_norda_member': current_user.is_norda_member,
+        'chamber_role': current_user.chamber_role,
+        'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None,
+    }
+
+    model_choice = data.get('model') or session.get('chat_model', 'flash')
+    model_key = '3-flash' if model_choice == 'flash' else '3-pro'
+
+    def generate():
+        try:
+            chat_engine = NordaBizChatEngine(model=model_key)
+            for chunk in chat_engine.send_message_stream(
+                conversation_id=conversation_id,
+                user_message=message,
+                user_id=current_user.id,
+                user_context=user_context
+            ):
+                yield f"data: {json_module.dumps(chunk, ensure_ascii=False)}\n\n"
+        except PermissionError:
+            yield f"data: {json_module.dumps({'type': 'error', 'content': 'Brak dostępu do tej konwersacji'})}\n\n"
+        except Exception as e:
+            logger.error(f"Streaming error: {e}")
+            yield f"data: {json_module.dumps({'type': 'error', 'content': 'Wystąpił błąd'})}\n\n"
+
+    return Response(
+        stream_with_context(generate()),
+        mimetype='text/event-stream',
+        headers={
+            'Cache-Control': 'no-cache',
+            'X-Accel-Buffering': 'no',  # Disable Nginx buffering
+        }
+    )
+```
+
+- [ ] **Step 2: Add send_message_stream() to NordaBizChatEngine**
+
+In `nordabiz_chat.py`, add a new method after `send_message()` (after line ~282):
+
+```python
+def send_message_stream(
+    self,
+    conversation_id: int,
+    user_message: str,
+    user_id: int,
+    user_context: Optional[Dict[str, Any]] = None
+):
+    """
+    Generator that yields streaming chunks for SSE.
+    Yields dicts: {'type': 'thinking'|'token'|'done'|'error', 'content': '...'}
+    """
+    import time
+
+    db = SessionLocal()
+    try:
+        conversation = db.query(AIChatConversation).filter_by(
+            id=conversation_id, user_id=user_id
+        ).first()
+        if not conversation:
+            yield {'type': 'error', 'content': 'Konwersacja nie znaleziona'}
+            return
+
+        # Save user message
+        original_message = user_message
+        sanitized = self._sanitize_message(user_message)
+        user_msg = AIChatMessage(
+            conversation_id=conversation_id,
+            role='user',
+            content=sanitized
+        )
+        db.add(user_msg)
+        db.commit()
+
+        # Smart Router
+        route_decision = route_query(
+            message=original_message,
+            user_context=user_context,
+            gemini_service=self.gemini_service
+        )
+
+        yield {'type': 'thinking', 'content': 'Analizuję pytanie...'}
+
+        # Build selective context
+        context = build_selective_context(
+            data_needed=route_decision.get('data_needed', []),
+            conversation_id=conversation.id,
+            current_message=original_message,
+            user_context=user_context
+        )
+        context['_route_decision'] = route_decision
+
+        # Build prompt (reuse _query_ai logic for prompt building)
+        full_prompt = self._build_prompt(context, original_message, user_context, route_decision.get('thinking', 'low'))
+
+        # Get effective model
+        from gemini_service import GEMINI_MODELS
+        model_alias = route_decision.get('model', '3-flash')
+        effective_model = GEMINI_MODELS.get(model_alias, self.model_name)
+
+        # Stream from Gemini
+        start_time = time.time()
+        stream_response = self.gemini_service.generate_text(
+            prompt=full_prompt,
+            temperature=0.7,
+            stream=True,
+            thinking_level=route_decision.get('thinking', 'low'),
+            user_id=user_id,
+            feature='chat_stream',
+            model=effective_model
+        )
+
+        full_text = ""
+        for chunk in stream_response:
+            if hasattr(chunk, 'text') and chunk.text:
+                full_text += chunk.text
+                yield {'type': 'token', 'content': chunk.text}
+
+        latency_ms = int((time.time() - start_time) * 1000)
+
+        # Save AI response to DB
+        ai_msg = AIChatMessage(
+            conversation_id=conversation_id,
+            role='assistant',
+            content=full_text,
+            latency_ms=latency_ms
+        )
+        db.add(ai_msg)
+        conversation.updated_at = datetime.now()
+        conversation.message_count = (conversation.message_count or 0) + 2
+        db.commit()
+
+        yield {
+            'type': 'done',
+            'message_id': ai_msg.id,
+            'latency_ms': latency_ms,
+            'model': model_alias,
+            'complexity': route_decision.get('complexity')
+        }
+
+    except Exception as e:
+        logger.error(f"Stream error: {e}", exc_info=True)
+        yield {'type': 'error', 'content': 'Wystąpił błąd podczas generowania odpowiedzi'}
+    finally:
+        db.close()
+```
+
+- [ ] **Step 3: Extract prompt building into reusable method**
+
+Add a `_build_prompt()` method to `NordaBizChatEngine` that extracts prompt construction from `_query_ai()`. This method builds the full prompt string without calling Gemini:
+
+```python
+def _build_prompt(
+    self,
+    context: Dict[str, Any],
+    user_message: str,
+    user_context: Optional[Dict[str, Any]] = None,
+    thinking_level: str = 'low'
+) -> str:
+    """Build the full prompt string. Extracted from _query_ai() for reuse in streaming."""
+    # Build user identity section
+    user_identity = ""
+    if user_context:
+        user_identity = f"""
+# AKTUALNY UŻYTKOWNIK
+Rozmawiasz z: {user_context.get('user_name', 'Nieznany')}
+Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')}
+Rola w firmie: {user_context.get('company_role', 'MEMBER')}
+Członek Izby: {'tak' if user_context.get('is_norda_member') else 'nie'}
+Rola w Izbie: {user_context.get('chamber_role') or '—'}
+Na portalu od: {user_context.get('member_since', 'nieznana data')}
+"""
+
+    # Reuse the existing system_prompt from _query_ai() lines 922-1134
+    # This is the same static prompt — extract it to a class attribute or method
+    # For now, call _query_ai's prompt logic
+    # NOTE: In implementation, refactor the static prompt into a separate method
+    # to avoid duplication. The key point is that _build_prompt returns the
+    # same prompt string that _query_ai would build.
+
+    # ... (reuse existing system prompt construction logic) ...
+
+    return full_prompt
+```
+
+**Implementation note:** The actual implementation should refactor `_query_ai()` to call `_build_prompt()` internally, then the streaming method also calls `_build_prompt()`. This avoids prompt duplication.
+
+- [ ] **Step 4: Verify syntax**
+
+```bash
+python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK"
+```
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add nordabiz_chat.py blueprints/chat/routes.py
+git commit -m "feat(nordagpt): add streaming SSE endpoint + send_message_stream method"
+```
+
+---
+
+### Task 9: Frontend streaming UI
+
+**Files:**
+- Modify: `templates/chat.html`
+
+- [ ] **Step 1: Add streaming sendMessage function**
+
+In `templates/chat.html`, replace the existing `sendMessage()` function (lines 2373-2454) with a streaming version:
+
+```javascript
+async function sendMessage() {
+    const input = document.getElementById('messageInput');
+    const message = input.value.trim();
+    if (!message || isSending) return;
+
+    isSending = true;
+    document.getElementById('sendBtn').disabled = true;
+    input.value = '';
+    autoResizeTextarea();
+
+    // Add user message to chat
+    addMessage('user', message);
+
+    // Create conversation if needed
+    if (!currentConversationId) {
+        try {
+            const startRes = await fetch('/api/chat/start', {
+                method: 'POST',
+                headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
+                body: JSON.stringify({title: message.substring(0, 50)})
+            });
+            const startData = await startRes.json();
+            currentConversationId = startData.conversation_id;
+        } catch (e) {
+            addMessage('assistant', 'Błąd tworzenia konwersacji.');
+            isSending = false;
+            document.getElementById('sendBtn').disabled = false;
+            return;
+        }
+    }
+
+    // Add empty assistant bubble with thinking animation
+    const msgDiv = document.createElement('div');
+    msgDiv.className = 'message assistant';
+    msgDiv.innerHTML = `
+        <div class="message-avatar">AI</div>
+        <div class="message-content">
+            <div class="thinking-dots"><span>.</span><span>.</span><span>.</span></div>
+        </div>
+    `;
+    document.getElementById('chatMessages').appendChild(msgDiv);
+    scrollToBottom();
+
+    const contentDiv = msgDiv.querySelector('.message-content');
+
+    try {
+        const response = await fetch(`/api/chat/${currentConversationId}/message/stream`, {
+            method: 'POST',
+            headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
+            body: JSON.stringify({message: message, model: currentModel})
+        });
+
+        if (response.status === 429) {
+            contentDiv.innerHTML = '';
+            contentDiv.textContent = 'Przekroczono limit zapytań.';
+            showLimitBanner();
+            isSending = false;
+            document.getElementById('sendBtn').disabled = false;
+            return;
+        }
+
+        const reader = response.body.getReader();
+        const decoder = new TextDecoder();
+        let fullText = '';
+        let thinkingRemoved = false;
+
+        while (true) {
+            const {done, value} = await reader.read();
+            if (done) break;
+
+            const text = decoder.decode(value, {stream: true});
+            const lines = text.split('\n');
+
+            for (const line of lines) {
+                if (!line.startsWith('data: ')) continue;
+                try {
+                    const chunk = JSON.parse(line.slice(6));
+
+                    if (chunk.type === 'thinking') {
+                        // Keep thinking dots visible
+                        continue;
+                    }
+
+                    if (chunk.type === 'token') {
+                        if (!thinkingRemoved) {
+                            contentDiv.innerHTML = '';
+                            thinkingRemoved = true;
+                        }
+                        fullText += chunk.content;
+                        contentDiv.innerHTML = formatMessage(fullText);
+                        scrollToBottom();
+                    }
+
+                    if (chunk.type === 'done') {
+                        // Add tech info badge
+                        if (chunk.latency_ms) {
+                            const badge = document.createElement('div');
+                            badge.className = 'thinking-info-badge';
+                            badge.textContent = `${chunk.model || 'AI'} · ${(chunk.latency_ms/1000).toFixed(1)}s`;
+                            msgDiv.appendChild(badge);
+                        }
+                        loadConversations();
+                    }
+
+                    if (chunk.type === 'error') {
+                        contentDiv.innerHTML = '';
+                        contentDiv.textContent = chunk.content || 'Wystąpił błąd';
+                    }
+                } catch (e) {
+                    // Skip malformed chunks
+                }
+            }
+        }
+    } catch (e) {
+        contentDiv.innerHTML = '';
+        contentDiv.textContent = 'Błąd połączenia z serwerem.';
+    }
+
+    isSending = false;
+    document.getElementById('sendBtn').disabled = false;
+}
+```
+
+- [ ] **Step 2: Add CSS for thinking animation**
+
+In `templates/chat.html`, in the `{% block extra_css %}` section, add:
+
+```css
+.thinking-dots {
+    display: flex;
+    gap: 4px;
+    padding: 8px 0;
+}
+
+.thinking-dots span {
+    animation: thinkBounce 1.4s infinite ease-in-out both;
+    font-size: 1.5rem;
+    color: var(--text-secondary);
+}
+
+.thinking-dots span:nth-child(1) { animation-delay: -0.32s; }
+.thinking-dots span:nth-child(2) { animation-delay: -0.16s; }
+.thinking-dots span:nth-child(3) { animation-delay: 0s; }
+
+@keyframes thinkBounce {
+    0%, 80%, 100% { transform: scale(0); }
+    40% { transform: scale(1); }
+}
+```
+
+- [ ] **Step 3: Verify locally and commit**
+
+```bash
+python3 -m py_compile app.py && echo "OK"
+git add templates/chat.html
+git commit -m "feat(nordagpt): streaming UI — word-by-word response with thinking animation"
+```
+
+---
+
+### Task 10: Deploy Phase 3 and verify streaming
+
+- [ ] **Step 1: Check Nginx/NPM config for SSE support**
+
+SSE requires Nginx to NOT buffer the response. The streaming endpoint sets `X-Accel-Buffering: no` header. Verify NPM custom config allows this:
+
+```bash
+ssh maciejpi@10.22.68.249 "cat /etc/nginx/sites-enabled/nordabiznes.conf 2>/dev/null || echo 'Using NPM proxy'"
+```
+
+If using NPM, the `X-Accel-Buffering: no` header should be sufficient. If not, add to NPM custom Nginx config for nordabiznes.pl:
+```
+proxy_buffering off;
+proxy_cache off;
+```
+
+- [ ] **Step 2: Push, deploy to staging, test streaming**
+
+```bash
+git push origin master && git push inpi master
+ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
+```
+
+Test on staging: open chat, send message, verify text appears word-by-word.
+
+- [ ] **Step 3: Deploy to production**
+
+```bash
+ssh maciejpi@10.22.68.249 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
+curl -sI https://nordabiznes.pl/health | head -3
+```
+
+---
+
+## Phase 4: Persistent User Memory (Tasks 11-15)
+
+### Task 11: Database migration — memory tables
+
+**Files:**
+- Create: `database/migrations/092_ai_user_memory.sql`
+- Create: `database/migrations/093_ai_conversation_summary.sql`
+
+- [ ] **Step 1: Create migration 092**
+
+```sql
+-- 092_ai_user_memory.sql
+-- Persistent memory for NordaGPT — per-user facts extracted from conversations
+
+CREATE TABLE IF NOT EXISTS ai_user_memory (
+    id SERIAL PRIMARY KEY,
+    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
+    fact TEXT NOT NULL,
+    category VARCHAR(50) DEFAULT 'general',
+    source_conversation_id INTEGER REFERENCES ai_chat_conversations(id) ON DELETE SET NULL,
+    confidence FLOAT DEFAULT 1.0,
+    created_at TIMESTAMP DEFAULT NOW(),
+    expires_at TIMESTAMP DEFAULT (NOW() + INTERVAL '12 months'),
+    is_active BOOLEAN DEFAULT TRUE
+);
+
+CREATE INDEX idx_ai_user_memory_user_active ON ai_user_memory(user_id, is_active, confidence DESC);
+CREATE INDEX idx_ai_user_memory_expires ON ai_user_memory(expires_at) WHERE is_active = TRUE;
+
+GRANT ALL ON TABLE ai_user_memory TO nordabiz_app;
+GRANT USAGE, SELECT ON SEQUENCE ai_user_memory_id_seq TO nordabiz_app;
+```
+
+- [ ] **Step 2: Create migration 093**
+
+```sql
+-- 093_ai_conversation_summary.sql
+-- Auto-generated summaries of AI conversations for memory context
+
+CREATE TABLE IF NOT EXISTS ai_conversation_summary (
+    id SERIAL PRIMARY KEY,
+    conversation_id INTEGER NOT NULL UNIQUE REFERENCES ai_chat_conversations(id) ON DELETE CASCADE,
+    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
+    summary TEXT NOT NULL,
+    key_topics JSONB DEFAULT '[]',
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW()
+);
+
+CREATE INDEX idx_ai_conv_summary_user ON ai_conversation_summary(user_id, created_at DESC);
+
+GRANT ALL ON TABLE ai_conversation_summary TO nordabiz_app;
+GRANT USAGE, SELECT ON SEQUENCE ai_conversation_summary_id_seq TO nordabiz_app;
+```
+
+- [ ] **Step 3: Commit migrations**
+
+```bash
+git add database/migrations/092_ai_user_memory.sql database/migrations/093_ai_conversation_summary.sql
+git commit -m "feat(nordagpt): add migrations for user memory and conversation summary tables"
+```
+
+---
+
+### Task 12: Add SQLAlchemy models
+
+**Files:**
+- Modify: `database.py` (insert before line 5954)
+
+- [ ] **Step 1: Add AIUserMemory model**
+
+Insert before the `# DATABASE INITIALIZATION` comment (line 5954):
+
+```python
+class AIUserMemory(Base):
+    __tablename__ = 'ai_user_memory'
+
+    id = Column(Integer, primary_key=True)
+    user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False)
+    fact = Column(Text, nullable=False)
+    category = Column(String(50), default='general')
+    source_conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='SET NULL'), nullable=True)
+    confidence = Column(Float, default=1.0)
+    created_at = Column(DateTime, default=datetime.utcnow)
+    expires_at = Column(DateTime)
+    is_active = Column(Boolean, default=True)
+
+    user = relationship('User')
+    source_conversation = relationship('AIChatConversation')
+
+
+class AIConversationSummary(Base):
+    __tablename__ = 'ai_conversation_summary'
+
+    id = Column(Integer, primary_key=True)
+    conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='CASCADE'), nullable=False, unique=True)
+    user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False)
+    summary = Column(Text, nullable=False)
+    key_topics = Column(JSON, default=list)
+    created_at = Column(DateTime, default=datetime.utcnow)
+    updated_at = Column(DateTime, default=datetime.utcnow)
+
+    user = relationship('User')
+    conversation = relationship('AIChatConversation')
+```
+
+- [ ] **Step 2: Verify syntax**
+
+```bash
+python3 -m py_compile database.py && echo "OK"
+```
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add database.py
+git commit -m "feat(nordagpt): add AIUserMemory and AIConversationSummary ORM models"
+```
+
+---
+
+### Task 13: Create memory_service.py
+
+**Files:**
+- Create: `memory_service.py`
+
+- [ ] **Step 1: Create memory_service.py**
+
+```python
+"""
+Memory Service for NordaGPT
+=============================
+Manages persistent per-user memory: fact extraction, storage, retrieval, cleanup.
+"""
+
+import json
+import logging
+from datetime import datetime, timedelta
+from typing import Dict, Any, List, Optional
+
+from database import SessionLocal, AIUserMemory, AIConversationSummary, AIChatMessage
+
+logger = logging.getLogger(__name__)
+
+EXTRACT_FACTS_PROMPT = """Na podstawie tej rozmowy wyciągnij kluczowe fakty o użytkowniku {user_name} ({company_name}).
+
+Rozmowa:
+{conversation_text}
+
+Istniejące fakty (NIE DUPLIKUJ):
+{existing_facts}
+
+Zwróć TYLKO JSON array (bez markdown):
+[{{"fact": "...", "category": "interests|needs|contacts|insights"}}]
+
+Zasady:
+- Tylko nowe, nietrywialne fakty przydatne w przyszłych rozmowach
+- Nie zapisuj: "zapytał o firmę X" (to za mało)
+- Zapisuj: "szuka podwykonawców do projektu PEJ w branży elektrycznej"
+- Max 3 fakty. Jeśli nie ma nowych faktów, zwróć []
+- Kategorie: interests (zainteresowania), needs (potrzeby biznesowe), contacts (kontakty), insights (wnioski/preferencje)
+"""
+
+SUMMARIZE_PROMPT = """Podsumuj tę rozmowę w 1-3 zdaniach. Skup się na tym, czego użytkownik szukał i co ustalono.
+
+Rozmowa:
+{conversation_text}
+
+Zwróć TYLKO JSON (bez markdown):
+{{"summary": "...", "key_topics": ["temat1", "temat2"]}}
+"""
+
+
+def get_user_memory(user_id: int, limit: int = 10) -> List[Dict]:
+    """Get active memory facts for a user, sorted by recency and confidence."""
+    db = SessionLocal()
+    try:
+        facts = db.query(AIUserMemory).filter(
+            AIUserMemory.user_id == user_id,
+            AIUserMemory.is_active == True,
+            AIUserMemory.expires_at > datetime.now()
+        ).order_by(
+            AIUserMemory.confidence.desc(),
+            AIUserMemory.created_at.desc()
+        ).limit(limit).all()
+
+        return [
+            {
+                'id': f.id,
+                'fact': f.fact,
+                'category': f.category,
+                'confidence': f.confidence,
+                'created_at': f.created_at.isoformat()
+            }
+            for f in facts
+        ]
+    finally:
+        db.close()
+
+
+def get_conversation_summaries(user_id: int, limit: int = 5) -> List[Dict]:
+    """Get recent conversation summaries for a user."""
+    db = SessionLocal()
+    try:
+        summaries = db.query(AIConversationSummary).filter(
+            AIConversationSummary.user_id == user_id
+        ).order_by(
+            AIConversationSummary.created_at.desc()
+        ).limit(limit).all()
+
+        return [
+            {
+                'summary': s.summary,
+                'topics': s.key_topics or [],
+                'date': s.created_at.strftime('%Y-%m-%d')
+            }
+            for s in summaries
+        ]
+    finally:
+        db.close()
+
+
+def format_memory_for_prompt(user_id: int) -> str:
+    """Format user memory and summaries for injection into AI prompt."""
+    facts = get_user_memory(user_id)
+    summaries = get_conversation_summaries(user_id)
+
+    if not facts and not summaries:
+        return ""
+
+    parts = ["\n# PAMIĘĆ O UŻYTKOWNIKU"]
+
+    if facts:
+        parts.append("Znane fakty:")
+        for f in facts:
+            parts.append(f"- [{f['category']}] {f['fact']}")
+
+    if summaries:
+        parts.append("\nOstatnie rozmowy:")
+        for s in summaries:
+            topics = ", ".join(s['topics'][:3]) if s['topics'] else ""
+            parts.append(f"- {s['date']}: {s['summary']}" + (f" (tematy: {topics})" if topics else ""))
+
+    parts.append("\nWykorzystuj tę wiedzę do personalizacji odpowiedzi. Nawiązuj do wcześniejszych rozmów gdy to naturalne.")
+
+    return "\n".join(parts)
+
+
+def extract_facts_async(
+    conversation_id: int,
+    user_id: int,
+    user_context: Dict,
+    gemini_service
+):
+    """
+    Extract memory facts from a conversation. Run async after response is sent.
+    Uses Flash-Lite for minimal cost.
+    """
+    db = SessionLocal()
+    try:
+        # Get conversation messages
+        messages = db.query(AIChatMessage).filter_by(
+            conversation_id=conversation_id
+        ).order_by(AIChatMessage.created_at).all()
+
+        if len(messages) < 2:
+            return  # Too short to extract
+
+        conversation_text = "\n".join([
+            f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content}"
+            for m in messages[-10:]  # Last 10 messages
+        ])
+
+        # Get existing facts to avoid duplicates
+        existing = db.query(AIUserMemory).filter(
+            AIUserMemory.user_id == user_id,
+            AIUserMemory.is_active == True
+        ).all()
+        existing_text = "\n".join([f"- {f.fact}" for f in existing]) or "Brak"
+
+        prompt = EXTRACT_FACTS_PROMPT.format(
+            user_name=user_context.get('user_name', 'Nieznany'),
+            company_name=user_context.get('company_name', 'brak'),
+            conversation_text=conversation_text,
+            existing_facts=existing_text
+        )
+
+        response = gemini_service.generate_text(
+            prompt=prompt,
+            temperature=0.1,
+            max_tokens=300,
+            model='gemini-3.1-flash-lite-preview',
+            thinking_level='minimal',
+            feature='memory_extraction'
+        )
+
+        # Parse response
+        text = response.strip()
+        if text.startswith('```'):
+            text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()
+
+        facts = json.loads(text)
+        if not isinstance(facts, list):
+            return
+
+        for fact_data in facts[:3]:
+            if not fact_data.get('fact'):
+                continue
+            memory = AIUserMemory(
+                user_id=user_id,
+                fact=fact_data['fact'],
+                category=fact_data.get('category', 'general'),
+                source_conversation_id=conversation_id,
+                expires_at=datetime.now() + timedelta(days=365)
+            )
+            db.add(memory)
+
+        db.commit()
+        logger.info(f"Extracted {len(facts)} memory facts for user {user_id}")
+
+    except Exception as e:
+        logger.warning(f"Memory extraction failed for conversation {conversation_id}: {e}")
+        db.rollback()
+    finally:
+        db.close()
+
+
+def summarize_conversation_async(
+    conversation_id: int,
+    user_id: int,
+    gemini_service
+):
+    """Generate or update conversation summary. Run async."""
+    db = SessionLocal()
+    try:
+        messages = db.query(AIChatMessage).filter_by(
+            conversation_id=conversation_id
+        ).order_by(AIChatMessage.created_at).all()
+
+        if len(messages) < 2:
+            return
+
+        conversation_text = "\n".join([
+            f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content[:200]}"
+            for m in messages[-10:]
+        ])
+
+        prompt = SUMMARIZE_PROMPT.format(conversation_text=conversation_text)
+
+        response = gemini_service.generate_text(
+            prompt=prompt,
+            temperature=0.1,
+            max_tokens=200,
+            model='gemini-3.1-flash-lite-preview',
+            thinking_level='minimal',
+            feature='conversation_summary'
+        )
+
+        text = response.strip()
+        if text.startswith('```'):
+            text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()
+
+        result = json.loads(text)
+
+        existing = db.query(AIConversationSummary).filter_by(
+            conversation_id=conversation_id
+        ).first()
+
+        if existing:
+            existing.summary = result.get('summary', existing.summary)
+            existing.key_topics = result.get('key_topics', existing.key_topics)
+            existing.updated_at = datetime.now()
+        else:
+            summary = AIConversationSummary(
+                conversation_id=conversation_id,
+                user_id=user_id,
+                summary=result.get('summary', ''),
+                key_topics=result.get('key_topics', [])
+            )
+            db.add(summary)
+
+        db.commit()
+        logger.info(f"Summarized conversation {conversation_id}")
+
+    except Exception as e:
+        logger.warning(f"Conversation summary failed for {conversation_id}: {e}")
+        db.rollback()
+    finally:
+        db.close()
+
+
+def delete_user_fact(user_id: int, fact_id: int) -> bool:
+    """Soft-delete a memory fact. Returns True if deleted."""
+    db = SessionLocal()
+    try:
+        fact = db.query(AIUserMemory).filter_by(id=fact_id, user_id=user_id).first()
+        if fact:
+            fact.is_active = False
+            db.commit()
+            return True
+        return False
+    finally:
+        db.close()
+```
+
+- [ ] **Step 2: Verify syntax**
+
+```bash
+python3 -m py_compile memory_service.py && echo "OK"
+```
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add memory_service.py
+git commit -m "feat(nordagpt): add memory_service.py — fact extraction, summaries, CRUD"
+```
+
+---
+
+### Task 14: Integrate memory into chat flow
+
+**Files:**
+- Modify: `nordabiz_chat.py`
+- Modify: `blueprints/chat/routes.py`
+
+- [ ] **Step 1: Inject memory into system prompt**
+
+In `nordabiz_chat.py`, in the `_build_prompt()` or `_query_ai()` method, after the user identity block and before the data sections, add memory:
+
+```python
+from memory_service import format_memory_for_prompt
+
+# After user_identity block, before data injection:
+user_memory_text = ""
+if user_context and user_context.get('user_id'):
+    user_memory_text = format_memory_for_prompt(user_context['user_id'])
+
+# Prepend to system prompt:
+system_prompt = user_identity + user_memory_text + f"""Jesteś pomocnym asystentem..."""
+```
+
+- [ ] **Step 2: Trigger async memory extraction after response**
+
+In `send_message()` and `send_message_stream()`, after saving the AI response, trigger async extraction using threading:
+
+```python
+import threading
+from memory_service import extract_facts_async, summarize_conversation_async
+
+# After saving AI response to DB (end of send_message/send_message_stream):
+# Async memory extraction — don't block the response
+def _extract_memory():
+    extract_facts_async(conversation_id, user_id, user_context, self.gemini_service)
+    # Summarize every 5 messages
+    if (conversation.message_count or 0) % 5 == 0:
+        summarize_conversation_async(conversation_id, user_id, self.gemini_service)
+
+threading.Thread(target=_extract_memory, daemon=True).start()
+```
+
+- [ ] **Step 3: Add memory CRUD API routes**
+
+In `blueprints/chat/routes.py`, add routes for viewing and deleting memory:
+
+```python
+@bp.route('/api/chat/memory', methods=['GET'])
+@login_required
+@member_required
+def get_user_memory_api():
+    """Get current user's NordaGPT memory facts and summaries"""
+    from memory_service import get_user_memory, get_conversation_summaries
+    return jsonify({
+        'facts': get_user_memory(current_user.id, limit=20),
+        'summaries': get_conversation_summaries(current_user.id, limit=10)
+    })
+
+
+@bp.route('/api/chat/memory/<int:fact_id>', methods=['DELETE'])
+@login_required
+@member_required
+def delete_memory_fact(fact_id):
+    """Delete a memory fact"""
+    from memory_service import delete_user_fact
+    if delete_user_fact(current_user.id, fact_id):
+        return jsonify({'status': 'ok'})
+    return jsonify({'error': 'Nie znaleziono'}), 404
+```
+
+- [ ] **Step 4: Verify syntax**
+
+```bash
+python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK"
+```
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add nordabiz_chat.py blueprints/chat/routes.py
+git commit -m "feat(nordagpt): integrate memory into chat — injection, async extraction, CRUD API"
+```
+
+---
+
+### Task 15: Deploy Phase 4 — migrations + code
+
+- [ ] **Step 1: Push to remotes**
+
+```bash
+git push origin master && git push inpi master
+```
+
+- [ ] **Step 2: Deploy to staging with migrations**
+
+```bash
+ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull"
+ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql"
+ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql"
+ssh maciejpi@10.22.68.248 "sudo systemctl restart nordabiznes"
+```
+
+- [ ] **Step 3: Test on staging**
+
+1. Open chat, have a conversation about looking for IT companies
+2. Open another chat, ask "o czym rozmawialiśmy?" — verify AI mentions previous topics
+3. Check memory API: `curl https://staging.nordabiznes.pl/api/chat/memory` (with auth)
+4. Verify facts are extracted
+
+- [ ] **Step 4: Deploy to production**
+
+```bash
+ssh maciejpi@10.22.68.249 "cd /var/www/nordabiznes && sudo -u www-data git pull"
+ssh maciejpi@10.22.68.249 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql"
+ssh maciejpi@10.22.68.249 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql"
+ssh maciejpi@10.22.68.249 "sudo systemctl restart nordabiznes"
+curl -sI https://nordabiznes.pl/health | head -3
+```
+
+- [ ] **Step 5: Update release notes**
+
+Add entry in `blueprints/public/routes.py` `_get_releases()`.
+
+---
+
+## Post-Implementation Checklist
+
+- [ ] Verify AI greets users by name
+- [ ] Verify Smart Router logs show correct classification
+- [ ] Verify streaming works on mobile (Android + iOS)
+- [ ] Verify memory facts are extracted after conversations
+- [ ] Verify memory is private (user A cannot see user B's facts)
+- [ ] Verify response times: simple <3s, medium <6s, complex <12s
+- [ ] Monitor costs for first week — compare with estimates
+- [ ] Send message to Jakub Pornowski confirming speed improvements