# NordaGPT Identity, Memory & Performance — Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Transform NordaGPT from an anonymous chatbot into a personalized assistant with user identity, persistent memory, smart routing, and streaming responses. **Architecture:** Four-phase rollout: (1) inject user identity into AI prompt, (2) smart router + selective context loading, (3) streaming SSE responses, (4) persistent user memory with async extraction. Each phase is independently deployable and testable. **Tech Stack:** Flask 3.0, SQLAlchemy 2.0, PostgreSQL, Google Gemini API (3-Flash, 3.1-Flash-Lite), Server-Sent Events, Jinja2 inline JS. **Spec:** `docs/superpowers/specs/2026-03-28-nordagpt-identity-memory-design.md` --- ## File Structure ### New files | File | Responsibility | |------|---------------| | `smart_router.py` | Classifies query complexity, selects data categories and model | | `memory_service.py` | CRUD for user memory facts + conversation summaries, extraction prompt | | `context_builder.py` | Loads selective data from DB based on router decision | | `database/migrations/092_ai_user_memory.sql` | Memory + summary tables | | `database/migrations/093_ai_conversation_summary.sql` | Summary table | ### Modified files | File | Changes | |------|---------| | `database.py` | Add AIUserMemory, AIConversationSummary models (before line 5954) | | `nordabiz_chat.py` | Accept user_context, integrate router, selective context, memory injection | | `gemini_service.py` | Token counting for streamed responses | | `blueprints/chat/routes.py` | Build user_context, add streaming endpoint, memory CRUD routes | | `templates/chat.html` | Streaming UI, thinking animation, memory settings panel | --- ## Phase 1: User Identity (Tasks 1-3) ### Task 1: Pass user context from route to chat engine **Files:** - Modify: `blueprints/chat/routes.py:234-309` - Modify: `nordabiz_chat.py:163-180` - [ ] **Step 1: Build user_context dict in chat route** In `blueprints/chat/routes.py`, modify `chat_send_message()`. After line 262 (where `current_user.id` and `current_user.email` are used for limit check), add user_context construction: ```python # After line 262, before line 268 # Build user context for AI personalization user_context = { 'user_id': current_user.id, 'user_name': current_user.name, 'user_email': current_user.email, 'company_name': current_user.company.name if current_user.company else None, 'company_id': current_user.company.id if current_user.company else None, 'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None, 'company_role': current_user.company_role or 'MEMBER', 'is_norda_member': current_user.is_norda_member, 'chamber_role': current_user.chamber_role, 'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None, } ``` - [ ] **Step 2: Pass user_context to send_message()** In the same function, modify the `chat_engine.send_message()` call (around line 282): ```python # Before: ai_response = chat_engine.send_message( conversation_id, user_message=message, user_id=current_user.id, thinking_level=thinking_level ) # After: ai_response = chat_engine.send_message( conversation_id, user_message=message, user_id=current_user.id, thinking_level=thinking_level, user_context=user_context ) ``` - [ ] **Step 3: Update send_message() signature in nordabiz_chat.py** In `nordabiz_chat.py`, modify `send_message()` at line 163: ```python # Before: def send_message( self, conversation_id: int, user_message: str, user_id: int, thinking_level: str = 'high' ) -> AIChatMessage: # After: def send_message( self, conversation_id: int, user_message: str, user_id: int, thinking_level: str = 'high', user_context: Optional[Dict[str, Any]] = None ) -> AIChatMessage: ``` Add `from typing import Optional, Dict, Any` to imports if not already present. - [ ] **Step 4: Thread user_context through to _query_ai()** In `send_message()`, find the call to `_query_ai()` (around line 239) and add user_context: ```python # Before: ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level) # After: ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context) ``` - [ ] **Step 5: Update _query_ai() signature** In `nordabiz_chat.py`, modify `_query_ai()` at line 890: ```python # Before: def _query_ai( self, context: Dict[str, Any], user_message: str, user_id: Optional[int] = None, thinking_level: str = 'high' ) -> str: # After: def _query_ai( self, context: Dict[str, Any], user_message: str, user_id: Optional[int] = None, thinking_level: str = 'high', user_context: Optional[Dict[str, Any]] = None ) -> str: ``` - [ ] **Step 6: Commit** ```bash git add blueprints/chat/routes.py nordabiz_chat.py git commit -m "refactor(chat): thread user_context from route through to _query_ai" ``` --- ### Task 2: Inject user identity into system prompt **Files:** - Modify: `nordabiz_chat.py:920-930` - [ ] **Step 1: Add user identity block to system prompt** In `nordabiz_chat.py`, inside `_query_ai()`, find line ~922 where `system_prompt` starts. Insert the user identity block BEFORE the main system prompt string (after line 921, before line 922): ```python # Build user identity section user_identity = "" if user_context: user_identity = f""" # AKTUALNY UŻYTKOWNIK Rozmawiasz z: {user_context.get('user_name', 'Nieznany')} Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')} Rola w firmie: {user_context.get('company_role', 'MEMBER')} Członek Izby Norda Biznes: {'tak' if user_context.get('is_norda_member') else 'nie'} Rola w Izbie: {user_context.get('chamber_role') or '—'} Na portalu od: {user_context.get('member_since', 'nieznana data')} ZASADY PERSONALIZACJI: - Zwracaj się do użytkownika po imieniu (pierwsze słowo z imienia i nazwiska) - W pierwszej wiadomości konwersacji przywitaj się: "Cześć [imię], w czym mogę pomóc?" - Na pytania "co wiesz o mnie?" / "kim jestem?" — wypisz powyższe dane + powiązania firmowe z bazy - Uwzględniaj kontekst firmy użytkownika w odpowiedziach (np. sugeruj partnerów z komplementarnych branż) - NIE ujawniaj danych technicznych (user_id, company_id, rola systemowa) """ ``` - [ ] **Step 2: Prepend user_identity to system_prompt** Find where `system_prompt` is first assigned (line 922) and prepend: ```python # Line 922 area - the system_prompt f-string starts here system_prompt = user_identity + f"""Jesteś pomocnym asystentem portalu Norda Biznes... ``` This is a minimal change — just concatenate `user_identity` (which is empty string if no context) before the existing prompt. - [ ] **Step 3: Verify syntax compiles** ```bash python3 -m py_compile nordabiz_chat.py && echo "OK" ``` - [ ] **Step 4: Test locally** Start local dev server and send a chat message. Verify in logs that the prompt now contains the user identity block. Check that the AI greets by name. ```bash python3 app.py # In another terminal: curl -X POST http://localhost:5000/api/chat/1/message \ -H "Content-Type: application/json" \ -d '{"message": "Kim jestem?"}' ``` (Note: requires auth cookie — easier to test via browser) - [ ] **Step 5: Commit** ```bash git add nordabiz_chat.py git commit -m "feat(nordagpt): inject user identity into AI system prompt — personalized greetings and context" ``` --- ### Task 3: Deploy Phase 1 and verify **Files:** None (deployment only) - [ ] **Step 1: Push to remotes** ```bash git push origin master && git push inpi master ``` - [ ] **Step 2: Deploy to staging** ```bash ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes" ``` - [ ] **Step 3: Test on staging — verify AI greets by name** Open https://staging.nordabiznes.pl/chat, start new conversation, type "Cześć". Verify AI responds with your name. Type "Co wiesz o mnie?" — verify AI lists your profile data. - [ ] **Step 4: Deploy to production** ```bash ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes" curl -sI https://nordabiznes.pl/health | head -3 ``` - [ ] **Step 5: Commit deployment notes (update release_notes in routes.py)** Add new release entry in `blueprints/public/routes.py` `_get_releases()` function. --- ## Phase 2: Smart Router + Context Builder (Tasks 4-7) ### Task 4: Create context_builder.py — selective data loading **Files:** - Create: `context_builder.py` - [ ] **Step 1: Create context_builder.py with selective loading functions** ```python """ Context Builder for NordaGPT Smart Router ========================================== Loads only the data categories requested by the Smart Router, instead of loading everything for every query. """ import json import logging from typing import Dict, Any, List, Optional from datetime import datetime, timedelta from database import ( SessionLocal, Company, Category, CompanyRecommendation, NordaEvent, Classified, ForumTopic, ForumReply, CompanyPerson, Person, User, CompanySocialMedia, GBPAudit, CompanyWebsiteAnalysis, ZOPKNews, UserCompanyPermissions ) from sqlalchemy import func, desc logger = logging.getLogger(__name__) def _company_to_compact_dict(company) -> Dict: """Convert company to compact dict for AI context. Mirrors nordabiz_chat.py format.""" return { 'name': company.name, 'cat': company.category.name if company.category else None, 'profile': f'/firma/{company.slug}', 'desc': company.description_short, 'about': company.description_full[:500] if company.description_full else None, 'svc': company.services, 'comp': company.competencies, 'web': company.website, 'tel': company.phone, 'mail': company.email, 'city': company.city, } def build_selective_context( data_needed: List[str], conversation_id: int, current_message: str, user_context: Optional[Dict] = None ) -> Dict[str, Any]: """ Build AI context with only the requested data categories. Args: data_needed: List of category strings from Smart Router, e.g.: ["companies_all", "companies_filtered:IT", "companies_single:termo", "events", "news", "classifieds", "forum", "company_people", "registered_users", "social_media", "audits"] conversation_id: Current conversation ID for history current_message: User's message text user_context: User identity dict Returns: Context dict compatible with nordabiz_chat.py _query_ai() """ db = SessionLocal() context = {} try: # Always load: basic stats and conversation history active_companies = db.query(Company).filter_by(status='active').all() context['total_companies'] = len(active_companies) categories = db.query(Category).all() context['categories'] = [ {'name': c.name, 'slug': c.slug, 'company_count': len([co for co in active_companies if co.category_id == c.id])} for c in categories ] # Conversation history (always loaded) from database import AIChatMessage, AIChatConversation messages = db.query(AIChatMessage).filter_by( conversation_id=conversation_id ).order_by(AIChatMessage.created_at.desc()).limit(10).all() context['recent_messages'] = [ {'role': msg.role, 'content': msg.content} for msg in reversed(messages) ] # Selective data loading based on router decision for category in data_needed: if category == 'companies_all': context['all_companies'] = [_company_to_compact_dict(c) for c in active_companies] elif category.startswith('companies_filtered:'): filter_cat = category.split(':', 1)[1] filtered = [c for c in active_companies if c.category and c.category.name.lower() == filter_cat.lower()] context['all_companies'] = [_company_to_compact_dict(c) for c in filtered] elif category.startswith('companies_single:'): search = category.split(':', 1)[1].lower() matched = [c for c in active_companies if search in c.name.lower() or search in (c.slug or '')] context['all_companies'] = [_company_to_compact_dict(c) for c in matched[:5]] elif category == 'events': events = db.query(NordaEvent).filter( NordaEvent.event_date >= datetime.now(), NordaEvent.event_date <= datetime.now() + timedelta(days=60) ).order_by(NordaEvent.event_date).all() context['upcoming_events'] = [ {'title': e.title, 'date': str(e.event_date), 'type': e.event_type, 'location': e.location, 'url': f'/kalendarz/{e.id}'} for e in events ] elif category == 'news': news = db.query(ZOPKNews).filter( ZOPKNews.published_at >= datetime.now() - timedelta(days=30), ZOPKNews.status == 'approved' ).order_by(ZOPKNews.published_at.desc()).limit(10).all() context['recent_news'] = [ {'title': n.title, 'summary': n.ai_summary, 'date': str(n.published_at), 'source': n.source_name, 'url': n.source_url} for n in news ] elif category == 'classifieds': classifieds = db.query(Classified).filter( Classified.status == 'active', Classified.is_test == False ).order_by(Classified.created_at.desc()).limit(20).all() context['classifieds'] = [ {'type': c.listing_type, 'title': c.title, 'description': c.description, 'company': c.company.name if c.company else None, 'budget': c.budget_text, 'url': f'/b2b/{c.id}'} for c in classifieds ] elif category == 'forum': topics = db.query(ForumTopic).filter( ForumTopic.is_test == False ).order_by(ForumTopic.created_at.desc()).limit(15).all() context['forum_topics'] = [ {'title': t.title, 'content': t.content[:300], 'author': t.author.name if t.author else None, 'replies': t.reply_count, 'url': f'/forum/{t.slug}'} for t in topics ] elif category == 'company_people': people_query = db.query(CompanyPerson).join(Person).join(Company).filter( Company.status == 'active' ).all() grouped = {} for cp in people_query: cname = cp.company.name if cname not in grouped: grouped[cname] = [] grouped[cname].append({ 'name': cp.person.name, 'role': cp.role_description, 'shares': cp.shares_value }) context['company_people'] = grouped elif category == 'registered_users': users = db.query(User).filter( User.is_active == True, User.company_id.isnot(None) ).all() grouped = {} for u in users: cname = u.company.name if u.company else 'Brak firmy' if cname not in grouped: grouped[cname] = [] grouped[cname].append({ 'name': u.name, 'email': u.email, 'role': u.company_role, 'member': u.is_norda_member }) context['registered_users'] = grouped elif category == 'social_media': socials = db.query(CompanySocialMedia).filter_by(is_valid=True).all() grouped = {} for s in socials: cname = s.company.name if s.company else 'Unknown' if cname not in grouped: grouped[cname] = [] grouped[cname].append({ 'platform': s.platform, 'url': s.url, 'followers': s.followers_count }) context['company_social_media'] = grouped elif category == 'audits': # GBP audits gbp = db.query(GBPAudit).order_by(GBPAudit.created_at.desc()).all() seen = set() gbp_unique = [] for g in gbp: if g.company_id not in seen: seen.add(g.company_id) gbp_unique.append({ 'company': g.company.name if g.company else None, 'score': g.overall_score, 'reviews': g.total_reviews, 'rating': g.average_rating }) context['gbp_audits'] = gbp_unique # SEO audits seo = db.query(CompanyWebsiteAnalysis).all() context['seo_audits'] = [ {'company': s.company.name if s.company else None, 'seo': s.seo_score, 'performance': s.performance_score} for s in seo ] # If no companies were loaded by any category, load a minimal summary if 'all_companies' not in context: context['all_companies'] = [] finally: db.close() return context ``` - [ ] **Step 2: Verify syntax** ```bash python3 -m py_compile context_builder.py && echo "OK" ``` - [ ] **Step 3: Commit** ```bash git add context_builder.py git commit -m "feat(nordagpt): add context_builder.py — selective data loading for smart router" ``` --- ### Task 5: Create smart_router.py — query classification **Files:** - Create: `smart_router.py` - [ ] **Step 1: Create smart_router.py** ```python """ Smart Router for NordaGPT ========================== Classifies query complexity and selects which data categories to load. Uses Gemini 3.1 Flash-Lite for fast, cheap classification (~1-2s). """ import json import logging import time from typing import Dict, Any, List, Optional logger = logging.getLogger(__name__) # Keyword-based fast routing (no API call needed) FAST_ROUTES = { 'companies_all': ['wszystkie firmy', 'ile firm', 'lista firm', 'katalog', 'porównaj firmy'], 'events': ['wydarzenie', 'spotkanie', 'kalendarz', 'konferencja', 'szkolenie', 'kiedy'], 'news': ['aktualności', 'nowości', 'wiadomości', 'pej', 'atom', 'elektrownia', 'zopk'], 'classifieds': ['ogłoszenie', 'b2b', 'zlecenie', 'oferta', 'szukam', 'oferuję'], 'forum': ['forum', 'dyskusja', 'temat', 'wątek', 'post'], 'company_people': ['zarząd', 'krs', 'właściciel', 'prezes', 'udziały', 'wspólnik'], 'registered_users': ['użytkownik', 'kto jest', 'profil', 'zarejestrowany', 'członek'], 'social_media': ['facebook', 'instagram', 'linkedin', 'social media', 'media społeczn'], 'audits': ['seo', 'google', 'gbp', 'opinie', 'ocena', 'pageSpeed'], } # Model selection by complexity MODEL_MAP = { 'simple': {'model': '3.1-flash-lite', 'thinking': 'minimal'}, 'medium': {'model': '3-flash', 'thinking': 'low'}, 'complex': {'model': '3-flash', 'thinking': 'high'}, } ROUTER_PROMPT = """Jesteś routerem zapytań. Przeanalizuj pytanie i zdecyduj jakie dane potrzebne. Użytkownik: {user_name} z firmy {company_name} Pytanie: {message} Zwróć TYLKO JSON (bez markdown): {{ "complexity": "simple|medium|complex", "data_needed": ["lista kategorii z poniższych"] }} Kategorie: - companies_all — wszystkie firmy (porównania, przeglądy, "ile firm") - companies_filtered:KATEGORIA — firmy z kategorii (np. companies_filtered:IT) - companies_single:NAZWA — jedna firma (np. companies_single:termo) - events — nadchodzące wydarzenia - news — aktualności, PEJ, ZOPK - classifieds — ogłoszenia B2B - forum — tematy forum - company_people — zarząd, KRS, udziałowcy - registered_users — użytkownicy portalu - social_media — profile social media firm - audits — wyniki SEO/GBP Zasady: - "simple" = jedno pytanie o konkretną rzecz (telefon, adres, link) - "medium" = porównanie, lista, filtrowanie - "complex" = analiza, strategia, rekomendacje - Wybierz MINIMUM kategorii. Nie ładuj niepotrzebnych danych. - Jeśli pytanie dotyczy konkretnej firmy, użyj companies_single:nazwa - Pytania ogólne o użytkownika (kim jestem, co wiesz) = [] (dane z profilu wystarczą) """ def route_query_fast(message: str, user_context: Optional[Dict] = None) -> Dict[str, Any]: """ Fast keyword-based routing. No API call. Returns routing decision or None if uncertain (needs AI router). """ msg_lower = message.lower() # Check for personal questions — no data needed personal_patterns = ['kim jestem', 'co wiesz o mnie', 'mój profil', 'moje dane'] if any(p in msg_lower for p in personal_patterns): return { 'complexity': 'simple', 'data_needed': [], 'model': '3.1-flash-lite', 'thinking': 'minimal', 'routed_by': 'fast' } # Check for greetings — no data needed greeting_patterns = ['cześć', 'hej', 'witam', 'dzień dobry', 'siema', 'hello'] if any(msg_lower.strip().startswith(p) for p in greeting_patterns) and len(message) < 30: return { 'complexity': 'simple', 'data_needed': [], 'model': '3.1-flash-lite', 'thinking': 'minimal', 'routed_by': 'fast' } # Check keyword matches matched_categories = [] for category, keywords in FAST_ROUTES.items(): if any(kw in msg_lower for kw in keywords): matched_categories.append(category) # Check for specific company name mention # Simple heuristic: if message has quotes or specific capitalized words if not matched_categories: # Can't determine — return None to trigger AI router return None # Determine complexity if len(matched_categories) <= 1 and len(message) < 80: complexity = 'simple' elif len(matched_categories) <= 2: complexity = 'medium' else: complexity = 'complex' model_config = MODEL_MAP[complexity] return { 'complexity': complexity, 'data_needed': matched_categories, 'model': model_config['model'], 'thinking': model_config['thinking'], 'routed_by': 'fast' } def route_query_ai( message: str, user_context: Optional[Dict] = None, gemini_service=None ) -> Dict[str, Any]: """ AI-powered routing using Flash-Lite. Called when fast routing is uncertain. """ if not gemini_service: # Fallback: load everything return _fallback_route() user_name = user_context.get('user_name', 'Nieznany') if user_context else 'Nieznany' company_name = user_context.get('company_name', 'brak') if user_context else 'brak' prompt = ROUTER_PROMPT.format( user_name=user_name, company_name=company_name, message=message ) try: start = time.time() response = gemini_service.generate_text( prompt=prompt, temperature=0.1, max_tokens=200, model='gemini-3.1-flash-lite-preview', thinking_level='minimal', feature='smart_router' ) latency = int((time.time() - start) * 1000) logger.info(f"Smart Router AI response in {latency}ms: {response[:200]}") # Parse JSON from response # Handle potential markdown wrapping text = response.strip() if text.startswith('```'): text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip() result = json.loads(text) complexity = result.get('complexity', 'medium') model_config = MODEL_MAP.get(complexity, MODEL_MAP['medium']) return { 'complexity': complexity, 'data_needed': result.get('data_needed', []), 'model': model_config['model'], 'thinking': model_config['thinking'], 'routed_by': 'ai', 'router_latency_ms': latency } except (json.JSONDecodeError, KeyError, Exception) as e: logger.warning(f"Smart Router AI failed: {e}, falling back to full context") return _fallback_route() def route_query( message: str, user_context: Optional[Dict] = None, gemini_service=None ) -> Dict[str, Any]: """ Main entry point. Tries fast routing first, falls back to AI routing. """ # Try fast keyword-based routing result = route_query_fast(message, user_context) if result is not None: logger.info(f"Smart Router FAST: complexity={result['complexity']}, data={result['data_needed']}") return result # Fall back to AI routing result = route_query_ai(message, user_context, gemini_service) logger.info(f"Smart Router AI: complexity={result['complexity']}, data={result['data_needed']}") return result def _fallback_route() -> Dict[str, Any]: """Fallback: load everything, use default model. Safe but slow.""" return { 'complexity': 'medium', 'data_needed': [ 'companies_all', 'events', 'news', 'classifieds', 'forum', 'company_people', 'registered_users' ], 'model': '3-flash', 'thinking': 'low', 'routed_by': 'fallback' } ``` - [ ] **Step 2: Verify syntax** ```bash python3 -m py_compile smart_router.py && echo "OK" ``` - [ ] **Step 3: Commit** ```bash git add smart_router.py git commit -m "feat(nordagpt): add smart_router.py — fast keyword routing + AI fallback" ``` --- ### Task 6: Integrate Smart Router into nordabiz_chat.py **Files:** - Modify: `nordabiz_chat.py:163-282, 347-643, 890-1365` - [ ] **Step 1: Add imports at top of nordabiz_chat.py** After existing imports (around line 30), add: ```python from smart_router import route_query from context_builder import build_selective_context ``` - [ ] **Step 2: Modify send_message() to use Smart Router** In `send_message()`, replace the call to `_build_conversation_context()` and `_query_ai()` (around lines 236-239). The key change: use the router to decide model and data, then use context_builder for selective loading. Find the section where context is built and AI is queried (around lines 236-241): ```python # Before (approximately lines 236-241): # context = self._build_conversation_context(db, conversation, original_message) # ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context) # After: # Smart Router — classify query and select data + model route_decision = route_query( message=original_message, user_context=user_context, gemini_service=self.gemini_service ) # Override model and thinking based on router decision effective_model = route_decision.get('model', '3-flash') effective_thinking = route_decision.get('thinking', thinking_level) # Build selective context (only requested data categories) context = build_selective_context( data_needed=route_decision.get('data_needed', []), conversation_id=conversation.id, current_message=original_message, user_context=user_context ) # Use the original _query_ai but with router-selected parameters ai_response_text = self._query_ai( context, original_message, user_id=user_id, thinking_level=effective_thinking, user_context=user_context ) ``` Note: Keep `_build_conversation_context()` and full `_query_ai()` intact as fallback. The router's `_fallback_route()` loads all data, so it's safe. - [ ] **Step 3: Log routing decisions** After the route_query call, add logging: ```python logger.info( f"NordaGPT Router: user={user_context.get('user_name') if user_context else '?'}, " f"complexity={route_decision['complexity']}, model={effective_model}, " f"thinking={effective_thinking}, data={route_decision['data_needed']}, " f"routed_by={route_decision.get('routed_by')}" ) ``` - [ ] **Step 4: Update the GeminiService call in _query_ai() to use effective model** Currently `_query_ai()` uses `self.gemini_service` which has a fixed model. We need to pass the router-selected model to the generate_text call. In `_query_ai()`, around line 1352, modify: ```python # Before: response = self.gemini_service.generate_text( prompt=full_prompt, temperature=0.7, thinking_level=thinking_level, user_id=user_id, feature='chat' ) # After: response = self.gemini_service.generate_text( prompt=full_prompt, temperature=0.7, thinking_level=thinking_level, user_id=user_id, feature='chat', model=route_decision.get('model') if hasattr(self, '_current_route_decision') else None ) ``` Actually, a cleaner approach — pass the model through context: In `send_message()`, add to context before calling `_query_ai()`: ```python context['_route_decision'] = route_decision ``` In `_query_ai()`, read it at the generate_text call: ```python route = context.get('_route_decision', {}) effective_model_id = None model_alias = route.get('model') if model_alias: from gemini_service import GEMINI_MODELS effective_model_id = GEMINI_MODELS.get(model_alias) response = self.gemini_service.generate_text( prompt=full_prompt, temperature=0.7, thinking_level=thinking_level, user_id=user_id, feature='chat', model=effective_model_id ) ``` - [ ] **Step 5: Verify syntax** ```bash python3 -m py_compile nordabiz_chat.py && echo "OK" ``` - [ ] **Step 6: Commit** ```bash git add nordabiz_chat.py git commit -m "feat(nordagpt): integrate smart router — selective context loading + adaptive model selection" ``` --- ### Task 7: Deploy Phase 2 and verify - [ ] **Step 1: Push and deploy to staging** ```bash git push origin master && git push inpi master ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes" ``` - [ ] **Step 2: Test on staging — verify routing works** Test simple query: "Jaki jest telefon do TERMO?" — should be fast (2-3s), Flash-Lite model. Test medium query: "Porównaj firmy budowlane w Izbie" — should load companies_all, medium speed. Test complex query: "Jakie firmy mogłyby współpracować przy projekcie PEJ?" — should use full context. Check logs for routing decisions: ```bash ssh maciejpi@10.22.68.248 "journalctl -u nordabiznes -n 30 --no-pager | grep 'Router'" ``` - [ ] **Step 3: Deploy to production** ```bash ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes" curl -sI https://nordabiznes.pl/health | head -3 ``` --- ## Phase 3: Streaming Responses (Tasks 8-10) ### Task 8: Add streaming endpoint in Flask **Files:** - Modify: `blueprints/chat/routes.py` - Modify: `nordabiz_chat.py` - [ ] **Step 1: Add SSE streaming endpoint** In `blueprints/chat/routes.py`, add a new route after `chat_send_message()` (after line ~309): ```python @bp.route('/api/chat//message/stream', methods=['POST']) @login_required @member_required def chat_send_message_stream(conversation_id): """Send message to AI chat with streaming response (SSE)""" from flask import Response, stream_with_context import json as json_module data = request.get_json() if not data or not data.get('message', '').strip(): return jsonify({'error': 'Wiadomość nie może być pusta'}), 400 message = data['message'].strip() # Check limits from nordabiz_chat import check_user_limits limit_result = check_user_limits(current_user.id, current_user.email) if limit_result.get('limited'): return jsonify({'error': 'Przekroczono limit', 'limit_info': limit_result}), 429 # Build user context user_context = { 'user_id': current_user.id, 'user_name': current_user.name, 'user_email': current_user.email, 'company_name': current_user.company.name if current_user.company else None, 'company_id': current_user.company.id if current_user.company else None, 'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None, 'company_role': current_user.company_role or 'MEMBER', 'is_norda_member': current_user.is_norda_member, 'chamber_role': current_user.chamber_role, 'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None, } model_choice = data.get('model') or session.get('chat_model', 'flash') model_key = '3-flash' if model_choice == 'flash' else '3-pro' def generate(): try: chat_engine = NordaBizChatEngine(model=model_key) for chunk in chat_engine.send_message_stream( conversation_id=conversation_id, user_message=message, user_id=current_user.id, user_context=user_context ): yield f"data: {json_module.dumps(chunk, ensure_ascii=False)}\n\n" except PermissionError: yield f"data: {json_module.dumps({'type': 'error', 'content': 'Brak dostępu do tej konwersacji'})}\n\n" except Exception as e: logger.error(f"Streaming error: {e}") yield f"data: {json_module.dumps({'type': 'error', 'content': 'Wystąpił błąd'})}\n\n" return Response( stream_with_context(generate()), mimetype='text/event-stream', headers={ 'Cache-Control': 'no-cache', 'X-Accel-Buffering': 'no', # Disable Nginx buffering } ) ``` - [ ] **Step 2: Add send_message_stream() to NordaBizChatEngine** In `nordabiz_chat.py`, add a new method after `send_message()` (after line ~282): ```python def send_message_stream( self, conversation_id: int, user_message: str, user_id: int, user_context: Optional[Dict[str, Any]] = None ): """ Generator that yields streaming chunks for SSE. Yields dicts: {'type': 'thinking'|'token'|'done'|'error', 'content': '...'} """ import time db = SessionLocal() try: conversation = db.query(AIChatConversation).filter_by( id=conversation_id, user_id=user_id ).first() if not conversation: yield {'type': 'error', 'content': 'Konwersacja nie znaleziona'} return # Save user message original_message = user_message sanitized = self._sanitize_message(user_message) user_msg = AIChatMessage( conversation_id=conversation_id, role='user', content=sanitized ) db.add(user_msg) db.commit() # Smart Router route_decision = route_query( message=original_message, user_context=user_context, gemini_service=self.gemini_service ) yield {'type': 'thinking', 'content': 'Analizuję pytanie...'} # Build selective context context = build_selective_context( data_needed=route_decision.get('data_needed', []), conversation_id=conversation.id, current_message=original_message, user_context=user_context ) context['_route_decision'] = route_decision # Build prompt (reuse _query_ai logic for prompt building) full_prompt = self._build_prompt(context, original_message, user_context, route_decision.get('thinking', 'low')) # Get effective model from gemini_service import GEMINI_MODELS model_alias = route_decision.get('model', '3-flash') effective_model = GEMINI_MODELS.get(model_alias, self.model_name) # Stream from Gemini start_time = time.time() stream_response = self.gemini_service.generate_text( prompt=full_prompt, temperature=0.7, stream=True, thinking_level=route_decision.get('thinking', 'low'), user_id=user_id, feature='chat_stream', model=effective_model ) full_text = "" for chunk in stream_response: if hasattr(chunk, 'text') and chunk.text: full_text += chunk.text yield {'type': 'token', 'content': chunk.text} latency_ms = int((time.time() - start_time) * 1000) # Save AI response to DB ai_msg = AIChatMessage( conversation_id=conversation_id, role='assistant', content=full_text, latency_ms=latency_ms ) db.add(ai_msg) conversation.updated_at = datetime.now() conversation.message_count = (conversation.message_count or 0) + 2 db.commit() yield { 'type': 'done', 'message_id': ai_msg.id, 'latency_ms': latency_ms, 'model': model_alias, 'complexity': route_decision.get('complexity') } except Exception as e: logger.error(f"Stream error: {e}", exc_info=True) yield {'type': 'error', 'content': 'Wystąpił błąd podczas generowania odpowiedzi'} finally: db.close() ``` - [ ] **Step 3: Extract prompt building into reusable method** Add a `_build_prompt()` method to `NordaBizChatEngine` that extracts prompt construction from `_query_ai()`. This method builds the full prompt string without calling Gemini: ```python def _build_prompt( self, context: Dict[str, Any], user_message: str, user_context: Optional[Dict[str, Any]] = None, thinking_level: str = 'low' ) -> str: """Build the full prompt string. Extracted from _query_ai() for reuse in streaming.""" # Build user identity section user_identity = "" if user_context: user_identity = f""" # AKTUALNY UŻYTKOWNIK Rozmawiasz z: {user_context.get('user_name', 'Nieznany')} Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')} Rola w firmie: {user_context.get('company_role', 'MEMBER')} Członek Izby: {'tak' if user_context.get('is_norda_member') else 'nie'} Rola w Izbie: {user_context.get('chamber_role') or '—'} Na portalu od: {user_context.get('member_since', 'nieznana data')} """ # Reuse the existing system_prompt from _query_ai() lines 922-1134 # This is the same static prompt — extract it to a class attribute or method # For now, call _query_ai's prompt logic # NOTE: In implementation, refactor the static prompt into a separate method # to avoid duplication. The key point is that _build_prompt returns the # same prompt string that _query_ai would build. # ... (reuse existing system prompt construction logic) ... return full_prompt ``` **Implementation note:** The actual implementation should refactor `_query_ai()` to call `_build_prompt()` internally, then the streaming method also calls `_build_prompt()`. This avoids prompt duplication. - [ ] **Step 4: Verify syntax** ```bash python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK" ``` - [ ] **Step 5: Commit** ```bash git add nordabiz_chat.py blueprints/chat/routes.py git commit -m "feat(nordagpt): add streaming SSE endpoint + send_message_stream method" ``` --- ### Task 9: Frontend streaming UI **Files:** - Modify: `templates/chat.html` - [ ] **Step 1: Add streaming sendMessage function** In `templates/chat.html`, replace the existing `sendMessage()` function (lines 2373-2454) with a streaming version: ```javascript async function sendMessage() { const input = document.getElementById('messageInput'); const message = input.value.trim(); if (!message || isSending) return; isSending = true; document.getElementById('sendBtn').disabled = true; input.value = ''; autoResizeTextarea(); // Add user message to chat addMessage('user', message); // Create conversation if needed if (!currentConversationId) { try { const startRes = await fetch('/api/chat/start', { method: 'POST', headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken}, body: JSON.stringify({title: message.substring(0, 50)}) }); const startData = await startRes.json(); currentConversationId = startData.conversation_id; } catch (e) { addMessage('assistant', 'Błąd tworzenia konwersacji.'); isSending = false; document.getElementById('sendBtn').disabled = false; return; } } // Add empty assistant bubble with thinking animation const msgDiv = document.createElement('div'); msgDiv.className = 'message assistant'; msgDiv.innerHTML = `
AI
...
`; document.getElementById('chatMessages').appendChild(msgDiv); scrollToBottom(); const contentDiv = msgDiv.querySelector('.message-content'); try { const response = await fetch(`/api/chat/${currentConversationId}/message/stream`, { method: 'POST', headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken}, body: JSON.stringify({message: message, model: currentModel}) }); if (response.status === 429) { contentDiv.innerHTML = ''; contentDiv.textContent = 'Przekroczono limit zapytań.'; showLimitBanner(); isSending = false; document.getElementById('sendBtn').disabled = false; return; } const reader = response.body.getReader(); const decoder = new TextDecoder(); let fullText = ''; let thinkingRemoved = false; while (true) { const {done, value} = await reader.read(); if (done) break; const text = decoder.decode(value, {stream: true}); const lines = text.split('\n'); for (const line of lines) { if (!line.startsWith('data: ')) continue; try { const chunk = JSON.parse(line.slice(6)); if (chunk.type === 'thinking') { // Keep thinking dots visible continue; } if (chunk.type === 'token') { if (!thinkingRemoved) { contentDiv.innerHTML = ''; thinkingRemoved = true; } fullText += chunk.content; contentDiv.innerHTML = formatMessage(fullText); scrollToBottom(); } if (chunk.type === 'done') { // Add tech info badge if (chunk.latency_ms) { const badge = document.createElement('div'); badge.className = 'thinking-info-badge'; badge.textContent = `${chunk.model || 'AI'} · ${(chunk.latency_ms/1000).toFixed(1)}s`; msgDiv.appendChild(badge); } loadConversations(); } if (chunk.type === 'error') { contentDiv.innerHTML = ''; contentDiv.textContent = chunk.content || 'Wystąpił błąd'; } } catch (e) { // Skip malformed chunks } } } } catch (e) { contentDiv.innerHTML = ''; contentDiv.textContent = 'Błąd połączenia z serwerem.'; } isSending = false; document.getElementById('sendBtn').disabled = false; } ``` - [ ] **Step 2: Add CSS for thinking animation** In `templates/chat.html`, in the `{% block extra_css %}` section, add: ```css .thinking-dots { display: flex; gap: 4px; padding: 8px 0; } .thinking-dots span { animation: thinkBounce 1.4s infinite ease-in-out both; font-size: 1.5rem; color: var(--text-secondary); } .thinking-dots span:nth-child(1) { animation-delay: -0.32s; } .thinking-dots span:nth-child(2) { animation-delay: -0.16s; } .thinking-dots span:nth-child(3) { animation-delay: 0s; } @keyframes thinkBounce { 0%, 80%, 100% { transform: scale(0); } 40% { transform: scale(1); } } ``` - [ ] **Step 3: Verify locally and commit** ```bash python3 -m py_compile app.py && echo "OK" git add templates/chat.html git commit -m "feat(nordagpt): streaming UI — word-by-word response with thinking animation" ``` --- ### Task 10: Deploy Phase 3 and verify streaming - [ ] **Step 1: Check Nginx/NPM config for SSE support** SSE requires Nginx to NOT buffer the response. The streaming endpoint sets `X-Accel-Buffering: no` header. Verify NPM custom config allows this: ```bash ssh maciejpi@57.128.200.27 "cat /etc/nginx/sites-enabled/nordabiznes.conf 2>/dev/null || echo 'Using NPM proxy'" ``` If using NPM, the `X-Accel-Buffering: no` header should be sufficient. If not, add to NPM custom Nginx config for nordabiznes.pl: ``` proxy_buffering off; proxy_cache off; ``` - [ ] **Step 2: Push, deploy to staging, test streaming** ```bash git push origin master && git push inpi master ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes" ``` Test on staging: open chat, send message, verify text appears word-by-word. - [ ] **Step 3: Deploy to production** ```bash ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes" curl -sI https://nordabiznes.pl/health | head -3 ``` --- ## Phase 4: Persistent User Memory (Tasks 11-15) ### Task 11: Database migration — memory tables **Files:** - Create: `database/migrations/092_ai_user_memory.sql` - Create: `database/migrations/093_ai_conversation_summary.sql` - [ ] **Step 1: Create migration 092** ```sql -- 092_ai_user_memory.sql -- Persistent memory for NordaGPT — per-user facts extracted from conversations CREATE TABLE IF NOT EXISTS ai_user_memory ( id SERIAL PRIMARY KEY, user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, fact TEXT NOT NULL, category VARCHAR(50) DEFAULT 'general', source_conversation_id INTEGER REFERENCES ai_chat_conversations(id) ON DELETE SET NULL, confidence FLOAT DEFAULT 1.0, created_at TIMESTAMP DEFAULT NOW(), expires_at TIMESTAMP DEFAULT (NOW() + INTERVAL '12 months'), is_active BOOLEAN DEFAULT TRUE ); CREATE INDEX idx_ai_user_memory_user_active ON ai_user_memory(user_id, is_active, confidence DESC); CREATE INDEX idx_ai_user_memory_expires ON ai_user_memory(expires_at) WHERE is_active = TRUE; GRANT ALL ON TABLE ai_user_memory TO nordabiz_app; GRANT USAGE, SELECT ON SEQUENCE ai_user_memory_id_seq TO nordabiz_app; ``` - [ ] **Step 2: Create migration 093** ```sql -- 093_ai_conversation_summary.sql -- Auto-generated summaries of AI conversations for memory context CREATE TABLE IF NOT EXISTS ai_conversation_summary ( id SERIAL PRIMARY KEY, conversation_id INTEGER NOT NULL UNIQUE REFERENCES ai_chat_conversations(id) ON DELETE CASCADE, user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, summary TEXT NOT NULL, key_topics JSONB DEFAULT '[]', created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() ); CREATE INDEX idx_ai_conv_summary_user ON ai_conversation_summary(user_id, created_at DESC); GRANT ALL ON TABLE ai_conversation_summary TO nordabiz_app; GRANT USAGE, SELECT ON SEQUENCE ai_conversation_summary_id_seq TO nordabiz_app; ``` - [ ] **Step 3: Commit migrations** ```bash git add database/migrations/092_ai_user_memory.sql database/migrations/093_ai_conversation_summary.sql git commit -m "feat(nordagpt): add migrations for user memory and conversation summary tables" ``` --- ### Task 12: Add SQLAlchemy models **Files:** - Modify: `database.py` (insert before line 5954) - [ ] **Step 1: Add AIUserMemory model** Insert before the `# DATABASE INITIALIZATION` comment (line 5954): ```python class AIUserMemory(Base): __tablename__ = 'ai_user_memory' id = Column(Integer, primary_key=True) user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False) fact = Column(Text, nullable=False) category = Column(String(50), default='general') source_conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='SET NULL'), nullable=True) confidence = Column(Float, default=1.0) created_at = Column(DateTime, default=datetime.utcnow) expires_at = Column(DateTime) is_active = Column(Boolean, default=True) user = relationship('User') source_conversation = relationship('AIChatConversation') class AIConversationSummary(Base): __tablename__ = 'ai_conversation_summary' id = Column(Integer, primary_key=True) conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='CASCADE'), nullable=False, unique=True) user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False) summary = Column(Text, nullable=False) key_topics = Column(JSON, default=list) created_at = Column(DateTime, default=datetime.utcnow) updated_at = Column(DateTime, default=datetime.utcnow) user = relationship('User') conversation = relationship('AIChatConversation') ``` - [ ] **Step 2: Verify syntax** ```bash python3 -m py_compile database.py && echo "OK" ``` - [ ] **Step 3: Commit** ```bash git add database.py git commit -m "feat(nordagpt): add AIUserMemory and AIConversationSummary ORM models" ``` --- ### Task 13: Create memory_service.py **Files:** - Create: `memory_service.py` - [ ] **Step 1: Create memory_service.py** ```python """ Memory Service for NordaGPT ============================= Manages persistent per-user memory: fact extraction, storage, retrieval, cleanup. """ import json import logging from datetime import datetime, timedelta from typing import Dict, Any, List, Optional from database import SessionLocal, AIUserMemory, AIConversationSummary, AIChatMessage logger = logging.getLogger(__name__) EXTRACT_FACTS_PROMPT = """Na podstawie tej rozmowy wyciągnij kluczowe fakty o użytkowniku {user_name} ({company_name}). Rozmowa: {conversation_text} Istniejące fakty (NIE DUPLIKUJ): {existing_facts} Zwróć TYLKO JSON array (bez markdown): [{{"fact": "...", "category": "interests|needs|contacts|insights"}}] Zasady: - Tylko nowe, nietrywialne fakty przydatne w przyszłych rozmowach - Nie zapisuj: "zapytał o firmę X" (to za mało) - Zapisuj: "szuka podwykonawców do projektu PEJ w branży elektrycznej" - Max 3 fakty. Jeśli nie ma nowych faktów, zwróć [] - Kategorie: interests (zainteresowania), needs (potrzeby biznesowe), contacts (kontakty), insights (wnioski/preferencje) """ SUMMARIZE_PROMPT = """Podsumuj tę rozmowę w 1-3 zdaniach. Skup się na tym, czego użytkownik szukał i co ustalono. Rozmowa: {conversation_text} Zwróć TYLKO JSON (bez markdown): {{"summary": "...", "key_topics": ["temat1", "temat2"]}} """ def get_user_memory(user_id: int, limit: int = 10) -> List[Dict]: """Get active memory facts for a user, sorted by recency and confidence.""" db = SessionLocal() try: facts = db.query(AIUserMemory).filter( AIUserMemory.user_id == user_id, AIUserMemory.is_active == True, AIUserMemory.expires_at > datetime.now() ).order_by( AIUserMemory.confidence.desc(), AIUserMemory.created_at.desc() ).limit(limit).all() return [ { 'id': f.id, 'fact': f.fact, 'category': f.category, 'confidence': f.confidence, 'created_at': f.created_at.isoformat() } for f in facts ] finally: db.close() def get_conversation_summaries(user_id: int, limit: int = 5) -> List[Dict]: """Get recent conversation summaries for a user.""" db = SessionLocal() try: summaries = db.query(AIConversationSummary).filter( AIConversationSummary.user_id == user_id ).order_by( AIConversationSummary.created_at.desc() ).limit(limit).all() return [ { 'summary': s.summary, 'topics': s.key_topics or [], 'date': s.created_at.strftime('%Y-%m-%d') } for s in summaries ] finally: db.close() def format_memory_for_prompt(user_id: int) -> str: """Format user memory and summaries for injection into AI prompt.""" facts = get_user_memory(user_id) summaries = get_conversation_summaries(user_id) if not facts and not summaries: return "" parts = ["\n# PAMIĘĆ O UŻYTKOWNIKU"] if facts: parts.append("Znane fakty:") for f in facts: parts.append(f"- [{f['category']}] {f['fact']}") if summaries: parts.append("\nOstatnie rozmowy:") for s in summaries: topics = ", ".join(s['topics'][:3]) if s['topics'] else "" parts.append(f"- {s['date']}: {s['summary']}" + (f" (tematy: {topics})" if topics else "")) parts.append("\nWykorzystuj tę wiedzę do personalizacji odpowiedzi. Nawiązuj do wcześniejszych rozmów gdy to naturalne.") return "\n".join(parts) def extract_facts_async( conversation_id: int, user_id: int, user_context: Dict, gemini_service ): """ Extract memory facts from a conversation. Run async after response is sent. Uses Flash-Lite for minimal cost. """ db = SessionLocal() try: # Get conversation messages messages = db.query(AIChatMessage).filter_by( conversation_id=conversation_id ).order_by(AIChatMessage.created_at).all() if len(messages) < 2: return # Too short to extract conversation_text = "\n".join([ f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content}" for m in messages[-10:] # Last 10 messages ]) # Get existing facts to avoid duplicates existing = db.query(AIUserMemory).filter( AIUserMemory.user_id == user_id, AIUserMemory.is_active == True ).all() existing_text = "\n".join([f"- {f.fact}" for f in existing]) or "Brak" prompt = EXTRACT_FACTS_PROMPT.format( user_name=user_context.get('user_name', 'Nieznany'), company_name=user_context.get('company_name', 'brak'), conversation_text=conversation_text, existing_facts=existing_text ) response = gemini_service.generate_text( prompt=prompt, temperature=0.1, max_tokens=300, model='gemini-3.1-flash-lite-preview', thinking_level='minimal', feature='memory_extraction' ) # Parse response text = response.strip() if text.startswith('```'): text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip() facts = json.loads(text) if not isinstance(facts, list): return for fact_data in facts[:3]: if not fact_data.get('fact'): continue memory = AIUserMemory( user_id=user_id, fact=fact_data['fact'], category=fact_data.get('category', 'general'), source_conversation_id=conversation_id, expires_at=datetime.now() + timedelta(days=365) ) db.add(memory) db.commit() logger.info(f"Extracted {len(facts)} memory facts for user {user_id}") except Exception as e: logger.warning(f"Memory extraction failed for conversation {conversation_id}: {e}") db.rollback() finally: db.close() def summarize_conversation_async( conversation_id: int, user_id: int, gemini_service ): """Generate or update conversation summary. Run async.""" db = SessionLocal() try: messages = db.query(AIChatMessage).filter_by( conversation_id=conversation_id ).order_by(AIChatMessage.created_at).all() if len(messages) < 2: return conversation_text = "\n".join([ f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content[:200]}" for m in messages[-10:] ]) prompt = SUMMARIZE_PROMPT.format(conversation_text=conversation_text) response = gemini_service.generate_text( prompt=prompt, temperature=0.1, max_tokens=200, model='gemini-3.1-flash-lite-preview', thinking_level='minimal', feature='conversation_summary' ) text = response.strip() if text.startswith('```'): text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip() result = json.loads(text) existing = db.query(AIConversationSummary).filter_by( conversation_id=conversation_id ).first() if existing: existing.summary = result.get('summary', existing.summary) existing.key_topics = result.get('key_topics', existing.key_topics) existing.updated_at = datetime.now() else: summary = AIConversationSummary( conversation_id=conversation_id, user_id=user_id, summary=result.get('summary', ''), key_topics=result.get('key_topics', []) ) db.add(summary) db.commit() logger.info(f"Summarized conversation {conversation_id}") except Exception as e: logger.warning(f"Conversation summary failed for {conversation_id}: {e}") db.rollback() finally: db.close() def delete_user_fact(user_id: int, fact_id: int) -> bool: """Soft-delete a memory fact. Returns True if deleted.""" db = SessionLocal() try: fact = db.query(AIUserMemory).filter_by(id=fact_id, user_id=user_id).first() if fact: fact.is_active = False db.commit() return True return False finally: db.close() ``` - [ ] **Step 2: Verify syntax** ```bash python3 -m py_compile memory_service.py && echo "OK" ``` - [ ] **Step 3: Commit** ```bash git add memory_service.py git commit -m "feat(nordagpt): add memory_service.py — fact extraction, summaries, CRUD" ``` --- ### Task 14: Integrate memory into chat flow **Files:** - Modify: `nordabiz_chat.py` - Modify: `blueprints/chat/routes.py` - [ ] **Step 1: Inject memory into system prompt** In `nordabiz_chat.py`, in the `_build_prompt()` or `_query_ai()` method, after the user identity block and before the data sections, add memory: ```python from memory_service import format_memory_for_prompt # After user_identity block, before data injection: user_memory_text = "" if user_context and user_context.get('user_id'): user_memory_text = format_memory_for_prompt(user_context['user_id']) # Prepend to system prompt: system_prompt = user_identity + user_memory_text + f"""Jesteś pomocnym asystentem...""" ``` - [ ] **Step 2: Trigger async memory extraction after response** In `send_message()` and `send_message_stream()`, after saving the AI response, trigger async extraction using threading: ```python import threading from memory_service import extract_facts_async, summarize_conversation_async # After saving AI response to DB (end of send_message/send_message_stream): # Async memory extraction — don't block the response def _extract_memory(): extract_facts_async(conversation_id, user_id, user_context, self.gemini_service) # Summarize every 5 messages if (conversation.message_count or 0) % 5 == 0: summarize_conversation_async(conversation_id, user_id, self.gemini_service) threading.Thread(target=_extract_memory, daemon=True).start() ``` - [ ] **Step 3: Add memory CRUD API routes** In `blueprints/chat/routes.py`, add routes for viewing and deleting memory: ```python @bp.route('/api/chat/memory', methods=['GET']) @login_required @member_required def get_user_memory_api(): """Get current user's NordaGPT memory facts and summaries""" from memory_service import get_user_memory, get_conversation_summaries return jsonify({ 'facts': get_user_memory(current_user.id, limit=20), 'summaries': get_conversation_summaries(current_user.id, limit=10) }) @bp.route('/api/chat/memory/', methods=['DELETE']) @login_required @member_required def delete_memory_fact(fact_id): """Delete a memory fact""" from memory_service import delete_user_fact if delete_user_fact(current_user.id, fact_id): return jsonify({'status': 'ok'}) return jsonify({'error': 'Nie znaleziono'}), 404 ``` - [ ] **Step 4: Verify syntax** ```bash python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK" ``` - [ ] **Step 5: Commit** ```bash git add nordabiz_chat.py blueprints/chat/routes.py git commit -m "feat(nordagpt): integrate memory into chat — injection, async extraction, CRUD API" ``` --- ### Task 15: Deploy Phase 4 — migrations + code - [ ] **Step 1: Push to remotes** ```bash git push origin master && git push inpi master ``` - [ ] **Step 2: Deploy to staging with migrations** ```bash ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull" ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql" ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql" ssh maciejpi@10.22.68.248 "sudo systemctl restart nordabiznes" ``` - [ ] **Step 3: Test on staging** 1. Open chat, have a conversation about looking for IT companies 2. Open another chat, ask "o czym rozmawialiśmy?" — verify AI mentions previous topics 3. Check memory API: `curl https://staging.nordabiznes.pl/api/chat/memory` (with auth) 4. Verify facts are extracted - [ ] **Step 4: Deploy to production** ```bash ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull" ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql" ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql" ssh maciejpi@57.128.200.27 "sudo systemctl restart nordabiznes" curl -sI https://nordabiznes.pl/health | head -3 ``` - [ ] **Step 5: Update release notes** Add entry in `blueprints/public/routes.py` `_get_releases()`. --- ## Post-Implementation Checklist - [ ] Verify AI greets users by name - [ ] Verify Smart Router logs show correct classification - [ ] Verify streaming works on mobile (Android + iOS) - [ ] Verify memory facts are extracted after conversations - [ ] Verify memory is private (user A cannot see user B's facts) - [ ] Verify response times: simple <3s, medium <6s, complex <12s - [ ] Monitor costs for first week — compare with estimates - [ ] Send message to Jakub Pornowski confirming speed improvements