# Company Search Flow **Document Version:** 1.0 **Last Updated:** 2026-01-10 **Status:** Production LIVE **Flow Type:** Company Search & Discovery --- ## Overview This document describes the **complete company search flow** for the Norda Biznes Partner application, covering: - **User Search Interface** (`/search` route) - **Search Service Architecture** (unified search with multiple strategies) - **AI Chat Integration** (context-aware company discovery) - **Search Strategies:** - NIP/REGON direct lookup - Synonym expansion - PostgreSQL Full-Text Search (FTS) - Fuzzy matching (pg_trgm) - SQLite keyword scoring fallback **Key Technology:** - **Search Engine:** Custom unified SearchService - **Database:** PostgreSQL FTS with tsvector indexing - **Fuzzy Matching:** pg_trgm extension for typo tolerance - **Synonym Expansion:** Domain-specific keyword mappings - **AI Integration:** Used by NordaBiz Chat for context building **Performance Features:** - Direct identifier lookup (NIP/REGON) bypasses full search - Database-level full-text search indexing - Synonym expansion increases recall - Configurable result limits (default 50) - Fallback mechanisms for SQLite compatibility --- ## 1. Search Flow Overview ### 1.1 High-Level Architecture ```mermaid flowchart TD User[User] -->|1. Search query| UI[Search UI
/search route] AIUser[AI Chat User] -->|1. Natural language| Chat[AI Chat
/chat route] UI -->|2. Call| SearchSvc[Search Service
search_service.py] Chat -->|2. Find companies| SearchSvc SearchSvc -->|3. Detect query type| QueryType{Query Type?} QueryType -->|NIP: 10 digits| NIPLookup[NIP Direct Lookup] QueryType -->|REGON: 9/14 digits| REGONLookup[REGON Direct Lookup] QueryType -->|Text query| DBCheck{Database
Type?} DBCheck -->|PostgreSQL| PGFTS[PostgreSQL FTS
+ Fuzzy Match] DBCheck -->|SQLite| SQLiteFallback[SQLite Keyword
Scoring] NIPLookup -->|4. Query DB| DB[(PostgreSQL
companies)] REGONLookup -->|4. Query DB| DB PGFTS -->|4. FTS query| DB SQLiteFallback -->|4. LIKE query| DB DB -->|5. Results| SearchSvc SearchSvc -->|6. SearchResult[]| UI SearchSvc -->|6. Company[]| Chat UI -->|7. Render| SearchResults[search_results.html] Chat -->|7. Build context| AIContext[AI Context Builder] SearchResults -->|8. Display| User AIContext -->|8. Generate response| AIUser style SearchSvc fill:#4CAF50 style PGFTS fill:#2196F3 style DB fill:#FF9800 style NIPLookup fill:#9C27B0 style REGONLookup fill:#9C27B0 ``` --- ## 2. Search Strategies ### 2.1 Strategy Selection Algorithm ```mermaid flowchart TD Start([User Query]) --> Clean[Strip whitespace] Clean --> Empty{Empty
query?} Empty -->|Yes| AllCompanies[Return all companies
ORDER BY name] Empty -->|No| NIPCheck{Is NIP?
10 digits} NIPCheck -->|Yes| NIPSearch[Direct NIP lookup
WHERE nip = ?] NIPCheck -->|No| REGONCheck{Is REGON?
9 or 14 digits} REGONCheck -->|Yes| REGONSearch[Direct REGON lookup
WHERE regon = ?] REGONCheck -->|No| DBType{Database
Type?} DBType -->|PostgreSQL| PGFlow[PostgreSQL FTS Flow] DBType -->|SQLite| SQLiteFlow[SQLite Keyword Flow] NIPSearch --> Found{Found?} REGONSearch --> Found Found -->|Yes| ReturnSingle[Return single result
score=100, match_type='nip/regon'] Found -->|No| ReturnEmpty[Return empty list] PGFlow --> PGSynonym[Expand synonyms] PGSynonym --> PGExtCheck{pg_trgm
available?} PGExtCheck -->|Yes| FTS_Fuzzy[FTS + Fuzzy search
ts_rank + similarity] PGExtCheck -->|No| FTS_Only[FTS only
ts_rank] FTS_Fuzzy --> PGResults{Results?} FTS_Only --> PGResults PGResults -->|Yes| ReturnScored[Return scored results
ORDER BY score DESC] PGResults -->|No| Fallback[Execute SQLite fallback] SQLiteFlow --> SQLiteSynonym[Expand synonyms] SQLiteSynonym --> Fallback Fallback --> InMemory[In-memory keyword scoring] InMemory --> ReturnScored ReturnSingle --> End([SearchResult[]]) ReturnEmpty --> End ReturnScored --> End AllCompanies --> End style NIPSearch fill:#9C27B0 style REGONSearch fill:#9C27B0 style FTS_Fuzzy fill:#2196F3 style FTS_Only fill:#2196F3 style InMemory fill:#FF9800 ``` ### 2.2 Synonym Expansion **Purpose:** Increase search recall by expanding user queries with domain-specific synonyms **Examples:** ```python KEYWORD_SYNONYMS = { # IT / Web 'strony': ['www', 'web', 'internet', 'witryny', 'seo', 'e-commerce', 'sklep', 'portal'], 'aplikacje': ['software', 'programowanie', 'systemy', 'crm', 'erp', 'app'], 'it': ['informatyka', 'komputery', 'software', 'systemy', 'serwis'], # Construction 'budowa': ['budownictwo', 'konstrukcje', 'remonty', 'wykończenia', 'dach', 'elewacja'], 'remont': ['wykończenie', 'naprawa', 'renowacja', 'modernizacja'], # Services 'księgowość': ['rachunkowość', 'finanse', 'podatki', 'biuro rachunkowe', 'kadry'], 'prawo': ['prawnik', 'adwokat', 'radca', 'kancelaria', 'notariusz'], # Production 'metal': ['stal', 'obróbka', 'spawanie', 'cnc', 'ślusarstwo'], 'drewno': ['stolarka', 'meble', 'tartak', 'carpentry'], } ``` **Algorithm:** 1. Tokenize user query (split on whitespace, strip punctuation) 2. For each word: - Direct lookup in KEYWORD_SYNONYMS keys - Check if word appears in any synonym list - Add matching synonyms to expanded query 3. Return unique set of keywords **Example Expansion:** ``` Input: "strony internetowe" Output: ['strony', 'internetowe', 'www', 'web', 'internet', 'witryny', 'seo', 'e-commerce', 'ecommerce', 'sklep', 'portal', 'online', 'cyfrowe', 'marketing'] ``` --- ## 3. PostgreSQL Full-Text Search (FTS) ### 3.1 FTS Search Sequence ```mermaid sequenceDiagram actor User participant Route as Flask Route
/search participant SearchSvc as SearchService participant PG as PostgreSQL participant FTS as Full-Text Engine
(tsvector) participant Trgm as pg_trgm Extension
(fuzzy matching) User->>Route: GET /search?q=strony www Route->>SearchSvc: search("strony www", limit=50) Note over SearchSvc: Detect PostgreSQL database SearchSvc->>SearchSvc: _expand_keywords("strony www") Note over SearchSvc: Expanded: [strony, www, web, internet,
witryny, seo, e-commerce, ...] SearchSvc->>SearchSvc: Build tsquery: "strony:* | www:* | web:* | ..." SearchSvc->>SearchSvc: Build ILIKE patterns: [%strony%, %www%, %web%, ...] SearchSvc->>PG: Check pg_trgm extension available PG->>SearchSvc: Extension exists SearchSvc->>PG: Execute FTS + Fuzzy query Note over PG: SELECT c.id,
ts_rank(search_vector, tsquery) as fts_score,
similarity(name, query) as fuzzy_score,
CASE WHEN founding_history ILIKE ...
FROM companies c
WHERE search_vector @@ tsquery
OR similarity(name, query) > 0.2
OR name/description ILIKE patterns PG->>FTS: Match against search_vector FTS->>PG: FTS matches with ts_rank scores PG->>Trgm: Calculate similarity(name, query) Trgm->>PG: Fuzzy match scores (0.0-1.0) PG->>SearchSvc: Result rows: [(id, fts_score, fuzzy_score, history_score), ...] SearchSvc->>PG: Fetch full Company objects
WHERE id IN (...) PG->>SearchSvc: Company objects SearchSvc->>SearchSvc: Determine match_type (fts/fuzzy/history) SearchSvc->>SearchSvc: Normalize scores (0-100) SearchSvc->>Route: SearchResult[] with companies, scores, match_types Route->>User: Render search_results.html ``` ### 3.2 PostgreSQL FTS Implementation **File:** `search_service.py` (lines 251-378) **Database Requirements:** - **Extension:** `pg_trgm` (optional, enables fuzzy matching) - **Column:** `companies.search_vector` (tsvector, indexed) - **Index:** GIN index on `search_vector` for fast full-text search **SQL Query Structure (with pg_trgm):** ```sql SELECT c.id, COALESCE(ts_rank(c.search_vector, to_tsquery('simple', :tsquery)), 0) as fts_score, COALESCE(similarity(c.name, :query), 0) as fuzzy_score, CASE WHEN c.founding_history ILIKE ANY(:like_patterns) THEN 0.5 ELSE 0 END as history_score FROM companies c WHERE c.status = 'active' AND ( c.search_vector @@ to_tsquery('simple', :tsquery) -- FTS match OR similarity(c.name, :query) > 0.2 -- Fuzzy name match OR c.name ILIKE ANY(:like_patterns) -- Keyword in name OR c.description_short ILIKE ANY(:like_patterns) -- Keyword in description OR c.founding_history ILIKE ANY(:like_patterns) -- Keyword in owners/founders OR c.description_full ILIKE ANY(:like_patterns) -- Keyword in full text ) ORDER BY GREATEST( COALESCE(ts_rank(c.search_vector, to_tsquery('simple', :tsquery)), 0), COALESCE(similarity(c.name, :query), 0), CASE WHEN c.founding_history ILIKE ANY(:like_patterns) THEN 0.5 ELSE 0 END ) DESC LIMIT :limit ``` **Parameters:** - `:tsquery` - Expanded keywords joined with `|` (OR), each with `:*` prefix matching - Example: `"strony:* | www:* | web:* | internet:*"` - `:query` - Original user query for fuzzy matching - `:like_patterns` - Array of ILIKE patterns for direct keyword matches - Example: `['%strony%', '%www%', '%web%']` - `:limit` - Maximum results (default 50) **Scoring Strategy:** 1. **FTS Score:** `ts_rank()` measures how well document matches query (0.0-1.0) 2. **Fuzzy Score:** `similarity()` from pg_trgm measures string similarity (0.0-1.0) 3. **History Score:** Fixed 0.5 bonus if founders/owners match (important for people search) 4. **Final Score:** `GREATEST()` of all three scores, normalized to 0-100 scale **Match Types:** - `'fts'` - Full-text search match (highest ts_rank) - `'fuzzy'` - Fuzzy string similarity match (highest similarity) - `'history'` - Founding history match (owner/founder keywords) **Fallback Behavior:** - If `pg_trgm` extension not available → Uses FTS only (no fuzzy matching) - If FTS returns 0 results → Falls back to SQLite keyword scoring - If FTS query fails (exception) → Rollback transaction, use SQLite fallback --- ## 4. SQLite Keyword Scoring Fallback ### 4.1 Fallback Sequence ```mermaid sequenceDiagram participant SearchSvc as SearchService participant DB as Database participant Scorer as Keyword Scorer
(in-memory) SearchSvc->>SearchSvc: _expand_keywords(query) Note over SearchSvc: Keywords: [strony, www, web, ...] SearchSvc->>DB: SELECT * FROM companies
WHERE status = 'active' DB->>SearchSvc: All active companies (in-memory) loop For each company SearchSvc->>Scorer: Calculate score Note over Scorer: Name match: +10
(+5 bonus for exact match) Note over Scorer: Description short: +5 Note over Scorer: Services: +8 Note over Scorer: Competencies: +7 Note over Scorer: City: +3 Note over Scorer: Founding history: +12
(owners/founders) Note over Scorer: Description full: +4 Scorer->>SearchSvc: Total score (0+) end SearchSvc->>SearchSvc: Filter companies (score > 0) SearchSvc->>SearchSvc: Sort by score DESC SearchSvc->>SearchSvc: Limit results SearchSvc->>SearchSvc: Build SearchResult[]
with scores and match_types ``` ### 4.2 Keyword Scoring Algorithm **File:** `search_service.py` (lines 162-249) **Scoring Weights:** ```python { 'name_match': 10, # Company name contains keyword 'exact_name_match': +5, # Exact query appears in name (bonus) 'description_short': 5, # Short description contains keyword 'services': 8, # Service tag matches 'competencies': 7, # Competency tag matches 'city': 3, # City/location matches 'founding_history': 12, # Owners/founders match (highest weight) 'description_full': 4, # Full description contains keyword } ``` **Algorithm:** 1. Fetch all active companies from database 2. For each company, calculate score: ```python score = 0 match_type = 'keyword' # Name match (highest weight) if any(keyword in company.name.lower() for keyword in keywords): score += 10 if original_query.lower() in company.name.lower(): score += 5 # Exact match bonus match_type = 'exact' # Description match if any(keyword in company.description_short.lower() for keyword in keywords): score += 5 # Services match if any(keyword in service.name.lower() for service in company.services for keyword in keywords): score += 8 # Competencies match if any(keyword in competency.name.lower() for competency in company.competencies for keyword in keywords): score += 7 # City match if any(keyword in company.city.lower() for keyword in keywords): score += 3 # Founding history match (owners, founders) if any(keyword in company.founding_history.lower() for keyword in keywords): score += 12 # Full description match if any(keyword in company.description_full.lower() for keyword in keywords): score += 4 ``` 3. Filter companies with score > 0 4. Sort by score descending 5. Limit to requested result count 6. Return as `SearchResult[]` with scores and match types **Match Types:** - `'exact'` - Original query appears exactly in company name - `'keyword'` - One or more expanded keywords matched --- ## 5. Direct Identifier Lookup ### 5.1 NIP Lookup Flow ```mermaid sequenceDiagram actor User participant Route as /search route participant SearchSvc as SearchService participant DB as PostgreSQL User->>Route: GET /search?q=5882436505 Route->>SearchSvc: search("5882436505") SearchSvc->>SearchSvc: _is_nip("5882436505") Note over SearchSvc: Regex: ^\d{10}$ SearchSvc->>SearchSvc: Clean: remove spaces/hyphens SearchSvc->>DB: SELECT * FROM companies
WHERE nip = '5882436505'
AND status = 'active' alt Company found DB->>SearchSvc: Company object SearchSvc->>Route: [SearchResult(company, score=100, match_type='nip')] Route->>User: Display single company else Not found DB->>SearchSvc: NULL SearchSvc->>Route: [] Route->>User: "Brak wyników" end ``` **Implementation:** - **File:** `search_service.py` (lines 112-131) - **Input cleaning:** Strip spaces and hyphens (e.g., "588-243-65-05" → "5882436505") - **Validation:** Must be exactly 10 digits - **Score:** Always 100.0 (perfect match) - **Match type:** `'nip'` ### 5.2 REGON Lookup Flow ```mermaid sequenceDiagram actor User participant Route as /search route participant SearchSvc as SearchService participant DB as PostgreSQL User->>Route: GET /search?q=220825533 Route->>SearchSvc: search("220825533") SearchSvc->>SearchSvc: _is_regon("220825533") Note over SearchSvc: Regex: ^\d{9}$ OR ^\d{14}$ SearchSvc->>SearchSvc: Clean: remove spaces/hyphens SearchSvc->>DB: SELECT * FROM companies
WHERE regon = '220825533'
AND status = 'active' alt Company found DB->>SearchSvc: Company object SearchSvc->>Route: [SearchResult(company, score=100, match_type='regon')] Route->>User: Display single company else Not found DB->>SearchSvc: NULL SearchSvc->>Route: [] Route->>User: "Brak wyników" end ``` **Implementation:** - **File:** `search_service.py` (lines 117-142) - **Input cleaning:** Strip spaces and hyphens - **Validation:** Must be exactly 9 or 14 digits - **Score:** Always 100.0 (perfect match) - **Match type:** `'regon'` --- ## 6. User Search Interface ### 6.1 Search Route Flow ```mermaid sequenceDiagram actor User participant Browser participant Flask as Flask App
(app.py /search) participant SearchSvc as SearchService participant DB as PostgreSQL participant Template as search_results.html User->>Browser: Navigate to /search Browser->>Flask: GET /search?q=strony+www&category=1 Note over Flask: @login_required
User must be authenticated Flask->>Flask: Parse query params
q = "strony www"
category = 1 Flask->>SearchSvc: search_companies(db, "strony www", category_id=1, limit=50) SearchSvc->>SearchSvc: Execute search strategy
(NIP/REGON/FTS/Fallback) SearchSvc->>DB: Query companies DB->>SearchSvc: Results SearchSvc->>Flask: List[SearchResult] Flask->>Flask: Extract companies from results
companies = [r.company for r in results] Flask->>Flask: Log search analytics
logger.info(f"Search '{query}': {len} results, types: {match_types}") Flask->>Template: render_template('search_results.html',
companies=companies,
query=query,
category_id=category_id,
result_count=len) Template->>Browser: HTML response Browser->>User: Display search results ``` **Route Details:** - **Path:** `/search` - **Method:** GET - **Authentication:** Required (`@login_required`) - **File:** `app.py` (lines 718-748) **Query Parameters:** - `q` (string, optional) - Search query - `category` (integer, optional) - Category filter (category_id) **Response:** - **Template:** `search_results.html` - **Context Variables:** - `companies` - List of Company objects - `query` - Original search query - `category_id` - Selected category filter - `result_count` - Number of results **Analytics Logging:** ```python if query: match_types = {} for r in results: match_types[r.match_type] = match_types.get(r.match_type, 0) + 1 logger.info(f"Search '{query}': {len(companies)} results, types: {match_types}") ``` Example log output: ``` Search 'strony www': 12 results, types: {'fts': 8, 'fuzzy': 3, 'exact': 1} ``` --- ## 7. AI Chat Integration ### 7.1 AI Chat Search Flow ```mermaid sequenceDiagram actor User participant Chat as AI Chat Interface
/chat participant ChatSvc as NordaBizChatService
nordabiz_chat.py participant SearchSvc as SearchService participant DB as PostgreSQL participant Gemini as Google Gemini API User->>Chat: POST /chat/send
"Szukam firm do stron www" Chat->>ChatSvc: send_message(user_message, conversation_id) ChatSvc->>ChatSvc: _find_relevant_companies(db, message) Note over ChatSvc: Extract search keywords from message ChatSvc->>SearchSvc: search_companies(db, message, limit=10) Note over SearchSvc: Use same search strategies
(NIP/REGON/FTS/Fallback) SearchSvc->>DB: Query companies DB->>SearchSvc: Results SearchSvc->>ChatSvc: List[SearchResult] (max 10) ChatSvc->>ChatSvc: Extract companies from results
companies = [r.company for r in results] ChatSvc->>ChatSvc: _build_conversation_context(db, user, conversation, companies) Note over ChatSvc: Limit to 8 companies (prevent context overflow)
Include last 10 messages for history ChatSvc->>ChatSvc: _company_to_compact_dict(company) Note over ChatSvc: Compress company data
(name, desc, services, competencies, etc) ChatSvc->>Gemini: POST /generateContent
System prompt + context + user message Note over Gemini: Model: gemini-2.5-flash
Max tokens: 2048 Gemini->>ChatSvc: AI response text ChatSvc->>DB: Save conversation messages
(user message + AI response) ChatSvc->>DB: Track API costs
(gemini_cost_tracking) ChatSvc->>Chat: AI response with company recommendations Chat->>User: Display chat response ``` **Key Differences from User Search:** 1. **Result Limit:** 10 companies (vs 50 for user search) 2. **Company Limit to AI:** 8 companies max (prevents context overflow) 3. **Context Building:** Companies converted to compact JSON format 4. **Integration:** Seamless - AI doesn't know about search internals 5. **Message History:** Last 10 messages included in context **Implementation:** - **File:** `nordabiz_chat.py` (lines 383-405) - **Search Call:** ```python results = search_companies(db, message, limit=10) companies = [result.company for result in results] return companies ``` **Company Data Compression:** ```python compact = { 'name': company.name, 'cat': company.category.name, 'desc': company.description_short, 'history': company.founding_history, # Owners, founders 'svc': [service.name for service in company.services], 'comp': [competency.name for competency in company.competencies], 'web': company.website, 'tel': company.phone, 'mail': company.email, 'city': company.address_city, 'year': company.year_established, 'cert': [cert.name for cert in company.certifications[:3]] } ``` **AI System Prompt (includes search context):** ``` Jesteś asystentem bazy firm Norda Biznes z Wejherowa. Odpowiadaj zwięźle, konkretnie, po polsku. Oto firmy które mogą być istotne dla pytania użytkownika: {companies_json} Historia rozmowy: {recent_messages} Odpowiedz na pytanie użytkownika bazując na powyższych danych. ``` --- ## 8. Performance Considerations ### 8.1 Database Indexing **Required Indexes:** ```sql -- Full-text search index (PostgreSQL) CREATE INDEX idx_companies_search_vector ON companies USING gin(search_vector); -- NIP lookup index CREATE UNIQUE INDEX idx_companies_nip ON companies(nip) WHERE status = 'active'; -- REGON lookup index CREATE INDEX idx_companies_regon ON companies(regon) WHERE status = 'active'; -- Status filter index CREATE INDEX idx_companies_status ON companies(status); -- Category filter index CREATE INDEX idx_companies_category ON companies(category_id) WHERE status = 'active'; -- pg_trgm index for fuzzy matching (optional) CREATE INDEX idx_companies_name_trgm ON companies USING gin(name gin_trgm_ops); ``` ### 8.2 Search Vector Maintenance **Automatic Updates:** ```sql -- Trigger to update search_vector on INSERT/UPDATE CREATE TRIGGER companies_search_vector_update BEFORE INSERT OR UPDATE ON companies FOR EACH ROW EXECUTE FUNCTION tsvector_update_trigger( search_vector, 'pg_catalog.simple', name, description_short, description_full, founding_history ); ``` **Manual Rebuild:** ```sql -- Rebuild all search vectors UPDATE companies SET search_vector = setweight(to_tsvector('simple', COALESCE(name, '')), 'A') || setweight(to_tsvector('simple', COALESCE(description_short, '')), 'B') || setweight(to_tsvector('simple', COALESCE(description_full, '')), 'C') || setweight(to_tsvector('simple', COALESCE(founding_history, '')), 'B'); ``` ### 8.3 Query Performance **Performance Targets:** - **NIP/REGON lookup:** < 10ms (indexed) - **PostgreSQL FTS:** < 100ms (typical) - **SQLite fallback:** < 500ms (in-memory scoring) - **AI Chat search:** < 200ms (limit 10 results) **Optimization Strategies:** 1. **Early Exit:** NIP/REGON lookup bypasses full search 2. **Result Limiting:** Default 50 results (10 for AI chat) 3. **Category Filtering:** Reduces search space 4. **Synonym Pre-expansion:** Computed once, reused in all clauses 5. **Score-based Ordering:** Database-level sorting (not in-memory) ### 8.4 Fallback Performance **PostgreSQL → SQLite Fallback Triggers:** 1. FTS query returns 0 results 2. FTS query throws exception (syntax error, missing extension) 3. `pg_trgm` extension not available (degrades to FTS-only, not full fallback) **SQLite Fallback Cost:** - Fetches ALL active companies into memory - Scores each company in Python (slower than SQL) - Suitable for development/testing, not recommended for production with 100+ companies **Monitoring:** ```python # Logged in app.py when search executes logger.info(f"Search '{query}': {len(companies)} results, types: {match_types}") # Example outputs: # Search 'strony www': 12 results, types: {'fts': 8, 'fuzzy': 4} # Search '5882436505': 1 results, types: {'nip': 1} # Search 'PIXLAB': 1 results, types: {'exact': 1} ``` --- ## 9. Search Result Structure ### 9.1 SearchResult Dataclass **File:** `search_service.py` (lines 20-25) ```python @dataclass class SearchResult: """Search result with score and match info""" company: Company # Full Company SQLAlchemy object score: float # Relevance score (0.0-100.0) match_type: str # Match type identifier ``` **Match Types:** | Match Type | Description | Score Range | |------------|-------------|-------------| | `'nip'` | Direct NIP match | 100.0 (fixed) | | `'regon'` | Direct REGON match | 100.0 (fixed) | | `'exact'` | Exact name match (SQLite) | Variable (usually high) | | `'fts'` | PostgreSQL full-text search | 0.0-100.0 (normalized ts_rank) | | `'fuzzy'` | PostgreSQL fuzzy similarity | 0.0-100.0 (normalized similarity) | | `'history'` | Founding history match | 50.0 (fixed bonus) | | `'keyword'` | SQLite keyword scoring | Variable (weighted sum) | | `'all'` | All companies (no filter) | 0.0 (no relevance) | ### 9.2 Score Normalization **PostgreSQL FTS Scores:** ```python # ts_rank returns 0.0-1.0, normalize to 0-100 fts_score = ts_rank(...) * 100 # similarity returns 0.0-1.0, normalize to 0-100 fuzzy_score = similarity(...) * 100 # history match is fixed bonus history_score = 0.5 * 100 = 50.0 ``` **SQLite Keyword Scores:** ```python # Sum of all matching field weights score = ( 10 # name match + 5 # exact match bonus + 5 # description_short + 8 # services + 7 # competencies + 3 # city + 12 # founding_history + 4 # description_full ) # Maximum possible: 54 points # Typical: 10-30 points ``` --- ## 10. Error Handling & Edge Cases ### 10.1 PostgreSQL FTS Error Handling **Error Scenarios:** 1. **Invalid tsquery syntax** - Fallback to SQLite 2. **pg_trgm extension missing** - Degrade to FTS-only (no fuzzy) 3. **search_vector column missing** - Exception, fallback to SQLite 4. **Database connection error** - Propagate exception to route **Implementation:** ```python try: result = self.db.execute(sql, params) rows = result.fetchall() # ... process results except Exception as e: print(f"PostgreSQL FTS error: {e}, falling back to keyword search") self.db.rollback() # CRITICAL: prevent InFailedSqlTransaction return self._search_sqlite_fallback(query, category_id, limit) ``` **Critical:** `db.rollback()` is essential before fallback to prevent transaction state errors. ### 10.2 Empty Results Handling **No Results Scenarios:** 1. **NIP/REGON not found** - Return empty list `[]` 2. **FTS returns 0 matches** - Automatic fallback to SQLite scoring 3. **SQLite scoring returns 0 matches** - Return empty list `[]` 4. **Empty query** - Return all active companies (ordered by name) **User Interface:** ```html {% if result_count == 0 %}
Brak wyników dla zapytania "{{ query }}". Spróbuj innych słów kluczowych lub usuń filtry.
{% endif %} ``` ### 10.3 Special Characters & Sanitization **Query Cleaning:** ```python query = query.strip() # Remove leading/trailing whitespace clean_nip = re.sub(r'[\s\-]', '', query) # Remove spaces and hyphens from NIP/REGON ``` **SQL Injection Prevention:** - All queries use SQLAlchemy parameter binding (`:param` syntax) - No raw string concatenation in SQL - ILIKE patterns are passed as array parameters **XSS Prevention:** - All user input sanitized before display (handled by Jinja2 auto-escaping) - Query string displayed in template: `{{ query }}` (auto-escaped) --- ## 11. Testing & Verification ### 11.1 Test Queries **NIP Lookup:** ``` Query: "5882436505" Expected: PIXLAB Sp. z o.o. (single result, score=100, match_type='nip') ``` **REGON Lookup:** ``` Query: "220825533" Expected: Single company with matching REGON (score=100, match_type='regon') ``` **Keyword Search (PostgreSQL FTS):** ``` Query: "strony internetowe" Expected: Multiple results (IT/Web companies, match_type='fts' or 'fuzzy') Keywords expanded to: [strony, internetowe, www, web, internet, witryny, seo, ...] ``` **Exact Name Match:** ``` Query: "PIXLAB" Expected: PIXLAB at top (high score, match_type='exact' or 'fts') ``` **Owner/Founder Search:** ``` Query: "Jan Kowalski" (example founder name) Expected: Companies where Jan Kowalski appears in founding_history Match type: 'history' or high score from founding_history match ``` **Category Filter:** ``` Query: "strony" + category=1 (IT) Expected: Only IT category companies matching "strony" ``` **Empty Query:** ``` Query: "" Expected: All active companies, alphabetically sorted ``` ### 11.2 Performance Testing **Load Testing Scenarios:** ```python # Test 1: Direct lookup performance for nip in all_nips: results = search_companies(db, nip) assert len(results) == 1 assert results[0].match_type == 'nip' # Test 2: Full-text search performance queries = ["strony", "budowa", "księgowość", "metal", "transport"] for query in queries: start = time.time() results = search_companies(db, query) elapsed = time.time() - start assert elapsed < 0.1 # < 100ms print(f"{query}: {len(results)} results in {elapsed*1000:.1f}ms") # Test 3: Fallback trigger test (simulate FTS failure) # Force SQLite fallback by using invalid tsquery syntax results = search_companies(db, "test:query|with:invalid&syntax") # Should not crash, should return results via fallback ``` ### 11.3 Search Quality Metrics **Relevance Testing:** ```python test_cases = [ { 'query': 'strony www', 'expected_top_3': ['PIXLAB', 'Web Agency', 'IT Solutions'], 'min_results': 5 }, { 'query': 'budownictwo', 'expected_categories': ['Construction'], 'min_results': 3 }, # ... more test cases ] for test in test_cases: results = search_companies(db, test['query']) assert len(results) >= test['min_results'] # Check if expected companies appear in top results top_names = [r.company.name for r in results[:3]] for expected in test['expected_top_3']: assert expected in top_names ``` --- ## 12. Maintenance & Monitoring ### 12.1 Database Maintenance **Weekly Tasks:** ```sql -- Rebuild search vectors (if data quality issues) UPDATE companies SET search_vector = setweight(to_tsvector('simple', COALESCE(name, '')), 'A') || setweight(to_tsvector('simple', COALESCE(description_short, '')), 'B') || setweight(to_tsvector('simple', COALESCE(description_full, '')), 'C') || setweight(to_tsvector('simple', COALESCE(founding_history, '')), 'B') WHERE updated_at > NOW() - INTERVAL '7 days'; -- Verify index health SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes WHERE tablename = 'companies' ORDER BY idx_scan DESC; -- Check for missing indexes SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'companies'; ``` **Monthly Tasks:** ```sql -- Vacuum and analyze for performance VACUUM ANALYZE companies; -- Check for slow queries SELECT query, mean_exec_time, calls FROM pg_stat_statements WHERE query LIKE '%companies%search_vector%' ORDER BY mean_exec_time DESC LIMIT 10; ``` ### 12.2 Search Analytics **Logging Search Patterns:** ```python # Already implemented in app.py /search route logger.info(f"Search '{query}': {len(companies)} results, types: {match_types}") ``` **Analytics Queries:** ```sql -- Top search queries (requires search_logs table - not yet implemented) SELECT query, COUNT(*) as frequency FROM search_logs WHERE created_at > NOW() - INTERVAL '30 days' GROUP BY query ORDER BY frequency DESC LIMIT 20; -- Zero-result searches (requires logging) SELECT query, COUNT(*) as frequency FROM search_logs WHERE result_count = 0 AND created_at > NOW() - INTERVAL '30 days' GROUP BY query ORDER BY frequency DESC LIMIT 10; ``` ### 12.3 Synonym Expansion Tuning **Adding New Synonyms:** ```python # Edit search_service.py KEYWORD_SYNONYMS dictionary KEYWORD_SYNONYMS = { # Add new industry-specific terms 'cyberbezpieczeństwo': ['security', 'ochrona', 'firewall', 'antywirus'], # ... more synonyms } ``` **Synonym Effectiveness Testing:** ```python # Test query with and without synonym expansion query = "cyberbezpieczeństwo" # With expansion results_with = search_companies(db, query) print(f"With synonyms: {len(results_with)} results") # Without expansion (mock) # ... compare recall/precision ``` --- ## 13. Future Enhancements ### 13.1 Planned Improvements 1. **Search Result Ranking ML Model** - Learn from user click-through rates - Personalized ranking based on user preferences - A/B testing of ranking algorithms 2. **Search Autocomplete** - Suggest company names as user types - Suggest common search queries - Category-based suggestions 3. **Advanced Filters** - Location-based search (radius from city) - Certification filters (ISO, other) - Founding year range - Employee count range (if available) 4. **Search Analytics Dashboard** - Top queries (daily/weekly/monthly) - Zero-result queries (opportunities for content) - Average result count per query - Match type distribution - Click-through rates by position 5. **Semantic Search** - Integrate sentence embeddings (sentence-transformers) - Vector similarity search for related companies - "More like this" company recommendations 6. **Multi-language Support** - English query translation - German query support (for border region) - Auto-detect query language ### 13.2 Performance Optimization Ideas 1. **Query Result Caching** - Redis cache for common queries (TTL 5 minutes) - Cache key: `search:{query}:{category_id}` - Invalidate on company data updates 2. **Partial Index Optimization** ```sql -- Index only active companies CREATE INDEX idx_companies_active_search ON companies USING gin(search_vector) WHERE status = 'active'; ``` 3. **Materialized View for Search** ```sql -- Pre-compute search data CREATE MATERIALIZED VIEW search_companies_mv AS SELECT id, name, search_vector, category_id, status, ... FROM companies WHERE status = 'active'; -- Refresh daily REFRESH MATERIALIZED VIEW search_companies_mv; ``` 4. **Connection Pooling** - Already implemented via SQLAlchemy - Monitor pool size and overflow - Adjust pool_size/max_overflow if needed --- ## 14. Related Documentation - **[Flask Application Structure](../analysis/flask-application-structure.md)** - Complete route reference - **[Database Schema](./05-database-schema.md)** - Company model and indexes - **[External Integrations](./06-external-integrations.md)** - AI Chat integration details - **[AI Chat Flow](./03-ai-chat-flow.md)** - How AI uses search service (to be created) --- ## 15. Glossary | Term | Description | |------|-------------| | **FTS** | Full-Text Search - PostgreSQL text search engine using tsvector | | **tsvector** | PostgreSQL data type for full-text search, stores preprocessed text | | **tsquery** | PostgreSQL query syntax for full-text search (e.g., "word1 \| word2") | | **ts_rank** | PostgreSQL function to score FTS relevance (0.0-1.0) | | **pg_trgm** | PostgreSQL extension for trigram-based fuzzy string matching | | **similarity()** | pg_trgm function to measure string similarity (0.0-1.0) | | **Synonym Expansion** | Expanding user query with related keywords (e.g., "strony" → "www, web, internet") | | **SearchResult** | Dataclass containing Company, score, and match_type | | **Match Type** | Identifier for how company was matched (nip, regon, fts, fuzzy, keyword, etc.) | | **NIP** | Polish tax identification number (10 digits) | | **REGON** | Polish business registry number (9 or 14 digits) | | **Fallback** | Alternative search method when primary method fails (PostgreSQL FTS → SQLite keyword scoring) | | **SearchService** | Unified search service class (search_service.py) | | **Keyword Scoring** | In-memory scoring algorithm for SQLite fallback | --- ## Document Metadata **Created:** 2026-01-10 **Author:** Architecture Documentation (auto-claude) **Related Files:** - `search_service.py` (main implementation) - `app.py` (lines 718-748, /search route) - `nordabiz_chat.py` (lines 383-405, AI integration) - `database.py` (Company model) **Version History:** - v1.0 (2026-01-10) - Initial documentation --- **End of Document**