nordabiz/docs/architecture/flows/02-search-flow.md

# Company Search Flow

**Document Version:** 1.0
**Last Updated:** 2026-01-10
**Status:** Production LIVE
**Flow Type:** Company Search & Discovery

---

## Overview

This document describes the **complete company search flow** for the Norda Biznes Partner application, covering:

- **User Search Interface** (`/search` route)
- **Search Service Architecture** (unified search with multiple strategies)
- **AI Chat Integration** (context-aware company discovery)
- **Search Strategies:**
  - NIP/REGON direct lookup
  - Synonym expansion
  - PostgreSQL Full-Text Search (FTS)
  - Fuzzy matching (pg_trgm)
  - SQLite keyword scoring fallback

**Key Technology:**
- **Search Engine:** Custom unified SearchService
- **Database:** PostgreSQL FTS with tsvector indexing
- **Fuzzy Matching:** pg_trgm extension for typo tolerance
- **Synonym Expansion:** Domain-specific keyword mappings
- **AI Integration:** Used by NordaBiz Chat for context building

**Performance Features:**
- Direct identifier lookup (NIP/REGON) bypasses full search
- Database-level full-text search indexing
- Synonym expansion increases recall
- Configurable result limits (default 50)
- Fallback mechanisms for SQLite compatibility

---

## 1. Search Flow Overview

### 1.1 High-Level Architecture

```mermaid
flowchart TD
    User[User] -->|1. Search query| UI[Search UI<br/>/search route]
    AIUser[AI Chat User] -->|1. Natural language| Chat[AI Chat<br/>/chat route]

    UI -->|2. Call| SearchSvc[Search Service<br/>search_service.py]
    Chat -->|2. Find companies| SearchSvc

    SearchSvc -->|3. Detect query type| QueryType{Query Type?}

    QueryType -->|NIP: 10 digits| NIPLookup[NIP Direct Lookup]
    QueryType -->|REGON: 9/14 digits| REGONLookup[REGON Direct Lookup]
    QueryType -->|Text query| DBCheck{Database<br/>Type?}

    DBCheck -->|PostgreSQL| PGFTS[PostgreSQL FTS<br/>+ Fuzzy Match]
    DBCheck -->|SQLite| SQLiteFallback[SQLite Keyword<br/>Scoring]

    NIPLookup -->|4. Query DB| DB[(PostgreSQL<br/>companies)]
    REGONLookup -->|4. Query DB| DB
    PGFTS -->|4. FTS query| DB
    SQLiteFallback -->|4. LIKE query| DB

    DB -->|5. Results| SearchSvc
    SearchSvc -->|6. SearchResult[]| UI
    SearchSvc -->|6. Company[]| Chat

    UI -->|7. Render| SearchResults[search_results.html]
    Chat -->|7. Build context| AIContext[AI Context Builder]

    SearchResults -->|8. Display| User
    AIContext -->|8. Generate response| AIUser

    style SearchSvc fill:#4CAF50
    style PGFTS fill:#2196F3
    style DB fill:#FF9800
    style NIPLookup fill:#9C27B0
    style REGONLookup fill:#9C27B0
```

---

## 2. Search Strategies

### 2.1 Strategy Selection Algorithm

```mermaid
flowchart TD
    Start([User Query]) --> Clean[Strip whitespace]
    Clean --> Empty{Empty<br/>query?}

    Empty -->|Yes| AllCompanies[Return all companies<br/>ORDER BY name]
    Empty -->|No| NIPCheck{Is NIP?<br/>10 digits}

    NIPCheck -->|Yes| NIPSearch[Direct NIP lookup<br/>WHERE nip = ?]
    NIPCheck -->|No| REGONCheck{Is REGON?<br/>9 or 14 digits}

    REGONCheck -->|Yes| REGONSearch[Direct REGON lookup<br/>WHERE regon = ?]
    REGONCheck -->|No| DBType{Database<br/>Type?}

    DBType -->|PostgreSQL| PGFlow[PostgreSQL FTS Flow]
    DBType -->|SQLite| SQLiteFlow[SQLite Keyword Flow]

    NIPSearch --> Found{Found?}
    REGONSearch --> Found

    Found -->|Yes| ReturnSingle[Return single result<br/>score=100, match_type='nip/regon']
    Found -->|No| ReturnEmpty[Return empty list]

    PGFlow --> PGSynonym[Expand synonyms]
    PGSynonym --> PGExtCheck{pg_trgm<br/>available?}

    PGExtCheck -->|Yes| FTS_Fuzzy[FTS + Fuzzy search<br/>ts_rank + similarity]
    PGExtCheck -->|No| FTS_Only[FTS only<br/>ts_rank]

    FTS_Fuzzy --> PGResults{Results?}
    FTS_Only --> PGResults

    PGResults -->|Yes| ReturnScored[Return scored results<br/>ORDER BY score DESC]
    PGResults -->|No| Fallback[Execute SQLite fallback]

    SQLiteFlow --> SQLiteSynonym[Expand synonyms]
    SQLiteSynonym --> Fallback

    Fallback --> InMemory[In-memory keyword scoring]
    InMemory --> ReturnScored

    ReturnSingle --> End([SearchResult[]])
    ReturnEmpty --> End
    ReturnScored --> End
    AllCompanies --> End

    style NIPSearch fill:#9C27B0
    style REGONSearch fill:#9C27B0
    style FTS_Fuzzy fill:#2196F3
    style FTS_Only fill:#2196F3
    style InMemory fill:#FF9800
```

### 2.2 Synonym Expansion

**Purpose:** Increase search recall by expanding user queries with domain-specific synonyms

**Examples:**
```python
KEYWORD_SYNONYMS = {
    # IT / Web
    'strony': ['www', 'web', 'internet', 'witryny', 'seo', 'e-commerce', 'sklep', 'portal'],
    'aplikacje': ['software', 'programowanie', 'systemy', 'crm', 'erp', 'app'],
    'it': ['informatyka', 'komputery', 'software', 'systemy', 'serwis'],

    # Construction
    'budowa': ['budownictwo', 'konstrukcje', 'remonty', 'wykończenia', 'dach', 'elewacja'],
    'remont': ['wykończenie', 'naprawa', 'renowacja', 'modernizacja'],

    # Services
    'księgowość': ['rachunkowość', 'finanse', 'podatki', 'biuro rachunkowe', 'kadry'],
    'prawo': ['prawnik', 'adwokat', 'radca', 'kancelaria', 'notariusz'],

    # Production
    'metal': ['stal', 'obróbka', 'spawanie', 'cnc', 'ślusarstwo'],
    'drewno': ['stolarka', 'meble', 'tartak', 'carpentry'],
}
```

**Algorithm:**
1. Tokenize user query (split on whitespace, strip punctuation)
2. For each word:
   - Direct lookup in KEYWORD_SYNONYMS keys
   - Check if word appears in any synonym list
   - Add matching synonyms to expanded query
3. Return unique set of keywords

**Example Expansion:**
```
Input:  "strony internetowe"
Output: ['strony', 'internetowe', 'www', 'web', 'internet', 'witryny',
         'seo', 'e-commerce', 'ecommerce', 'sklep', 'portal', 'online',
         'cyfrowe', 'marketing']
```

---

## 3. PostgreSQL Full-Text Search (FTS)

### 3.1 FTS Search Sequence

```mermaid
sequenceDiagram
    actor User
    participant Route as Flask Route<br/>/search
    participant SearchSvc as SearchService
    participant PG as PostgreSQL
    participant FTS as Full-Text Engine<br/>(tsvector)
    participant Trgm as pg_trgm Extension<br/>(fuzzy matching)

    User->>Route: GET /search?q=strony www
    Route->>SearchSvc: search("strony www", limit=50)

    Note over SearchSvc: Detect PostgreSQL database
    SearchSvc->>SearchSvc: _expand_keywords("strony www")
    Note over SearchSvc: Expanded: [strony, www, web, internet,<br/>witryny, seo, e-commerce, ...]

    SearchSvc->>SearchSvc: Build tsquery: "strony:* | www:* | web:* | ..."
    SearchSvc->>SearchSvc: Build ILIKE patterns: [%strony%, %www%, %web%, ...]

    SearchSvc->>PG: Check pg_trgm extension available
    PG->>SearchSvc: Extension exists

    SearchSvc->>PG: Execute FTS + Fuzzy query
    Note over PG: SELECT c.id,<br/>ts_rank(search_vector, tsquery) as fts_score,<br/>similarity(name, query) as fuzzy_score,<br/>CASE WHEN founding_history ILIKE ...<br/>FROM companies c<br/>WHERE search_vector @@ tsquery<br/>OR similarity(name, query) > 0.2<br/>OR name/description ILIKE patterns

    PG->>FTS: Match against search_vector
    FTS->>PG: FTS matches with ts_rank scores

    PG->>Trgm: Calculate similarity(name, query)
    Trgm->>PG: Fuzzy match scores (0.0-1.0)

    PG->>SearchSvc: Result rows: [(id, fts_score, fuzzy_score, history_score), ...]

    SearchSvc->>PG: Fetch full Company objects<br/>WHERE id IN (...)
    PG->>SearchSvc: Company objects

    SearchSvc->>SearchSvc: Determine match_type (fts/fuzzy/history)
    SearchSvc->>SearchSvc: Normalize scores (0-100)

    SearchSvc->>Route: SearchResult[] with companies, scores, match_types
    Route->>User: Render search_results.html
```

### 3.2 PostgreSQL FTS Implementation

**File:** `search_service.py` (lines 251-378)

**Database Requirements:**
- **Extension:** `pg_trgm` (optional, enables fuzzy matching)
- **Column:** `companies.search_vector` (tsvector, indexed)
- **Index:** GIN index on `search_vector` for fast full-text search

**SQL Query Structure (with pg_trgm):**
```sql
SELECT c.id,
    COALESCE(ts_rank(c.search_vector, to_tsquery('simple', :tsquery)), 0) as fts_score,
    COALESCE(similarity(c.name, :query), 0) as fuzzy_score,
    CASE WHEN c.founding_history ILIKE ANY(:like_patterns) THEN 0.5 ELSE 0 END as history_score
FROM companies c
WHERE c.status = 'active'
AND (
    c.search_vector @@ to_tsquery('simple', :tsquery)          -- FTS match
    OR similarity(c.name, :query) > 0.2                        -- Fuzzy name match
    OR c.name ILIKE ANY(:like_patterns)                        -- Keyword in name
    OR c.description_short ILIKE ANY(:like_patterns)           -- Keyword in description
    OR c.founding_history ILIKE ANY(:like_patterns)            -- Keyword in owners/founders
    OR c.description_full ILIKE ANY(:like_patterns)            -- Keyword in full text
)
ORDER BY GREATEST(
    COALESCE(ts_rank(c.search_vector, to_tsquery('simple', :tsquery)), 0),
    COALESCE(similarity(c.name, :query), 0),
    CASE WHEN c.founding_history ILIKE ANY(:like_patterns) THEN 0.5 ELSE 0 END
) DESC
LIMIT :limit
```

**Parameters:**
- `:tsquery` - Expanded keywords joined with `|` (OR), each with `:*` prefix matching
  - Example: `"strony:* | www:* | web:* | internet:*"`
- `:query` - Original user query for fuzzy matching
- `:like_patterns` - Array of ILIKE patterns for direct keyword matches
  - Example: `['%strony%', '%www%', '%web%']`
- `:limit` - Maximum results (default 50)

**Scoring Strategy:**
1. **FTS Score:** `ts_rank()` measures how well document matches query (0.0-1.0)
2. **Fuzzy Score:** `similarity()` from pg_trgm measures string similarity (0.0-1.0)
3. **History Score:** Fixed 0.5 bonus if founders/owners match (important for people search)
4. **Final Score:** `GREATEST()` of all three scores, normalized to 0-100 scale

**Match Types:**
- `'fts'` - Full-text search match (highest ts_rank)
- `'fuzzy'` - Fuzzy string similarity match (highest similarity)
- `'history'` - Founding history match (owner/founder keywords)

**Fallback Behavior:**
- If `pg_trgm` extension not available → Uses FTS only (no fuzzy matching)
- If FTS returns 0 results → Falls back to SQLite keyword scoring
- If FTS query fails (exception) → Rollback transaction, use SQLite fallback

---

## 4. SQLite Keyword Scoring Fallback

### 4.1 Fallback Sequence

```mermaid
sequenceDiagram
    participant SearchSvc as SearchService
    participant DB as Database
    participant Scorer as Keyword Scorer<br/>(in-memory)

    SearchSvc->>SearchSvc: _expand_keywords(query)
    Note over SearchSvc: Keywords: [strony, www, web, ...]

    SearchSvc->>DB: SELECT * FROM companies<br/>WHERE status = 'active'
    DB->>SearchSvc: All active companies (in-memory)

    loop For each company
        SearchSvc->>Scorer: Calculate score

        Note over Scorer: Name match: +10<br/>(+5 bonus for exact match)
        Note over Scorer: Description short: +5
        Note over Scorer: Services: +8
        Note over Scorer: Competencies: +7
        Note over Scorer: City: +3
        Note over Scorer: Founding history: +12<br/>(owners/founders)
        Note over Scorer: Description full: +4

        Scorer->>SearchSvc: Total score (0+)
    end

    SearchSvc->>SearchSvc: Filter companies (score > 0)
    SearchSvc->>SearchSvc: Sort by score DESC
    SearchSvc->>SearchSvc: Limit results

    SearchSvc->>SearchSvc: Build SearchResult[]<br/>with scores and match_types
```

### 4.2 Keyword Scoring Algorithm

**File:** `search_service.py` (lines 162-249)

**Scoring Weights:**
```python
{
    'name_match': 10,           # Company name contains keyword
    'exact_name_match': +5,     # Exact query appears in name (bonus)
    'description_short': 5,     # Short description contains keyword
    'services': 8,              # Service tag matches
    'competencies': 7,          # Competency tag matches
    'city': 3,                  # City/location matches
    'founding_history': 12,     # Owners/founders match (highest weight)
    'description_full': 4,      # Full description contains keyword
}
```

**Algorithm:**
1. Fetch all active companies from database
2. For each company, calculate score:
   ```python
   score = 0
   match_type = 'keyword'

   # Name match (highest weight)
   if any(keyword in company.name.lower() for keyword in keywords):
       score += 10
       if original_query.lower() in company.name.lower():
           score += 5  # Exact match bonus
           match_type = 'exact'

   # Description match
   if any(keyword in company.description_short.lower() for keyword in keywords):
       score += 5

   # Services match
   if any(keyword in service.name.lower() for service in company.services for keyword in keywords):
       score += 8

   # Competencies match
   if any(keyword in competency.name.lower() for competency in company.competencies for keyword in keywords):
       score += 7

   # City match
   if any(keyword in company.city.lower() for keyword in keywords):
       score += 3

   # Founding history match (owners, founders)
   if any(keyword in company.founding_history.lower() for keyword in keywords):
       score += 12

   # Full description match
   if any(keyword in company.description_full.lower() for keyword in keywords):
       score += 4
   ```

3. Filter companies with score > 0
4. Sort by score descending
5. Limit to requested result count
6. Return as `SearchResult[]` with scores and match types

**Match Types:**
- `'exact'` - Original query appears exactly in company name
- `'keyword'` - One or more expanded keywords matched

---

## 5. Direct Identifier Lookup

### 5.1 NIP Lookup Flow

```mermaid
sequenceDiagram
    actor User
    participant Route as /search route
    participant SearchSvc as SearchService
    participant DB as PostgreSQL

    User->>Route: GET /search?q=5882436505
    Route->>SearchSvc: search("5882436505")

    SearchSvc->>SearchSvc: _is_nip("5882436505")
    Note over SearchSvc: Regex: ^\d{10}$
    SearchSvc->>SearchSvc: Clean: remove spaces/hyphens

    SearchSvc->>DB: SELECT * FROM companies<br/>WHERE nip = '5882436505'<br/>AND status = 'active'

    alt Company found
        DB->>SearchSvc: Company object
        SearchSvc->>Route: [SearchResult(company, score=100, match_type='nip')]
        Route->>User: Display single company
    else Not found
        DB->>SearchSvc: NULL
        SearchSvc->>Route: []
        Route->>User: "Brak wyników"
    end
```

**Implementation:**
- **File:** `search_service.py` (lines 112-131)
- **Input cleaning:** Strip spaces and hyphens (e.g., "588-243-65-05" → "5882436505")
- **Validation:** Must be exactly 10 digits
- **Score:** Always 100.0 (perfect match)
- **Match type:** `'nip'`

### 5.2 REGON Lookup Flow

```mermaid
sequenceDiagram
    actor User
    participant Route as /search route
    participant SearchSvc as SearchService
    participant DB as PostgreSQL

    User->>Route: GET /search?q=220825533
    Route->>SearchSvc: search("220825533")

    SearchSvc->>SearchSvc: _is_regon("220825533")
    Note over SearchSvc: Regex: ^\d{9}$ OR ^\d{14}$
    SearchSvc->>SearchSvc: Clean: remove spaces/hyphens

    SearchSvc->>DB: SELECT * FROM companies<br/>WHERE regon = '220825533'<br/>AND status = 'active'

    alt Company found
        DB->>SearchSvc: Company object
        SearchSvc->>Route: [SearchResult(company, score=100, match_type='regon')]
        Route->>User: Display single company
    else Not found
        DB->>SearchSvc: NULL
        SearchSvc->>Route: []
        Route->>User: "Brak wyników"
    end
```

**Implementation:**
- **File:** `search_service.py` (lines 117-142)
- **Input cleaning:** Strip spaces and hyphens
- **Validation:** Must be exactly 9 or 14 digits
- **Score:** Always 100.0 (perfect match)
- **Match type:** `'regon'`

---

## 6. User Search Interface

### 6.1 Search Route Flow

```mermaid
sequenceDiagram
    actor User
    participant Browser
    participant Flask as Flask App<br/>(app.py /search)
    participant SearchSvc as SearchService
    participant DB as PostgreSQL
    participant Template as search_results.html

    User->>Browser: Navigate to /search
    Browser->>Flask: GET /search?q=strony+www&category=1

    Note over Flask: @login_required<br/>User must be authenticated

    Flask->>Flask: Parse query params<br/>q = "strony www"<br/>category = 1

    Flask->>SearchSvc: search_companies(db, "strony www", category_id=1, limit=50)
    SearchSvc->>SearchSvc: Execute search strategy<br/>(NIP/REGON/FTS/Fallback)
    SearchSvc->>DB: Query companies
    DB->>SearchSvc: Results
    SearchSvc->>Flask: List[SearchResult]

    Flask->>Flask: Extract companies from results<br/>companies = [r.company for r in results]

    Flask->>Flask: Log search analytics<br/>logger.info(f"Search '{query}': {len} results, types: {match_types}")

    Flask->>Template: render_template('search_results.html',<br/>companies=companies,<br/>query=query,<br/>category_id=category_id,<br/>result_count=len)

    Template->>Browser: HTML response
    Browser->>User: Display search results
```

**Route Details:**
- **Path:** `/search`
- **Method:** GET
- **Authentication:** Required (`@login_required`)
- **File:** `app.py` (lines 718-748)

**Query Parameters:**
- `q` (string, optional) - Search query
- `category` (integer, optional) - Category filter (category_id)

**Response:**
- **Template:** `search_results.html`
- **Context Variables:**
  - `companies` - List of Company objects
  - `query` - Original search query
  - `category_id` - Selected category filter
  - `result_count` - Number of results

**Analytics Logging:**
```python
if query:
    match_types = {}
    for r in results:
        match_types[r.match_type] = match_types.get(r.match_type, 0) + 1
    logger.info(f"Search '{query}': {len(companies)} results, types: {match_types}")
```

Example log output:
```
Search 'strony www': 12 results, types: {'fts': 8, 'fuzzy': 3, 'exact': 1}
```

---

## 7. AI Chat Integration

### 7.1 AI Chat Search Flow

```mermaid
sequenceDiagram
    actor User
    participant Chat as AI Chat Interface<br/>/chat
    participant ChatSvc as NordaBizChatService<br/>nordabiz_chat.py
    participant SearchSvc as SearchService
    participant DB as PostgreSQL
    participant Gemini as Google Gemini API

    User->>Chat: POST /chat/send<br/>"Szukam firm do stron www"
    Chat->>ChatSvc: send_message(user_message, conversation_id)

    ChatSvc->>ChatSvc: _find_relevant_companies(db, message)
    Note over ChatSvc: Extract search keywords from message

    ChatSvc->>SearchSvc: search_companies(db, message, limit=10)
    Note over SearchSvc: Use same search strategies<br/>(NIP/REGON/FTS/Fallback)

    SearchSvc->>DB: Query companies
    DB->>SearchSvc: Results
    SearchSvc->>ChatSvc: List[SearchResult] (max 10)

    ChatSvc->>ChatSvc: Extract companies from results<br/>companies = [r.company for r in results]

    ChatSvc->>ChatSvc: _build_conversation_context(db, user, conversation, companies)
    Note over ChatSvc: Limit to 8 companies (prevent context overflow)<br/>Include last 10 messages for history

    ChatSvc->>ChatSvc: _company_to_compact_dict(company)
    Note over ChatSvc: Compress company data<br/>(name, desc, services, competencies, etc)

    ChatSvc->>Gemini: POST /generateContent<br/>System prompt + context + user message
    Note over Gemini: Model: gemini-2.5-flash<br/>Max tokens: 2048

    Gemini->>ChatSvc: AI response text

    ChatSvc->>DB: Save conversation messages<br/>(user message + AI response)
    ChatSvc->>DB: Track API costs<br/>(gemini_cost_tracking)

    ChatSvc->>Chat: AI response with company recommendations
    Chat->>User: Display chat response
```

**Key Differences from User Search:**
1. **Result Limit:** 10 companies (vs 50 for user search)
2. **Company Limit to AI:** 8 companies max (prevents context overflow)
3. **Context Building:** Companies converted to compact JSON format
4. **Integration:** Seamless - AI doesn't know about search internals
5. **Message History:** Last 10 messages included in context

**Implementation:**
- **File:** `nordabiz_chat.py` (lines 383-405)
- **Search Call:**
  ```python
  results = search_companies(db, message, limit=10)
  companies = [result.company for result in results]
  return companies
  ```

**Company Data Compression:**
```python
compact = {
    'name': company.name,
    'cat': company.category.name,
    'desc': company.description_short,
    'history': company.founding_history,  # Owners, founders
    'svc': [service.name for service in company.services],
    'comp': [competency.name for competency in company.competencies],
    'web': company.website,
    'tel': company.phone,
    'mail': company.email,
    'city': company.address_city,
    'year': company.year_established,
    'cert': [cert.name for cert in company.certifications[:3]]
}
```

**AI System Prompt (includes search context):**
```
Jesteś asystentem bazy firm Norda Biznes z Wejherowa.
Odpowiadaj zwięźle, konkretnie, po polsku.

Oto firmy które mogą być istotne dla pytania użytkownika:
{companies_json}

Historia rozmowy:
{recent_messages}

Odpowiedz na pytanie użytkownika bazując na powyższych danych.
```

---

## 8. Performance Considerations

### 8.1 Database Indexing

**Required Indexes:**
```sql
-- Full-text search index (PostgreSQL)
CREATE INDEX idx_companies_search_vector ON companies USING gin(search_vector);

-- NIP lookup index
CREATE UNIQUE INDEX idx_companies_nip ON companies(nip) WHERE status = 'active';

-- REGON lookup index
CREATE INDEX idx_companies_regon ON companies(regon) WHERE status = 'active';

-- Status filter index
CREATE INDEX idx_companies_status ON companies(status);

-- Category filter index
CREATE INDEX idx_companies_category ON companies(category_id) WHERE status = 'active';

-- pg_trgm index for fuzzy matching (optional)
CREATE INDEX idx_companies_name_trgm ON companies USING gin(name gin_trgm_ops);
```

### 8.2 Search Vector Maintenance

**Automatic Updates:**
```sql
-- Trigger to update search_vector on INSERT/UPDATE
CREATE TRIGGER companies_search_vector_update
BEFORE INSERT OR UPDATE ON companies
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(
    search_vector, 'pg_catalog.simple',
    name, description_short, description_full, founding_history
);
```

**Manual Rebuild:**
```sql
-- Rebuild all search vectors
UPDATE companies SET search_vector =
    setweight(to_tsvector('simple', COALESCE(name, '')), 'A') ||
    setweight(to_tsvector('simple', COALESCE(description_short, '')), 'B') ||
    setweight(to_tsvector('simple', COALESCE(description_full, '')), 'C') ||
    setweight(to_tsvector('simple', COALESCE(founding_history, '')), 'B');
```

### 8.3 Query Performance

**Performance Targets:**
- **NIP/REGON lookup:** < 10ms (indexed)
- **PostgreSQL FTS:** < 100ms (typical)
- **SQLite fallback:** < 500ms (in-memory scoring)
- **AI Chat search:** < 200ms (limit 10 results)

**Optimization Strategies:**
1. **Early Exit:** NIP/REGON lookup bypasses full search
2. **Result Limiting:** Default 50 results (10 for AI chat)
3. **Category Filtering:** Reduces search space
4. **Synonym Pre-expansion:** Computed once, reused in all clauses
5. **Score-based Ordering:** Database-level sorting (not in-memory)

### 8.4 Fallback Performance

**PostgreSQL → SQLite Fallback Triggers:**
1. FTS query returns 0 results
2. FTS query throws exception (syntax error, missing extension)
3. `pg_trgm` extension not available (degrades to FTS-only, not full fallback)

**SQLite Fallback Cost:**
- Fetches ALL active companies into memory
- Scores each company in Python (slower than SQL)
- Suitable for development/testing, not recommended for production with 100+ companies

**Monitoring:**
```python
# Logged in app.py when search executes
logger.info(f"Search '{query}': {len(companies)} results, types: {match_types}")

# Example outputs:
# Search 'strony www': 12 results, types: {'fts': 8, 'fuzzy': 4}
# Search '5882436505': 1 results, types: {'nip': 1}
# Search 'PIXLAB': 1 results, types: {'exact': 1}
```

---

## 9. Search Result Structure

### 9.1 SearchResult Dataclass

**File:** `search_service.py` (lines 20-25)

```python
@dataclass
class SearchResult:
    """Search result with score and match info"""
    company: Company          # Full Company SQLAlchemy object
    score: float              # Relevance score (0.0-100.0)
    match_type: str           # Match type identifier
```

**Match Types:**
| Match Type | Description | Score Range |
|------------|-------------|-------------|
| `'nip'` | Direct NIP match | 100.0 (fixed) |
| `'regon'` | Direct REGON match | 100.0 (fixed) |
| `'exact'` | Exact name match (SQLite) | Variable (usually high) |
| `'fts'` | PostgreSQL full-text search | 0.0-100.0 (normalized ts_rank) |
| `'fuzzy'` | PostgreSQL fuzzy similarity | 0.0-100.0 (normalized similarity) |
| `'history'` | Founding history match | 50.0 (fixed bonus) |
| `'keyword'` | SQLite keyword scoring | Variable (weighted sum) |
| `'all'` | All companies (no filter) | 0.0 (no relevance) |

### 9.2 Score Normalization

**PostgreSQL FTS Scores:**
```python
# ts_rank returns 0.0-1.0, normalize to 0-100
fts_score = ts_rank(...) * 100

# similarity returns 0.0-1.0, normalize to 0-100
fuzzy_score = similarity(...) * 100

# history match is fixed bonus
history_score = 0.5 * 100 = 50.0
```

**SQLite Keyword Scores:**
```python
# Sum of all matching field weights
score = (
    10  # name match
    + 5   # exact match bonus
    + 5   # description_short
    + 8   # services
    + 7   # competencies
    + 3   # city
    + 12  # founding_history
    + 4   # description_full
)
# Maximum possible: 54 points
# Typical: 10-30 points
```

---

## 10. Error Handling & Edge Cases

### 10.1 PostgreSQL FTS Error Handling

**Error Scenarios:**
1. **Invalid tsquery syntax** - Fallback to SQLite
2. **pg_trgm extension missing** - Degrade to FTS-only (no fuzzy)
3. **search_vector column missing** - Exception, fallback to SQLite
4. **Database connection error** - Propagate exception to route

**Implementation:**
```python
try:
    result = self.db.execute(sql, params)
    rows = result.fetchall()
    # ... process results
except Exception as e:
    print(f"PostgreSQL FTS error: {e}, falling back to keyword search")
    self.db.rollback()  # CRITICAL: prevent InFailedSqlTransaction
    return self._search_sqlite_fallback(query, category_id, limit)
```

**Critical:** `db.rollback()` is essential before fallback to prevent transaction state errors.

### 10.2 Empty Results Handling

**No Results Scenarios:**
1. **NIP/REGON not found** - Return empty list `[]`
2. **FTS returns 0 matches** - Automatic fallback to SQLite scoring
3. **SQLite scoring returns 0 matches** - Return empty list `[]`
4. **Empty query** - Return all active companies (ordered by name)

**User Interface:**
```html
{% if result_count == 0 %}
    <div class="alert alert-info">
        Brak wyników dla zapytania "{{ query }}".
        Spróbuj innych słów kluczowych lub usuń filtry.
    </div>
{% endif %}
```

### 10.3 Special Characters & Sanitization

**Query Cleaning:**
```python
query = query.strip()  # Remove leading/trailing whitespace
clean_nip = re.sub(r'[\s\-]', '', query)  # Remove spaces and hyphens from NIP/REGON
```

**SQL Injection Prevention:**
- All queries use SQLAlchemy parameter binding (`:param` syntax)
- No raw string concatenation in SQL
- ILIKE patterns are passed as array parameters

**XSS Prevention:**
- All user input sanitized before display (handled by Jinja2 auto-escaping)
- Query string displayed in template: `{{ query }}` (auto-escaped)

---

## 11. Testing & Verification

### 11.1 Test Queries

**NIP Lookup:**
```
Query: "5882436505"
Expected: PIXLAB Sp. z o.o. (single result, score=100, match_type='nip')
```

**REGON Lookup:**
```
Query: "220825533"
Expected: Single company with matching REGON (score=100, match_type='regon')
```

**Keyword Search (PostgreSQL FTS):**
```
Query: "strony internetowe"
Expected: Multiple results (IT/Web companies, match_type='fts' or 'fuzzy')
Keywords expanded to: [strony, internetowe, www, web, internet, witryny, seo, ...]
```

**Exact Name Match:**
```
Query: "PIXLAB"
Expected: PIXLAB at top (high score, match_type='exact' or 'fts')
```

**Owner/Founder Search:**
```
Query: "Jan Kowalski"  (example founder name)
Expected: Companies where Jan Kowalski appears in founding_history
Match type: 'history' or high score from founding_history match
```

**Category Filter:**
```
Query: "strony" + category=1 (IT)
Expected: Only IT category companies matching "strony"
```

**Empty Query:**
```
Query: ""
Expected: All active companies, alphabetically sorted
```

### 11.2 Performance Testing

**Load Testing Scenarios:**
```python
# Test 1: Direct lookup performance
for nip in all_nips:
    results = search_companies(db, nip)
    assert len(results) == 1
    assert results[0].match_type == 'nip'

# Test 2: Full-text search performance
queries = ["strony", "budowa", "księgowość", "metal", "transport"]
for query in queries:
    start = time.time()
    results = search_companies(db, query)
    elapsed = time.time() - start
    assert elapsed < 0.1  # < 100ms
    print(f"{query}: {len(results)} results in {elapsed*1000:.1f}ms")

# Test 3: Fallback trigger test (simulate FTS failure)
# Force SQLite fallback by using invalid tsquery syntax
results = search_companies(db, "test:query|with:invalid&syntax")
# Should not crash, should return results via fallback
```

### 11.3 Search Quality Metrics

**Relevance Testing:**
```python
test_cases = [
    {
        'query': 'strony www',
        'expected_top_3': ['PIXLAB', 'Web Agency', 'IT Solutions'],
        'min_results': 5
    },
    {
        'query': 'budownictwo',
        'expected_categories': ['Construction'],
        'min_results': 3
    },
    # ... more test cases
]

for test in test_cases:
    results = search_companies(db, test['query'])
    assert len(results) >= test['min_results']
    # Check if expected companies appear in top results
    top_names = [r.company.name for r in results[:3]]
    for expected in test['expected_top_3']:
        assert expected in top_names
```

---

## 12. Maintenance & Monitoring

### 12.1 Database Maintenance

**Weekly Tasks:**
```sql
-- Rebuild search vectors (if data quality issues)
UPDATE companies SET search_vector =
    setweight(to_tsvector('simple', COALESCE(name, '')), 'A') ||
    setweight(to_tsvector('simple', COALESCE(description_short, '')), 'B') ||
    setweight(to_tsvector('simple', COALESCE(description_full, '')), 'C') ||
    setweight(to_tsvector('simple', COALESCE(founding_history, '')), 'B')
WHERE updated_at > NOW() - INTERVAL '7 days';

-- Verify index health
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename = 'companies'
ORDER BY idx_scan DESC;

-- Check for missing indexes
SELECT indexname, indexdef FROM pg_indexes
WHERE tablename = 'companies';
```

**Monthly Tasks:**
```sql
-- Vacuum and analyze for performance
VACUUM ANALYZE companies;

-- Check for slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE query LIKE '%companies%search_vector%'
ORDER BY mean_exec_time DESC
LIMIT 10;
```

### 12.2 Search Analytics

**Logging Search Patterns:**
```python
# Already implemented in app.py /search route
logger.info(f"Search '{query}': {len(companies)} results, types: {match_types}")
```

**Analytics Queries:**
```sql
-- Top search queries (requires search_logs table - not yet implemented)
SELECT query, COUNT(*) as frequency
FROM search_logs
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY query
ORDER BY frequency DESC
LIMIT 20;

-- Zero-result searches (requires logging)
SELECT query, COUNT(*) as frequency
FROM search_logs
WHERE result_count = 0
AND created_at > NOW() - INTERVAL '30 days'
GROUP BY query
ORDER BY frequency DESC
LIMIT 10;
```

### 12.3 Synonym Expansion Tuning

**Adding New Synonyms:**
```python
# Edit search_service.py KEYWORD_SYNONYMS dictionary
KEYWORD_SYNONYMS = {
    # Add new industry-specific terms
    'cyberbezpieczeństwo': ['security', 'ochrona', 'firewall', 'antywirus'],
    # ... more synonyms
}
```

**Synonym Effectiveness Testing:**
```python
# Test query with and without synonym expansion
query = "cyberbezpieczeństwo"

# With expansion
results_with = search_companies(db, query)
print(f"With synonyms: {len(results_with)} results")

# Without expansion (mock)
# ... compare recall/precision
```

---

## 13. Future Enhancements

### 13.1 Planned Improvements

1. **Search Result Ranking ML Model**
   - Learn from user click-through rates
   - Personalized ranking based on user preferences
   - A/B testing of ranking algorithms

2. **Search Autocomplete**
   - Suggest company names as user types
   - Suggest common search queries
   - Category-based suggestions

3. **Advanced Filters**
   - Location-based search (radius from city)
   - Certification filters (ISO, other)
   - Founding year range
   - Employee count range (if available)

4. **Search Analytics Dashboard**
   - Top queries (daily/weekly/monthly)
   - Zero-result queries (opportunities for content)
   - Average result count per query
   - Match type distribution
   - Click-through rates by position

5. **Semantic Search**
   - Integrate sentence embeddings (sentence-transformers)
   - Vector similarity search for related companies
   - "More like this" company recommendations

6. **Multi-language Support**
   - English query translation
   - German query support (for border region)
   - Auto-detect query language

### 13.2 Performance Optimization Ideas

1. **Query Result Caching**
   - Redis cache for common queries (TTL 5 minutes)
   - Cache key: `search:{query}:{category_id}`
   - Invalidate on company data updates

2. **Partial Index Optimization**
   ```sql
   -- Index only active companies
   CREATE INDEX idx_companies_active_search
   ON companies USING gin(search_vector)
   WHERE status = 'active';
   ```

3. **Materialized View for Search**
   ```sql
   -- Pre-compute search data
   CREATE MATERIALIZED VIEW search_companies_mv AS
   SELECT id, name, search_vector, category_id, status, ...
   FROM companies
   WHERE status = 'active';

   -- Refresh daily
   REFRESH MATERIALIZED VIEW search_companies_mv;
   ```

4. **Connection Pooling**
   - Already implemented via SQLAlchemy
   - Monitor pool size and overflow
   - Adjust pool_size/max_overflow if needed

---

## 14. Related Documentation

- **[Flask Application Structure](../analysis/flask-application-structure.md)** - Complete route reference
- **[Database Schema](./05-database-schema.md)** - Company model and indexes
- **[External Integrations](./06-external-integrations.md)** - AI Chat integration details
- **[AI Chat Flow](./03-ai-chat-flow.md)** - How AI uses search service (to be created)

---

## 15. Glossary

| Term | Description |
|------|-------------|
| **FTS** | Full-Text Search - PostgreSQL text search engine using tsvector |
| **tsvector** | PostgreSQL data type for full-text search, stores preprocessed text |
| **tsquery** | PostgreSQL query syntax for full-text search (e.g., "word1 \| word2") |
| **ts_rank** | PostgreSQL function to score FTS relevance (0.0-1.0) |
| **pg_trgm** | PostgreSQL extension for trigram-based fuzzy string matching |
| **similarity()** | pg_trgm function to measure string similarity (0.0-1.0) |
| **Synonym Expansion** | Expanding user query with related keywords (e.g., "strony" → "www, web, internet") |
| **SearchResult** | Dataclass containing Company, score, and match_type |
| **Match Type** | Identifier for how company was matched (nip, regon, fts, fuzzy, keyword, etc.) |
| **NIP** | Polish tax identification number (10 digits) |
| **REGON** | Polish business registry number (9 or 14 digits) |
| **Fallback** | Alternative search method when primary method fails (PostgreSQL FTS → SQLite keyword scoring) |
| **SearchService** | Unified search service class (search_service.py) |
| **Keyword Scoring** | In-memory scoring algorithm for SQLite fallback |

---

## Document Metadata

**Created:** 2026-01-10
**Author:** Architecture Documentation (auto-claude)
**Related Files:**
- `search_service.py` (main implementation)
- `app.py` (lines 718-748, /search route)
- `nordabiz_chat.py` (lines 383-405, AI integration)
- `database.py` (Company model)

**Version History:**
- v1.0 (2026-01-10) - Initial documentation

---

**End of Document**