- System Context diagram (C4 Level 1) - Container diagram (C4 Level 2) - Flask component diagram (C4 Level 3) - Deployment architecture with NPM proxy - Database schema (PostgreSQL) - External integrations (Gemini AI, Brave Search, PageSpeed) - Network topology (INPI infrastructure) - Security architecture - API endpoints reference - Troubleshooting guide - Data flow diagrams (auth, search, AI chat, SEO audit, news monitoring) All diagrams use Mermaid.js and render automatically on GitHub. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
36 KiB
Company Search Flow
Document Version: 1.0 Last Updated: 2026-01-10 Status: Production LIVE Flow Type: Company Search & Discovery
Overview
This document describes the complete company search flow for the Norda Biznes Hub application, covering:
- User Search Interface (
/searchroute) - Search Service Architecture (unified search with multiple strategies)
- AI Chat Integration (context-aware company discovery)
- Search Strategies:
- NIP/REGON direct lookup
- Synonym expansion
- PostgreSQL Full-Text Search (FTS)
- Fuzzy matching (pg_trgm)
- SQLite keyword scoring fallback
Key Technology:
- Search Engine: Custom unified SearchService
- Database: PostgreSQL FTS with tsvector indexing
- Fuzzy Matching: pg_trgm extension for typo tolerance
- Synonym Expansion: Domain-specific keyword mappings
- AI Integration: Used by NordaBiz Chat for context building
Performance Features:
- Direct identifier lookup (NIP/REGON) bypasses full search
- Database-level full-text search indexing
- Synonym expansion increases recall
- Configurable result limits (default 50)
- Fallback mechanisms for SQLite compatibility
1. Search Flow Overview
1.1 High-Level Architecture
flowchart TD
User[User] -->|1. Search query| UI[Search UI<br/>/search route]
AIUser[AI Chat User] -->|1. Natural language| Chat[AI Chat<br/>/chat route]
UI -->|2. Call| SearchSvc[Search Service<br/>search_service.py]
Chat -->|2. Find companies| SearchSvc
SearchSvc -->|3. Detect query type| QueryType{Query Type?}
QueryType -->|NIP: 10 digits| NIPLookup[NIP Direct Lookup]
QueryType -->|REGON: 9/14 digits| REGONLookup[REGON Direct Lookup]
QueryType -->|Text query| DBCheck{Database<br/>Type?}
DBCheck -->|PostgreSQL| PGFTS[PostgreSQL FTS<br/>+ Fuzzy Match]
DBCheck -->|SQLite| SQLiteFallback[SQLite Keyword<br/>Scoring]
NIPLookup -->|4. Query DB| DB[(PostgreSQL<br/>companies)]
REGONLookup -->|4. Query DB| DB
PGFTS -->|4. FTS query| DB
SQLiteFallback -->|4. LIKE query| DB
DB -->|5. Results| SearchSvc
SearchSvc -->|6. SearchResult[]| UI
SearchSvc -->|6. Company[]| Chat
UI -->|7. Render| SearchResults[search_results.html]
Chat -->|7. Build context| AIContext[AI Context Builder]
SearchResults -->|8. Display| User
AIContext -->|8. Generate response| AIUser
style SearchSvc fill:#4CAF50
style PGFTS fill:#2196F3
style DB fill:#FF9800
style NIPLookup fill:#9C27B0
style REGONLookup fill:#9C27B0
2. Search Strategies
2.1 Strategy Selection Algorithm
flowchart TD
Start([User Query]) --> Clean[Strip whitespace]
Clean --> Empty{Empty<br/>query?}
Empty -->|Yes| AllCompanies[Return all companies<br/>ORDER BY name]
Empty -->|No| NIPCheck{Is NIP?<br/>10 digits}
NIPCheck -->|Yes| NIPSearch[Direct NIP lookup<br/>WHERE nip = ?]
NIPCheck -->|No| REGONCheck{Is REGON?<br/>9 or 14 digits}
REGONCheck -->|Yes| REGONSearch[Direct REGON lookup<br/>WHERE regon = ?]
REGONCheck -->|No| DBType{Database<br/>Type?}
DBType -->|PostgreSQL| PGFlow[PostgreSQL FTS Flow]
DBType -->|SQLite| SQLiteFlow[SQLite Keyword Flow]
NIPSearch --> Found{Found?}
REGONSearch --> Found
Found -->|Yes| ReturnSingle[Return single result<br/>score=100, match_type='nip/regon']
Found -->|No| ReturnEmpty[Return empty list]
PGFlow --> PGSynonym[Expand synonyms]
PGSynonym --> PGExtCheck{pg_trgm<br/>available?}
PGExtCheck -->|Yes| FTS_Fuzzy[FTS + Fuzzy search<br/>ts_rank + similarity]
PGExtCheck -->|No| FTS_Only[FTS only<br/>ts_rank]
FTS_Fuzzy --> PGResults{Results?}
FTS_Only --> PGResults
PGResults -->|Yes| ReturnScored[Return scored results<br/>ORDER BY score DESC]
PGResults -->|No| Fallback[Execute SQLite fallback]
SQLiteFlow --> SQLiteSynonym[Expand synonyms]
SQLiteSynonym --> Fallback
Fallback --> InMemory[In-memory keyword scoring]
InMemory --> ReturnScored
ReturnSingle --> End([SearchResult[]])
ReturnEmpty --> End
ReturnScored --> End
AllCompanies --> End
style NIPSearch fill:#9C27B0
style REGONSearch fill:#9C27B0
style FTS_Fuzzy fill:#2196F3
style FTS_Only fill:#2196F3
style InMemory fill:#FF9800
2.2 Synonym Expansion
Purpose: Increase search recall by expanding user queries with domain-specific synonyms
Examples:
KEYWORD_SYNONYMS = {
# IT / Web
'strony': ['www', 'web', 'internet', 'witryny', 'seo', 'e-commerce', 'sklep', 'portal'],
'aplikacje': ['software', 'programowanie', 'systemy', 'crm', 'erp', 'app'],
'it': ['informatyka', 'komputery', 'software', 'systemy', 'serwis'],
# Construction
'budowa': ['budownictwo', 'konstrukcje', 'remonty', 'wykończenia', 'dach', 'elewacja'],
'remont': ['wykończenie', 'naprawa', 'renowacja', 'modernizacja'],
# Services
'księgowość': ['rachunkowość', 'finanse', 'podatki', 'biuro rachunkowe', 'kadry'],
'prawo': ['prawnik', 'adwokat', 'radca', 'kancelaria', 'notariusz'],
# Production
'metal': ['stal', 'obróbka', 'spawanie', 'cnc', 'ślusarstwo'],
'drewno': ['stolarka', 'meble', 'tartak', 'carpentry'],
}
Algorithm:
- Tokenize user query (split on whitespace, strip punctuation)
- For each word:
- Direct lookup in KEYWORD_SYNONYMS keys
- Check if word appears in any synonym list
- Add matching synonyms to expanded query
- Return unique set of keywords
Example Expansion:
Input: "strony internetowe"
Output: ['strony', 'internetowe', 'www', 'web', 'internet', 'witryny',
'seo', 'e-commerce', 'ecommerce', 'sklep', 'portal', 'online',
'cyfrowe', 'marketing']
3. PostgreSQL Full-Text Search (FTS)
3.1 FTS Search Sequence
sequenceDiagram
actor User
participant Route as Flask Route<br/>/search
participant SearchSvc as SearchService
participant PG as PostgreSQL
participant FTS as Full-Text Engine<br/>(tsvector)
participant Trgm as pg_trgm Extension<br/>(fuzzy matching)
User->>Route: GET /search?q=strony www
Route->>SearchSvc: search("strony www", limit=50)
Note over SearchSvc: Detect PostgreSQL database
SearchSvc->>SearchSvc: _expand_keywords("strony www")
Note over SearchSvc: Expanded: [strony, www, web, internet,<br/>witryny, seo, e-commerce, ...]
SearchSvc->>SearchSvc: Build tsquery: "strony:* | www:* | web:* | ..."
SearchSvc->>SearchSvc: Build ILIKE patterns: [%strony%, %www%, %web%, ...]
SearchSvc->>PG: Check pg_trgm extension available
PG->>SearchSvc: Extension exists
SearchSvc->>PG: Execute FTS + Fuzzy query
Note over PG: SELECT c.id,<br/>ts_rank(search_vector, tsquery) as fts_score,<br/>similarity(name, query) as fuzzy_score,<br/>CASE WHEN founding_history ILIKE ...<br/>FROM companies c<br/>WHERE search_vector @@ tsquery<br/>OR similarity(name, query) > 0.2<br/>OR name/description ILIKE patterns
PG->>FTS: Match against search_vector
FTS->>PG: FTS matches with ts_rank scores
PG->>Trgm: Calculate similarity(name, query)
Trgm->>PG: Fuzzy match scores (0.0-1.0)
PG->>SearchSvc: Result rows: [(id, fts_score, fuzzy_score, history_score), ...]
SearchSvc->>PG: Fetch full Company objects<br/>WHERE id IN (...)
PG->>SearchSvc: Company objects
SearchSvc->>SearchSvc: Determine match_type (fts/fuzzy/history)
SearchSvc->>SearchSvc: Normalize scores (0-100)
SearchSvc->>Route: SearchResult[] with companies, scores, match_types
Route->>User: Render search_results.html
3.2 PostgreSQL FTS Implementation
File: search_service.py (lines 251-378)
Database Requirements:
- Extension:
pg_trgm(optional, enables fuzzy matching) - Column:
companies.search_vector(tsvector, indexed) - Index: GIN index on
search_vectorfor fast full-text search
SQL Query Structure (with pg_trgm):
SELECT c.id,
COALESCE(ts_rank(c.search_vector, to_tsquery('simple', :tsquery)), 0) as fts_score,
COALESCE(similarity(c.name, :query), 0) as fuzzy_score,
CASE WHEN c.founding_history ILIKE ANY(:like_patterns) THEN 0.5 ELSE 0 END as history_score
FROM companies c
WHERE c.status = 'active'
AND (
c.search_vector @@ to_tsquery('simple', :tsquery) -- FTS match
OR similarity(c.name, :query) > 0.2 -- Fuzzy name match
OR c.name ILIKE ANY(:like_patterns) -- Keyword in name
OR c.description_short ILIKE ANY(:like_patterns) -- Keyword in description
OR c.founding_history ILIKE ANY(:like_patterns) -- Keyword in owners/founders
OR c.description_full ILIKE ANY(:like_patterns) -- Keyword in full text
)
ORDER BY GREATEST(
COALESCE(ts_rank(c.search_vector, to_tsquery('simple', :tsquery)), 0),
COALESCE(similarity(c.name, :query), 0),
CASE WHEN c.founding_history ILIKE ANY(:like_patterns) THEN 0.5 ELSE 0 END
) DESC
LIMIT :limit
Parameters:
:tsquery- Expanded keywords joined with|(OR), each with:*prefix matching- Example:
"strony:* | www:* | web:* | internet:*"
- Example:
:query- Original user query for fuzzy matching:like_patterns- Array of ILIKE patterns for direct keyword matches- Example:
['%strony%', '%www%', '%web%']
- Example:
:limit- Maximum results (default 50)
Scoring Strategy:
- FTS Score:
ts_rank()measures how well document matches query (0.0-1.0) - Fuzzy Score:
similarity()from pg_trgm measures string similarity (0.0-1.0) - History Score: Fixed 0.5 bonus if founders/owners match (important for people search)
- Final Score:
GREATEST()of all three scores, normalized to 0-100 scale
Match Types:
'fts'- Full-text search match (highest ts_rank)'fuzzy'- Fuzzy string similarity match (highest similarity)'history'- Founding history match (owner/founder keywords)
Fallback Behavior:
- If
pg_trgmextension not available → Uses FTS only (no fuzzy matching) - If FTS returns 0 results → Falls back to SQLite keyword scoring
- If FTS query fails (exception) → Rollback transaction, use SQLite fallback
4. SQLite Keyword Scoring Fallback
4.1 Fallback Sequence
sequenceDiagram
participant SearchSvc as SearchService
participant DB as Database
participant Scorer as Keyword Scorer<br/>(in-memory)
SearchSvc->>SearchSvc: _expand_keywords(query)
Note over SearchSvc: Keywords: [strony, www, web, ...]
SearchSvc->>DB: SELECT * FROM companies<br/>WHERE status = 'active'
DB->>SearchSvc: All active companies (in-memory)
loop For each company
SearchSvc->>Scorer: Calculate score
Note over Scorer: Name match: +10<br/>(+5 bonus for exact match)
Note over Scorer: Description short: +5
Note over Scorer: Services: +8
Note over Scorer: Competencies: +7
Note over Scorer: City: +3
Note over Scorer: Founding history: +12<br/>(owners/founders)
Note over Scorer: Description full: +4
Scorer->>SearchSvc: Total score (0+)
end
SearchSvc->>SearchSvc: Filter companies (score > 0)
SearchSvc->>SearchSvc: Sort by score DESC
SearchSvc->>SearchSvc: Limit results
SearchSvc->>SearchSvc: Build SearchResult[]<br/>with scores and match_types
4.2 Keyword Scoring Algorithm
File: search_service.py (lines 162-249)
Scoring Weights:
{
'name_match': 10, # Company name contains keyword
'exact_name_match': +5, # Exact query appears in name (bonus)
'description_short': 5, # Short description contains keyword
'services': 8, # Service tag matches
'competencies': 7, # Competency tag matches
'city': 3, # City/location matches
'founding_history': 12, # Owners/founders match (highest weight)
'description_full': 4, # Full description contains keyword
}
Algorithm:
-
Fetch all active companies from database
-
For each company, calculate score:
score = 0 match_type = 'keyword' # Name match (highest weight) if any(keyword in company.name.lower() for keyword in keywords): score += 10 if original_query.lower() in company.name.lower(): score += 5 # Exact match bonus match_type = 'exact' # Description match if any(keyword in company.description_short.lower() for keyword in keywords): score += 5 # Services match if any(keyword in service.name.lower() for service in company.services for keyword in keywords): score += 8 # Competencies match if any(keyword in competency.name.lower() for competency in company.competencies for keyword in keywords): score += 7 # City match if any(keyword in company.city.lower() for keyword in keywords): score += 3 # Founding history match (owners, founders) if any(keyword in company.founding_history.lower() for keyword in keywords): score += 12 # Full description match if any(keyword in company.description_full.lower() for keyword in keywords): score += 4 -
Filter companies with score > 0
-
Sort by score descending
-
Limit to requested result count
-
Return as
SearchResult[]with scores and match types
Match Types:
'exact'- Original query appears exactly in company name'keyword'- One or more expanded keywords matched
5. Direct Identifier Lookup
5.1 NIP Lookup Flow
sequenceDiagram
actor User
participant Route as /search route
participant SearchSvc as SearchService
participant DB as PostgreSQL
User->>Route: GET /search?q=5882436505
Route->>SearchSvc: search("5882436505")
SearchSvc->>SearchSvc: _is_nip("5882436505")
Note over SearchSvc: Regex: ^\d{10}$
SearchSvc->>SearchSvc: Clean: remove spaces/hyphens
SearchSvc->>DB: SELECT * FROM companies<br/>WHERE nip = '5882436505'<br/>AND status = 'active'
alt Company found
DB->>SearchSvc: Company object
SearchSvc->>Route: [SearchResult(company, score=100, match_type='nip')]
Route->>User: Display single company
else Not found
DB->>SearchSvc: NULL
SearchSvc->>Route: []
Route->>User: "Brak wyników"
end
Implementation:
- File:
search_service.py(lines 112-131) - Input cleaning: Strip spaces and hyphens (e.g., "588-243-65-05" → "5882436505")
- Validation: Must be exactly 10 digits
- Score: Always 100.0 (perfect match)
- Match type:
'nip'
5.2 REGON Lookup Flow
sequenceDiagram
actor User
participant Route as /search route
participant SearchSvc as SearchService
participant DB as PostgreSQL
User->>Route: GET /search?q=220825533
Route->>SearchSvc: search("220825533")
SearchSvc->>SearchSvc: _is_regon("220825533")
Note over SearchSvc: Regex: ^\d{9}$ OR ^\d{14}$
SearchSvc->>SearchSvc: Clean: remove spaces/hyphens
SearchSvc->>DB: SELECT * FROM companies<br/>WHERE regon = '220825533'<br/>AND status = 'active'
alt Company found
DB->>SearchSvc: Company object
SearchSvc->>Route: [SearchResult(company, score=100, match_type='regon')]
Route->>User: Display single company
else Not found
DB->>SearchSvc: NULL
SearchSvc->>Route: []
Route->>User: "Brak wyników"
end
Implementation:
- File:
search_service.py(lines 117-142) - Input cleaning: Strip spaces and hyphens
- Validation: Must be exactly 9 or 14 digits
- Score: Always 100.0 (perfect match)
- Match type:
'regon'
6. User Search Interface
6.1 Search Route Flow
sequenceDiagram
actor User
participant Browser
participant Flask as Flask App<br/>(app.py /search)
participant SearchSvc as SearchService
participant DB as PostgreSQL
participant Template as search_results.html
User->>Browser: Navigate to /search
Browser->>Flask: GET /search?q=strony+www&category=1
Note over Flask: @login_required<br/>User must be authenticated
Flask->>Flask: Parse query params<br/>q = "strony www"<br/>category = 1
Flask->>SearchSvc: search_companies(db, "strony www", category_id=1, limit=50)
SearchSvc->>SearchSvc: Execute search strategy<br/>(NIP/REGON/FTS/Fallback)
SearchSvc->>DB: Query companies
DB->>SearchSvc: Results
SearchSvc->>Flask: List[SearchResult]
Flask->>Flask: Extract companies from results<br/>companies = [r.company for r in results]
Flask->>Flask: Log search analytics<br/>logger.info(f"Search '{query}': {len} results, types: {match_types}")
Flask->>Template: render_template('search_results.html',<br/>companies=companies,<br/>query=query,<br/>category_id=category_id,<br/>result_count=len)
Template->>Browser: HTML response
Browser->>User: Display search results
Route Details:
- Path:
/search - Method: GET
- Authentication: Required (
@login_required) - File:
app.py(lines 718-748)
Query Parameters:
q(string, optional) - Search querycategory(integer, optional) - Category filter (category_id)
Response:
- Template:
search_results.html - Context Variables:
companies- List of Company objectsquery- Original search querycategory_id- Selected category filterresult_count- Number of results
Analytics Logging:
if query:
match_types = {}
for r in results:
match_types[r.match_type] = match_types.get(r.match_type, 0) + 1
logger.info(f"Search '{query}': {len(companies)} results, types: {match_types}")
Example log output:
Search 'strony www': 12 results, types: {'fts': 8, 'fuzzy': 3, 'exact': 1}
7. AI Chat Integration
7.1 AI Chat Search Flow
sequenceDiagram
actor User
participant Chat as AI Chat Interface<br/>/chat
participant ChatSvc as NordaBizChatService<br/>nordabiz_chat.py
participant SearchSvc as SearchService
participant DB as PostgreSQL
participant Gemini as Google Gemini API
User->>Chat: POST /chat/send<br/>"Szukam firm do stron www"
Chat->>ChatSvc: send_message(user_message, conversation_id)
ChatSvc->>ChatSvc: _find_relevant_companies(db, message)
Note over ChatSvc: Extract search keywords from message
ChatSvc->>SearchSvc: search_companies(db, message, limit=10)
Note over SearchSvc: Use same search strategies<br/>(NIP/REGON/FTS/Fallback)
SearchSvc->>DB: Query companies
DB->>SearchSvc: Results
SearchSvc->>ChatSvc: List[SearchResult] (max 10)
ChatSvc->>ChatSvc: Extract companies from results<br/>companies = [r.company for r in results]
ChatSvc->>ChatSvc: _build_conversation_context(db, user, conversation, companies)
Note over ChatSvc: Limit to 8 companies (prevent context overflow)<br/>Include last 10 messages for history
ChatSvc->>ChatSvc: _company_to_compact_dict(company)
Note over ChatSvc: Compress company data<br/>(name, desc, services, competencies, etc)
ChatSvc->>Gemini: POST /generateContent<br/>System prompt + context + user message
Note over Gemini: Model: gemini-2.5-flash<br/>Max tokens: 2048
Gemini->>ChatSvc: AI response text
ChatSvc->>DB: Save conversation messages<br/>(user message + AI response)
ChatSvc->>DB: Track API costs<br/>(gemini_cost_tracking)
ChatSvc->>Chat: AI response with company recommendations
Chat->>User: Display chat response
Key Differences from User Search:
- Result Limit: 10 companies (vs 50 for user search)
- Company Limit to AI: 8 companies max (prevents context overflow)
- Context Building: Companies converted to compact JSON format
- Integration: Seamless - AI doesn't know about search internals
- Message History: Last 10 messages included in context
Implementation:
- File:
nordabiz_chat.py(lines 383-405) - Search Call:
results = search_companies(db, message, limit=10) companies = [result.company for result in results] return companies
Company Data Compression:
compact = {
'name': company.name,
'cat': company.category.name,
'desc': company.description_short,
'history': company.founding_history, # Owners, founders
'svc': [service.name for service in company.services],
'comp': [competency.name for competency in company.competencies],
'web': company.website,
'tel': company.phone,
'mail': company.email,
'city': company.address_city,
'year': company.year_established,
'cert': [cert.name for cert in company.certifications[:3]]
}
AI System Prompt (includes search context):
Jesteś asystentem bazy firm Norda Biznes z Wejherowa.
Odpowiadaj zwięźle, konkretnie, po polsku.
Oto firmy które mogą być istotne dla pytania użytkownika:
{companies_json}
Historia rozmowy:
{recent_messages}
Odpowiedz na pytanie użytkownika bazując na powyższych danych.
8. Performance Considerations
8.1 Database Indexing
Required Indexes:
-- Full-text search index (PostgreSQL)
CREATE INDEX idx_companies_search_vector ON companies USING gin(search_vector);
-- NIP lookup index
CREATE UNIQUE INDEX idx_companies_nip ON companies(nip) WHERE status = 'active';
-- REGON lookup index
CREATE INDEX idx_companies_regon ON companies(regon) WHERE status = 'active';
-- Status filter index
CREATE INDEX idx_companies_status ON companies(status);
-- Category filter index
CREATE INDEX idx_companies_category ON companies(category_id) WHERE status = 'active';
-- pg_trgm index for fuzzy matching (optional)
CREATE INDEX idx_companies_name_trgm ON companies USING gin(name gin_trgm_ops);
8.2 Search Vector Maintenance
Automatic Updates:
-- Trigger to update search_vector on INSERT/UPDATE
CREATE TRIGGER companies_search_vector_update
BEFORE INSERT OR UPDATE ON companies
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(
search_vector, 'pg_catalog.simple',
name, description_short, description_full, founding_history
);
Manual Rebuild:
-- Rebuild all search vectors
UPDATE companies SET search_vector =
setweight(to_tsvector('simple', COALESCE(name, '')), 'A') ||
setweight(to_tsvector('simple', COALESCE(description_short, '')), 'B') ||
setweight(to_tsvector('simple', COALESCE(description_full, '')), 'C') ||
setweight(to_tsvector('simple', COALESCE(founding_history, '')), 'B');
8.3 Query Performance
Performance Targets:
- NIP/REGON lookup: < 10ms (indexed)
- PostgreSQL FTS: < 100ms (typical)
- SQLite fallback: < 500ms (in-memory scoring)
- AI Chat search: < 200ms (limit 10 results)
Optimization Strategies:
- Early Exit: NIP/REGON lookup bypasses full search
- Result Limiting: Default 50 results (10 for AI chat)
- Category Filtering: Reduces search space
- Synonym Pre-expansion: Computed once, reused in all clauses
- Score-based Ordering: Database-level sorting (not in-memory)
8.4 Fallback Performance
PostgreSQL → SQLite Fallback Triggers:
- FTS query returns 0 results
- FTS query throws exception (syntax error, missing extension)
pg_trgmextension not available (degrades to FTS-only, not full fallback)
SQLite Fallback Cost:
- Fetches ALL active companies into memory
- Scores each company in Python (slower than SQL)
- Suitable for development/testing, not recommended for production with 100+ companies
Monitoring:
# Logged in app.py when search executes
logger.info(f"Search '{query}': {len(companies)} results, types: {match_types}")
# Example outputs:
# Search 'strony www': 12 results, types: {'fts': 8, 'fuzzy': 4}
# Search '5882436505': 1 results, types: {'nip': 1}
# Search 'PIXLAB': 1 results, types: {'exact': 1}
9. Search Result Structure
9.1 SearchResult Dataclass
File: search_service.py (lines 20-25)
@dataclass
class SearchResult:
"""Search result with score and match info"""
company: Company # Full Company SQLAlchemy object
score: float # Relevance score (0.0-100.0)
match_type: str # Match type identifier
Match Types:
| Match Type | Description | Score Range |
|---|---|---|
'nip' |
Direct NIP match | 100.0 (fixed) |
'regon' |
Direct REGON match | 100.0 (fixed) |
'exact' |
Exact name match (SQLite) | Variable (usually high) |
'fts' |
PostgreSQL full-text search | 0.0-100.0 (normalized ts_rank) |
'fuzzy' |
PostgreSQL fuzzy similarity | 0.0-100.0 (normalized similarity) |
'history' |
Founding history match | 50.0 (fixed bonus) |
'keyword' |
SQLite keyword scoring | Variable (weighted sum) |
'all' |
All companies (no filter) | 0.0 (no relevance) |
9.2 Score Normalization
PostgreSQL FTS Scores:
# ts_rank returns 0.0-1.0, normalize to 0-100
fts_score = ts_rank(...) * 100
# similarity returns 0.0-1.0, normalize to 0-100
fuzzy_score = similarity(...) * 100
# history match is fixed bonus
history_score = 0.5 * 100 = 50.0
SQLite Keyword Scores:
# Sum of all matching field weights
score = (
10 # name match
+ 5 # exact match bonus
+ 5 # description_short
+ 8 # services
+ 7 # competencies
+ 3 # city
+ 12 # founding_history
+ 4 # description_full
)
# Maximum possible: 54 points
# Typical: 10-30 points
10. Error Handling & Edge Cases
10.1 PostgreSQL FTS Error Handling
Error Scenarios:
- Invalid tsquery syntax - Fallback to SQLite
- pg_trgm extension missing - Degrade to FTS-only (no fuzzy)
- search_vector column missing - Exception, fallback to SQLite
- Database connection error - Propagate exception to route
Implementation:
try:
result = self.db.execute(sql, params)
rows = result.fetchall()
# ... process results
except Exception as e:
print(f"PostgreSQL FTS error: {e}, falling back to keyword search")
self.db.rollback() # CRITICAL: prevent InFailedSqlTransaction
return self._search_sqlite_fallback(query, category_id, limit)
Critical: db.rollback() is essential before fallback to prevent transaction state errors.
10.2 Empty Results Handling
No Results Scenarios:
- NIP/REGON not found - Return empty list
[] - FTS returns 0 matches - Automatic fallback to SQLite scoring
- SQLite scoring returns 0 matches - Return empty list
[] - Empty query - Return all active companies (ordered by name)
User Interface:
{% if result_count == 0 %}
<div class="alert alert-info">
Brak wyników dla zapytania "{{ query }}".
Spróbuj innych słów kluczowych lub usuń filtry.
</div>
{% endif %}
10.3 Special Characters & Sanitization
Query Cleaning:
query = query.strip() # Remove leading/trailing whitespace
clean_nip = re.sub(r'[\s\-]', '', query) # Remove spaces and hyphens from NIP/REGON
SQL Injection Prevention:
- All queries use SQLAlchemy parameter binding (
:paramsyntax) - No raw string concatenation in SQL
- ILIKE patterns are passed as array parameters
XSS Prevention:
- All user input sanitized before display (handled by Jinja2 auto-escaping)
- Query string displayed in template:
{{ query }}(auto-escaped)
11. Testing & Verification
11.1 Test Queries
NIP Lookup:
Query: "5882436505"
Expected: PIXLAB Sp. z o.o. (single result, score=100, match_type='nip')
REGON Lookup:
Query: "220825533"
Expected: Single company with matching REGON (score=100, match_type='regon')
Keyword Search (PostgreSQL FTS):
Query: "strony internetowe"
Expected: Multiple results (IT/Web companies, match_type='fts' or 'fuzzy')
Keywords expanded to: [strony, internetowe, www, web, internet, witryny, seo, ...]
Exact Name Match:
Query: "PIXLAB"
Expected: PIXLAB at top (high score, match_type='exact' or 'fts')
Owner/Founder Search:
Query: "Jan Kowalski" (example founder name)
Expected: Companies where Jan Kowalski appears in founding_history
Match type: 'history' or high score from founding_history match
Category Filter:
Query: "strony" + category=1 (IT)
Expected: Only IT category companies matching "strony"
Empty Query:
Query: ""
Expected: All active companies, alphabetically sorted
11.2 Performance Testing
Load Testing Scenarios:
# Test 1: Direct lookup performance
for nip in all_nips:
results = search_companies(db, nip)
assert len(results) == 1
assert results[0].match_type == 'nip'
# Test 2: Full-text search performance
queries = ["strony", "budowa", "księgowość", "metal", "transport"]
for query in queries:
start = time.time()
results = search_companies(db, query)
elapsed = time.time() - start
assert elapsed < 0.1 # < 100ms
print(f"{query}: {len(results)} results in {elapsed*1000:.1f}ms")
# Test 3: Fallback trigger test (simulate FTS failure)
# Force SQLite fallback by using invalid tsquery syntax
results = search_companies(db, "test:query|with:invalid&syntax")
# Should not crash, should return results via fallback
11.3 Search Quality Metrics
Relevance Testing:
test_cases = [
{
'query': 'strony www',
'expected_top_3': ['PIXLAB', 'Web Agency', 'IT Solutions'],
'min_results': 5
},
{
'query': 'budownictwo',
'expected_categories': ['Construction'],
'min_results': 3
},
# ... more test cases
]
for test in test_cases:
results = search_companies(db, test['query'])
assert len(results) >= test['min_results']
# Check if expected companies appear in top results
top_names = [r.company.name for r in results[:3]]
for expected in test['expected_top_3']:
assert expected in top_names
12. Maintenance & Monitoring
12.1 Database Maintenance
Weekly Tasks:
-- Rebuild search vectors (if data quality issues)
UPDATE companies SET search_vector =
setweight(to_tsvector('simple', COALESCE(name, '')), 'A') ||
setweight(to_tsvector('simple', COALESCE(description_short, '')), 'B') ||
setweight(to_tsvector('simple', COALESCE(description_full, '')), 'C') ||
setweight(to_tsvector('simple', COALESCE(founding_history, '')), 'B')
WHERE updated_at > NOW() - INTERVAL '7 days';
-- Verify index health
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename = 'companies'
ORDER BY idx_scan DESC;
-- Check for missing indexes
SELECT indexname, indexdef FROM pg_indexes
WHERE tablename = 'companies';
Monthly Tasks:
-- Vacuum and analyze for performance
VACUUM ANALYZE companies;
-- Check for slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE query LIKE '%companies%search_vector%'
ORDER BY mean_exec_time DESC
LIMIT 10;
12.2 Search Analytics
Logging Search Patterns:
# Already implemented in app.py /search route
logger.info(f"Search '{query}': {len(companies)} results, types: {match_types}")
Analytics Queries:
-- Top search queries (requires search_logs table - not yet implemented)
SELECT query, COUNT(*) as frequency
FROM search_logs
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY query
ORDER BY frequency DESC
LIMIT 20;
-- Zero-result searches (requires logging)
SELECT query, COUNT(*) as frequency
FROM search_logs
WHERE result_count = 0
AND created_at > NOW() - INTERVAL '30 days'
GROUP BY query
ORDER BY frequency DESC
LIMIT 10;
12.3 Synonym Expansion Tuning
Adding New Synonyms:
# Edit search_service.py KEYWORD_SYNONYMS dictionary
KEYWORD_SYNONYMS = {
# Add new industry-specific terms
'cyberbezpieczeństwo': ['security', 'ochrona', 'firewall', 'antywirus'],
# ... more synonyms
}
Synonym Effectiveness Testing:
# Test query with and without synonym expansion
query = "cyberbezpieczeństwo"
# With expansion
results_with = search_companies(db, query)
print(f"With synonyms: {len(results_with)} results")
# Without expansion (mock)
# ... compare recall/precision
13. Future Enhancements
13.1 Planned Improvements
-
Search Result Ranking ML Model
- Learn from user click-through rates
- Personalized ranking based on user preferences
- A/B testing of ranking algorithms
-
Search Autocomplete
- Suggest company names as user types
- Suggest common search queries
- Category-based suggestions
-
Advanced Filters
- Location-based search (radius from city)
- Certification filters (ISO, other)
- Founding year range
- Employee count range (if available)
-
Search Analytics Dashboard
- Top queries (daily/weekly/monthly)
- Zero-result queries (opportunities for content)
- Average result count per query
- Match type distribution
- Click-through rates by position
-
Semantic Search
- Integrate sentence embeddings (sentence-transformers)
- Vector similarity search for related companies
- "More like this" company recommendations
-
Multi-language Support
- English query translation
- German query support (for border region)
- Auto-detect query language
13.2 Performance Optimization Ideas
-
Query Result Caching
- Redis cache for common queries (TTL 5 minutes)
- Cache key:
search:{query}:{category_id} - Invalidate on company data updates
-
Partial Index Optimization
-- Index only active companies CREATE INDEX idx_companies_active_search ON companies USING gin(search_vector) WHERE status = 'active'; -
Materialized View for Search
-- Pre-compute search data CREATE MATERIALIZED VIEW search_companies_mv AS SELECT id, name, search_vector, category_id, status, ... FROM companies WHERE status = 'active'; -- Refresh daily REFRESH MATERIALIZED VIEW search_companies_mv; -
Connection Pooling
- Already implemented via SQLAlchemy
- Monitor pool size and overflow
- Adjust pool_size/max_overflow if needed
14. Related Documentation
- Flask Application Structure - Complete route reference
- Database Schema - Company model and indexes
- External Integrations - AI Chat integration details
- AI Chat Flow - How AI uses search service (to be created)
15. Glossary
| Term | Description |
|---|---|
| FTS | Full-Text Search - PostgreSQL text search engine using tsvector |
| tsvector | PostgreSQL data type for full-text search, stores preprocessed text |
| tsquery | PostgreSQL query syntax for full-text search (e.g., "word1 | word2") |
| ts_rank | PostgreSQL function to score FTS relevance (0.0-1.0) |
| pg_trgm | PostgreSQL extension for trigram-based fuzzy string matching |
| similarity() | pg_trgm function to measure string similarity (0.0-1.0) |
| Synonym Expansion | Expanding user query with related keywords (e.g., "strony" → "www, web, internet") |
| SearchResult | Dataclass containing Company, score, and match_type |
| Match Type | Identifier for how company was matched (nip, regon, fts, fuzzy, keyword, etc.) |
| NIP | Polish tax identification number (10 digits) |
| REGON | Polish business registry number (9 or 14 digits) |
| Fallback | Alternative search method when primary method fails (PostgreSQL FTS → SQLite keyword scoring) |
| SearchService | Unified search service class (search_service.py) |
| Keyword Scoring | In-memory scoring algorithm for SQLite fallback |
Document Metadata
Created: 2026-01-10 Author: Architecture Documentation (auto-claude) Related Files:
search_service.py(main implementation)app.py(lines 718-748, /search route)nordabiz_chat.py(lines 383-405, AI integration)database.py(Company model)
Version History:
- v1.0 (2026-01-10) - Initial documentation
End of Document