# News Monitoring Flow **Document Version:** 1.0 **Last Updated:** 2026-01-10 **Status:** Planned (Database schema ready, implementation pending) **Flow Type:** Automated News Discovery and Moderation --- ## Overview This document describes the **complete news monitoring flow** for the Norda Biznes Partner application, covering: - **News Discovery** via Brave Search API - **AI-Powered Filtering** using Google Gemini AI - **Manual Moderation** workflow for admins - **Company Profile Display** of approved news - **User Notifications** for new news items - **Database Storage** in `company_news` and `user_notifications` tables **Key Technology:** - **Search API:** Brave Search News API (free tier: 2,000 req/month) - **AI Filter:** Google Gemini 3 Flash (relevance scoring and classification) - **Database:** PostgreSQL (company_news and user_notifications tables) - **Scheduler:** Planned cron job (6-hour intervals) **Key Features:** - Automated discovery of company mentions in news media - AI-powered relevance scoring (0.0-1.0 scale) - Automatic classification (news_mention, press_release, award, etc.) - Admin moderation dashboard (`/admin/news`) - Display on company profiles (approved news only) - User notification system - Deduplication by URL **API Costs & Performance:** - **API:** Brave Search News API (Free tier: 2,000 searches/month) - **Pricing:** Free for 2,000 monthly searches - **Typical Search Time:** 2-5 seconds per company - **Monthly Capacity:** 2,000 searches ÷ 80 companies = 25 searches per company - **Actual Cost:** $0.00 (within free tier) **Planned Schedule:** - Run every 6 hours (4 times/day) - 80 companies × 4 runs = 320 searches/day - 320 × 30 days = 9,600 searches/month - **⚠️ EXCEEDS FREE TIER** - Need to implement rate limiting or paid tier --- ## 1. High-Level News Monitoring Flow ### 1.1 Complete News Monitoring Flow Diagram ```mermaid flowchart TD Cron[Cron Job
Every 6 hours] -->|Trigger| Script[scripts/fetch_company_news.py] Script -->|1. Fetch companies| DB[(PostgreSQL
companies table)] DB -->|Company list| Script Script -->|2. For each company| Loop{More companies?} Loop -->|Yes| BraveAPI[Brave Search API] Loop -->|No| Complete[Complete] BraveAPI -->|3. Search query
"company_name" OR "NIP"| BraveSearch[Brave Search
News Endpoint] BraveSearch -->|4. News results
JSON response| BraveAPI BraveAPI -->|5. News articles| Filter{Has results?} Filter -->|No| Loop Filter -->|Yes| AIFilter[AI Filtering Pipeline] AIFilter -->|6. For each article| Gemini[Google Gemini AI] Gemini -->|7. Analyze relevance| RelevanceScore[Calculate
relevance_score
0.0-1.0] RelevanceScore -->|8. Score + classification| Decision{Score >= 0.3?} Decision -->|No - Irrelevant| Discard[Discard article] Decision -->|Yes| SaveNews[Save to DB] SaveNews -->|9. INSERT ON CONFLICT| NewsDB[(company_news
table)] NewsDB -->|10. Check duplicates
by URL| DupeCheck{Duplicate?} DupeCheck -->|Yes| Skip[Skip - Already exists] DupeCheck -->|No| CreateRecord[Create news record
status='pending'] CreateRecord -->|11. News saved| NotifyCheck{Notify users?} NotifyCheck -->|Yes| CreateNotif[Create notifications] CreateNotif -->|12. INSERT| NotifDB[(user_notifications)] NotifyCheck -->|No| Loop CreateNotif -->|13. Done| Loop Discard --> Loop Skip --> Loop style BraveSearch fill:#FFD700 style Gemini fill:#4285F4 style NewsDB fill:#90EE90 style NotifDB fill:#90EE90 style AIFilter fill:#FFB6C1 ``` ### 1.2 Admin Moderation Flow ```mermaid sequenceDiagram participant Admin as Admin User participant Browser participant Flask as Flask App participant DB as PostgreSQL Admin->>Browser: Navigate to /admin/news Browser->>Flask: GET /admin/news Flask->>Flask: Check permissions (is_admin?) alt Not Admin Flask-->>Browser: 403 Forbidden else Is Admin Flask->>DB: SELECT * FROM company_news
WHERE moderation_status='pending' DB-->>Flask: Pending news list Flask-->>Browser: Render admin_news_moderation.html Browser-->>Admin: Display pending news end Admin->>Browser: Review article #42
Click "Approve" Browser->>Flask: POST /api/news/moderate
{news_id: 42, action: 'approve'} Flask->>Flask: Verify admin permissions Flask->>DB: UPDATE company_news
SET moderation_status='approved',
is_approved=TRUE,
moderated_by=admin_id,
moderated_at=NOW() DB-->>Flask: Updated Flask->>DB: INSERT INTO user_notifications
(type='news', related_id=42) DB-->>Flask: Notification created Flask-->>Browser: JSON: {success: true} Browser-->>Admin: Show success message Note over Admin,DB: Article now visible on company profile ``` ### 1.3 User View Flow (Company Profile) ```mermaid sequenceDiagram participant User as Visitor/Member participant Browser participant Flask as Flask App participant DB as PostgreSQL User->>Browser: Visit /company/pixlab-sp-z-o-o Browser->>Flask: GET /company/pixlab-sp-z-o-o Flask->>DB: SELECT * FROM companies
WHERE slug='pixlab-sp-z-o-o' DB-->>Flask: Company data Flask->>DB: SELECT * FROM company_news
WHERE company_id=26
AND is_approved=TRUE
AND is_visible=TRUE
ORDER BY published_date DESC
LIMIT 5 DB-->>Flask: Approved news (0-5 items) Flask-->>Browser: Render company_detail.html
with news section Browser-->>User: Display company profile
with "Aktualności" section alt Has approved news Browser-->>User: Show news cards
(title, date, source, summary) User->>Browser: Click "Czytaj więcej" Browser->>User: Open source_url in new tab else No news Browser-->>User: "Brak aktualności" end ``` --- ## 2. News Discovery Pipeline ### 2.1 Brave Search API Integration **Endpoint:** `https://api.search.brave.com/res/v1/news/search` **Authentication:** - API Key in `.env`: `BRAVE_SEARCH_API_KEY` - Header: `X-Subscription-Token: {API_KEY}` **Search Parameters:** ```python params = { "q": f'"{company_name}" OR "{nip}"', # Quoted for exact match "count": 10, # Max results per query "freshness": "pw", # Past week (pw), month (pm), year (py) "country": "pl", # Poland "search_lang": "pl", # Polish language "offset": 0 # Pagination (unused) } ``` **Rate Limits:** - **Free Tier:** 2,000 searches/month - **Paid Tier:** $5/1000 additional searches - **Throttling:** 1 request/second (built into script) **Response Format:** ```json { "type": "news", "news": { "results": [ { "title": "PIXLAB otwiera nową siedzibę w Wejherowie", "url": "https://example.com/article", "description": "Firma PIXLAB, specjalizująca się...", "age": "2 days ago", "meta_url": { "netloc": "example.com", "hostname": "example.com" }, "thumbnail": { "src": "https://example.com/image.jpg" } } ] } } ``` **Error Handling:** ```python try: response = requests.get(url, headers=headers, params=params, timeout=10) response.raise_for_status() data = response.json() except requests.exceptions.Timeout: # Retry with exponential backoff time.sleep(2 ** retry_count) except requests.exceptions.HTTPError as e: if e.response.status_code == 429: # Rate limit exceeded # Wait and retry time.sleep(60) elif e.response.status_code == 401: # Invalid API key # Log error and skip logger.error("Invalid Brave API key") except requests.exceptions.RequestException: # Network error - skip company logger.error(f"Network error for company {company_name}") ``` ### 2.2 News Discovery Script **File:** `scripts/fetch_company_news.py` (planned) **Usage:** ```bash # Fetch news for all companies python scripts/fetch_company_news.py --all # Fetch for specific company python scripts/fetch_company_news.py --company pixlab-sp-z-o-o # Dry run (no database writes) python scripts/fetch_company_news.py --all --dry-run # Fetch only high-priority companies python scripts/fetch_company_news.py --priority ``` **Implementation Outline:** ```python #!/usr/bin/env python3 """ Fetch company news from Brave Search API and store in database. """ import os import sys import time import requests from datetime import datetime from sqlalchemy import create_engine, select from sqlalchemy.orm import Session from database import Company, CompanyNews from gemini_service import GeminiService # Configuration BRAVE_API_KEY = os.getenv('BRAVE_SEARCH_API_KEY') BRAVE_NEWS_ENDPOINT = 'https://api.search.brave.com/res/v1/news/search' DATABASE_URL = os.getenv('DATABASE_URL') def fetch_news_for_company(company: Company, db: Session) -> int: """ Fetch news for a single company. Returns: Number of new articles found. """ # Build search query query = f'"{company.name}" OR "{company.nip}"' # Call Brave API headers = {'X-Subscription-Token': BRAVE_API_KEY} params = { 'q': query, 'count': 10, 'freshness': 'pw', # Past week 'country': 'pl', 'search_lang': 'pl' } response = requests.get(BRAVE_NEWS_ENDPOINT, headers=headers, params=params, timeout=10) response.raise_for_status() data = response.json() articles = data.get('news', {}).get('results', []) new_count = 0 for article in articles: # Check if already exists existing = db.query(CompanyNews).filter_by( company_id=company.id, source_url=article['url'] ).first() if existing: continue # Skip duplicate # AI filtering relevance = filter_with_ai(company, article) if relevance['score'] < 0.3: continue # Too irrelevant # Create news record news = CompanyNews( company_id=company.id, title=article['title'], summary=article['description'], source_url=article['url'], source_name=article['meta_url']['hostname'], source_type='web', news_type=relevance['type'], published_date=parse_date(article.get('age')), discovered_at=datetime.utcnow(), relevance_score=relevance['score'], ai_summary=relevance['summary'], ai_tags=relevance['tags'], moderation_status='pending', is_approved=False, is_visible=True ) db.add(news) new_count += 1 db.commit() return new_count def filter_with_ai(company: Company, article: dict) -> dict: """ Use Gemini AI to filter and classify news article. Returns: {score: float, type: str, summary: str, tags: list} """ gemini = GeminiService() prompt = f""" Oceń czy poniższy artykuł jest istotny dla firmy "{company.name}". Firma: {company.name} NIP: {company.nip} Branża: {company.category} Opis: {company.description} Artykuł: Tytuł: {article['title']} Treść: {article['description']} Zwróć JSON: {{ "relevance": 0.0-1.0, // 1.0 = bardzo istotny, 0.0 = całkowicie nieistotny "type": "news_mention|press_release|award|social_post|event|financial|partnership", "reason": "Krótkie uzasadnienie oceny", "summary": "Krótkie streszczenie artykułu (max 200 znaków)", "tags": ["tag1", "tag2", "tag3"] // Maksymalnie 5 tagów }} """ response = gemini.generate_content(prompt) result = parse_json(response.text) return { 'score': result['relevance'], 'type': result['type'], 'summary': result['summary'], 'tags': result['tags'] } def main(): """Main entry point.""" parser = argparse.ArgumentParser(description='Fetch company news from Brave API') parser.add_argument('--all', action='store_true', help='Fetch for all companies') parser.add_argument('--company', type=str, help='Fetch for specific company (slug)') parser.add_argument('--dry-run', action='store_true', help='Dry run (no DB writes)') parser.add_argument('--priority', action='store_true', help='Fetch only high-priority') args = parser.parse_args() engine = create_engine(DATABASE_URL) with Session(engine) as db: if args.all: companies = db.query(Company).filter_by(is_active=True).all() elif args.company: companies = [db.query(Company).filter_by(slug=args.company).first()] else: print("Error: Must specify --all or --company") sys.exit(1) total_new = 0 for company in companies: print(f"Fetching news for {company.name}...") new_count = fetch_news_for_company(company, db) total_new += new_count print(f" → Found {new_count} new articles") time.sleep(1) # Rate limiting print(f"\nTotal: {total_new} new articles") if __name__ == '__main__': main() ``` **Cron Job Setup (planned):** ```bash # Add to crontab (every 6 hours) 0 */6 * * * cd /var/www/nordabiznes && \ /var/www/nordabiznes/venv/bin/python3 scripts/fetch_company_news.py --all \ >> /var/log/nordabiznes/news_fetch.log 2>&1 ``` --- ## 3. AI Filtering and Classification ### 3.1 Gemini AI Integration **Purpose:** - Filter out irrelevant articles (false positives) - Calculate relevance score (0.0-1.0) - Classify news type (news_mention, press_release, award, etc.) - Generate AI summary - Extract tags for categorization **Relevance Scoring Criteria:** | Score Range | Description | Action | |-------------|-------------|--------| | 0.9 - 1.0 | Highly relevant - direct mention, official communication | Auto-approve | | 0.7 - 0.8 | Very relevant - significant mention or related news | Pending moderation | | 0.5 - 0.6 | Moderately relevant - indirect mention | Pending moderation | | 0.3 - 0.4 | Low relevance - tangential mention | Pending moderation | | 0.0 - 0.2 | Irrelevant - false positive, unrelated | Auto-reject (discard) | **AI Prompt Template:** ```python RELEVANCE_PROMPT = """ Jesteś ekspertem od analizy newsów firmowych. Oceń czy poniższy artykuł jest istotny dla firmy. INFORMACJE O FIRMIE: Nazwa: {company_name} NIP: {nip} Branża: {category} Opis: {description} Lokalizacja: {city} ARTYKUŁ DO OCENY: Tytuł: {article_title} Źródło: {source_name} Data: {published_date} Treść: {article_content} KRYTERIA OCENY: 1. Czy artykuł bezpośrednio wspomina o firmie (nazwa lub NIP)? 2. Czy dotyczy działalności firmy, produktów lub usług? 3. Czy jest to oficjalny komunikat prasowy firmy? 4. Czy informacje są istotne dla klientów lub partnerów firmy? 5. Czy artykuł dotyczy nagród, wyróżnień lub osiągnięć firmy? INSTRUKCJE: - Zwróć ocenę w formacie JSON - relevance: 0.0 (całkowicie nieistotny) do 1.0 (bardzo istotny) - type: klasyfikacja artykułu - reason: krótkie uzasadnienie (max 100 znaków) - summary: streszczenie artykułu (max 200 znaków) - tags: maksymalnie 5 tagów opisujących temat FORMAT ODPOWIEDZI (tylko JSON, bez dodatkowego tekstu): {{ "relevance": 0.85, "type": "news_mention", "reason": "Artykuł wspomina o nowym projekcie firmy", "summary": "Firma PIXLAB rozpoczyna realizację projektu XYZ...", "tags": ["projekty", "IT", "wejherowo", "innowacje"] }} DOSTĘPNE TYPY: - news_mention: Wzmianka w mediach - press_release: Oficjalny komunikat prasowy - award: Nagroda lub wyróżnienie - social_post: Post w mediach społecznościowych - event: Wydarzenie lub konferencja - financial: Informacje finansowe (wyniki, inwestycje) - partnership: Partnerstwo lub współpraca """ ``` **AI Cost Tracking:** ```python # Track AI API costs in ai_api_costs table def track_ai_cost(prompt_tokens: int, completion_tokens: int, model: str): """ Track AI API usage and cost. Gemini 3 Flash: Free tier 1,500 req/day """ cost_per_1k_input = 0.0 # Free tier cost_per_1k_output = 0.0 # Free tier input_cost = (prompt_tokens / 1000) * cost_per_1k_input output_cost = (completion_tokens / 1000) * cost_per_1k_output total_cost = input_cost + output_cost # Save to database cost_record = AIAPICost( service='gemini', model=model, operation='news_filtering', prompt_tokens=prompt_tokens, completion_tokens=completion_tokens, total_tokens=prompt_tokens + completion_tokens, cost=total_cost, created_at=datetime.utcnow() ) db.add(cost_record) db.commit() ``` ### 3.2 Classification Types **News Types:** 1. **news_mention** - General media mention - Company mentioned in news article - Industry news involving the company - Local or regional news coverage 2. **press_release** - Official company press release - Official statements from company - Product launches - Company announcements 3. **award** - Award or recognition - Industry awards won - Certifications achieved - Recognition or rankings 4. **social_post** - Social media post - Facebook posts - LinkedIn updates - Instagram stories (future) 5. **event** - Event announcement - Company hosting or participating in event - Conference appearances - Webinars or workshops 6. **financial** - Financial news - Revenue reports - Investment announcements - Funding rounds 7. **partnership** - Partnership or collaboration - New partnerships announced - Joint ventures - Strategic collaborations **Source Types:** - `web` - Web news article (Brave Search) - `facebook` - Facebook post (future) - `linkedin` - LinkedIn post (future) - `instagram` - Instagram post (future) - `press` - Press release portal - `award` - Award announcement --- ## 4. Database Schema ### 4.1 company_news Table **Purpose:** Store news and mentions for companies from various sources. **Schema:** ```sql CREATE TABLE company_news ( id SERIAL PRIMARY KEY, -- Company reference company_id INTEGER NOT NULL REFERENCES companies(id) ON DELETE CASCADE, -- News content title VARCHAR(500) NOT NULL, summary TEXT, content TEXT, -- Source information source_url VARCHAR(1000), source_name VARCHAR(255), source_type VARCHAR(50), -- Classification news_type VARCHAR(50) DEFAULT 'news_mention', -- Dates published_date TIMESTAMP, discovered_at TIMESTAMP DEFAULT NOW(), -- AI filtering is_approved BOOLEAN DEFAULT FALSE, is_visible BOOLEAN DEFAULT TRUE, relevance_score NUMERIC(3,2), ai_summary TEXT, ai_tags TEXT[], -- Moderation moderation_status VARCHAR(20) DEFAULT 'pending', moderated_by INTEGER REFERENCES users(id), moderated_at TIMESTAMP, rejection_reason VARCHAR(255), -- Engagement view_count INTEGER DEFAULT 0, -- Timestamps created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), -- Unique constraint CONSTRAINT uq_company_news_url UNIQUE (company_id, source_url) ); ``` **Indexes:** ```sql -- Performance indexes CREATE INDEX idx_company_news_company_id ON company_news(company_id); CREATE INDEX idx_company_news_source_type ON company_news(source_type); CREATE INDEX idx_company_news_news_type ON company_news(news_type); CREATE INDEX idx_company_news_is_approved ON company_news(is_approved); CREATE INDEX idx_company_news_published_date ON company_news(published_date DESC); CREATE INDEX idx_company_news_discovered_at ON company_news(discovered_at DESC); CREATE INDEX idx_company_news_moderation ON company_news(moderation_status); -- Composite index for efficient querying CREATE INDEX idx_company_news_approved_visible ON company_news(company_id, is_approved, is_visible) WHERE is_approved = TRUE AND is_visible = TRUE; ``` **Field Descriptions:** | Field | Type | Description | |-------|------|-------------| | `id` | SERIAL | Primary key | | `company_id` | INTEGER | Foreign key to companies table | | `title` | VARCHAR(500) | News headline | | `summary` | TEXT | Short excerpt or description | | `content` | TEXT | Full article content (if scraped) | | `source_url` | VARCHAR(1000) | Original URL of news article | | `source_name` | VARCHAR(255) | Name of source (e.g., "Gazeta Wyborcza") | | `source_type` | VARCHAR(50) | Type: web, facebook, linkedin, instagram, press, award | | `news_type` | VARCHAR(50) | Classification (see section 3.2) | | `published_date` | TIMESTAMP | Original publication date | | `discovered_at` | TIMESTAMP | When our system found it | | `is_approved` | BOOLEAN | Passed AI filter and approved for display | | `is_visible` | BOOLEAN | Visible on company profile | | `relevance_score` | NUMERIC(3,2) | AI-calculated relevance (0.00-1.00) | | `ai_summary` | TEXT | Gemini-generated summary | | `ai_tags` | TEXT[] | Array of AI-extracted tags | | `moderation_status` | VARCHAR(20) | Status: pending, approved, rejected | | `moderated_by` | INTEGER | Admin user ID who moderated | | `moderated_at` | TIMESTAMP | When moderation happened | | `rejection_reason` | VARCHAR(255) | Reason if rejected | | `view_count` | INTEGER | Number of views on platform | | `created_at` | TIMESTAMP | Record creation time | | `updated_at` | TIMESTAMP | Last update time | ### 4.2 user_notifications Table **Purpose:** In-app notifications for users with read/unread tracking. **Schema:** ```sql CREATE TABLE user_notifications ( id SERIAL PRIMARY KEY, -- User reference user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, -- Notification content title VARCHAR(255) NOT NULL, message TEXT, notification_type VARCHAR(50) DEFAULT 'info', -- Related entity (polymorphic reference) related_type VARCHAR(50), related_id INTEGER, -- Status is_read BOOLEAN DEFAULT FALSE, read_at TIMESTAMP, -- Action action_url VARCHAR(500), -- Timestamps created_at TIMESTAMP DEFAULT NOW() ); ``` **Indexes:** ```sql CREATE INDEX idx_user_notifications_user_id ON user_notifications(user_id); CREATE INDEX idx_user_notifications_type ON user_notifications(notification_type); CREATE INDEX idx_user_notifications_is_read ON user_notifications(is_read); CREATE INDEX idx_user_notifications_created_at ON user_notifications(created_at DESC); -- Composite index for unread notifications badge CREATE INDEX idx_user_notifications_unread ON user_notifications(user_id, is_read, created_at DESC) WHERE is_read = FALSE; ``` **Field Descriptions:** | Field | Type | Description | |-------|------|-------------| | `id` | SERIAL | Primary key | | `user_id` | INTEGER | Foreign key to users table | | `title` | VARCHAR(255) | Notification title | | `message` | TEXT | Full notification message | | `notification_type` | VARCHAR(50) | Type: news, system, message, event, alert | | `related_type` | VARCHAR(50) | Type of related entity (company_news, event, message) | | `related_id` | INTEGER | ID of related entity | | `is_read` | BOOLEAN | Has user read the notification? | | `read_at` | TIMESTAMP | When was it read? | | `action_url` | VARCHAR(500) | URL to navigate when clicked | | `created_at` | TIMESTAMP | Notification creation time | **Notification Types:** - `news` - New company news - `system` - System announcements - `message` - Private message notification - `event` - Event reminder/update - `alert` - Important alert --- ## 5. Admin Moderation Workflow ### 5.1 Admin Dashboard (`/admin/news`) **Purpose:** Allow admins to review, approve, or reject pending news items. **URL:** `/admin/news` **Authentication:** Requires `is_admin=True` **Features:** 1. **Pending News List** - Display all news with `moderation_status='pending'` - Sort by discovered_at DESC (newest first) - Show: title, company, source, published_date, relevance_score 2. **Filtering Options** - By company (dropdown) - By source_type (web, facebook, linkedin, etc.) - By news_type (news_mention, press_release, award, etc.) - By relevance_score range (0.0-1.0) - By date range (last 7 days, last 30 days, custom) 3. **Moderation Actions** - **Approve:** Set `moderation_status='approved'`, `is_approved=TRUE` - **Reject:** Set `moderation_status='rejected'`, `is_approved=FALSE` - **Edit:** Modify title, summary, news_type before approving - **Preview:** View full article (open source_url in new tab) 4. **Bulk Actions** - Approve all with relevance_score >= 0.8 - Reject all with relevance_score < 0.4 - Select multiple items for batch approval/rejection **UI Layout:** ``` ┌─────────────────────────────────────────────────────────────┐ │ NEWS MODERATION DASHBOARD │ ├─────────────────────────────────────────────────────────────┤ │ Filters: [Company ▼] [Type ▼] [Score ▼] [Date ▼] │ │ Bulk: [Approve Score>=0.8] [Reject Score<0.4] │ ├─────────────────────────────────────────────────────────────┤ │ Pending: 42 | Approved: 128 | Rejected: 15 │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ☐ PIXLAB otwiera nową siedzibę │ │ Company: PIXLAB | Type: news_mention | Score: 0.85 │ │ Source: trojmiasto.pl | Published: 2026-01-08 │ │ [Preview] [Approve] [Reject] [Edit] │ │ │ │ ☐ Graal zwycięzcą konkursu SME Leader │ │ Company: GRAAL | Type: award | Score: 0.95 │ │ Source: forbes.pl | Published: 2026-01-07 │ │ [Preview] [Approve] [Reject] [Edit] │ │ │ │ ☐ Losowe firmę wspomniała │ │ Company: ABC Sp. z o.o. | Type: news_mention | Score: 0.25 │ │ Source: random-blog.com | Published: 2026-01-05 │ │ [Preview] [Approve] [Reject] [Edit] │ │ │ └─────────────────────────────────────────────────────────────┘ ``` ### 5.2 Moderation API Endpoints **Endpoint:** `POST /api/news/moderate` **Authentication:** Admin only **Request Body:** ```json { "news_id": 42, "action": "approve", // or "reject" "rejection_reason": "Spam / Nieistotne / Duplikat" // Required if rejecting } ``` **Response:** ```json { "success": true, "message": "News approved successfully", "news_id": 42, "moderation_status": "approved" } ``` **Implementation:** ```python @app.route('/api/news/moderate', methods=['POST']) @login_required def api_news_moderate(): """Moderate a news item (admin only).""" if not current_user.is_admin: return jsonify({'error': 'Unauthorized'}), 403 data = request.get_json() news_id = data.get('news_id') action = data.get('action') # 'approve' or 'reject' rejection_reason = data.get('rejection_reason') news = db.session.get(CompanyNews, news_id) if not news: return jsonify({'error': 'News not found'}), 404 if action == 'approve': news.moderation_status = 'approved' news.is_approved = True news.moderated_by = current_user.id news.moderated_at = datetime.utcnow() # Create notification for company owner (if exists) company_user = db.session.query(User).filter_by(company_id=news.company_id).first() if company_user: notification = UserNotification( user_id=company_user.id, title=f"Nowa aktualność o {news.company.name}", message=f"Artykuł '{news.title}' został zatwierdzony i jest widoczny na profilu firmy.", notification_type='news', related_type='company_news', related_id=news.id, action_url=f"/company/{news.company.slug}#news" ) db.session.add(notification) elif action == 'reject': if not rejection_reason: return jsonify({'error': 'Rejection reason required'}), 400 news.moderation_status = 'rejected' news.is_approved = False news.is_visible = False news.moderated_by = current_user.id news.moderated_at = datetime.utcnow() news.rejection_reason = rejection_reason else: return jsonify({'error': 'Invalid action'}), 400 db.session.commit() return jsonify({ 'success': True, 'message': f"News {action}d successfully", 'news_id': news.id, 'moderation_status': news.moderation_status }) ``` ### 5.3 Auto-Approval Rules **High-Confidence Auto-Approval:** Automatically approve news if ALL conditions are met: 1. `relevance_score >= 0.9` 2. `source_type` in ('press', 'award') 3. Company name appears in title 4. Source is a trusted domain (whitelist) **Trusted Sources Whitelist:** ```python TRUSTED_NEWS_SOURCES = [ 'trojmiasto.pl', 'gdansk.pl', 'bizneswkaszubach.pl', 'pomorska.pl', 'forbes.pl', 'pulshr.pl', 'rp.pl', # Rzeczpospolita 'pb.pl', # Puls Biznesu 'gp24.pl' # Gazeta Pomorska ] ``` **Implementation:** ```python def should_auto_approve(news: CompanyNews) -> bool: """ Determine if news should be auto-approved. Returns True if news meets high-confidence criteria. """ if news.relevance_score < 0.9: return False if news.source_type not in ('press', 'award'): return False # Check if company name in title if news.company.name.lower() not in news.title.lower(): return False # Check if source is trusted from urllib.parse import urlparse domain = urlparse(news.source_url).netloc if domain not in TRUSTED_NEWS_SOURCES: return False return True ``` --- ## 6. Display on Company Profiles ### 6.1 News Section in Company Profile **Location:** `templates/company_detail.html` **Placement:** After "Social Media" section, before "Strona WWW" section **Visibility Rules:** - Only show news with `is_approved=TRUE` AND `is_visible=TRUE` - Sort by `published_date DESC` - Limit to 5 most recent items - If no approved news, don't show section **HTML Structure:** ```html {% if company_news %}

Aktualności

{% for news in company_news %}

{{ news.source_name }} {{ (news.relevance_score * 100)|int }}%

{% if news.ai_tags %}

{% for tag in news.ai_tags[:5] %} {{ tag }} {% endfor %}

{% endif %} Czytaj więcej

{% endfor %}

{% if company_news|length >= 5 %}

{% endif %}

{% endif %} ``` **CSS Styling:** ```css .news-grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(300px, 1fr)); gap: 20px; margin-top: 20px; } .news-card { background: white; border: 1px solid #e0e0e0; border-radius: 8px; padding: 20px; transition: box-shadow 0.2s; } .news-card:hover { box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1); } .news-header { display: flex; justify-content: space-between; align-items: center; margin-bottom: 12px; } .news-type-badge { padding: 4px 12px; border-radius: 4px; font-size: 12px; font-weight: 600; text-transform: uppercase; } .news-type-badge.news_mention { background: #e3f2fd; color: #1976d2; } .news-type-badge.press_release { background: #f3e5f5; color: #7b1fa2; } .news-type-badge.award { background: #fff3e0; color: #f57c00; } .news-date { color: #666; font-size: 13px; } .news-title { font-size: 18px; font-weight: 600; margin-bottom: 12px; color: #333; line-height: 1.4; } .news-summary { color: #555; font-size: 14px; line-height: 1.6; margin-bottom: 12px; } .news-meta { display: flex; justify-content: space-between; align-items: center; margin-bottom: 12px; font-size: 13px; color: #666; } .news-tags { display: flex; flex-wrap: wrap; gap: 8px; margin-bottom: 12px; } .news-tags .tag { background: #f5f5f5; padding: 4px 10px; border-radius: 12px; font-size: 12px; color: #555; } .news-link { display: inline-flex; align-items: center; gap: 6px; color: #1976d2; font-weight: 500; text-decoration: none; font-size: 14px; } .news-link:hover { text-decoration: underline; } ``` ### 6.2 Load More News (Pagination) **Endpoint:** `GET /api/company//news` **Parameters:** - `offset` - Number of items to skip (default: 0) - `limit` - Number of items to return (default: 5) **Response:** ```json { "news": [ { "id": 42, "title": "PIXLAB otwiera nową siedzibę", "summary": "Firma PIXLAB, specjalizująca się...", "source_name": "Trojmiasto.pl", "source_url": "https://example.com/article", "news_type": "news_mention", "published_date": "2026-01-08T10:30:00", "relevance_score": 0.85, "ai_tags": ["projekty", "IT", "wejherowo"] } ], "total": 12, "offset": 5, "limit": 5, "has_more": true } ``` **JavaScript Implementation:** ```javascript async function loadMoreNews(companyId) { const currentCount = document.querySelectorAll('.news-card').length; try { const response = await fetch(`/api/company/${companyId}/news?offset=${currentCount}&limit=5`); const data = await response.json(); if (data.news.length === 0) { document.querySelector('.news-load-more').innerHTML = '

Brak więcej aktualności

'; return; } const newsGrid = document.querySelector('.news-grid'); data.news.forEach(news => { const newsCard = createNewsCard(news); newsGrid.appendChild(newsCard); }); if (!data.has_more) { document.querySelector('.news-load-more').style.display = 'none'; } } catch (error) { console.error('Error loading more news:', error); alert('Błąd podczas ładowania aktualności'); } } function createNewsCard(news) { const card = document.createElement('div'); card.className = 'news-card'; card.innerHTML = `

${newsTypeLabels[news.news_type]} ${formatDate(news.published_date)}

${escapeHtml(news.title)}

${escapeHtml(news.summary)}

${escapeHtml(news.source_name)} ${Math.round(news.relevance_score * 100)}%

${news.ai_tags ? `

${news.ai_tags.slice(0, 5).map(tag => `${escapeHtml(tag)}` ).join('')}

` : ''} Czytaj więcej `; return card; } ``` --- ## 7. User Notification System ### 7.1 Notification Creation **When to Create Notifications:** 1. **New News Approved** - Notify company owner when their news is approved 2. **News Rejected** - Notify company owner when their news is rejected (optional) 3. **High-Priority News** - Notify NORDA members when high-relevance news appears for companies they follow (future feature) **Implementation:** ```python def create_news_approval_notification(news: CompanyNews, db: Session): """ Create notification when news is approved. Notify company owner (if user account exists). """ # Find company owner company_user = db.query(User).filter_by(company_id=news.company_id).first() if not company_user: return # No user account for this company notification = UserNotification( user_id=company_user.id, title=f"Nowa aktualność o {news.company.name}", message=f"Artykuł '{news.title}' został zatwierdzony i jest widoczny na profilu firmy.", notification_type='news', related_type='company_news', related_id=news.id, action_url=f"/company/{news.company.slug}#news", is_read=False ) db.add(notification) db.commit() ``` ### 7.2 Notification API **Endpoint:** `GET /api/notifications` **Authentication:** Requires logged-in user **Response:** ```json { "notifications": [ { "id": 123, "title": "Nowa aktualność o PIXLAB", "message": "Artykuł 'PIXLAB otwiera nową siedzibę' został zatwierdzony...", "notification_type": "news", "related_type": "company_news", "related_id": 42, "action_url": "/company/pixlab-sp-z-o-o#news", "is_read": false, "created_at": "2026-01-10T14:30:00" } ], "unread_count": 3, "total": 15 } ``` **Implementation:** ```python @app.route('/api/notifications', methods=['GET']) @login_required def api_notifications(): """Get user notifications.""" limit = request.args.get('limit', 20, type=int) offset = request.args.get('offset', 0, type=int) unread_only = request.args.get('unread_only', 'false') == 'true' query = db.session.query(UserNotification).filter_by(user_id=current_user.id) if unread_only: query = query.filter_by(is_read=False) total = query.count() unread_count = db.session.query(UserNotification).filter_by( user_id=current_user.id, is_read=False ).count() notifications = query.order_by(UserNotification.created_at.desc()) \ .limit(limit) \ .offset(offset) \ .all() return jsonify({ 'notifications': [n.to_dict() for n in notifications], 'unread_count': unread_count, 'total': total, 'has_more': (offset + limit) < total }) ``` ### 7.3 Mark as Read **Endpoint:** `POST /api/notifications//read` **Authentication:** Requires logged-in user **Response:** ```json { "success": true, "notification_id": 123, "is_read": true } ``` **Implementation:** ```python @app.route('/api/notifications//read', methods=['POST']) @login_required def api_notification_mark_read(notification_id): """Mark notification as read.""" notification = db.session.get(UserNotification, notification_id) if not notification: return jsonify({'error': 'Notification not found'}), 404 if notification.user_id != current_user.id: return jsonify({'error': 'Unauthorized'}), 403 notification.is_read = True notification.read_at = datetime.utcnow() db.session.commit() return jsonify({ 'success': True, 'notification_id': notification.id, 'is_read': notification.is_read }) ``` ### 7.4 Notification Badge (UI) **Location:** Navigation bar (next to user avatar) **Implementation:** ```html

``` --- ## 8. Performance and Optimization ### 8.1 Rate Limiting **Brave Search API:** - Free Tier: 2,000 searches/month - Rate Limit: 1 request/second (implemented in script) - Monthly Quota Tracking: Store in database **Gemini AI:** - Free Tier: 1,500 requests/day - Cost per request: $0.00 (free tier) - Track usage in `ai_api_costs` table **Database Query Optimization:** - Use composite indexes for approved + visible news - Cache company news list (5 min TTL) - Paginate results (5 items per page) ### 8.2 Caching Strategy **News List Caching:** ```python from functools import lru_cache from datetime import datetime, timedelta @lru_cache(maxsize=128) def get_company_news_cached(company_id: int, cache_key: str) -> list: """ Cache company news for 5 minutes. cache_key format: "news_{company_id}_{timestamp_5min}" """ news = db.session.query(CompanyNews).filter( CompanyNews.company_id == company_id, CompanyNews.is_approved == True, CompanyNews.is_visible == True ).order_by(CompanyNews.published_date.desc()).limit(5).all() return [n.to_dict() for n in news] def get_company_news(company_id: int) -> list: """Get company news with 5-minute cache.""" # Generate cache key (changes every 5 minutes) now = datetime.utcnow() cache_timestamp = now.replace(minute=(now.minute // 5) * 5, second=0, microsecond=0) cache_key = f"news_{company_id}_{cache_timestamp.isoformat()}" return get_company_news_cached(company_id, cache_key) ``` ### 8.3 Monitoring Queries **Check Quota Usage:** ```sql -- Brave API usage (last 30 days) SELECT COUNT(*) as total_searches, 2000 - COUNT(*) as remaining_quota, DATE(created_at) as search_date FROM company_news WHERE created_at >= NOW() - INTERVAL '30 days' AND source_type = 'web' GROUP BY DATE(created_at) ORDER BY search_date DESC; ``` **News Statistics:** ```sql -- News by status SELECT moderation_status, COUNT(*) as count, AVG(relevance_score) as avg_relevance FROM company_news GROUP BY moderation_status; -- News by type SELECT news_type, COUNT(*) as count FROM company_news WHERE is_approved = TRUE GROUP BY news_type ORDER BY count DESC; -- Top sources SELECT source_name, COUNT(*) as article_count, AVG(relevance_score) as avg_relevance FROM company_news WHERE is_approved = TRUE GROUP BY source_name ORDER BY article_count DESC LIMIT 10; ``` --- ## 9. Security Considerations ### 9.1 Input Validation **URL Validation:** ```python from urllib.parse import urlparse def is_valid_news_url(url: str) -> bool: """Validate news URL before storing.""" try: parsed = urlparse(url) # Must have scheme and netloc if not parsed.scheme or not parsed.netloc: return False # Only allow HTTP/HTTPS if parsed.scheme not in ('http', 'https'): return False # Block localhost and private IPs if 'localhost' in parsed.netloc or '127.0.0.1' in parsed.netloc: return False return True except Exception: return False ``` **Content Sanitization:** ```python from markupsafe import escape def sanitize_news_content(text: str) -> str: """Sanitize user-generated content.""" # Escape HTML text = escape(text) # Remove excessive whitespace text = ' '.join(text.split()) # Limit length max_length = 5000 if len(text) > max_length: text = text[:max_length] + '...' return text ``` ### 9.2 Rate Limiting (Flask-Limiter) **API Endpoints:** ```python from flask_limiter import Limiter from flask_limiter.util import get_remote_address limiter = Limiter( app=app, key_func=get_remote_address, default_limits=["200 per day", "50 per hour"] ) @app.route('/api/company//news') @limiter.limit("30 per minute") # Prevent abuse def api_company_news(company_id): """Get company news (rate limited).""" pass @app.route('/api/news/moderate', methods=['POST']) @login_required @limiter.limit("100 per hour") # Admin moderation limit def api_news_moderate(): """Moderate news (rate limited).""" pass ``` ### 9.3 CSRF Protection **All POST endpoints:** ```python from flask_wtf.csrf import CSRFProtect csrf = CSRFProtect(app) # Automatically protects all POST/PUT/DELETE requests # Frontend must include CSRF token: # ``` --- ## 10. Testing and Validation ### 10.1 Manual Testing Checklist **News Discovery:** - [ ] Brave API returns results for company name search - [ ] Brave API returns results for NIP search - [ ] Script handles companies with no results - [ ] Script handles API rate limits (429 error) - [ ] Script handles network errors gracefully - [ ] Deduplication works (same URL not inserted twice) **AI Filtering:** - [ ] Gemini AI returns valid JSON response - [ ] Relevance scores are between 0.0 and 1.0 - [ ] News types are correctly classified - [ ] AI summaries are generated correctly - [ ] Tags are relevant and limited to 5 **Admin Moderation:** - [ ] Only admins can access /admin/news - [ ] Pending news list displays correctly - [ ] Approve action updates database - [ ] Reject action updates database - [ ] Bulk actions work correctly - [ ] Filtering works (by company, type, score, date) **Company Profile Display:** - [ ] News section appears on company profile - [ ] Only approved news is shown - [ ] News sorted by published_date DESC - [ ] "Load more" pagination works - [ ] News cards display correctly **Notifications:** - [ ] Notification created when news approved - [ ] Notification badge shows unread count - [ ] Mark as read works correctly - [ ] Notification links to correct company profile ### 10.2 Database Integrity Tests **Run these queries:** ```sql -- Check for duplicate URLs (should return 0) SELECT company_id, source_url, COUNT(*) FROM company_news GROUP BY company_id, source_url HAVING COUNT(*) > 1; -- Check for invalid relevance scores (should return 0) SELECT id, relevance_score FROM company_news WHERE relevance_score < 0.0 OR relevance_score > 1.0; -- Check for orphaned news (company deleted) SELECT cn.id, cn.company_id FROM company_news cn LEFT JOIN companies c ON cn.company_id = c.id WHERE c.id IS NULL; -- Check for orphaned notifications (user deleted) SELECT un.id, un.user_id FROM user_notifications un LEFT JOIN users u ON un.user_id = u.id WHERE u.id IS NULL; ``` ### 10.3 Performance Tests **Load Testing:** ```bash # Test company profile with 50 news items ab -n 1000 -c 10 https://nordabiznes.pl/company/pixlab-sp-z-o-o # Test news API pagination ab -n 500 -c 5 https://nordabiznes.pl/api/company/26/news?offset=0&limit=5 # Test notification API ab -n 500 -c 5 -H "Cookie: session=..." https://nordabiznes.pl/api/notifications ``` **Expected Performance:** - Company profile load: < 500ms - News API (5 items): < 200ms - Notification API: < 150ms --- ## 11. Troubleshooting Guide ### 11.1 Common Issues **Issue: No news discovered for company** **Possible Causes:** 1. Company name too generic (e.g., "ABC") 2. No recent news published 3. Brave API rate limit exceeded 4. Network connectivity issues **Solution:** ```bash # Manual test for specific company python scripts/fetch_company_news.py --company pixlab-sp-z-o-o --dry-run # Check Brave API quota curl -H "X-Subscription-Token: $BRAVE_API_KEY" \ "https://api.search.brave.com/res/v1/news/search?q=test" ``` --- **Issue: News not appearing on company profile** **Possible Causes:** 1. News not approved (`is_approved=FALSE`) 2. News not visible (`is_visible=FALSE`) 3. Moderation status is `pending` or `rejected` 4. Cache not cleared **Solution:** ```sql -- Check news status SELECT id, title, is_approved, is_visible, moderation_status FROM company_news WHERE company_id = 26 ORDER BY created_at DESC; -- Force approve (admin only) UPDATE company_news SET is_approved = TRUE, is_visible = TRUE, moderation_status = 'approved' WHERE id = 42; ``` --- **Issue: AI filtering returns invalid JSON** **Possible Causes:** 1. Gemini response includes markdown formatting 2. Response truncated (token limit) 3. Response contains invalid JSON characters **Solution:** ```python def parse_gemini_json_response(response_text: str) -> dict: """Parse Gemini JSON response with error handling.""" import re import json # Remove markdown code blocks text = re.sub(r'```json\s*|\s*```', '', response_text) # Remove leading/trailing whitespace text = text.strip() try: return json.loads(text) except json.JSONDecodeError as e: logger.error(f"Invalid JSON from Gemini: {e}") logger.error(f"Response: {text}") # Return default values return { 'relevance': 0.5, 'type': 'news_mention', 'reason': 'AI parsing error', 'summary': '', 'tags': [] } ``` --- **Issue: Notification not received** **Possible Causes:** 1. Company has no user account 2. Notification creation failed (database error) 3. User has notifications disabled (future feature) **Solution:** ```sql -- Check if company has user account SELECT u.id, u.email, u.company_id FROM users u WHERE u.company_id = 26; -- Manually create notification INSERT INTO user_notifications ( user_id, title, message, notification_type, related_type, related_id, action_url ) VALUES ( 42, -- user_id 'Test notification', 'This is a test message', 'news', 'company_news', 123, -- news_id '/company/pixlab-sp-z-o-o#news' ); ``` --- ### 11.2 Diagnostic Queries **News Discovery Stats:** ```sql -- News discovered per day (last 30 days) SELECT DATE(discovered_at) as discovery_date, COUNT(*) as news_count, AVG(relevance_score) as avg_relevance FROM company_news WHERE discovered_at >= NOW() - INTERVAL '30 days' GROUP BY DATE(discovered_at) ORDER BY discovery_date DESC; ``` **Moderation Backlog:** ```sql -- Pending news count by company SELECT c.name, COUNT(*) as pending_count, MIN(cn.discovered_at) as oldest_pending FROM company_news cn JOIN companies c ON cn.company_id = c.id WHERE cn.moderation_status = 'pending' GROUP BY c.name ORDER BY pending_count DESC; ``` **AI Filtering Performance:** ```sql -- Relevance score distribution SELECT CASE WHEN relevance_score >= 0.8 THEN 'High (0.8-1.0)' WHEN relevance_score >= 0.5 THEN 'Medium (0.5-0.7)' WHEN relevance_score >= 0.3 THEN 'Low (0.3-0.4)' ELSE 'Very Low (0.0-0.2)' END as score_range, COUNT(*) as count, ROUND(AVG(relevance_score)::numeric, 2) as avg_score FROM company_news GROUP BY score_range ORDER BY avg_score DESC; ``` --- ## 12. Future Enhancements ### 12.1 Planned Features **Social Media Integration:** - Fetch posts from Facebook Pages - Fetch posts from LinkedIn Company Pages - Fetch posts from Instagram Business accounts - Unified news feed (web + social) **Advanced Filtering:** - Sentiment analysis (positive, neutral, negative) - Entity extraction (people, places, organizations) - Topic clustering (group similar news) - Trend detection (identify trending topics) **User Features:** - Follow companies to receive notifications - Save news items to favorites - Share news on social media - Comment on news articles (internal discussion) **Analytics:** - News engagement metrics (views, clicks, shares) - Company visibility score based on news coverage - Trending companies dashboard - News heatmap (by region, industry, time) **Automation:** - Auto-approve high-confidence news (score >= 0.95) - Auto-reject spam/irrelevant (score < 0.2) - Scheduled email digests for admins - RSS feed for approved news ### 12.2 Technical Improvements **Performance:** - Implement Redis caching for news lists - Add full-text search for news content - Optimize database queries with materialized views - Add CDN for news thumbnails **Reliability:** - Add retry mechanism for Brave API - Implement circuit breaker pattern - Add health check endpoint for news system - Monitor API quota usage with alerts **Security:** - Add content moderation for user comments (future) - Implement rate limiting per user (not just IP) - Add CAPTCHA for public API endpoints - Scan URLs for malware/phishing --- ## 13. Glossary **Terms:** - **Brave Search API** - News search API by Brave (alternative to Google News API) - **Company News** - News articles and mentions about companies in the Norda Biznes directory - **AI Filtering** - Automated relevance scoring using Google Gemini AI - **Moderation** - Manual review and approval process by admins - **Relevance Score** - AI-calculated score (0.0-1.0) indicating how relevant an article is to a company - **News Type** - Classification of news (news_mention, press_release, award, etc.) - **Source Type** - Origin of news (web, facebook, linkedin, etc.) - **Moderation Status** - Workflow state (pending, approved, rejected) - **User Notification** - In-app notification for users about new news **Database Tables:** - `company_news` - Stores news articles and mentions - `user_notifications` - Stores user notifications - `companies` - Company directory - `users` - User accounts --- ## 14. Related Documentation - [External Integrations Architecture](../06-external-integrations.md) - Brave Search and Gemini AI integration details - [Database Schema](../05-database-schema.md) - Complete database schema documentation - [Flask Components](../04-flask-components.md) - Flask application structure and routes - [AI Chat Flow](./03-ai-chat-flow.md) - Gemini AI integration patterns - `CLAUDE.md` - Main project documentation (News Monitoring section) - `database/migrate_news_tables.sql` - Database migration script --- ## 15. Maintenance Guidelines ### 15.1 Regular Maintenance Tasks **Daily:** - Monitor Brave API quota usage - Check moderation backlog (pending news) - Review AI filtering accuracy (sample check) **Weekly:** - Analyze news discovery statistics - Review rejected news for false negatives - Update trusted sources whitelist **Monthly:** - Review Brave API costs (if exceeding free tier) - Analyze news engagement metrics - Update AI filtering prompt (if needed) - Clean up old rejected news (> 90 days) ### 15.2 When to Update This Document Update this document when: - New news sources are added (e.g., social media) - AI filtering algorithm changes - Database schema changes (new fields, indexes) - Admin dashboard UI changes significantly - New API endpoints are added - Rate limits or quotas change **Update Process:** 1. Edit this markdown file 2. Verify Mermaid diagrams render correctly 3. Update "Last Updated" date at top 4. Commit with descriptive message: `docs: Update news monitoring flow - [what changed]` 5. Notify team via Slack/email --- **Document Status:** ✅ Complete - Ready for implementation **Implementation Status:** 🚧 Planned (Database schema ready, scripts pending) **Next Steps:** Implement `scripts/fetch_company_news.py` and `/admin/news` dashboard

Aktualności

{{ news.title }}

${escapeHtml(news.title)}