nordabiz/docs/architecture/flows/05-news-monitoring-flow.md
Maciej Pienczyn cebe52f303 refactor: Rebranding i aktualizacja modelu AI
- Zmiana nazwy: "Norda Biznes Hub" → "Norda Biznes Partner"
- Aktualizacja modelu AI: Gemini 2.0 Flash → Gemini 3 Flash
- Zachowano historyczne odniesienia w timeline i dokumentacji

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 14:08:39 +01:00

2058 lines
57 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# News Monitoring Flow
**Document Version:** 1.0
**Last Updated:** 2026-01-10
**Status:** Planned (Database schema ready, implementation pending)
**Flow Type:** Automated News Discovery and Moderation
---
## Overview
This document describes the **complete news monitoring flow** for the Norda Biznes Partner application, covering:
- **News Discovery** via Brave Search API
- **AI-Powered Filtering** using Google Gemini AI
- **Manual Moderation** workflow for admins
- **Company Profile Display** of approved news
- **User Notifications** for new news items
- **Database Storage** in `company_news` and `user_notifications` tables
**Key Technology:**
- **Search API:** Brave Search News API (free tier: 2,000 req/month)
- **AI Filter:** Google Gemini 3 Flash (relevance scoring and classification)
- **Database:** PostgreSQL (company_news and user_notifications tables)
- **Scheduler:** Planned cron job (6-hour intervals)
**Key Features:**
- Automated discovery of company mentions in news media
- AI-powered relevance scoring (0.0-1.0 scale)
- Automatic classification (news_mention, press_release, award, etc.)
- Admin moderation dashboard (`/admin/news`)
- Display on company profiles (approved news only)
- User notification system
- Deduplication by URL
**API Costs & Performance:**
- **API:** Brave Search News API (Free tier: 2,000 searches/month)
- **Pricing:** Free for 2,000 monthly searches
- **Typical Search Time:** 2-5 seconds per company
- **Monthly Capacity:** 2,000 searches ÷ 80 companies = 25 searches per company
- **Actual Cost:** $0.00 (within free tier)
**Planned Schedule:**
- Run every 6 hours (4 times/day)
- 80 companies × 4 runs = 320 searches/day
- 320 × 30 days = 9,600 searches/month
- **⚠️ EXCEEDS FREE TIER** - Need to implement rate limiting or paid tier
---
## 1. High-Level News Monitoring Flow
### 1.1 Complete News Monitoring Flow Diagram
```mermaid
flowchart TD
Cron[Cron Job<br/>Every 6 hours] -->|Trigger| Script[scripts/fetch_company_news.py]
Script -->|1. Fetch companies| DB[(PostgreSQL<br/>companies table)]
DB -->|Company list| Script
Script -->|2. For each company| Loop{More companies?}
Loop -->|Yes| BraveAPI[Brave Search API]
Loop -->|No| Complete[Complete]
BraveAPI -->|3. Search query<br/>"company_name" OR "NIP"| BraveSearch[Brave Search<br/>News Endpoint]
BraveSearch -->|4. News results<br/>JSON response| BraveAPI
BraveAPI -->|5. News articles| Filter{Has results?}
Filter -->|No| Loop
Filter -->|Yes| AIFilter[AI Filtering Pipeline]
AIFilter -->|6. For each article| Gemini[Google Gemini AI]
Gemini -->|7. Analyze relevance| RelevanceScore[Calculate<br/>relevance_score<br/>0.0-1.0]
RelevanceScore -->|8. Score + classification| Decision{Score >= 0.3?}
Decision -->|No - Irrelevant| Discard[Discard article]
Decision -->|Yes| SaveNews[Save to DB]
SaveNews -->|9. INSERT ON CONFLICT| NewsDB[(company_news<br/>table)]
NewsDB -->|10. Check duplicates<br/>by URL| DupeCheck{Duplicate?}
DupeCheck -->|Yes| Skip[Skip - Already exists]
DupeCheck -->|No| CreateRecord[Create news record<br/>status='pending']
CreateRecord -->|11. News saved| NotifyCheck{Notify users?}
NotifyCheck -->|Yes| CreateNotif[Create notifications]
CreateNotif -->|12. INSERT| NotifDB[(user_notifications)]
NotifyCheck -->|No| Loop
CreateNotif -->|13. Done| Loop
Discard --> Loop
Skip --> Loop
style BraveSearch fill:#FFD700
style Gemini fill:#4285F4
style NewsDB fill:#90EE90
style NotifDB fill:#90EE90
style AIFilter fill:#FFB6C1
```
### 1.2 Admin Moderation Flow
```mermaid
sequenceDiagram
participant Admin as Admin User
participant Browser
participant Flask as Flask App
participant DB as PostgreSQL
Admin->>Browser: Navigate to /admin/news
Browser->>Flask: GET /admin/news
Flask->>Flask: Check permissions (is_admin?)
alt Not Admin
Flask-->>Browser: 403 Forbidden
else Is Admin
Flask->>DB: SELECT * FROM company_news<br/>WHERE moderation_status='pending'
DB-->>Flask: Pending news list
Flask-->>Browser: Render admin_news_moderation.html
Browser-->>Admin: Display pending news
end
Admin->>Browser: Review article #42<br/>Click "Approve"
Browser->>Flask: POST /api/news/moderate<br/>{news_id: 42, action: 'approve'}
Flask->>Flask: Verify admin permissions
Flask->>DB: UPDATE company_news<br/>SET moderation_status='approved',<br/>is_approved=TRUE,<br/>moderated_by=admin_id,<br/>moderated_at=NOW()
DB-->>Flask: Updated
Flask->>DB: INSERT INTO user_notifications<br/>(type='news', related_id=42)
DB-->>Flask: Notification created
Flask-->>Browser: JSON: {success: true}
Browser-->>Admin: Show success message
Note over Admin,DB: Article now visible on company profile
```
### 1.3 User View Flow (Company Profile)
```mermaid
sequenceDiagram
participant User as Visitor/Member
participant Browser
participant Flask as Flask App
participant DB as PostgreSQL
User->>Browser: Visit /company/pixlab-sp-z-o-o
Browser->>Flask: GET /company/pixlab-sp-z-o-o
Flask->>DB: SELECT * FROM companies<br/>WHERE slug='pixlab-sp-z-o-o'
DB-->>Flask: Company data
Flask->>DB: SELECT * FROM company_news<br/>WHERE company_id=26<br/>AND is_approved=TRUE<br/>AND is_visible=TRUE<br/>ORDER BY published_date DESC<br/>LIMIT 5
DB-->>Flask: Approved news (0-5 items)
Flask-->>Browser: Render company_detail.html<br/>with news section
Browser-->>User: Display company profile<br/>with "Aktualności" section
alt Has approved news
Browser-->>User: Show news cards<br/>(title, date, source, summary)
User->>Browser: Click "Czytaj więcej"
Browser->>User: Open source_url in new tab
else No news
Browser-->>User: "Brak aktualności"
end
```
---
## 2. News Discovery Pipeline
### 2.1 Brave Search API Integration
**Endpoint:** `https://api.search.brave.com/res/v1/news/search`
**Authentication:**
- API Key in `.env`: `BRAVE_SEARCH_API_KEY`
- Header: `X-Subscription-Token: {API_KEY}`
**Search Parameters:**
```python
params = {
"q": f'"{company_name}" OR "{nip}"', # Quoted for exact match
"count": 10, # Max results per query
"freshness": "pw", # Past week (pw), month (pm), year (py)
"country": "pl", # Poland
"search_lang": "pl", # Polish language
"offset": 0 # Pagination (unused)
}
```
**Rate Limits:**
- **Free Tier:** 2,000 searches/month
- **Paid Tier:** $5/1000 additional searches
- **Throttling:** 1 request/second (built into script)
**Response Format:**
```json
{
"type": "news",
"news": {
"results": [
{
"title": "PIXLAB otwiera nową siedzibę w Wejherowie",
"url": "https://example.com/article",
"description": "Firma PIXLAB, specjalizująca się...",
"age": "2 days ago",
"meta_url": {
"netloc": "example.com",
"hostname": "example.com"
},
"thumbnail": {
"src": "https://example.com/image.jpg"
}
}
]
}
}
```
**Error Handling:**
```python
try:
response = requests.get(url, headers=headers, params=params, timeout=10)
response.raise_for_status()
data = response.json()
except requests.exceptions.Timeout:
# Retry with exponential backoff
time.sleep(2 ** retry_count)
except requests.exceptions.HTTPError as e:
if e.response.status_code == 429: # Rate limit exceeded
# Wait and retry
time.sleep(60)
elif e.response.status_code == 401: # Invalid API key
# Log error and skip
logger.error("Invalid Brave API key")
except requests.exceptions.RequestException:
# Network error - skip company
logger.error(f"Network error for company {company_name}")
```
### 2.2 News Discovery Script
**File:** `scripts/fetch_company_news.py` (planned)
**Usage:**
```bash
# Fetch news for all companies
python scripts/fetch_company_news.py --all
# Fetch for specific company
python scripts/fetch_company_news.py --company pixlab-sp-z-o-o
# Dry run (no database writes)
python scripts/fetch_company_news.py --all --dry-run
# Fetch only high-priority companies
python scripts/fetch_company_news.py --priority
```
**Implementation Outline:**
```python
#!/usr/bin/env python3
"""
Fetch company news from Brave Search API and store in database.
"""
import os
import sys
import time
import requests
from datetime import datetime
from sqlalchemy import create_engine, select
from sqlalchemy.orm import Session
from database import Company, CompanyNews
from gemini_service import GeminiService
# Configuration
BRAVE_API_KEY = os.getenv('BRAVE_SEARCH_API_KEY')
BRAVE_NEWS_ENDPOINT = 'https://api.search.brave.com/res/v1/news/search'
DATABASE_URL = os.getenv('DATABASE_URL')
def fetch_news_for_company(company: Company, db: Session) -> int:
"""
Fetch news for a single company.
Returns: Number of new articles found.
"""
# Build search query
query = f'"{company.name}" OR "{company.nip}"'
# Call Brave API
headers = {'X-Subscription-Token': BRAVE_API_KEY}
params = {
'q': query,
'count': 10,
'freshness': 'pw', # Past week
'country': 'pl',
'search_lang': 'pl'
}
response = requests.get(BRAVE_NEWS_ENDPOINT, headers=headers, params=params, timeout=10)
response.raise_for_status()
data = response.json()
articles = data.get('news', {}).get('results', [])
new_count = 0
for article in articles:
# Check if already exists
existing = db.query(CompanyNews).filter_by(
company_id=company.id,
source_url=article['url']
).first()
if existing:
continue # Skip duplicate
# AI filtering
relevance = filter_with_ai(company, article)
if relevance['score'] < 0.3:
continue # Too irrelevant
# Create news record
news = CompanyNews(
company_id=company.id,
title=article['title'],
summary=article['description'],
source_url=article['url'],
source_name=article['meta_url']['hostname'],
source_type='web',
news_type=relevance['type'],
published_date=parse_date(article.get('age')),
discovered_at=datetime.utcnow(),
relevance_score=relevance['score'],
ai_summary=relevance['summary'],
ai_tags=relevance['tags'],
moderation_status='pending',
is_approved=False,
is_visible=True
)
db.add(news)
new_count += 1
db.commit()
return new_count
def filter_with_ai(company: Company, article: dict) -> dict:
"""
Use Gemini AI to filter and classify news article.
Returns: {score: float, type: str, summary: str, tags: list}
"""
gemini = GeminiService()
prompt = f"""
Oceń czy poniższy artykuł jest istotny dla firmy "{company.name}".
Firma: {company.name}
NIP: {company.nip}
Branża: {company.category}
Opis: {company.description}
Artykuł:
Tytuł: {article['title']}
Treść: {article['description']}
Zwróć JSON:
{{
"relevance": 0.0-1.0, // 1.0 = bardzo istotny, 0.0 = całkowicie nieistotny
"type": "news_mention|press_release|award|social_post|event|financial|partnership",
"reason": "Krótkie uzasadnienie oceny",
"summary": "Krótkie streszczenie artykułu (max 200 znaków)",
"tags": ["tag1", "tag2", "tag3"] // Maksymalnie 5 tagów
}}
"""
response = gemini.generate_content(prompt)
result = parse_json(response.text)
return {
'score': result['relevance'],
'type': result['type'],
'summary': result['summary'],
'tags': result['tags']
}
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description='Fetch company news from Brave API')
parser.add_argument('--all', action='store_true', help='Fetch for all companies')
parser.add_argument('--company', type=str, help='Fetch for specific company (slug)')
parser.add_argument('--dry-run', action='store_true', help='Dry run (no DB writes)')
parser.add_argument('--priority', action='store_true', help='Fetch only high-priority')
args = parser.parse_args()
engine = create_engine(DATABASE_URL)
with Session(engine) as db:
if args.all:
companies = db.query(Company).filter_by(is_active=True).all()
elif args.company:
companies = [db.query(Company).filter_by(slug=args.company).first()]
else:
print("Error: Must specify --all or --company")
sys.exit(1)
total_new = 0
for company in companies:
print(f"Fetching news for {company.name}...")
new_count = fetch_news_for_company(company, db)
total_new += new_count
print(f" → Found {new_count} new articles")
time.sleep(1) # Rate limiting
print(f"\nTotal: {total_new} new articles")
if __name__ == '__main__':
main()
```
**Cron Job Setup (planned):**
```bash
# Add to crontab (every 6 hours)
0 */6 * * * cd /var/www/nordabiznes && \
/var/www/nordabiznes/venv/bin/python3 scripts/fetch_company_news.py --all \
>> /var/log/nordabiznes/news_fetch.log 2>&1
```
---
## 3. AI Filtering and Classification
### 3.1 Gemini AI Integration
**Purpose:**
- Filter out irrelevant articles (false positives)
- Calculate relevance score (0.0-1.0)
- Classify news type (news_mention, press_release, award, etc.)
- Generate AI summary
- Extract tags for categorization
**Relevance Scoring Criteria:**
| Score Range | Description | Action |
|-------------|-------------|--------|
| 0.9 - 1.0 | Highly relevant - direct mention, official communication | Auto-approve |
| 0.7 - 0.8 | Very relevant - significant mention or related news | Pending moderation |
| 0.5 - 0.6 | Moderately relevant - indirect mention | Pending moderation |
| 0.3 - 0.4 | Low relevance - tangential mention | Pending moderation |
| 0.0 - 0.2 | Irrelevant - false positive, unrelated | Auto-reject (discard) |
**AI Prompt Template:**
```python
RELEVANCE_PROMPT = """
Jesteś ekspertem od analizy newsów firmowych. Oceń czy poniższy artykuł jest istotny dla firmy.
INFORMACJE O FIRMIE:
Nazwa: {company_name}
NIP: {nip}
Branża: {category}
Opis: {description}
Lokalizacja: {city}
ARTYKUŁ DO OCENY:
Tytuł: {article_title}
Źródło: {source_name}
Data: {published_date}
Treść: {article_content}
KRYTERIA OCENY:
1. Czy artykuł bezpośrednio wspomina o firmie (nazwa lub NIP)?
2. Czy dotyczy działalności firmy, produktów lub usług?
3. Czy jest to oficjalny komunikat prasowy firmy?
4. Czy informacje są istotne dla klientów lub partnerów firmy?
5. Czy artykuł dotyczy nagród, wyróżnień lub osiągnięć firmy?
INSTRUKCJE:
- Zwróć ocenę w formacie JSON
- relevance: 0.0 (całkowicie nieistotny) do 1.0 (bardzo istotny)
- type: klasyfikacja artykułu
- reason: krótkie uzasadnienie (max 100 znaków)
- summary: streszczenie artykułu (max 200 znaków)
- tags: maksymalnie 5 tagów opisujących temat
FORMAT ODPOWIEDZI (tylko JSON, bez dodatkowego tekstu):
{{
"relevance": 0.85,
"type": "news_mention",
"reason": "Artykuł wspomina o nowym projekcie firmy",
"summary": "Firma PIXLAB rozpoczyna realizację projektu XYZ...",
"tags": ["projekty", "IT", "wejherowo", "innowacje"]
}}
DOSTĘPNE TYPY:
- news_mention: Wzmianka w mediach
- press_release: Oficjalny komunikat prasowy
- award: Nagroda lub wyróżnienie
- social_post: Post w mediach społecznościowych
- event: Wydarzenie lub konferencja
- financial: Informacje finansowe (wyniki, inwestycje)
- partnership: Partnerstwo lub współpraca
"""
```
**AI Cost Tracking:**
```python
# Track AI API costs in ai_api_costs table
def track_ai_cost(prompt_tokens: int, completion_tokens: int, model: str):
"""
Track AI API usage and cost.
Gemini 3 Flash: Free tier 1,500 req/day
"""
cost_per_1k_input = 0.0 # Free tier
cost_per_1k_output = 0.0 # Free tier
input_cost = (prompt_tokens / 1000) * cost_per_1k_input
output_cost = (completion_tokens / 1000) * cost_per_1k_output
total_cost = input_cost + output_cost
# Save to database
cost_record = AIAPICost(
service='gemini',
model=model,
operation='news_filtering',
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
total_tokens=prompt_tokens + completion_tokens,
cost=total_cost,
created_at=datetime.utcnow()
)
db.add(cost_record)
db.commit()
```
### 3.2 Classification Types
**News Types:**
1. **news_mention** - General media mention
- Company mentioned in news article
- Industry news involving the company
- Local or regional news coverage
2. **press_release** - Official company press release
- Official statements from company
- Product launches
- Company announcements
3. **award** - Award or recognition
- Industry awards won
- Certifications achieved
- Recognition or rankings
4. **social_post** - Social media post
- Facebook posts
- LinkedIn updates
- Instagram stories (future)
5. **event** - Event announcement
- Company hosting or participating in event
- Conference appearances
- Webinars or workshops
6. **financial** - Financial news
- Revenue reports
- Investment announcements
- Funding rounds
7. **partnership** - Partnership or collaboration
- New partnerships announced
- Joint ventures
- Strategic collaborations
**Source Types:**
- `web` - Web news article (Brave Search)
- `facebook` - Facebook post (future)
- `linkedin` - LinkedIn post (future)
- `instagram` - Instagram post (future)
- `press` - Press release portal
- `award` - Award announcement
---
## 4. Database Schema
### 4.1 company_news Table
**Purpose:** Store news and mentions for companies from various sources.
**Schema:**
```sql
CREATE TABLE company_news (
id SERIAL PRIMARY KEY,
-- Company reference
company_id INTEGER NOT NULL REFERENCES companies(id) ON DELETE CASCADE,
-- News content
title VARCHAR(500) NOT NULL,
summary TEXT,
content TEXT,
-- Source information
source_url VARCHAR(1000),
source_name VARCHAR(255),
source_type VARCHAR(50),
-- Classification
news_type VARCHAR(50) DEFAULT 'news_mention',
-- Dates
published_date TIMESTAMP,
discovered_at TIMESTAMP DEFAULT NOW(),
-- AI filtering
is_approved BOOLEAN DEFAULT FALSE,
is_visible BOOLEAN DEFAULT TRUE,
relevance_score NUMERIC(3,2),
ai_summary TEXT,
ai_tags TEXT[],
-- Moderation
moderation_status VARCHAR(20) DEFAULT 'pending',
moderated_by INTEGER REFERENCES users(id),
moderated_at TIMESTAMP,
rejection_reason VARCHAR(255),
-- Engagement
view_count INTEGER DEFAULT 0,
-- Timestamps
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
-- Unique constraint
CONSTRAINT uq_company_news_url UNIQUE (company_id, source_url)
);
```
**Indexes:**
```sql
-- Performance indexes
CREATE INDEX idx_company_news_company_id ON company_news(company_id);
CREATE INDEX idx_company_news_source_type ON company_news(source_type);
CREATE INDEX idx_company_news_news_type ON company_news(news_type);
CREATE INDEX idx_company_news_is_approved ON company_news(is_approved);
CREATE INDEX idx_company_news_published_date ON company_news(published_date DESC);
CREATE INDEX idx_company_news_discovered_at ON company_news(discovered_at DESC);
CREATE INDEX idx_company_news_moderation ON company_news(moderation_status);
-- Composite index for efficient querying
CREATE INDEX idx_company_news_approved_visible
ON company_news(company_id, is_approved, is_visible)
WHERE is_approved = TRUE AND is_visible = TRUE;
```
**Field Descriptions:**
| Field | Type | Description |
|-------|------|-------------|
| `id` | SERIAL | Primary key |
| `company_id` | INTEGER | Foreign key to companies table |
| `title` | VARCHAR(500) | News headline |
| `summary` | TEXT | Short excerpt or description |
| `content` | TEXT | Full article content (if scraped) |
| `source_url` | VARCHAR(1000) | Original URL of news article |
| `source_name` | VARCHAR(255) | Name of source (e.g., "Gazeta Wyborcza") |
| `source_type` | VARCHAR(50) | Type: web, facebook, linkedin, instagram, press, award |
| `news_type` | VARCHAR(50) | Classification (see section 3.2) |
| `published_date` | TIMESTAMP | Original publication date |
| `discovered_at` | TIMESTAMP | When our system found it |
| `is_approved` | BOOLEAN | Passed AI filter and approved for display |
| `is_visible` | BOOLEAN | Visible on company profile |
| `relevance_score` | NUMERIC(3,2) | AI-calculated relevance (0.00-1.00) |
| `ai_summary` | TEXT | Gemini-generated summary |
| `ai_tags` | TEXT[] | Array of AI-extracted tags |
| `moderation_status` | VARCHAR(20) | Status: pending, approved, rejected |
| `moderated_by` | INTEGER | Admin user ID who moderated |
| `moderated_at` | TIMESTAMP | When moderation happened |
| `rejection_reason` | VARCHAR(255) | Reason if rejected |
| `view_count` | INTEGER | Number of views on platform |
| `created_at` | TIMESTAMP | Record creation time |
| `updated_at` | TIMESTAMP | Last update time |
### 4.2 user_notifications Table
**Purpose:** In-app notifications for users with read/unread tracking.
**Schema:**
```sql
CREATE TABLE user_notifications (
id SERIAL PRIMARY KEY,
-- User reference
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
-- Notification content
title VARCHAR(255) NOT NULL,
message TEXT,
notification_type VARCHAR(50) DEFAULT 'info',
-- Related entity (polymorphic reference)
related_type VARCHAR(50),
related_id INTEGER,
-- Status
is_read BOOLEAN DEFAULT FALSE,
read_at TIMESTAMP,
-- Action
action_url VARCHAR(500),
-- Timestamps
created_at TIMESTAMP DEFAULT NOW()
);
```
**Indexes:**
```sql
CREATE INDEX idx_user_notifications_user_id ON user_notifications(user_id);
CREATE INDEX idx_user_notifications_type ON user_notifications(notification_type);
CREATE INDEX idx_user_notifications_is_read ON user_notifications(is_read);
CREATE INDEX idx_user_notifications_created_at ON user_notifications(created_at DESC);
-- Composite index for unread notifications badge
CREATE INDEX idx_user_notifications_unread
ON user_notifications(user_id, is_read, created_at DESC)
WHERE is_read = FALSE;
```
**Field Descriptions:**
| Field | Type | Description |
|-------|------|-------------|
| `id` | SERIAL | Primary key |
| `user_id` | INTEGER | Foreign key to users table |
| `title` | VARCHAR(255) | Notification title |
| `message` | TEXT | Full notification message |
| `notification_type` | VARCHAR(50) | Type: news, system, message, event, alert |
| `related_type` | VARCHAR(50) | Type of related entity (company_news, event, message) |
| `related_id` | INTEGER | ID of related entity |
| `is_read` | BOOLEAN | Has user read the notification? |
| `read_at` | TIMESTAMP | When was it read? |
| `action_url` | VARCHAR(500) | URL to navigate when clicked |
| `created_at` | TIMESTAMP | Notification creation time |
**Notification Types:**
- `news` - New company news
- `system` - System announcements
- `message` - Private message notification
- `event` - Event reminder/update
- `alert` - Important alert
---
## 5. Admin Moderation Workflow
### 5.1 Admin Dashboard (`/admin/news`)
**Purpose:** Allow admins to review, approve, or reject pending news items.
**URL:** `/admin/news`
**Authentication:** Requires `is_admin=True`
**Features:**
1. **Pending News List**
- Display all news with `moderation_status='pending'`
- Sort by discovered_at DESC (newest first)
- Show: title, company, source, published_date, relevance_score
2. **Filtering Options**
- By company (dropdown)
- By source_type (web, facebook, linkedin, etc.)
- By news_type (news_mention, press_release, award, etc.)
- By relevance_score range (0.0-1.0)
- By date range (last 7 days, last 30 days, custom)
3. **Moderation Actions**
- **Approve:** Set `moderation_status='approved'`, `is_approved=TRUE`
- **Reject:** Set `moderation_status='rejected'`, `is_approved=FALSE`
- **Edit:** Modify title, summary, news_type before approving
- **Preview:** View full article (open source_url in new tab)
4. **Bulk Actions**
- Approve all with relevance_score >= 0.8
- Reject all with relevance_score < 0.4
- Select multiple items for batch approval/rejection
**UI Layout:**
```
┌─────────────────────────────────────────────────────────────┐
│ NEWS MODERATION DASHBOARD │
├─────────────────────────────────────────────────────────────┤
│ Filters: [Company ▼] [Type ▼] [Score ▼] [Date ▼] │
│ Bulk: [Approve Score>=0.8] [Reject Score<0.4] │
├─────────────────────────────────────────────────────────────┤
│ Pending: 42 | Approved: 128 | Rejected: 15 │
├─────────────────────────────────────────────────────────────┤
│ │
│ ☐ PIXLAB otwiera nową siedzibę │
│ Company: PIXLAB | Type: news_mention | Score: 0.85 │
│ Source: trojmiasto.pl | Published: 2026-01-08 │
│ [Preview] [Approve] [Reject] [Edit] │
│ │
│ ☐ Graal zwycięzcą konkursu SME Leader │
│ Company: GRAAL | Type: award | Score: 0.95 │
│ Source: forbes.pl | Published: 2026-01-07 │
│ [Preview] [Approve] [Reject] [Edit] │
│ │
│ ☐ Losowe firmę wspomniała │
│ Company: ABC Sp. z o.o. | Type: news_mention | Score: 0.25 │
│ Source: random-blog.com | Published: 2026-01-05 │
│ [Preview] [Approve] [Reject] [Edit] │
│ │
└─────────────────────────────────────────────────────────────┘
```
### 5.2 Moderation API Endpoints
**Endpoint:** `POST /api/news/moderate`
**Authentication:** Admin only
**Request Body:**
```json
{
"news_id": 42,
"action": "approve", // or "reject"
"rejection_reason": "Spam / Nieistotne / Duplikat" // Required if rejecting
}
```
**Response:**
```json
{
"success": true,
"message": "News approved successfully",
"news_id": 42,
"moderation_status": "approved"
}
```
**Implementation:**
```python
@app.route('/api/news/moderate', methods=['POST'])
@login_required
def api_news_moderate():
"""Moderate a news item (admin only)."""
if not current_user.is_admin:
return jsonify({'error': 'Unauthorized'}), 403
data = request.get_json()
news_id = data.get('news_id')
action = data.get('action') # 'approve' or 'reject'
rejection_reason = data.get('rejection_reason')
news = db.session.get(CompanyNews, news_id)
if not news:
return jsonify({'error': 'News not found'}), 404
if action == 'approve':
news.moderation_status = 'approved'
news.is_approved = True
news.moderated_by = current_user.id
news.moderated_at = datetime.utcnow()
# Create notification for company owner (if exists)
company_user = db.session.query(User).filter_by(company_id=news.company_id).first()
if company_user:
notification = UserNotification(
user_id=company_user.id,
title=f"Nowa aktualność o {news.company.name}",
message=f"Artykuł '{news.title}' został zatwierdzony i jest widoczny na profilu firmy.",
notification_type='news',
related_type='company_news',
related_id=news.id,
action_url=f"/company/{news.company.slug}#news"
)
db.session.add(notification)
elif action == 'reject':
if not rejection_reason:
return jsonify({'error': 'Rejection reason required'}), 400
news.moderation_status = 'rejected'
news.is_approved = False
news.is_visible = False
news.moderated_by = current_user.id
news.moderated_at = datetime.utcnow()
news.rejection_reason = rejection_reason
else:
return jsonify({'error': 'Invalid action'}), 400
db.session.commit()
return jsonify({
'success': True,
'message': f"News {action}d successfully",
'news_id': news.id,
'moderation_status': news.moderation_status
})
```
### 5.3 Auto-Approval Rules
**High-Confidence Auto-Approval:**
Automatically approve news if ALL conditions are met:
1. `relevance_score >= 0.9`
2. `source_type` in ('press', 'award')
3. Company name appears in title
4. Source is a trusted domain (whitelist)
**Trusted Sources Whitelist:**
```python
TRUSTED_NEWS_SOURCES = [
'trojmiasto.pl',
'gdansk.pl',
'bizneswkaszubach.pl',
'pomorska.pl',
'forbes.pl',
'pulshr.pl',
'rp.pl', # Rzeczpospolita
'pb.pl', # Puls Biznesu
'gp24.pl' # Gazeta Pomorska
]
```
**Implementation:**
```python
def should_auto_approve(news: CompanyNews) -> bool:
"""
Determine if news should be auto-approved.
Returns True if news meets high-confidence criteria.
"""
if news.relevance_score < 0.9:
return False
if news.source_type not in ('press', 'award'):
return False
# Check if company name in title
if news.company.name.lower() not in news.title.lower():
return False
# Check if source is trusted
from urllib.parse import urlparse
domain = urlparse(news.source_url).netloc
if domain not in TRUSTED_NEWS_SOURCES:
return False
return True
```
---
## 6. Display on Company Profiles
### 6.1 News Section in Company Profile
**Location:** `templates/company_detail.html`
**Placement:** After "Social Media" section, before "Strona WWW" section
**Visibility Rules:**
- Only show news with `is_approved=TRUE` AND `is_visible=TRUE`
- Sort by `published_date DESC`
- Limit to 5 most recent items
- If no approved news, don't show section
**HTML Structure:**
```html
<!-- Company News Section -->
{% if company_news %}
<section class="company-section">
<h2 class="section-title">
<i class="fas fa-newspaper"></i> Aktualności
</h2>
<div class="news-grid">
{% for news in company_news %}
<div class="news-card">
<div class="news-header">
<span class="news-type-badge {{ news.news_type }}">
{{ news_type_labels[news.news_type] }}
</span>
<span class="news-date">{{ news.published_date|format_date }}</span>
</div>
<h3 class="news-title">{{ news.title }}</h3>
<p class="news-summary">
{{ news.ai_summary or news.summary }}
</p>
<div class="news-meta">
<span class="news-source">
<i class="fas fa-external-link-alt"></i> {{ news.source_name }}
</span>
<span class="news-relevance" title="Relevance: {{ news.relevance_score }}">
<i class="fas fa-star"></i> {{ (news.relevance_score * 100)|int }}%
</span>
</div>
{% if news.ai_tags %}
<div class="news-tags">
{% for tag in news.ai_tags[:5] %}
<span class="tag">{{ tag }}</span>
{% endfor %}
</div>
{% endif %}
<a href="{{ news.source_url }}" target="_blank" class="news-link">
Czytaj więcej <i class="fas fa-arrow-right"></i>
</a>
</div>
{% endfor %}
</div>
{% if company_news|length >= 5 %}
<div class="news-load-more">
<button class="btn-secondary" onclick="loadMoreNews({{ company.id }})">
Załaduj więcej aktualności
</button>
</div>
{% endif %}
</section>
{% endif %}
```
**CSS Styling:**
```css
.news-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(300px, 1fr));
gap: 20px;
margin-top: 20px;
}
.news-card {
background: white;
border: 1px solid #e0e0e0;
border-radius: 8px;
padding: 20px;
transition: box-shadow 0.2s;
}
.news-card:hover {
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);
}
.news-header {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 12px;
}
.news-type-badge {
padding: 4px 12px;
border-radius: 4px;
font-size: 12px;
font-weight: 600;
text-transform: uppercase;
}
.news-type-badge.news_mention {
background: #e3f2fd;
color: #1976d2;
}
.news-type-badge.press_release {
background: #f3e5f5;
color: #7b1fa2;
}
.news-type-badge.award {
background: #fff3e0;
color: #f57c00;
}
.news-date {
color: #666;
font-size: 13px;
}
.news-title {
font-size: 18px;
font-weight: 600;
margin-bottom: 12px;
color: #333;
line-height: 1.4;
}
.news-summary {
color: #555;
font-size: 14px;
line-height: 1.6;
margin-bottom: 12px;
}
.news-meta {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 12px;
font-size: 13px;
color: #666;
}
.news-tags {
display: flex;
flex-wrap: wrap;
gap: 8px;
margin-bottom: 12px;
}
.news-tags .tag {
background: #f5f5f5;
padding: 4px 10px;
border-radius: 12px;
font-size: 12px;
color: #555;
}
.news-link {
display: inline-flex;
align-items: center;
gap: 6px;
color: #1976d2;
font-weight: 500;
text-decoration: none;
font-size: 14px;
}
.news-link:hover {
text-decoration: underline;
}
```
### 6.2 Load More News (Pagination)
**Endpoint:** `GET /api/company/<company_id>/news`
**Parameters:**
- `offset` - Number of items to skip (default: 0)
- `limit` - Number of items to return (default: 5)
**Response:**
```json
{
"news": [
{
"id": 42,
"title": "PIXLAB otwiera nową siedzibę",
"summary": "Firma PIXLAB, specjalizująca się...",
"source_name": "Trojmiasto.pl",
"source_url": "https://example.com/article",
"news_type": "news_mention",
"published_date": "2026-01-08T10:30:00",
"relevance_score": 0.85,
"ai_tags": ["projekty", "IT", "wejherowo"]
}
],
"total": 12,
"offset": 5,
"limit": 5,
"has_more": true
}
```
**JavaScript Implementation:**
```javascript
async function loadMoreNews(companyId) {
const currentCount = document.querySelectorAll('.news-card').length;
try {
const response = await fetch(`/api/company/${companyId}/news?offset=${currentCount}&limit=5`);
const data = await response.json();
if (data.news.length === 0) {
document.querySelector('.news-load-more').innerHTML =
'<p>Brak więcej aktualności</p>';
return;
}
const newsGrid = document.querySelector('.news-grid');
data.news.forEach(news => {
const newsCard = createNewsCard(news);
newsGrid.appendChild(newsCard);
});
if (!data.has_more) {
document.querySelector('.news-load-more').style.display = 'none';
}
} catch (error) {
console.error('Error loading more news:', error);
alert('Błąd podczas ładowania aktualności');
}
}
function createNewsCard(news) {
const card = document.createElement('div');
card.className = 'news-card';
card.innerHTML = `
<div class="news-header">
<span class="news-type-badge ${news.news_type}">
${newsTypeLabels[news.news_type]}
</span>
<span class="news-date">${formatDate(news.published_date)}</span>
</div>
<h3 class="news-title">${escapeHtml(news.title)}</h3>
<p class="news-summary">${escapeHtml(news.summary)}</p>
<div class="news-meta">
<span class="news-source">
<i class="fas fa-external-link-alt"></i> ${escapeHtml(news.source_name)}
</span>
<span class="news-relevance">
<i class="fas fa-star"></i> ${Math.round(news.relevance_score * 100)}%
</span>
</div>
${news.ai_tags ? `
<div class="news-tags">
${news.ai_tags.slice(0, 5).map(tag =>
`<span class="tag">${escapeHtml(tag)}</span>`
).join('')}
</div>
` : ''}
<a href="${escapeHtml(news.source_url)}" target="_blank" class="news-link">
Czytaj więcej <i class="fas fa-arrow-right"></i>
</a>
`;
return card;
}
```
---
## 7. User Notification System
### 7.1 Notification Creation
**When to Create Notifications:**
1. **New News Approved** - Notify company owner when their news is approved
2. **News Rejected** - Notify company owner when their news is rejected (optional)
3. **High-Priority News** - Notify NORDA members when high-relevance news appears for companies they follow (future feature)
**Implementation:**
```python
def create_news_approval_notification(news: CompanyNews, db: Session):
"""
Create notification when news is approved.
Notify company owner (if user account exists).
"""
# Find company owner
company_user = db.query(User).filter_by(company_id=news.company_id).first()
if not company_user:
return # No user account for this company
notification = UserNotification(
user_id=company_user.id,
title=f"Nowa aktualność o {news.company.name}",
message=f"Artykuł '{news.title}' został zatwierdzony i jest widoczny na profilu firmy.",
notification_type='news',
related_type='company_news',
related_id=news.id,
action_url=f"/company/{news.company.slug}#news",
is_read=False
)
db.add(notification)
db.commit()
```
### 7.2 Notification API
**Endpoint:** `GET /api/notifications`
**Authentication:** Requires logged-in user
**Response:**
```json
{
"notifications": [
{
"id": 123,
"title": "Nowa aktualność o PIXLAB",
"message": "Artykuł 'PIXLAB otwiera nową siedzibę' został zatwierdzony...",
"notification_type": "news",
"related_type": "company_news",
"related_id": 42,
"action_url": "/company/pixlab-sp-z-o-o#news",
"is_read": false,
"created_at": "2026-01-10T14:30:00"
}
],
"unread_count": 3,
"total": 15
}
```
**Implementation:**
```python
@app.route('/api/notifications', methods=['GET'])
@login_required
def api_notifications():
"""Get user notifications."""
limit = request.args.get('limit', 20, type=int)
offset = request.args.get('offset', 0, type=int)
unread_only = request.args.get('unread_only', 'false') == 'true'
query = db.session.query(UserNotification).filter_by(user_id=current_user.id)
if unread_only:
query = query.filter_by(is_read=False)
total = query.count()
unread_count = db.session.query(UserNotification).filter_by(
user_id=current_user.id,
is_read=False
).count()
notifications = query.order_by(UserNotification.created_at.desc()) \
.limit(limit) \
.offset(offset) \
.all()
return jsonify({
'notifications': [n.to_dict() for n in notifications],
'unread_count': unread_count,
'total': total,
'has_more': (offset + limit) < total
})
```
### 7.3 Mark as Read
**Endpoint:** `POST /api/notifications/<notification_id>/read`
**Authentication:** Requires logged-in user
**Response:**
```json
{
"success": true,
"notification_id": 123,
"is_read": true
}
```
**Implementation:**
```python
@app.route('/api/notifications/<int:notification_id>/read', methods=['POST'])
@login_required
def api_notification_mark_read(notification_id):
"""Mark notification as read."""
notification = db.session.get(UserNotification, notification_id)
if not notification:
return jsonify({'error': 'Notification not found'}), 404
if notification.user_id != current_user.id:
return jsonify({'error': 'Unauthorized'}), 403
notification.is_read = True
notification.read_at = datetime.utcnow()
db.session.commit()
return jsonify({
'success': True,
'notification_id': notification.id,
'is_read': notification.is_read
})
```
### 7.4 Notification Badge (UI)
**Location:** Navigation bar (next to user avatar)
**Implementation:**
```html
<!-- Notification Badge -->
<div class="notification-badge" id="notificationBadge">
<i class="fas fa-bell"></i>
<span class="badge" id="unreadCount">0</span>
</div>
<script>
// Fetch unread count on page load
async function fetchUnreadCount() {
try {
const response = await fetch('/api/notifications?unread_only=true&limit=1');
const data = await response.json();
const badge = document.getElementById('unreadCount');
if (data.unread_count > 0) {
badge.textContent = data.unread_count;
badge.style.display = 'inline-block';
} else {
badge.style.display = 'none';
}
} catch (error) {
console.error('Error fetching notifications:', error);
}
}
// Refresh every 60 seconds
setInterval(fetchUnreadCount, 60000);
fetchUnreadCount();
</script>
```
---
## 8. Performance and Optimization
### 8.1 Rate Limiting
**Brave Search API:**
- Free Tier: 2,000 searches/month
- Rate Limit: 1 request/second (implemented in script)
- Monthly Quota Tracking: Store in database
**Gemini AI:**
- Free Tier: 1,500 requests/day
- Cost per request: $0.00 (free tier)
- Track usage in `ai_api_costs` table
**Database Query Optimization:**
- Use composite indexes for approved + visible news
- Cache company news list (5 min TTL)
- Paginate results (5 items per page)
### 8.2 Caching Strategy
**News List Caching:**
```python
from functools import lru_cache
from datetime import datetime, timedelta
@lru_cache(maxsize=128)
def get_company_news_cached(company_id: int, cache_key: str) -> list:
"""
Cache company news for 5 minutes.
cache_key format: "news_{company_id}_{timestamp_5min}"
"""
news = db.session.query(CompanyNews).filter(
CompanyNews.company_id == company_id,
CompanyNews.is_approved == True,
CompanyNews.is_visible == True
).order_by(CompanyNews.published_date.desc()).limit(5).all()
return [n.to_dict() for n in news]
def get_company_news(company_id: int) -> list:
"""Get company news with 5-minute cache."""
# Generate cache key (changes every 5 minutes)
now = datetime.utcnow()
cache_timestamp = now.replace(minute=(now.minute // 5) * 5, second=0, microsecond=0)
cache_key = f"news_{company_id}_{cache_timestamp.isoformat()}"
return get_company_news_cached(company_id, cache_key)
```
### 8.3 Monitoring Queries
**Check Quota Usage:**
```sql
-- Brave API usage (last 30 days)
SELECT
COUNT(*) as total_searches,
2000 - COUNT(*) as remaining_quota,
DATE(created_at) as search_date
FROM company_news
WHERE created_at >= NOW() - INTERVAL '30 days'
AND source_type = 'web'
GROUP BY DATE(created_at)
ORDER BY search_date DESC;
```
**News Statistics:**
```sql
-- News by status
SELECT
moderation_status,
COUNT(*) as count,
AVG(relevance_score) as avg_relevance
FROM company_news
GROUP BY moderation_status;
-- News by type
SELECT
news_type,
COUNT(*) as count
FROM company_news
WHERE is_approved = TRUE
GROUP BY news_type
ORDER BY count DESC;
-- Top sources
SELECT
source_name,
COUNT(*) as article_count,
AVG(relevance_score) as avg_relevance
FROM company_news
WHERE is_approved = TRUE
GROUP BY source_name
ORDER BY article_count DESC
LIMIT 10;
```
---
## 9. Security Considerations
### 9.1 Input Validation
**URL Validation:**
```python
from urllib.parse import urlparse
def is_valid_news_url(url: str) -> bool:
"""Validate news URL before storing."""
try:
parsed = urlparse(url)
# Must have scheme and netloc
if not parsed.scheme or not parsed.netloc:
return False
# Only allow HTTP/HTTPS
if parsed.scheme not in ('http', 'https'):
return False
# Block localhost and private IPs
if 'localhost' in parsed.netloc or '127.0.0.1' in parsed.netloc:
return False
return True
except Exception:
return False
```
**Content Sanitization:**
```python
from markupsafe import escape
def sanitize_news_content(text: str) -> str:
"""Sanitize user-generated content."""
# Escape HTML
text = escape(text)
# Remove excessive whitespace
text = ' '.join(text.split())
# Limit length
max_length = 5000
if len(text) > max_length:
text = text[:max_length] + '...'
return text
```
### 9.2 Rate Limiting (Flask-Limiter)
**API Endpoints:**
```python
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
limiter = Limiter(
app=app,
key_func=get_remote_address,
default_limits=["200 per day", "50 per hour"]
)
@app.route('/api/company/<int:company_id>/news')
@limiter.limit("30 per minute") # Prevent abuse
def api_company_news(company_id):
"""Get company news (rate limited)."""
pass
@app.route('/api/news/moderate', methods=['POST'])
@login_required
@limiter.limit("100 per hour") # Admin moderation limit
def api_news_moderate():
"""Moderate news (rate limited)."""
pass
```
### 9.3 CSRF Protection
**All POST endpoints:**
```python
from flask_wtf.csrf import CSRFProtect
csrf = CSRFProtect(app)
# Automatically protects all POST/PUT/DELETE requests
# Frontend must include CSRF token:
# <input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
```
---
## 10. Testing and Validation
### 10.1 Manual Testing Checklist
**News Discovery:**
- [ ] Brave API returns results for company name search
- [ ] Brave API returns results for NIP search
- [ ] Script handles companies with no results
- [ ] Script handles API rate limits (429 error)
- [ ] Script handles network errors gracefully
- [ ] Deduplication works (same URL not inserted twice)
**AI Filtering:**
- [ ] Gemini AI returns valid JSON response
- [ ] Relevance scores are between 0.0 and 1.0
- [ ] News types are correctly classified
- [ ] AI summaries are generated correctly
- [ ] Tags are relevant and limited to 5
**Admin Moderation:**
- [ ] Only admins can access /admin/news
- [ ] Pending news list displays correctly
- [ ] Approve action updates database
- [ ] Reject action updates database
- [ ] Bulk actions work correctly
- [ ] Filtering works (by company, type, score, date)
**Company Profile Display:**
- [ ] News section appears on company profile
- [ ] Only approved news is shown
- [ ] News sorted by published_date DESC
- [ ] "Load more" pagination works
- [ ] News cards display correctly
**Notifications:**
- [ ] Notification created when news approved
- [ ] Notification badge shows unread count
- [ ] Mark as read works correctly
- [ ] Notification links to correct company profile
### 10.2 Database Integrity Tests
**Run these queries:**
```sql
-- Check for duplicate URLs (should return 0)
SELECT company_id, source_url, COUNT(*)
FROM company_news
GROUP BY company_id, source_url
HAVING COUNT(*) > 1;
-- Check for invalid relevance scores (should return 0)
SELECT id, relevance_score
FROM company_news
WHERE relevance_score < 0.0 OR relevance_score > 1.0;
-- Check for orphaned news (company deleted)
SELECT cn.id, cn.company_id
FROM company_news cn
LEFT JOIN companies c ON cn.company_id = c.id
WHERE c.id IS NULL;
-- Check for orphaned notifications (user deleted)
SELECT un.id, un.user_id
FROM user_notifications un
LEFT JOIN users u ON un.user_id = u.id
WHERE u.id IS NULL;
```
### 10.3 Performance Tests
**Load Testing:**
```bash
# Test company profile with 50 news items
ab -n 1000 -c 10 https://nordabiznes.pl/company/pixlab-sp-z-o-o
# Test news API pagination
ab -n 500 -c 5 https://nordabiznes.pl/api/company/26/news?offset=0&limit=5
# Test notification API
ab -n 500 -c 5 -H "Cookie: session=..." https://nordabiznes.pl/api/notifications
```
**Expected Performance:**
- Company profile load: < 500ms
- News API (5 items): < 200ms
- Notification API: < 150ms
---
## 11. Troubleshooting Guide
### 11.1 Common Issues
**Issue: No news discovered for company**
**Possible Causes:**
1. Company name too generic (e.g., "ABC")
2. No recent news published
3. Brave API rate limit exceeded
4. Network connectivity issues
**Solution:**
```bash
# Manual test for specific company
python scripts/fetch_company_news.py --company pixlab-sp-z-o-o --dry-run
# Check Brave API quota
curl -H "X-Subscription-Token: $BRAVE_API_KEY" \
"https://api.search.brave.com/res/v1/news/search?q=test"
```
---
**Issue: News not appearing on company profile**
**Possible Causes:**
1. News not approved (`is_approved=FALSE`)
2. News not visible (`is_visible=FALSE`)
3. Moderation status is `pending` or `rejected`
4. Cache not cleared
**Solution:**
```sql
-- Check news status
SELECT id, title, is_approved, is_visible, moderation_status
FROM company_news
WHERE company_id = 26
ORDER BY created_at DESC;
-- Force approve (admin only)
UPDATE company_news
SET is_approved = TRUE,
is_visible = TRUE,
moderation_status = 'approved'
WHERE id = 42;
```
---
**Issue: AI filtering returns invalid JSON**
**Possible Causes:**
1. Gemini response includes markdown formatting
2. Response truncated (token limit)
3. Response contains invalid JSON characters
**Solution:**
```python
def parse_gemini_json_response(response_text: str) -> dict:
"""Parse Gemini JSON response with error handling."""
import re
import json
# Remove markdown code blocks
text = re.sub(r'```json\s*|\s*```', '', response_text)
# Remove leading/trailing whitespace
text = text.strip()
try:
return json.loads(text)
except json.JSONDecodeError as e:
logger.error(f"Invalid JSON from Gemini: {e}")
logger.error(f"Response: {text}")
# Return default values
return {
'relevance': 0.5,
'type': 'news_mention',
'reason': 'AI parsing error',
'summary': '',
'tags': []
}
```
---
**Issue: Notification not received**
**Possible Causes:**
1. Company has no user account
2. Notification creation failed (database error)
3. User has notifications disabled (future feature)
**Solution:**
```sql
-- Check if company has user account
SELECT u.id, u.email, u.company_id
FROM users u
WHERE u.company_id = 26;
-- Manually create notification
INSERT INTO user_notifications (
user_id, title, message, notification_type,
related_type, related_id, action_url
) VALUES (
42, -- user_id
'Test notification',
'This is a test message',
'news',
'company_news',
123, -- news_id
'/company/pixlab-sp-z-o-o#news'
);
```
---
### 11.2 Diagnostic Queries
**News Discovery Stats:**
```sql
-- News discovered per day (last 30 days)
SELECT
DATE(discovered_at) as discovery_date,
COUNT(*) as news_count,
AVG(relevance_score) as avg_relevance
FROM company_news
WHERE discovered_at >= NOW() - INTERVAL '30 days'
GROUP BY DATE(discovered_at)
ORDER BY discovery_date DESC;
```
**Moderation Backlog:**
```sql
-- Pending news count by company
SELECT
c.name,
COUNT(*) as pending_count,
MIN(cn.discovered_at) as oldest_pending
FROM company_news cn
JOIN companies c ON cn.company_id = c.id
WHERE cn.moderation_status = 'pending'
GROUP BY c.name
ORDER BY pending_count DESC;
```
**AI Filtering Performance:**
```sql
-- Relevance score distribution
SELECT
CASE
WHEN relevance_score >= 0.8 THEN 'High (0.8-1.0)'
WHEN relevance_score >= 0.5 THEN 'Medium (0.5-0.7)'
WHEN relevance_score >= 0.3 THEN 'Low (0.3-0.4)'
ELSE 'Very Low (0.0-0.2)'
END as score_range,
COUNT(*) as count,
ROUND(AVG(relevance_score)::numeric, 2) as avg_score
FROM company_news
GROUP BY score_range
ORDER BY avg_score DESC;
```
---
## 12. Future Enhancements
### 12.1 Planned Features
**Social Media Integration:**
- Fetch posts from Facebook Pages
- Fetch posts from LinkedIn Company Pages
- Fetch posts from Instagram Business accounts
- Unified news feed (web + social)
**Advanced Filtering:**
- Sentiment analysis (positive, neutral, negative)
- Entity extraction (people, places, organizations)
- Topic clustering (group similar news)
- Trend detection (identify trending topics)
**User Features:**
- Follow companies to receive notifications
- Save news items to favorites
- Share news on social media
- Comment on news articles (internal discussion)
**Analytics:**
- News engagement metrics (views, clicks, shares)
- Company visibility score based on news coverage
- Trending companies dashboard
- News heatmap (by region, industry, time)
**Automation:**
- Auto-approve high-confidence news (score >= 0.95)
- Auto-reject spam/irrelevant (score < 0.2)
- Scheduled email digests for admins
- RSS feed for approved news
### 12.2 Technical Improvements
**Performance:**
- Implement Redis caching for news lists
- Add full-text search for news content
- Optimize database queries with materialized views
- Add CDN for news thumbnails
**Reliability:**
- Add retry mechanism for Brave API
- Implement circuit breaker pattern
- Add health check endpoint for news system
- Monitor API quota usage with alerts
**Security:**
- Add content moderation for user comments (future)
- Implement rate limiting per user (not just IP)
- Add CAPTCHA for public API endpoints
- Scan URLs for malware/phishing
---
## 13. Glossary
**Terms:**
- **Brave Search API** - News search API by Brave (alternative to Google News API)
- **Company News** - News articles and mentions about companies in the Norda Biznes directory
- **AI Filtering** - Automated relevance scoring using Google Gemini AI
- **Moderation** - Manual review and approval process by admins
- **Relevance Score** - AI-calculated score (0.0-1.0) indicating how relevant an article is to a company
- **News Type** - Classification of news (news_mention, press_release, award, etc.)
- **Source Type** - Origin of news (web, facebook, linkedin, etc.)
- **Moderation Status** - Workflow state (pending, approved, rejected)
- **User Notification** - In-app notification for users about new news
**Database Tables:**
- `company_news` - Stores news articles and mentions
- `user_notifications` - Stores user notifications
- `companies` - Company directory
- `users` - User accounts
---
## 14. Related Documentation
- [External Integrations Architecture](../06-external-integrations.md) - Brave Search and Gemini AI integration details
- [Database Schema](../05-database-schema.md) - Complete database schema documentation
- [Flask Components](../04-flask-components.md) - Flask application structure and routes
- [AI Chat Flow](./03-ai-chat-flow.md) - Gemini AI integration patterns
- `CLAUDE.md` - Main project documentation (News Monitoring section)
- `database/migrate_news_tables.sql` - Database migration script
---
## 15. Maintenance Guidelines
### 15.1 Regular Maintenance Tasks
**Daily:**
- Monitor Brave API quota usage
- Check moderation backlog (pending news)
- Review AI filtering accuracy (sample check)
**Weekly:**
- Analyze news discovery statistics
- Review rejected news for false negatives
- Update trusted sources whitelist
**Monthly:**
- Review Brave API costs (if exceeding free tier)
- Analyze news engagement metrics
- Update AI filtering prompt (if needed)
- Clean up old rejected news (> 90 days)
### 15.2 When to Update This Document
Update this document when:
- New news sources are added (e.g., social media)
- AI filtering algorithm changes
- Database schema changes (new fields, indexes)
- Admin dashboard UI changes significantly
- New API endpoints are added
- Rate limits or quotas change
**Update Process:**
1. Edit this markdown file
2. Verify Mermaid diagrams render correctly
3. Update "Last Updated" date at top
4. Commit with descriptive message: `docs: Update news monitoring flow - [what changed]`
5. Notify team via Slack/email
---
**Document Status:** ✅ Complete - Ready for implementation
**Implementation Status:** 🚧 Planned (Database schema ready, scripts pending)
**Next Steps:** Implement `scripts/fetch_company_news.py` and `/admin/news` dashboard