nordabiz/docs/architecture/06-external-integrations.md
Maciej Pienczyn 23493f0b61 docs: Aktualizacja dokumentacji do Gemini 3 Flash
Zmiana domyślnego modelu w dokumentacji i kodzie:
- gemini-2.5-flash → gemini-3-flash-preview
- gemini-2.5-pro → gemini-3-pro-preview

Zaktualizowane pliki:
- README.md - opis technologii
- docs/architecture/*.md - diagramy i przepływy
- nordabiz_chat.py - fallback model name
- zopk_news_service.py - model dla AI evaluation
- templates/admin/zopk_dashboard.html - wyświetlany model

Zachowano mapowania legacy modeli dla kompatybilności wstecznej.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 14:19:05 +01:00

31 KiB

External Integrations Architecture

Document Version: 1.0 Last Updated: 2026-01-10 Status: Production LIVE Diagram Type: External Systems Integration Architecture


Overview

This diagram shows the external APIs and data sources integrated with Norda Biznes Partner. It illustrates:

  • 6 major API integrations (Google Gemini, Brave Search, PageSpeed, Places, KRS, MS Graph)
  • 2 web scraping sources (ALEO.com, rejestr.io)
  • Authentication methods for each integration
  • Data flows and usage patterns
  • Rate limits and quota management
  • Cost tracking and optimization

Abstraction Level: External Integration Architecture Audience: Developers, DevOps, System Architects Purpose: Understanding external dependencies, API usage, and integration patterns


Integration Architecture Diagram

graph TB
    %% Main system
    subgraph "Norda Biznes Partner System"
        WebApp["🌐 Flask Web Application<br/>app.py"]

        subgraph "Service Layer"
            GeminiSvc["🤖 Gemini Service<br/>gemini_service.py"]
            ChatSvc["💬 Chat Service<br/>nordabiz_chat.py"]
            EmailSvc["📧 Email Service<br/>email_service.py"]
            KRSSvc["🏛️ KRS Service<br/>krs_api_service.py"]
            GBPSvc["📊 GBP Audit Service<br/>gbp_audit_service.py"]
        end

        subgraph "Background Scripts"
            SEOScript["📊 SEO Audit<br/>scripts/seo_audit.py"]
            SocialScript["📱 Social Media Audit<br/>scripts/social_media_audit.py"]
            ImportScript["📥 Data Import<br/>scripts/import_*.py"]
        end

        Database["💾 PostgreSQL Database<br/>localhost:5432"]
    end

    %% External integrations
    subgraph "AI & ML Services"
        Gemini["🤖 Google Gemini API<br/>gemini-3-flash-preview<br/><br/>Free tier: unlimited<br/>Auth: API Key<br/>Cost: Free (preview)"]
    end

    subgraph "SEO & Analytics"
        PageSpeed["📊 Google PageSpeed Insights<br/>v5 API<br/><br/>Free tier: 25,000 req/day<br/>Auth: API Key<br/>Cost: Free"]
        Places["📍 Google Places API<br/>Maps Platform<br/><br/>Pay-per-use<br/>Auth: API Key<br/>Cost: $0.032/request"]
    end

    subgraph "Search & Discovery"
        BraveAPI["🔍 Brave Search API<br/>Web & News Search<br/><br/>Free tier: 2,000 req/month<br/>Auth: API Key<br/>Cost: Free"]
    end

    subgraph "Data Sources"
        KRS["🏛️ KRS Open API<br/>Ministry of Justice Poland<br/><br/>No limits (public API)<br/>Auth: None<br/>Cost: Free"]
        ALEO["🌐 ALEO.com<br/>NIP Verification Service<br/><br/>Web scraping (Playwright)<br/>Auth: None<br/>Cost: Free"]
        Rejestr["🔗 rejestr.io<br/>Company Connections<br/><br/>Web scraping (Playwright)<br/>Auth: None<br/>Cost: Free"]
    end

    subgraph "Communication"
        MSGraph["📧 Microsoft Graph API<br/>Email & Notifications<br/><br/>10,000 req/10min<br/>Auth: OAuth 2.0 Client Credentials<br/>Cost: Included in M365"]
    end

    %% Service layer connections to external APIs
    GeminiSvc -->|"HTTPS POST<br/>generateContent<br/>API Key: GOOGLE_GEMINI_API_KEY"| Gemini
    ChatSvc --> GeminiSvc
    GBPSvc -->|"Generate AI recommendations"| GeminiSvc

    KRSSvc -->|"HTTPS GET<br/>OdpisAktualny/{krs}<br/>Public API (no auth)"| KRS

    EmailSvc -->|"HTTPS POST<br/>users/{id}/sendMail<br/>OAuth 2.0 + Client Credentials"| MSGraph

    GBPSvc -->|"HTTPS GET<br/>findplacefromtext<br/>placedetails<br/>API Key: GOOGLE_PLACES_API_KEY"| Places

    %% Script connections to external APIs
    SEOScript -->|"HTTPS GET<br/>runPagespeed<br/>API Key: GOOGLE_PAGESPEED_API_KEY<br/>Quota tracking: 25K/day"| PageSpeed

    SocialScript -->|"HTTPS GET<br/>web/search<br/>news/search<br/>API Key: BRAVE_SEARCH_API_KEY"| BraveAPI
    SocialScript -->|"Fallback for reviews"| Places

    ImportScript -->|"NIP verification<br/>Playwright browser automation"| ALEO
    ImportScript -->|"Company connections<br/>Playwright browser automation"| Rejestr
    ImportScript --> KRSSvc

    %% Data flows back to database
    GeminiSvc -->|"Log costs<br/>ai_api_costs table"| Database
    SEOScript -->|"Store metrics<br/>company_website_analysis"| Database
    SocialScript -->|"Store profiles<br/>company_social_media"| Database
    GBPSvc -->|"Store audit results<br/>gbp_audits"| Database
    ImportScript -->|"Import companies<br/>companies table"| Database

    %% Web app connections
    WebApp --> ChatSvc
    WebApp --> EmailSvc
    WebApp --> KRSSvc
    WebApp --> GBPSvc
    WebApp --> GeminiSvc
    WebApp --> Database

    %% Styling
    classDef serviceStyle fill:#85bbf0,stroke:#5d92c7,color:#000000,stroke-width:2px
    classDef scriptStyle fill:#ffd93d,stroke:#ccae31,color:#000000,stroke-width:2px
    classDef aiStyle fill:#ff6b9d,stroke:#cc5579,color:#ffffff,stroke-width:3px
    classDef seoStyle fill:#c44569,stroke:#9d3754,color:#ffffff,stroke-width:3px
    classDef searchStyle fill:#6a89cc,stroke:#5570a3,color:#ffffff,stroke-width:3px
    classDef dataStyle fill:#4a69bd,stroke:#3b5497,color:#ffffff,stroke-width:3px
    classDef commStyle fill:#20bf6b,stroke:#1a9956,color:#ffffff,stroke-width:3px
    classDef dbStyle fill:#438dd5,stroke:#2e6295,color:#ffffff,stroke-width:3px
    classDef appStyle fill:#1168bd,stroke:#0b4884,color:#ffffff,stroke-width:3px

    class GeminiSvc,ChatSvc,EmailSvc,KRSSvc,GBPSvc serviceStyle
    class SEOScript,SocialScript,ImportScript scriptStyle
    class Gemini aiStyle
    class PageSpeed,Places seoStyle
    class BraveAPI searchStyle
    class KRS,ALEO,Rejestr dataStyle
    class MSGraph commStyle
    class Database dbStyle
    class WebApp appStyle

External Integration Details

🤖 Google Gemini API

Purpose: AI-powered text generation, chat, and image analysis Service Files: gemini_service.py, nordabiz_chat.py, gbp_audit_service.py Status: Production (Free Tier)

Configuration

Parameter Value
Provider Google Generative AI
Endpoint https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
Authentication API Key
Environment Variable GOOGLE_GEMINI_API_KEY
Default Model gemini-3-flash-preview
Timeout None (default)

Available Models

GEMINI_MODELS = {
    '3-flash': 'gemini-3-flash-preview',   # Default - 7x better reasoning, thinking mode
    '3-pro': 'gemini-3-pro-preview',       # Advanced - best reasoning, 2M context
    'flash': 'gemini-2.5-flash',           # Legacy - balanced cost/quality
    'flash-lite': 'gemini-2.5-flash-lite', # Legacy - ultra cheap
    'pro': 'gemini-2.5-pro',               # Legacy - high quality
    'flash-2.0': 'gemini-2.0-flash',       # Legacy - 1M context (wycofywany 31.03.2026)
}

Pricing (per 1M tokens)

Model Input Cost Output Cost
gemini-3-flash-preview Free Free
gemini-3-pro-preview $2.00 $12.00
gemini-2.5-flash $0.30 $2.50
gemini-2.5-flash-lite $0.10 $0.40
gemini-2.5-pro $1.25 $10.00
gemini-2.0-flash $0.10 $0.40

Rate Limits

  • Free Tier (Gemini 3 Flash Preview): Unlimited requests
  • Token Limits: Model-dependent (1M for flash-2.0)

Integration Points

Feature File Function
AI Chat nordabiz_chat.py NordaBizChatEngine.chat()
GBP Recommendations gbp_audit_service.py generate_ai_recommendations()
Text Generation gemini_service.py generate_text()
Image Analysis gemini_service.py analyze_image()

Cost Tracking

All API calls logged to ai_api_costs table:

ai_api_costs (
    id, timestamp, api_provider, model_name,
    feature, user_id,
    input_tokens, output_tokens, total_tokens,
    input_cost, output_cost, total_cost,
    success, error_message, latency_ms,
    prompt_hash
)

Data Flow

User Message → ChatService → GeminiService
                                ↓
                        Google Gemini API
                                ↓
                    Response + Token Count
                                ↓
                    Cost Calculation → ai_api_costs
                                ↓
                        Response to User

🏛️ KRS Open API

Purpose: Official company data from Polish National Court Register Service Files: krs_api_service.py Status: Production

Configuration

Parameter Value
Provider Ministry of Justice Poland
Endpoint https://api-krs.ms.gov.pl/api/krs/OdpisAktualny/{krs}
Authentication None (public API)
Timeout 15 seconds
Response Format JSON

Rate Limits

  • Official Limit: None documented
  • Best Practice: 1-2 second delays between requests
  • Timeout: 15 seconds configured

Data Retrieved

  • Basic identifiers (KRS, NIP, REGON)
  • Company name (full and shortened)
  • Legal form (Sp. z o.o., S.A., etc.)
  • Full address (street, city, voivodeship)
  • Share capital and currency
  • Registration dates
  • Management board (anonymized in Open API)
  • Shareholders (anonymized in Open API)
  • Business activities
  • OPP status (Organizacja Pożytku Publicznego)

Integration Points

Feature File Usage
Data Import import_*.py scripts Company verification
Manual Verification verify_all_companies_data.py Batch verification
API Endpoint app.py /api/verify-krs

Data Flow

Import Script → KRSService.get_company_from_krs()
                        ↓
                  KRS Open API
                        ↓
            KRSCompanyData (dataclass)
                        ↓
        Verification & Validation
                        ↓
        Update companies table

📊 Google PageSpeed Insights API

Purpose: SEO, performance, accessibility, and best practices analysis Service Files: scripts/seo_audit.py, scripts/pagespeed_client.py Status: Production

Configuration

Parameter Value
Provider Google PageSpeed Insights
Endpoint https://www.googleapis.com/pagespeedonline/v5/runPagespeed
Authentication API Key
Environment Variable GOOGLE_PAGESPEED_API_KEY
Google Cloud Project NORDABIZNES (gen-lang-client-0540794446)
Timeout 30 seconds
Strategy Mobile (default), Desktop (optional)

Rate Limits

  • Free Tier: 25,000 queries/day
  • Per-Second: Recommended 1 query/second
  • Quota Tracking: In-memory counter in pagespeed_client.py

Metrics Returned

@dataclass
class PageSpeedScores:
    seo: int               # 0-100 SEO score
    performance: int       # 0-100 Performance score
    accessibility: int     # 0-100 Accessibility score
    best_practices: int    # 0-100 Best Practices score
    pwa: Optional[int]     # 0-100 PWA score

@dataclass
class CoreWebVitals:
    lcp_ms: Optional[int]  # Largest Contentful Paint
    fid_ms: Optional[int]  # First Input Delay
    cls: Optional[float]   # Cumulative Layout Shift

Database Storage

Results saved to company_website_analysis:

company_website_analysis (
    company_id PRIMARY KEY,
    analyzed_at,
    pagespeed_seo_score,
    pagespeed_performance_score,
    pagespeed_accessibility_score,
    pagespeed_best_practices_score,
    pagespeed_audits JSONB,
    largest_contentful_paint_ms,
    first_input_delay_ms,
    cumulative_layout_shift,
    seo_overall_score,
    seo_health_score,
    seo_issues JSONB
)

Integration Points

Feature File Endpoint/Function
Admin Dashboard app.py /admin/seo
Audit Script scripts/seo_audit.py CLI tool
Batch Audits scripts/seo_audit.py SEOAuditor.run_audit()

Data Flow

Admin Trigger → SEO Audit Script
                        ↓
                PageSpeed API
                        ↓
    Scores + Core Web Vitals + Audits
                        ↓
        company_website_analysis table
                        ↓
            Admin Dashboard Display

📍 Google Places API

Purpose: Business profiles, ratings, reviews, and opening hours Service Files: gbp_audit_service.py, scripts/social_media_audit.py Status: Production

Configuration

Parameter Value
Provider Google Maps Platform
Endpoints Find Place from Text, Place Details
Authentication API Key
Environment Variable GOOGLE_PLACES_API_KEY
Timeout 15 seconds
Language Polish (pl)

Cost

  • Pricing Model: Pay-per-use
  • Cost per Request: ~$0.032 per Place Details call
  • Optimization: 24-hour cache in database

Endpoints Used

1. Find Place from Text

https://maps.googleapis.com/maps/api/place/findplacefromtext/json

2. Place Details

https://maps.googleapis.com/maps/api/place/details/json

Data Retrieved

{
    'google_place_id': str,           # Unique Place ID
    'google_name': str,               # Business name
    'google_address': str,            # Formatted address
    'google_phone': str,              # Phone number
    'google_website': str,            # Website URL
    'google_types': List[str],        # Business categories
    'google_maps_url': str,           # Google Maps link
    'google_rating': Decimal,         # Rating (1.0-5.0)
    'google_reviews_count': int,      # Number of reviews
    'google_photos_count': int,       # Number of photos
    'google_opening_hours': dict,     # Opening hours
    'google_business_status': str     # OPERATIONAL, CLOSED, etc.
}

Cache Strategy

  • Cache Duration: 24 hours
  • Storage: company_website_analysis.analyzed_at
  • Force Refresh: force_refresh=True parameter

Integration Points

Feature File Function
GBP Audit gbp_audit_service.py fetch_google_business_data()
Social Media Audit scripts/social_media_audit.py GooglePlacesSearcher
Admin Dashboard app.py /admin/gbp

Data Flow

Admin/Script Trigger → GBPService.fetch_google_business_data()
                                ↓
                    Check cache (< 24h old?)
                                ↓
                    [Cache miss] → Places API
                                ↓
                Business Profile Data (JSON)
                                ↓
                company_website_analysis table
                                ↓
                    Display in Admin Panel

🔍 Brave Search API

Purpose: News monitoring, social media discovery, web search Service Files: scripts/social_media_audit.py Status: Production (Social Media), 📋 Planned (News Monitoring)

Configuration

Parameter Value
Provider Brave Search
Endpoint (Web) https://api.search.brave.com/res/v1/web/search
Endpoint (News) https://api.search.brave.com/res/v1/news/search
Authentication API Key
Environment Variable BRAVE_SEARCH_API_KEY or BRAVE_API_KEY
Timeout 15 seconds

Rate Limits

  • Free Tier: 2,000 requests/month
  • Per-Second: No official limit
  • Recommended: 0.5-1 second delay

Current Usage: Social Media Discovery

# Search for social media profiles
params = {
    "q": f'"{company_name}" {city} facebook OR instagram',
    "count": 10,
    "country": "pl",
    "search_lang": "pl"
}

Planned Usage: News Monitoring

# News search (from CLAUDE.md)
params = {
    "q": f'"{company_name}" OR "{nip}"',
    "count": 10,
    "freshness": "pw",  # past week
    "country": "pl",
    "search_lang": "pl"
}

Pattern Extraction

Social Media URLs: Regex patterns for:

  • Facebook: facebook.com/[username]
  • Instagram: instagram.com/[username]
  • LinkedIn: linkedin.com/company/[name]
  • YouTube: youtube.com/@[channel]
  • Twitter/X: twitter.com/[username] or x.com/[username]
  • TikTok: tiktok.com/@[username]

Google Reviews: Patterns:

  • "4,5 (123 opinii)"
  • "Rating: 4.5 · 123 reviews"

Integration Points

Feature File Status
Social Media Discovery scripts/social_media_audit.py Implemented
Google Reviews Fallback scripts/social_media_audit.py Implemented
News Monitoring (Planned) 📋 Pending

Data Flow

Social Media Audit Script → Brave Search API
                                    ↓
                    Web Search Results (JSON)
                                    ↓
                    Pattern Extraction (regex)
                                    ↓
            Social Media URLs (Facebook, Instagram, etc.)
                                    ↓
                company_social_media table

📧 Microsoft Graph API

Purpose: Email notifications via Microsoft 365 Service Files: email_service.py Status: Production

Configuration

Parameter Value
Provider Microsoft Graph API
Endpoint https://graph.microsoft.com/v1.0
Authentication OAuth 2.0 Client Credentials Flow
Authority https://login.microsoftonline.com/{tenant_id}
Scope https://graph.microsoft.com/.default

Environment Variables

MICROSOFT_TENANT_ID=<Azure AD Tenant ID>
MICROSOFT_CLIENT_ID=<Application Client ID>
MICROSOFT_CLIENT_SECRET=<Client Secret Value>
MICROSOFT_MAIL_FROM=noreply@nordabiznes.pl

Authentication Flow

  1. Client Credentials Flow (Application permissions)

    • No user interaction required
    • Service-to-service authentication
    • Uses client ID + client secret
  2. Token Acquisition

    app = msal.ConfidentialClientApplication(
        client_id,
        authority=f"https://login.microsoftonline.com/{tenant_id}",
        client_credential=client_secret,
    )
    
    result = app.acquire_token_for_client(
        scopes=["https://graph.microsoft.com/.default"]
    )
    
  3. Token Caching

    • MSAL library handles caching
    • Tokens cached for ~1 hour
    • Automatic refresh when expired

Required Azure AD Permissions

Application Permissions (requires admin consent):

  • Mail.Send - Send mail as any user

Rate Limits

  • Mail.Send: 10,000 requests per 10 minutes per app
  • Throttling: 429 Too Many Requests (retry with backoff)

Integration Points

Feature File Usage
User Registration app.py Send welcome email
Password Reset app.py Send reset link
Notifications app.py News approval notifications

Data Flow

App Trigger → EmailService.send_mail()
                        ↓
        MSAL Token Acquisition (cached)
                        ↓
            Microsoft Graph API
                        ↓
        POST /users/{id}/sendMail
                        ↓
            Email Sent via M365
                        ↓
        Success/Failure Response

🌐 ALEO.com (Web Scraping)

Purpose: NIP verification and company data enrichment Service Files: scripts/import_*.py (Playwright integration) Status: Production (Limited Use)

Configuration

Parameter Value
Provider ALEO.com (Polish business directory)
Endpoint https://www.aleo.com/
Authentication None (public website)
Method Web scraping (Playwright browser automation)
Rate Limiting Self-imposed delays (1-2 seconds)

Data Retrieved

  • Company NIP verification
  • Company name
  • Address
  • Business category
  • Basic contact information

Best Practices

  • Rate Limiting: 1-2 second delays between requests
  • User Agent: Standard browser user agent
  • Error Handling: Handle missing elements gracefully
  • Caching: Cache results to minimize requests

Integration Points

Feature File Usage
Data Import import_*.py scripts NIP verification

Data Flow

Import Script → Playwright Browser
                        ↓
            ALEO.com Search Page
                        ↓
        Company Search (by NIP)
                        ↓
            Parse HTML Results
                        ↓
        Extract Company Data
                        ↓
    Verify against KRS API
                        ↓
        Save to companies table

🔗 rejestr.io (Web Scraping)

Purpose: Company connections, shareholders, management Service Files: analyze_connections.py (Playwright integration) Status: 📋 Planned Enhancement

Configuration

Parameter Value
Provider rejestr.io (KRS registry browser)
Endpoint https://rejestr.io/
Authentication None (public website)
Method Web scraping (Playwright browser automation)
Rate Limiting Self-imposed delays (1-2 seconds)

Data to Retrieve (Planned)

  • Management board members
  • Shareholders with ownership percentages
  • Beneficial owners
  • Prokurents (proxies)
  • Links between companies (shared owners/managers)

Planned Database Table

company_people (
    id SERIAL PRIMARY KEY,
    company_id INTEGER REFERENCES companies(id),
    name VARCHAR(255),
    role VARCHAR(100),  -- Prezes, Członek Zarządu, Wspólnik
    shares_percent NUMERIC(5,2),
    person_url VARCHAR(500),  -- Link to rejestr.io person page
    created_at TIMESTAMP,
    updated_at TIMESTAMP
)

Integration Points (Planned)

Feature File Status
Connection Analysis analyze_connections.py 📋 Basic implementation exists
Company Profile Display templates/company_detail.html 📋 Planned
Network Visualization (Future) 📋 Planned

Authentication Summary

API Key Authentication

API Environment Variable Key Location
Google Gemini GOOGLE_GEMINI_API_KEY Google AI Studio
Google PageSpeed GOOGLE_PAGESPEED_API_KEY Google Cloud Console
Google Places GOOGLE_PLACES_API_KEY Google Cloud Console
Brave Search BRAVE_SEARCH_API_KEY Brave Search API Portal

OAuth 2.0 Authentication

API Flow Type Environment Variables
Microsoft Graph Client Credentials MICROSOFT_TENANT_ID
MICROSOFT_CLIENT_ID
MICROSOFT_CLIENT_SECRET

No Authentication

API Access Type
KRS Open API Public API
ALEO.com Web scraping (public)
rejestr.io Web scraping (public)

Rate Limits & Quota Management

Summary Table

API Free Tier Quota Rate Limit Cost Tracking
Google Gemini 200 req/day
50 req/hour
Built-in $0.075-$5.00/1M tokens ai_api_costs table
Google PageSpeed 25,000 req/day ~1 req/sec Free In-memory counter
Google Places Pay-per-use No official limit $0.032/request 24-hour cache
Brave Search 2,000 req/month No official limit Free None
KRS Open API Unlimited No official limit Free None
Microsoft Graph 10,000 req/10min Built-in throttling Included in M365 None
ALEO.com N/A (scraping) Self-imposed (1-2s) Free None
rejestr.io N/A (scraping) Self-imposed (1-2s) Free None

Quota Monitoring

Gemini AI - Daily Cost Report:

SELECT
    feature,
    COUNT(*) as calls,
    SUM(total_tokens) as total_tokens,
    SUM(total_cost) as total_cost,
    AVG(latency_ms) as avg_latency_ms
FROM ai_api_costs
WHERE DATE(timestamp) = CURRENT_DATE
GROUP BY feature
ORDER BY total_cost DESC;

PageSpeed - Remaining Quota:

from scripts.pagespeed_client import GooglePageSpeedClient

client = GooglePageSpeedClient()
remaining = client.get_remaining_quota()
print(f"Remaining quota: {remaining}/{25000}")

Error Handling Patterns

Common Error Types

1. Authentication Errors

  • Invalid API key
  • Expired credentials
  • Missing environment variables

2. Rate Limiting

  • Quota exceeded (daily/hourly)
  • Too many requests per second
  • Throttling (429 status code)

3. Network Errors

  • Connection timeout
  • DNS resolution failure
  • SSL certificate errors

4. API Errors

  • 400 Bad Request (invalid parameters)
  • 404 Not Found (resource doesn't exist)
  • 500 Internal Server Error (API issue)

Retry Strategy

Exponential Backoff:

import time

max_retries = 3
for attempt in range(max_retries):
    try:
        result = api_client.call()
        break
    except TransientError:
        if attempt < max_retries - 1:
            wait_time = 2 ** attempt  # 1s, 2s, 4s
            time.sleep(wait_time)
        else:
            raise

Error Handling Example

try:
    result = api_client.call_api(params)
except requests.exceptions.Timeout:
    logger.error("API timeout")
    result = None
except requests.exceptions.ConnectionError as e:
    logger.error(f"Connection error: {e}")
    result = None
except QuotaExceededError:
    logger.warning("Quota exceeded, queuing for retry")
    queue_for_retry(params)
except APIError as e:
    logger.error(f"API error: {e.status_code} - {e.message}")
    result = None
finally:
    log_api_call(success=result is not None)

Security Considerations

API Key Storage

Best Practices:

  • Store in environment variables
  • Use .env file (NOT committed to git)
  • Rotate keys regularly
  • Use separate keys for dev/prod

Never:

  • Hardcode keys in source code
  • Commit keys to version control
  • Share keys in chat/email
  • Use production keys in development

HTTPS/TLS

All APIs use HTTPS:

  • Google APIs: TLS 1.2+
  • Microsoft Graph: TLS 1.2+
  • Brave Search: TLS 1.2+
  • KRS Open API: TLS 1.2+

Secrets Management

Production:

  • Environment variables set in systemd service
  • Restricted file permissions on .env files
  • No secrets in logs or error messages

Development:

  • .env file with restricted permissions (600)
  • Local .env not synced to cloud storage
  • Use test API keys when available

Cost Optimization Strategies

1. Caching

  • Google Places: 24-hour cache in company_website_analysis
  • PageSpeed: Cache results, re-audit only when needed
  • Gemini: Cache common responses (FAQ, greetings)

2. Batch Processing

  • SEO Audits: Run during off-peak hours
  • Social Media Discovery: Process in batches of 10-20
  • News Monitoring: Schedule daily/weekly runs

3. Model Selection

  • Gemini: Use appropriate models for task complexity
    • gemini-3-flash-preview for general use (default, free)
    • gemini-3-pro-preview for complex reasoning (paid)

4. Result Reuse

  • Don't re-analyze unchanged content
  • Check last analysis timestamp before API calls
  • Use force_refresh parameter sparingly

5. Quota Monitoring

  • Daily reports on API usage and costs
  • Alerts when >80% quota used
  • Automatic throttling when approaching limit

Monitoring & Troubleshooting

Health Checks

Test External API Connectivity:

# Gemini API
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent?key=${GOOGLE_GEMINI_API_KEY}" \
  -H 'Content-Type: application/json' \
  -d '{"contents":[{"parts":[{"text":"Hello"}]}]}'

# PageSpeed API
curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://nordabiznes.pl&key=${GOOGLE_PAGESPEED_API_KEY}"

# KRS API
curl "https://api-krs.ms.gov.pl/api/krs/OdpisAktualny/0000817317?rejestr=P&format=json"

# Brave Search API
curl -H "X-Subscription-Token: ${BRAVE_SEARCH_API_KEY}" \
  "https://api.search.brave.com/res/v1/web/search?q=test&count=1"

Common Issues

1. Gemini Quota Exceeded

Error: 429 Resource has been exhausted
Solution: Wait for quota reset (hourly/daily) or upgrade to paid tier

2. PageSpeed Timeout

Error: Timeout waiting for PageSpeed response
Solution: Increase timeout, retry later, or skip slow websites

3. Places API 403 Forbidden

Error: This API project is not authorized to use this API
Solution: Enable Places API in Google Cloud Console

4. MS Graph Authentication Failed

Error: AADSTS700016: Application not found in directory
Solution: Verify MICROSOFT_TENANT_ID and MICROSOFT_CLIENT_ID

Diagnostic Commands

Check API Key Configuration:

# Development
grep -E "GOOGLE|BRAVE|MICROSOFT" .env

# Production
sudo -u www-data printenv | grep -E "GOOGLE|BRAVE|MICROSOFT"

Check Database API Cost Tracking:

-- Gemini API calls today
SELECT
    feature,
    COUNT(*) as calls,
    SUM(total_cost) as cost
FROM ai_api_costs
WHERE DATE(timestamp) = CURRENT_DATE
GROUP BY feature;

-- Failed API calls
SELECT
    timestamp,
    feature,
    error_message
FROM ai_api_costs
WHERE success = FALSE
ORDER BY timestamp DESC
LIMIT 10;


Maintenance Guidelines

When to Update This Document

  • Adding new external API integration
  • Changing API authentication method
  • Updating rate limits or quotas
  • Modifying data flow patterns
  • Adding new database tables for API data
  • Changing cost tracking or optimization strategies

Update Checklist

  • Update Mermaid diagram with new integration
  • Add detailed section for new API
  • Update authentication summary table
  • Update rate limits & quota table
  • Add integration points
  • Document data flow
  • Add health check commands
  • Update cost optimization strategies

Glossary

Term Definition
API Key Secret token for authenticating API requests
OAuth 2.0 Industry-standard protocol for authorization
Client Credentials Flow OAuth flow for service-to-service authentication
Rate Limit Maximum number of API requests allowed per time period
Quota Total allowance for API usage (daily/monthly)
Web Scraping Automated extraction of data from websites
Playwright Browser automation framework for web scraping
Exponential Backoff Retry strategy with increasing delays
HTTPS/TLS Secure protocol for encrypted communication
Free Tier No-cost API usage level with limits
Pay-per-use Pricing model charging per API request

Document End