- System Context diagram (C4 Level 1) - Container diagram (C4 Level 2) - Flask component diagram (C4 Level 3) - Deployment architecture with NPM proxy - Database schema (PostgreSQL) - External integrations (Gemini AI, Brave Search, PageSpeed) - Network topology (INPI infrastructure) - Security architecture - API endpoints reference - Troubleshooting guide - Data flow diagrams (auth, search, AI chat, SEO audit, news monitoring) All diagrams use Mermaid.js and render automatically on GitHub. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
30 KiB
External Integrations Architecture
Document Version: 1.0 Last Updated: 2026-01-10 Status: Production LIVE Diagram Type: External Systems Integration Architecture
Overview
This diagram shows the external APIs and data sources integrated with Norda Biznes Hub. It illustrates:
- 6 major API integrations (Google Gemini, Brave Search, PageSpeed, Places, KRS, MS Graph)
- 2 web scraping sources (ALEO.com, rejestr.io)
- Authentication methods for each integration
- Data flows and usage patterns
- Rate limits and quota management
- Cost tracking and optimization
Abstraction Level: External Integration Architecture Audience: Developers, DevOps, System Architects Purpose: Understanding external dependencies, API usage, and integration patterns
Integration Architecture Diagram
graph TB
%% Main system
subgraph "Norda Biznes Hub System"
WebApp["🌐 Flask Web Application<br/>app.py"]
subgraph "Service Layer"
GeminiSvc["🤖 Gemini Service<br/>gemini_service.py"]
ChatSvc["💬 Chat Service<br/>nordabiz_chat.py"]
EmailSvc["📧 Email Service<br/>email_service.py"]
KRSSvc["🏛️ KRS Service<br/>krs_api_service.py"]
GBPSvc["📊 GBP Audit Service<br/>gbp_audit_service.py"]
end
subgraph "Background Scripts"
SEOScript["📊 SEO Audit<br/>scripts/seo_audit.py"]
SocialScript["📱 Social Media Audit<br/>scripts/social_media_audit.py"]
ImportScript["📥 Data Import<br/>scripts/import_*.py"]
end
Database["💾 PostgreSQL Database<br/>localhost:5432"]
end
%% External integrations
subgraph "AI & ML Services"
Gemini["🤖 Google Gemini API<br/>gemini-2.5-flash<br/><br/>Free tier: 200 req/day<br/>Auth: API Key<br/>Cost: $0.075-$5.00/1M tokens"]
end
subgraph "SEO & Analytics"
PageSpeed["📊 Google PageSpeed Insights<br/>v5 API<br/><br/>Free tier: 25,000 req/day<br/>Auth: API Key<br/>Cost: Free"]
Places["📍 Google Places API<br/>Maps Platform<br/><br/>Pay-per-use<br/>Auth: API Key<br/>Cost: $0.032/request"]
end
subgraph "Search & Discovery"
BraveAPI["🔍 Brave Search API<br/>Web & News Search<br/><br/>Free tier: 2,000 req/month<br/>Auth: API Key<br/>Cost: Free"]
end
subgraph "Data Sources"
KRS["🏛️ KRS Open API<br/>Ministry of Justice Poland<br/><br/>No limits (public API)<br/>Auth: None<br/>Cost: Free"]
ALEO["🌐 ALEO.com<br/>NIP Verification Service<br/><br/>Web scraping (Playwright)<br/>Auth: None<br/>Cost: Free"]
Rejestr["🔗 rejestr.io<br/>Company Connections<br/><br/>Web scraping (Playwright)<br/>Auth: None<br/>Cost: Free"]
end
subgraph "Communication"
MSGraph["📧 Microsoft Graph API<br/>Email & Notifications<br/><br/>10,000 req/10min<br/>Auth: OAuth 2.0 Client Credentials<br/>Cost: Included in M365"]
end
%% Service layer connections to external APIs
GeminiSvc -->|"HTTPS POST<br/>generateContent<br/>API Key: GOOGLE_GEMINI_API_KEY"| Gemini
ChatSvc --> GeminiSvc
GBPSvc -->|"Generate AI recommendations"| GeminiSvc
KRSSvc -->|"HTTPS GET<br/>OdpisAktualny/{krs}<br/>Public API (no auth)"| KRS
EmailSvc -->|"HTTPS POST<br/>users/{id}/sendMail<br/>OAuth 2.0 + Client Credentials"| MSGraph
GBPSvc -->|"HTTPS GET<br/>findplacefromtext<br/>placedetails<br/>API Key: GOOGLE_PLACES_API_KEY"| Places
%% Script connections to external APIs
SEOScript -->|"HTTPS GET<br/>runPagespeed<br/>API Key: GOOGLE_PAGESPEED_API_KEY<br/>Quota tracking: 25K/day"| PageSpeed
SocialScript -->|"HTTPS GET<br/>web/search<br/>news/search<br/>API Key: BRAVE_SEARCH_API_KEY"| BraveAPI
SocialScript -->|"Fallback for reviews"| Places
ImportScript -->|"NIP verification<br/>Playwright browser automation"| ALEO
ImportScript -->|"Company connections<br/>Playwright browser automation"| Rejestr
ImportScript --> KRSSvc
%% Data flows back to database
GeminiSvc -->|"Log costs<br/>ai_api_costs table"| Database
SEOScript -->|"Store metrics<br/>company_website_analysis"| Database
SocialScript -->|"Store profiles<br/>company_social_media"| Database
GBPSvc -->|"Store audit results<br/>gbp_audits"| Database
ImportScript -->|"Import companies<br/>companies table"| Database
%% Web app connections
WebApp --> ChatSvc
WebApp --> EmailSvc
WebApp --> KRSSvc
WebApp --> GBPSvc
WebApp --> GeminiSvc
WebApp --> Database
%% Styling
classDef serviceStyle fill:#85bbf0,stroke:#5d92c7,color:#000000,stroke-width:2px
classDef scriptStyle fill:#ffd93d,stroke:#ccae31,color:#000000,stroke-width:2px
classDef aiStyle fill:#ff6b9d,stroke:#cc5579,color:#ffffff,stroke-width:3px
classDef seoStyle fill:#c44569,stroke:#9d3754,color:#ffffff,stroke-width:3px
classDef searchStyle fill:#6a89cc,stroke:#5570a3,color:#ffffff,stroke-width:3px
classDef dataStyle fill:#4a69bd,stroke:#3b5497,color:#ffffff,stroke-width:3px
classDef commStyle fill:#20bf6b,stroke:#1a9956,color:#ffffff,stroke-width:3px
classDef dbStyle fill:#438dd5,stroke:#2e6295,color:#ffffff,stroke-width:3px
classDef appStyle fill:#1168bd,stroke:#0b4884,color:#ffffff,stroke-width:3px
class GeminiSvc,ChatSvc,EmailSvc,KRSSvc,GBPSvc serviceStyle
class SEOScript,SocialScript,ImportScript scriptStyle
class Gemini aiStyle
class PageSpeed,Places seoStyle
class BraveAPI searchStyle
class KRS,ALEO,Rejestr dataStyle
class MSGraph commStyle
class Database dbStyle
class WebApp appStyle
External Integration Details
🤖 Google Gemini API
Purpose: AI-powered text generation, chat, and image analysis
Service Files: gemini_service.py, nordabiz_chat.py, gbp_audit_service.py
Status: ✅ Production (Free Tier)
Configuration
| Parameter | Value |
|---|---|
| Provider | Google Generative AI |
| Endpoint | https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent |
| Authentication | API Key |
| Environment Variable | GOOGLE_GEMINI_API_KEY |
| Default Model | gemini-2.5-flash |
| Timeout | None (default) |
Available Models
GEMINI_MODELS = {
'flash': 'gemini-2.5-flash', # Best for general use
'flash-lite': 'gemini-2.5-flash-lite', # Ultra cheap
'pro': 'gemini-2.5-pro', # High quality
'flash-2.0': 'gemini-2.0-flash', # 1M context window
}
Pricing (per 1M tokens)
| Model | Input Cost | Output Cost |
|---|---|---|
| gemini-2.5-flash | $0.075 | $0.30 |
| gemini-2.5-flash-lite | $0.10 | $0.40 |
| gemini-2.5-pro | $1.25 | $5.00 |
| gemini-2.0-flash | $0.075 | $0.30 |
Rate Limits
- Free Tier: 200 requests/day, 50 requests/hour
- Token Limits: Model-dependent (1M for flash-2.0)
Integration Points
| Feature | File | Function |
|---|---|---|
| AI Chat | nordabiz_chat.py |
NordaBizChatEngine.chat() |
| GBP Recommendations | gbp_audit_service.py |
generate_ai_recommendations() |
| Text Generation | gemini_service.py |
generate_text() |
| Image Analysis | gemini_service.py |
analyze_image() |
Cost Tracking
All API calls logged to ai_api_costs table:
ai_api_costs (
id, timestamp, api_provider, model_name,
feature, user_id,
input_tokens, output_tokens, total_tokens,
input_cost, output_cost, total_cost,
success, error_message, latency_ms,
prompt_hash
)
Data Flow
User Message → ChatService → GeminiService
↓
Google Gemini API
↓
Response + Token Count
↓
Cost Calculation → ai_api_costs
↓
Response to User
🏛️ KRS Open API
Purpose: Official company data from Polish National Court Register
Service Files: krs_api_service.py
Status: ✅ Production
Configuration
| Parameter | Value |
|---|---|
| Provider | Ministry of Justice Poland |
| Endpoint | https://api-krs.ms.gov.pl/api/krs/OdpisAktualny/{krs} |
| Authentication | None (public API) |
| Timeout | 15 seconds |
| Response Format | JSON |
Rate Limits
- Official Limit: None documented
- Best Practice: 1-2 second delays between requests
- Timeout: 15 seconds configured
Data Retrieved
- Basic identifiers (KRS, NIP, REGON)
- Company name (full and shortened)
- Legal form (Sp. z o.o., S.A., etc.)
- Full address (street, city, voivodeship)
- Share capital and currency
- Registration dates
- Management board (anonymized in Open API)
- Shareholders (anonymized in Open API)
- Business activities
- OPP status (Organizacja Pożytku Publicznego)
Integration Points
| Feature | File | Usage |
|---|---|---|
| Data Import | import_*.py scripts |
Company verification |
| Manual Verification | verify_all_companies_data.py |
Batch verification |
| API Endpoint | app.py |
/api/verify-krs |
Data Flow
Import Script → KRSService.get_company_from_krs()
↓
KRS Open API
↓
KRSCompanyData (dataclass)
↓
Verification & Validation
↓
Update companies table
📊 Google PageSpeed Insights API
Purpose: SEO, performance, accessibility, and best practices analysis
Service Files: scripts/seo_audit.py, scripts/pagespeed_client.py
Status: ✅ Production
Configuration
| Parameter | Value |
|---|---|
| Provider | Google PageSpeed Insights |
| Endpoint | https://www.googleapis.com/pagespeedonline/v5/runPagespeed |
| Authentication | API Key |
| Environment Variable | GOOGLE_PAGESPEED_API_KEY |
| Google Cloud Project | NORDABIZNES (gen-lang-client-0540794446) |
| Timeout | 30 seconds |
| Strategy | Mobile (default), Desktop (optional) |
Rate Limits
- Free Tier: 25,000 queries/day
- Per-Second: Recommended 1 query/second
- Quota Tracking: In-memory counter in
pagespeed_client.py
Metrics Returned
@dataclass
class PageSpeedScores:
seo: int # 0-100 SEO score
performance: int # 0-100 Performance score
accessibility: int # 0-100 Accessibility score
best_practices: int # 0-100 Best Practices score
pwa: Optional[int] # 0-100 PWA score
@dataclass
class CoreWebVitals:
lcp_ms: Optional[int] # Largest Contentful Paint
fid_ms: Optional[int] # First Input Delay
cls: Optional[float] # Cumulative Layout Shift
Database Storage
Results saved to company_website_analysis:
company_website_analysis (
company_id PRIMARY KEY,
analyzed_at,
pagespeed_seo_score,
pagespeed_performance_score,
pagespeed_accessibility_score,
pagespeed_best_practices_score,
pagespeed_audits JSONB,
largest_contentful_paint_ms,
first_input_delay_ms,
cumulative_layout_shift,
seo_overall_score,
seo_health_score,
seo_issues JSONB
)
Integration Points
| Feature | File | Endpoint/Function |
|---|---|---|
| Admin Dashboard | app.py |
/admin/seo |
| Audit Script | scripts/seo_audit.py |
CLI tool |
| Batch Audits | scripts/seo_audit.py |
SEOAuditor.run_audit() |
Data Flow
Admin Trigger → SEO Audit Script
↓
PageSpeed API
↓
Scores + Core Web Vitals + Audits
↓
company_website_analysis table
↓
Admin Dashboard Display
📍 Google Places API
Purpose: Business profiles, ratings, reviews, and opening hours
Service Files: gbp_audit_service.py, scripts/social_media_audit.py
Status: ✅ Production
Configuration
| Parameter | Value |
|---|---|
| Provider | Google Maps Platform |
| Endpoints | Find Place from Text, Place Details |
| Authentication | API Key |
| Environment Variable | GOOGLE_PLACES_API_KEY |
| Timeout | 15 seconds |
| Language | Polish (pl) |
Cost
- Pricing Model: Pay-per-use
- Cost per Request: ~$0.032 per Place Details call
- Optimization: 24-hour cache in database
Endpoints Used
1. Find Place from Text
https://maps.googleapis.com/maps/api/place/findplacefromtext/json
2. Place Details
https://maps.googleapis.com/maps/api/place/details/json
Data Retrieved
{
'google_place_id': str, # Unique Place ID
'google_name': str, # Business name
'google_address': str, # Formatted address
'google_phone': str, # Phone number
'google_website': str, # Website URL
'google_types': List[str], # Business categories
'google_maps_url': str, # Google Maps link
'google_rating': Decimal, # Rating (1.0-5.0)
'google_reviews_count': int, # Number of reviews
'google_photos_count': int, # Number of photos
'google_opening_hours': dict, # Opening hours
'google_business_status': str # OPERATIONAL, CLOSED, etc.
}
Cache Strategy
- Cache Duration: 24 hours
- Storage:
company_website_analysis.analyzed_at - Force Refresh:
force_refresh=Trueparameter
Integration Points
| Feature | File | Function |
|---|---|---|
| GBP Audit | gbp_audit_service.py |
fetch_google_business_data() |
| Social Media Audit | scripts/social_media_audit.py |
GooglePlacesSearcher |
| Admin Dashboard | app.py |
/admin/gbp |
Data Flow
Admin/Script Trigger → GBPService.fetch_google_business_data()
↓
Check cache (< 24h old?)
↓
[Cache miss] → Places API
↓
Business Profile Data (JSON)
↓
company_website_analysis table
↓
Display in Admin Panel
🔍 Brave Search API
Purpose: News monitoring, social media discovery, web search
Service Files: scripts/social_media_audit.py
Status: ✅ Production (Social Media), 📋 Planned (News Monitoring)
Configuration
| Parameter | Value |
|---|---|
| Provider | Brave Search |
| Endpoint (Web) | https://api.search.brave.com/res/v1/web/search |
| Endpoint (News) | https://api.search.brave.com/res/v1/news/search |
| Authentication | API Key |
| Environment Variable | BRAVE_SEARCH_API_KEY or BRAVE_API_KEY |
| Timeout | 15 seconds |
Rate Limits
- Free Tier: 2,000 requests/month
- Per-Second: No official limit
- Recommended: 0.5-1 second delay
Current Usage: Social Media Discovery
# Search for social media profiles
params = {
"q": f'"{company_name}" {city} facebook OR instagram',
"count": 10,
"country": "pl",
"search_lang": "pl"
}
Planned Usage: News Monitoring
# News search (from CLAUDE.md)
params = {
"q": f'"{company_name}" OR "{nip}"',
"count": 10,
"freshness": "pw", # past week
"country": "pl",
"search_lang": "pl"
}
Pattern Extraction
Social Media URLs: Regex patterns for:
- Facebook:
facebook.com/[username] - Instagram:
instagram.com/[username] - LinkedIn:
linkedin.com/company/[name] - YouTube:
youtube.com/@[channel] - Twitter/X:
twitter.com/[username]orx.com/[username] - TikTok:
tiktok.com/@[username]
Google Reviews: Patterns:
"4,5 (123 opinii)""Rating: 4.5 · 123 reviews"
Integration Points
| Feature | File | Status |
|---|---|---|
| Social Media Discovery | scripts/social_media_audit.py |
✅ Implemented |
| Google Reviews Fallback | scripts/social_media_audit.py |
✅ Implemented |
| News Monitoring | (Planned) | 📋 Pending |
Data Flow
Social Media Audit Script → Brave Search API
↓
Web Search Results (JSON)
↓
Pattern Extraction (regex)
↓
Social Media URLs (Facebook, Instagram, etc.)
↓
company_social_media table
📧 Microsoft Graph API
Purpose: Email notifications via Microsoft 365
Service Files: email_service.py
Status: ✅ Production
Configuration
| Parameter | Value |
|---|---|
| Provider | Microsoft Graph API |
| Endpoint | https://graph.microsoft.com/v1.0 |
| Authentication | OAuth 2.0 Client Credentials Flow |
| Authority | https://login.microsoftonline.com/{tenant_id} |
| Scope | https://graph.microsoft.com/.default |
Environment Variables
MICROSOFT_TENANT_ID=<Azure AD Tenant ID>
MICROSOFT_CLIENT_ID=<Application Client ID>
MICROSOFT_CLIENT_SECRET=<Client Secret Value>
MICROSOFT_MAIL_FROM=noreply@nordabiznes.pl
Authentication Flow
-
Client Credentials Flow (Application permissions)
- No user interaction required
- Service-to-service authentication
- Uses client ID + client secret
-
Token Acquisition
app = msal.ConfidentialClientApplication( client_id, authority=f"https://login.microsoftonline.com/{tenant_id}", client_credential=client_secret, ) result = app.acquire_token_for_client( scopes=["https://graph.microsoft.com/.default"] ) -
Token Caching
- MSAL library handles caching
- Tokens cached for ~1 hour
- Automatic refresh when expired
Required Azure AD Permissions
Application Permissions (requires admin consent):
Mail.Send- Send mail as any user
Rate Limits
- Mail.Send: 10,000 requests per 10 minutes per app
- Throttling: 429 Too Many Requests (retry with backoff)
Integration Points
| Feature | File | Usage |
|---|---|---|
| User Registration | app.py |
Send welcome email |
| Password Reset | app.py |
Send reset link |
| Notifications | app.py |
News approval notifications |
Data Flow
App Trigger → EmailService.send_mail()
↓
MSAL Token Acquisition (cached)
↓
Microsoft Graph API
↓
POST /users/{id}/sendMail
↓
Email Sent via M365
↓
Success/Failure Response
🌐 ALEO.com (Web Scraping)
Purpose: NIP verification and company data enrichment
Service Files: scripts/import_*.py (Playwright integration)
Status: ✅ Production (Limited Use)
Configuration
| Parameter | Value |
|---|---|
| Provider | ALEO.com (Polish business directory) |
| Endpoint | https://www.aleo.com/ |
| Authentication | None (public website) |
| Method | Web scraping (Playwright browser automation) |
| Rate Limiting | Self-imposed delays (1-2 seconds) |
Data Retrieved
- Company NIP verification
- Company name
- Address
- Business category
- Basic contact information
Best Practices
- Rate Limiting: 1-2 second delays between requests
- User Agent: Standard browser user agent
- Error Handling: Handle missing elements gracefully
- Caching: Cache results to minimize requests
Integration Points
| Feature | File | Usage |
|---|---|---|
| Data Import | import_*.py scripts |
NIP verification |
Data Flow
Import Script → Playwright Browser
↓
ALEO.com Search Page
↓
Company Search (by NIP)
↓
Parse HTML Results
↓
Extract Company Data
↓
Verify against KRS API
↓
Save to companies table
🔗 rejestr.io (Web Scraping)
Purpose: Company connections, shareholders, management
Service Files: analyze_connections.py (Playwright integration)
Status: 📋 Planned Enhancement
Configuration
| Parameter | Value |
|---|---|
| Provider | rejestr.io (KRS registry browser) |
| Endpoint | https://rejestr.io/ |
| Authentication | None (public website) |
| Method | Web scraping (Playwright browser automation) |
| Rate Limiting | Self-imposed delays (1-2 seconds) |
Data to Retrieve (Planned)
- Management board members
- Shareholders with ownership percentages
- Beneficial owners
- Prokurents (proxies)
- Links between companies (shared owners/managers)
Planned Database Table
company_people (
id SERIAL PRIMARY KEY,
company_id INTEGER REFERENCES companies(id),
name VARCHAR(255),
role VARCHAR(100), -- Prezes, Członek Zarządu, Wspólnik
shares_percent NUMERIC(5,2),
person_url VARCHAR(500), -- Link to rejestr.io person page
created_at TIMESTAMP,
updated_at TIMESTAMP
)
Integration Points (Planned)
| Feature | File | Status |
|---|---|---|
| Connection Analysis | analyze_connections.py |
📋 Basic implementation exists |
| Company Profile Display | templates/company_detail.html |
📋 Planned |
| Network Visualization | (Future) | 📋 Planned |
Authentication Summary
API Key Authentication
| API | Environment Variable | Key Location |
|---|---|---|
| Google Gemini | GOOGLE_GEMINI_API_KEY |
Google AI Studio |
| Google PageSpeed | GOOGLE_PAGESPEED_API_KEY |
Google Cloud Console |
| Google Places | GOOGLE_PLACES_API_KEY |
Google Cloud Console |
| Brave Search | BRAVE_SEARCH_API_KEY |
Brave Search API Portal |
OAuth 2.0 Authentication
| API | Flow Type | Environment Variables |
|---|---|---|
| Microsoft Graph | Client Credentials | MICROSOFT_TENANT_IDMICROSOFT_CLIENT_IDMICROSOFT_CLIENT_SECRET |
No Authentication
| API | Access Type |
|---|---|
| KRS Open API | Public API |
| ALEO.com | Web scraping (public) |
| rejestr.io | Web scraping (public) |
Rate Limits & Quota Management
Summary Table
| API | Free Tier Quota | Rate Limit | Cost | Tracking |
|---|---|---|---|---|
| Google Gemini | 200 req/day 50 req/hour |
Built-in | $0.075-$5.00/1M tokens | ai_api_costs table |
| Google PageSpeed | 25,000 req/day | ~1 req/sec | Free | In-memory counter |
| Google Places | Pay-per-use | No official limit | $0.032/request | 24-hour cache |
| Brave Search | 2,000 req/month | No official limit | Free | None |
| KRS Open API | Unlimited | No official limit | Free | None |
| Microsoft Graph | 10,000 req/10min | Built-in throttling | Included in M365 | None |
| ALEO.com | N/A (scraping) | Self-imposed (1-2s) | Free | None |
| rejestr.io | N/A (scraping) | Self-imposed (1-2s) | Free | None |
Quota Monitoring
Gemini AI - Daily Cost Report:
SELECT
feature,
COUNT(*) as calls,
SUM(total_tokens) as total_tokens,
SUM(total_cost) as total_cost,
AVG(latency_ms) as avg_latency_ms
FROM ai_api_costs
WHERE DATE(timestamp) = CURRENT_DATE
GROUP BY feature
ORDER BY total_cost DESC;
PageSpeed - Remaining Quota:
from scripts.pagespeed_client import GooglePageSpeedClient
client = GooglePageSpeedClient()
remaining = client.get_remaining_quota()
print(f"Remaining quota: {remaining}/{25000}")
Error Handling Patterns
Common Error Types
1. Authentication Errors
- Invalid API key
- Expired credentials
- Missing environment variables
2. Rate Limiting
- Quota exceeded (daily/hourly)
- Too many requests per second
- Throttling (429 status code)
3. Network Errors
- Connection timeout
- DNS resolution failure
- SSL certificate errors
4. API Errors
- 400 Bad Request (invalid parameters)
- 404 Not Found (resource doesn't exist)
- 500 Internal Server Error (API issue)
Retry Strategy
Exponential Backoff:
import time
max_retries = 3
for attempt in range(max_retries):
try:
result = api_client.call()
break
except TransientError:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # 1s, 2s, 4s
time.sleep(wait_time)
else:
raise
Error Handling Example
try:
result = api_client.call_api(params)
except requests.exceptions.Timeout:
logger.error("API timeout")
result = None
except requests.exceptions.ConnectionError as e:
logger.error(f"Connection error: {e}")
result = None
except QuotaExceededError:
logger.warning("Quota exceeded, queuing for retry")
queue_for_retry(params)
except APIError as e:
logger.error(f"API error: {e.status_code} - {e.message}")
result = None
finally:
log_api_call(success=result is not None)
Security Considerations
API Key Storage
✅ Best Practices:
- Store in environment variables
- Use
.envfile (NOT committed to git) - Rotate keys regularly
- Use separate keys for dev/prod
❌ Never:
- Hardcode keys in source code
- Commit keys to version control
- Share keys in chat/email
- Use production keys in development
HTTPS/TLS
All APIs use HTTPS:
- Google APIs: TLS 1.2+
- Microsoft Graph: TLS 1.2+
- Brave Search: TLS 1.2+
- KRS Open API: TLS 1.2+
Secrets Management
Production:
- Environment variables set in systemd service
- Restricted file permissions on
.envfiles - No secrets in logs or error messages
Development:
.envfile with restricted permissions (600)- Local
.envnot synced to cloud storage - Use test API keys when available
Cost Optimization Strategies
1. Caching
- Google Places: 24-hour cache in
company_website_analysis - PageSpeed: Cache results, re-audit only when needed
- Gemini: Cache common responses (FAQ, greetings)
2. Batch Processing
- SEO Audits: Run during off-peak hours
- Social Media Discovery: Process in batches of 10-20
- News Monitoring: Schedule daily/weekly runs
3. Model Selection
- Gemini: Use cheaper models where appropriate
gemini-2.5-flash-litefor simple tasksgemini-2.5-flashfor general usegemini-2.5-proonly for complex reasoning
4. Result Reuse
- Don't re-analyze unchanged content
- Check last analysis timestamp before API calls
- Use
force_refreshparameter sparingly
5. Quota Monitoring
- Daily reports on API usage and costs
- Alerts when >80% quota used
- Automatic throttling when approaching limit
Monitoring & Troubleshooting
Health Checks
Test External API Connectivity:
# Gemini API
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=${GOOGLE_GEMINI_API_KEY}" \
-H 'Content-Type: application/json' \
-d '{"contents":[{"parts":[{"text":"Hello"}]}]}'
# PageSpeed API
curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://nordabiznes.pl&key=${GOOGLE_PAGESPEED_API_KEY}"
# KRS API
curl "https://api-krs.ms.gov.pl/api/krs/OdpisAktualny/0000817317?rejestr=P&format=json"
# Brave Search API
curl -H "X-Subscription-Token: ${BRAVE_SEARCH_API_KEY}" \
"https://api.search.brave.com/res/v1/web/search?q=test&count=1"
Common Issues
1. Gemini Quota Exceeded
Error: 429 Resource has been exhausted
Solution: Wait for quota reset (hourly/daily) or upgrade to paid tier
2. PageSpeed Timeout
Error: Timeout waiting for PageSpeed response
Solution: Increase timeout, retry later, or skip slow websites
3. Places API 403 Forbidden
Error: This API project is not authorized to use this API
Solution: Enable Places API in Google Cloud Console
4. MS Graph Authentication Failed
Error: AADSTS700016: Application not found in directory
Solution: Verify MICROSOFT_TENANT_ID and MICROSOFT_CLIENT_ID
Diagnostic Commands
Check API Key Configuration:
# Development
grep -E "GOOGLE|BRAVE|MICROSOFT" .env
# Production
sudo -u www-data printenv | grep -E "GOOGLE|BRAVE|MICROSOFT"
Check Database API Cost Tracking:
-- Gemini API calls today
SELECT
feature,
COUNT(*) as calls,
SUM(total_cost) as cost
FROM ai_api_costs
WHERE DATE(timestamp) = CURRENT_DATE
GROUP BY feature;
-- Failed API calls
SELECT
timestamp,
feature,
error_message
FROM ai_api_costs
WHERE success = FALSE
ORDER BY timestamp DESC
LIMIT 10;
Related Documentation
- System Context - High-level system overview
- Container Diagram - Container architecture
- Flask Components - Application components
- Database Schema - Database design
- External API Integration Analysis - Detailed API analysis
Maintenance Guidelines
When to Update This Document
- ✅ Adding new external API integration
- ✅ Changing API authentication method
- ✅ Updating rate limits or quotas
- ✅ Modifying data flow patterns
- ✅ Adding new database tables for API data
- ✅ Changing cost tracking or optimization strategies
Update Checklist
- Update Mermaid diagram with new integration
- Add detailed section for new API
- Update authentication summary table
- Update rate limits & quota table
- Add integration points
- Document data flow
- Add health check commands
- Update cost optimization strategies
Glossary
| Term | Definition |
|---|---|
| API Key | Secret token for authenticating API requests |
| OAuth 2.0 | Industry-standard protocol for authorization |
| Client Credentials Flow | OAuth flow for service-to-service authentication |
| Rate Limit | Maximum number of API requests allowed per time period |
| Quota | Total allowance for API usage (daily/monthly) |
| Web Scraping | Automated extraction of data from websites |
| Playwright | Browser automation framework for web scraping |
| Exponential Backoff | Retry strategy with increasing delays |
| HTTPS/TLS | Secure protocol for encrypted communication |
| Free Tier | No-cost API usage level with limits |
| Pay-per-use | Pricing model charging per API request |
Document End