nordabiz/docs/architecture/06-external-integrations.md
Maciej Pienczyn cebe52f303 refactor: Rebranding i aktualizacja modelu AI
- Zmiana nazwy: "Norda Biznes Hub" → "Norda Biznes Partner"
- Aktualizacja modelu AI: Gemini 2.0 Flash → Gemini 3 Flash
- Zachowano historyczne odniesienia w timeline i dokumentacji

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 14:08:39 +01:00

1070 lines
30 KiB
Markdown

# External Integrations Architecture
**Document Version:** 1.0
**Last Updated:** 2026-01-10
**Status:** Production LIVE
**Diagram Type:** External Systems Integration Architecture
---
## Overview
This diagram shows the **external APIs and data sources** integrated with Norda Biznes Partner. It illustrates:
- **6 major API integrations** (Google Gemini, Brave Search, PageSpeed, Places, KRS, MS Graph)
- **2 web scraping sources** (ALEO.com, rejestr.io)
- **Authentication methods** for each integration
- **Data flows** and usage patterns
- **Rate limits** and quota management
- **Cost tracking** and optimization
**Abstraction Level:** External Integration Architecture
**Audience:** Developers, DevOps, System Architects
**Purpose:** Understanding external dependencies, API usage, and integration patterns
---
## Integration Architecture Diagram
```mermaid
graph TB
%% Main system
subgraph "Norda Biznes Partner System"
WebApp["🌐 Flask Web Application<br/>app.py"]
subgraph "Service Layer"
GeminiSvc["🤖 Gemini Service<br/>gemini_service.py"]
ChatSvc["💬 Chat Service<br/>nordabiz_chat.py"]
EmailSvc["📧 Email Service<br/>email_service.py"]
KRSSvc["🏛️ KRS Service<br/>krs_api_service.py"]
GBPSvc["📊 GBP Audit Service<br/>gbp_audit_service.py"]
end
subgraph "Background Scripts"
SEOScript["📊 SEO Audit<br/>scripts/seo_audit.py"]
SocialScript["📱 Social Media Audit<br/>scripts/social_media_audit.py"]
ImportScript["📥 Data Import<br/>scripts/import_*.py"]
end
Database["💾 PostgreSQL Database<br/>localhost:5432"]
end
%% External integrations
subgraph "AI & ML Services"
Gemini["🤖 Google Gemini API<br/>gemini-2.5-flash<br/><br/>Free tier: 200 req/day<br/>Auth: API Key<br/>Cost: $0.075-$5.00/1M tokens"]
end
subgraph "SEO & Analytics"
PageSpeed["📊 Google PageSpeed Insights<br/>v5 API<br/><br/>Free tier: 25,000 req/day<br/>Auth: API Key<br/>Cost: Free"]
Places["📍 Google Places API<br/>Maps Platform<br/><br/>Pay-per-use<br/>Auth: API Key<br/>Cost: $0.032/request"]
end
subgraph "Search & Discovery"
BraveAPI["🔍 Brave Search API<br/>Web & News Search<br/><br/>Free tier: 2,000 req/month<br/>Auth: API Key<br/>Cost: Free"]
end
subgraph "Data Sources"
KRS["🏛️ KRS Open API<br/>Ministry of Justice Poland<br/><br/>No limits (public API)<br/>Auth: None<br/>Cost: Free"]
ALEO["🌐 ALEO.com<br/>NIP Verification Service<br/><br/>Web scraping (Playwright)<br/>Auth: None<br/>Cost: Free"]
Rejestr["🔗 rejestr.io<br/>Company Connections<br/><br/>Web scraping (Playwright)<br/>Auth: None<br/>Cost: Free"]
end
subgraph "Communication"
MSGraph["📧 Microsoft Graph API<br/>Email & Notifications<br/><br/>10,000 req/10min<br/>Auth: OAuth 2.0 Client Credentials<br/>Cost: Included in M365"]
end
%% Service layer connections to external APIs
GeminiSvc -->|"HTTPS POST<br/>generateContent<br/>API Key: GOOGLE_GEMINI_API_KEY"| Gemini
ChatSvc --> GeminiSvc
GBPSvc -->|"Generate AI recommendations"| GeminiSvc
KRSSvc -->|"HTTPS GET<br/>OdpisAktualny/{krs}<br/>Public API (no auth)"| KRS
EmailSvc -->|"HTTPS POST<br/>users/{id}/sendMail<br/>OAuth 2.0 + Client Credentials"| MSGraph
GBPSvc -->|"HTTPS GET<br/>findplacefromtext<br/>placedetails<br/>API Key: GOOGLE_PLACES_API_KEY"| Places
%% Script connections to external APIs
SEOScript -->|"HTTPS GET<br/>runPagespeed<br/>API Key: GOOGLE_PAGESPEED_API_KEY<br/>Quota tracking: 25K/day"| PageSpeed
SocialScript -->|"HTTPS GET<br/>web/search<br/>news/search<br/>API Key: BRAVE_SEARCH_API_KEY"| BraveAPI
SocialScript -->|"Fallback for reviews"| Places
ImportScript -->|"NIP verification<br/>Playwright browser automation"| ALEO
ImportScript -->|"Company connections<br/>Playwright browser automation"| Rejestr
ImportScript --> KRSSvc
%% Data flows back to database
GeminiSvc -->|"Log costs<br/>ai_api_costs table"| Database
SEOScript -->|"Store metrics<br/>company_website_analysis"| Database
SocialScript -->|"Store profiles<br/>company_social_media"| Database
GBPSvc -->|"Store audit results<br/>gbp_audits"| Database
ImportScript -->|"Import companies<br/>companies table"| Database
%% Web app connections
WebApp --> ChatSvc
WebApp --> EmailSvc
WebApp --> KRSSvc
WebApp --> GBPSvc
WebApp --> GeminiSvc
WebApp --> Database
%% Styling
classDef serviceStyle fill:#85bbf0,stroke:#5d92c7,color:#000000,stroke-width:2px
classDef scriptStyle fill:#ffd93d,stroke:#ccae31,color:#000000,stroke-width:2px
classDef aiStyle fill:#ff6b9d,stroke:#cc5579,color:#ffffff,stroke-width:3px
classDef seoStyle fill:#c44569,stroke:#9d3754,color:#ffffff,stroke-width:3px
classDef searchStyle fill:#6a89cc,stroke:#5570a3,color:#ffffff,stroke-width:3px
classDef dataStyle fill:#4a69bd,stroke:#3b5497,color:#ffffff,stroke-width:3px
classDef commStyle fill:#20bf6b,stroke:#1a9956,color:#ffffff,stroke-width:3px
classDef dbStyle fill:#438dd5,stroke:#2e6295,color:#ffffff,stroke-width:3px
classDef appStyle fill:#1168bd,stroke:#0b4884,color:#ffffff,stroke-width:3px
class GeminiSvc,ChatSvc,EmailSvc,KRSSvc,GBPSvc serviceStyle
class SEOScript,SocialScript,ImportScript scriptStyle
class Gemini aiStyle
class PageSpeed,Places seoStyle
class BraveAPI searchStyle
class KRS,ALEO,Rejestr dataStyle
class MSGraph commStyle
class Database dbStyle
class WebApp appStyle
```
---
## External Integration Details
### 🤖 Google Gemini API
**Purpose:** AI-powered text generation, chat, and image analysis
**Service Files:** `gemini_service.py`, `nordabiz_chat.py`, `gbp_audit_service.py`
**Status:** ✅ Production (Free Tier)
#### Configuration
| Parameter | Value |
|-----------|-------|
| **Provider** | Google Generative AI |
| **Endpoint** | https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent |
| **Authentication** | API Key |
| **Environment Variable** | `GOOGLE_GEMINI_API_KEY` |
| **Default Model** | gemini-2.5-flash |
| **Timeout** | None (default) |
#### Available Models
```python
GEMINI_MODELS = {
'flash': 'gemini-2.5-flash', # Best for general use
'flash-lite': 'gemini-2.5-flash-lite', # Ultra cheap
'pro': 'gemini-2.5-pro', # High quality
'flash-2.0': 'gemini-2.0-flash', # 1M context window
}
```
#### Pricing (per 1M tokens)
| Model | Input Cost | Output Cost |
|-------|-----------|-------------|
| gemini-2.5-flash | $0.075 | $0.30 |
| gemini-2.5-flash-lite | $0.10 | $0.40 |
| gemini-2.5-pro | $1.25 | $5.00 |
| gemini-2.0-flash | $0.075 | $0.30 |
#### Rate Limits
- **Free Tier:** 200 requests/day, 50 requests/hour
- **Token Limits:** Model-dependent (1M for flash-2.0)
#### Integration Points
| Feature | File | Function |
|---------|------|----------|
| AI Chat | `nordabiz_chat.py` | `NordaBizChatEngine.chat()` |
| GBP Recommendations | `gbp_audit_service.py` | `generate_ai_recommendations()` |
| Text Generation | `gemini_service.py` | `generate_text()` |
| Image Analysis | `gemini_service.py` | `analyze_image()` |
#### Cost Tracking
All API calls logged to `ai_api_costs` table:
```sql
ai_api_costs (
id, timestamp, api_provider, model_name,
feature, user_id,
input_tokens, output_tokens, total_tokens,
input_cost, output_cost, total_cost,
success, error_message, latency_ms,
prompt_hash
)
```
#### Data Flow
```
User Message → ChatService → GeminiService
Google Gemini API
Response + Token Count
Cost Calculation → ai_api_costs
Response to User
```
---
### 🏛️ KRS Open API
**Purpose:** Official company data from Polish National Court Register
**Service Files:** `krs_api_service.py`
**Status:** ✅ Production
#### Configuration
| Parameter | Value |
|-----------|-------|
| **Provider** | Ministry of Justice Poland |
| **Endpoint** | https://api-krs.ms.gov.pl/api/krs/OdpisAktualny/{krs} |
| **Authentication** | None (public API) |
| **Timeout** | 15 seconds |
| **Response Format** | JSON |
#### Rate Limits
- **Official Limit:** None documented
- **Best Practice:** 1-2 second delays between requests
- **Timeout:** 15 seconds configured
#### Data Retrieved
- Basic identifiers (KRS, NIP, REGON)
- Company name (full and shortened)
- Legal form (Sp. z o.o., S.A., etc.)
- Full address (street, city, voivodeship)
- Share capital and currency
- Registration dates
- Management board (anonymized in Open API)
- Shareholders (anonymized in Open API)
- Business activities
- OPP status (Organizacja Pożytku Publicznego)
#### Integration Points
| Feature | File | Usage |
|---------|------|-------|
| Data Import | `import_*.py` scripts | Company verification |
| Manual Verification | `verify_all_companies_data.py` | Batch verification |
| API Endpoint | `app.py` | `/api/verify-krs` |
#### Data Flow
```
Import Script → KRSService.get_company_from_krs()
KRS Open API
KRSCompanyData (dataclass)
Verification & Validation
Update companies table
```
---
### 📊 Google PageSpeed Insights API
**Purpose:** SEO, performance, accessibility, and best practices analysis
**Service Files:** `scripts/seo_audit.py`, `scripts/pagespeed_client.py`
**Status:** ✅ Production
#### Configuration
| Parameter | Value |
|-----------|-------|
| **Provider** | Google PageSpeed Insights |
| **Endpoint** | https://www.googleapis.com/pagespeedonline/v5/runPagespeed |
| **Authentication** | API Key |
| **Environment Variable** | `GOOGLE_PAGESPEED_API_KEY` |
| **Google Cloud Project** | NORDABIZNES (gen-lang-client-0540794446) |
| **Timeout** | 30 seconds |
| **Strategy** | Mobile (default), Desktop (optional) |
#### Rate Limits
- **Free Tier:** 25,000 queries/day
- **Per-Second:** Recommended 1 query/second
- **Quota Tracking:** In-memory counter in `pagespeed_client.py`
#### Metrics Returned
```python
@dataclass
class PageSpeedScores:
seo: int # 0-100 SEO score
performance: int # 0-100 Performance score
accessibility: int # 0-100 Accessibility score
best_practices: int # 0-100 Best Practices score
pwa: Optional[int] # 0-100 PWA score
@dataclass
class CoreWebVitals:
lcp_ms: Optional[int] # Largest Contentful Paint
fid_ms: Optional[int] # First Input Delay
cls: Optional[float] # Cumulative Layout Shift
```
#### Database Storage
Results saved to `company_website_analysis`:
```sql
company_website_analysis (
company_id PRIMARY KEY,
analyzed_at,
pagespeed_seo_score,
pagespeed_performance_score,
pagespeed_accessibility_score,
pagespeed_best_practices_score,
pagespeed_audits JSONB,
largest_contentful_paint_ms,
first_input_delay_ms,
cumulative_layout_shift,
seo_overall_score,
seo_health_score,
seo_issues JSONB
)
```
#### Integration Points
| Feature | File | Endpoint/Function |
|---------|------|-------------------|
| Admin Dashboard | `app.py` | `/admin/seo` |
| Audit Script | `scripts/seo_audit.py` | CLI tool |
| Batch Audits | `scripts/seo_audit.py` | `SEOAuditor.run_audit()` |
#### Data Flow
```
Admin Trigger → SEO Audit Script
PageSpeed API
Scores + Core Web Vitals + Audits
company_website_analysis table
Admin Dashboard Display
```
---
### 📍 Google Places API
**Purpose:** Business profiles, ratings, reviews, and opening hours
**Service Files:** `gbp_audit_service.py`, `scripts/social_media_audit.py`
**Status:** ✅ Production
#### Configuration
| Parameter | Value |
|-----------|-------|
| **Provider** | Google Maps Platform |
| **Endpoints** | Find Place from Text, Place Details |
| **Authentication** | API Key |
| **Environment Variable** | `GOOGLE_PLACES_API_KEY` |
| **Timeout** | 15 seconds |
| **Language** | Polish (pl) |
#### Cost
- **Pricing Model:** Pay-per-use
- **Cost per Request:** ~$0.032 per Place Details call
- **Optimization:** 24-hour cache in database
#### Endpoints Used
**1. Find Place from Text**
```
https://maps.googleapis.com/maps/api/place/findplacefromtext/json
```
**2. Place Details**
```
https://maps.googleapis.com/maps/api/place/details/json
```
#### Data Retrieved
```python
{
'google_place_id': str, # Unique Place ID
'google_name': str, # Business name
'google_address': str, # Formatted address
'google_phone': str, # Phone number
'google_website': str, # Website URL
'google_types': List[str], # Business categories
'google_maps_url': str, # Google Maps link
'google_rating': Decimal, # Rating (1.0-5.0)
'google_reviews_count': int, # Number of reviews
'google_photos_count': int, # Number of photos
'google_opening_hours': dict, # Opening hours
'google_business_status': str # OPERATIONAL, CLOSED, etc.
}
```
#### Cache Strategy
- **Cache Duration:** 24 hours
- **Storage:** `company_website_analysis.analyzed_at`
- **Force Refresh:** `force_refresh=True` parameter
#### Integration Points
| Feature | File | Function |
|---------|------|----------|
| GBP Audit | `gbp_audit_service.py` | `fetch_google_business_data()` |
| Social Media Audit | `scripts/social_media_audit.py` | `GooglePlacesSearcher` |
| Admin Dashboard | `app.py` | `/admin/gbp` |
#### Data Flow
```
Admin/Script Trigger → GBPService.fetch_google_business_data()
Check cache (< 24h old?)
[Cache miss] → Places API
Business Profile Data (JSON)
company_website_analysis table
Display in Admin Panel
```
---
### 🔍 Brave Search API
**Purpose:** News monitoring, social media discovery, web search
**Service Files:** `scripts/social_media_audit.py`
**Status:** ✅ Production (Social Media), 📋 Planned (News Monitoring)
#### Configuration
| Parameter | Value |
|-----------|-------|
| **Provider** | Brave Search |
| **Endpoint (Web)** | https://api.search.brave.com/res/v1/web/search |
| **Endpoint (News)** | https://api.search.brave.com/res/v1/news/search |
| **Authentication** | API Key |
| **Environment Variable** | `BRAVE_SEARCH_API_KEY` or `BRAVE_API_KEY` |
| **Timeout** | 15 seconds |
#### Rate Limits
- **Free Tier:** 2,000 requests/month
- **Per-Second:** No official limit
- **Recommended:** 0.5-1 second delay
#### Current Usage: Social Media Discovery
```python
# Search for social media profiles
params = {
"q": f'"{company_name}" {city} facebook OR instagram',
"count": 10,
"country": "pl",
"search_lang": "pl"
}
```
#### Planned Usage: News Monitoring
```python
# News search (from CLAUDE.md)
params = {
"q": f'"{company_name}" OR "{nip}"',
"count": 10,
"freshness": "pw", # past week
"country": "pl",
"search_lang": "pl"
}
```
#### Pattern Extraction
**Social Media URLs:** Regex patterns for:
- Facebook: `facebook.com/[username]`
- Instagram: `instagram.com/[username]`
- LinkedIn: `linkedin.com/company/[name]`
- YouTube: `youtube.com/@[channel]`
- Twitter/X: `twitter.com/[username]` or `x.com/[username]`
- TikTok: `tiktok.com/@[username]`
**Google Reviews:** Patterns:
- `"4,5 (123 opinii)"`
- `"Rating: 4.5 · 123 reviews"`
#### Integration Points
| Feature | File | Status |
|---------|------|--------|
| Social Media Discovery | `scripts/social_media_audit.py` | ✅ Implemented |
| Google Reviews Fallback | `scripts/social_media_audit.py` | ✅ Implemented |
| News Monitoring | (Planned) | 📋 Pending |
#### Data Flow
```
Social Media Audit Script → Brave Search API
Web Search Results (JSON)
Pattern Extraction (regex)
Social Media URLs (Facebook, Instagram, etc.)
company_social_media table
```
---
### 📧 Microsoft Graph API
**Purpose:** Email notifications via Microsoft 365
**Service Files:** `email_service.py`
**Status:** ✅ Production
#### Configuration
| Parameter | Value |
|-----------|-------|
| **Provider** | Microsoft Graph API |
| **Endpoint** | https://graph.microsoft.com/v1.0 |
| **Authentication** | OAuth 2.0 Client Credentials Flow |
| **Authority** | https://login.microsoftonline.com/{tenant_id} |
| **Scope** | https://graph.microsoft.com/.default |
#### Environment Variables
```bash
MICROSOFT_TENANT_ID=<Azure AD Tenant ID>
MICROSOFT_CLIENT_ID=<Application Client ID>
MICROSOFT_CLIENT_SECRET=<Client Secret Value>
MICROSOFT_MAIL_FROM=noreply@nordabiznes.pl
```
#### Authentication Flow
1. **Client Credentials Flow** (Application permissions)
- No user interaction required
- Service-to-service authentication
- Uses client ID + client secret
2. **Token Acquisition**
```python
app = msal.ConfidentialClientApplication(
client_id,
authority=f"https://login.microsoftonline.com/{tenant_id}",
client_credential=client_secret,
)
result = app.acquire_token_for_client(
scopes=["https://graph.microsoft.com/.default"]
)
```
3. **Token Caching**
- MSAL library handles caching
- Tokens cached for ~1 hour
- Automatic refresh when expired
#### Required Azure AD Permissions
**Application Permissions** (requires admin consent):
- `Mail.Send` - Send mail as any user
#### Rate Limits
- **Mail.Send:** 10,000 requests per 10 minutes per app
- **Throttling:** 429 Too Many Requests (retry with backoff)
#### Integration Points
| Feature | File | Usage |
|---------|------|-------|
| User Registration | `app.py` | Send welcome email |
| Password Reset | `app.py` | Send reset link |
| Notifications | `app.py` | News approval notifications |
#### Data Flow
```
App Trigger → EmailService.send_mail()
MSAL Token Acquisition (cached)
Microsoft Graph API
POST /users/{id}/sendMail
Email Sent via M365
Success/Failure Response
```
---
### 🌐 ALEO.com (Web Scraping)
**Purpose:** NIP verification and company data enrichment
**Service Files:** `scripts/import_*.py` (Playwright integration)
**Status:** ✅ Production (Limited Use)
#### Configuration
| Parameter | Value |
|-----------|-------|
| **Provider** | ALEO.com (Polish business directory) |
| **Endpoint** | https://www.aleo.com/ |
| **Authentication** | None (public website) |
| **Method** | Web scraping (Playwright browser automation) |
| **Rate Limiting** | Self-imposed delays (1-2 seconds) |
#### Data Retrieved
- Company NIP verification
- Company name
- Address
- Business category
- Basic contact information
#### Best Practices
- **Rate Limiting:** 1-2 second delays between requests
- **User Agent:** Standard browser user agent
- **Error Handling:** Handle missing elements gracefully
- **Caching:** Cache results to minimize requests
#### Integration Points
| Feature | File | Usage |
|---------|------|-------|
| Data Import | `import_*.py` scripts | NIP verification |
#### Data Flow
```
Import Script → Playwright Browser
ALEO.com Search Page
Company Search (by NIP)
Parse HTML Results
Extract Company Data
Verify against KRS API
Save to companies table
```
---
### 🔗 rejestr.io (Web Scraping)
**Purpose:** Company connections, shareholders, management
**Service Files:** `analyze_connections.py` (Playwright integration)
**Status:** 📋 Planned Enhancement
#### Configuration
| Parameter | Value |
|-----------|-------|
| **Provider** | rejestr.io (KRS registry browser) |
| **Endpoint** | https://rejestr.io/ |
| **Authentication** | None (public website) |
| **Method** | Web scraping (Playwright browser automation) |
| **Rate Limiting** | Self-imposed delays (1-2 seconds) |
#### Data to Retrieve (Planned)
- Management board members
- Shareholders with ownership percentages
- Beneficial owners
- Prokurents (proxies)
- Links between companies (shared owners/managers)
#### Planned Database Table
```sql
company_people (
id SERIAL PRIMARY KEY,
company_id INTEGER REFERENCES companies(id),
name VARCHAR(255),
role VARCHAR(100), -- Prezes, Członek Zarządu, Wspólnik
shares_percent NUMERIC(5,2),
person_url VARCHAR(500), -- Link to rejestr.io person page
created_at TIMESTAMP,
updated_at TIMESTAMP
)
```
#### Integration Points (Planned)
| Feature | File | Status |
|---------|------|--------|
| Connection Analysis | `analyze_connections.py` | 📋 Basic implementation exists |
| Company Profile Display | `templates/company_detail.html` | 📋 Planned |
| Network Visualization | (Future) | 📋 Planned |
---
## Authentication Summary
### API Key Authentication
| API | Environment Variable | Key Location |
|-----|---------------------|--------------|
| Google Gemini | `GOOGLE_GEMINI_API_KEY` | Google AI Studio |
| Google PageSpeed | `GOOGLE_PAGESPEED_API_KEY` | Google Cloud Console |
| Google Places | `GOOGLE_PLACES_API_KEY` | Google Cloud Console |
| Brave Search | `BRAVE_SEARCH_API_KEY` | Brave Search API Portal |
### OAuth 2.0 Authentication
| API | Flow Type | Environment Variables |
|-----|-----------|----------------------|
| Microsoft Graph | Client Credentials | `MICROSOFT_TENANT_ID`<br/>`MICROSOFT_CLIENT_ID`<br/>`MICROSOFT_CLIENT_SECRET` |
### No Authentication
| API | Access Type |
|-----|------------|
| KRS Open API | Public API |
| ALEO.com | Web scraping (public) |
| rejestr.io | Web scraping (public) |
---
## Rate Limits & Quota Management
### Summary Table
| API | Free Tier Quota | Rate Limit | Cost | Tracking |
|-----|----------------|------------|------|----------|
| **Google Gemini** | 200 req/day<br/>50 req/hour | Built-in | $0.075-$5.00/1M tokens | `ai_api_costs` table |
| **Google PageSpeed** | 25,000 req/day | ~1 req/sec | Free | In-memory counter |
| **Google Places** | Pay-per-use | No official limit | $0.032/request | 24-hour cache |
| **Brave Search** | 2,000 req/month | No official limit | Free | None |
| **KRS Open API** | Unlimited | No official limit | Free | None |
| **Microsoft Graph** | 10,000 req/10min | Built-in throttling | Included in M365 | None |
| **ALEO.com** | N/A (scraping) | Self-imposed (1-2s) | Free | None |
| **rejestr.io** | N/A (scraping) | Self-imposed (1-2s) | Free | None |
### Quota Monitoring
**Gemini AI - Daily Cost Report:**
```sql
SELECT
feature,
COUNT(*) as calls,
SUM(total_tokens) as total_tokens,
SUM(total_cost) as total_cost,
AVG(latency_ms) as avg_latency_ms
FROM ai_api_costs
WHERE DATE(timestamp) = CURRENT_DATE
GROUP BY feature
ORDER BY total_cost DESC;
```
**PageSpeed - Remaining Quota:**
```python
from scripts.pagespeed_client import GooglePageSpeedClient
client = GooglePageSpeedClient()
remaining = client.get_remaining_quota()
print(f"Remaining quota: {remaining}/{25000}")
```
---
## Error Handling Patterns
### Common Error Types
**1. Authentication Errors**
- Invalid API key
- Expired credentials
- Missing environment variables
**2. Rate Limiting**
- Quota exceeded (daily/hourly)
- Too many requests per second
- Throttling (429 status code)
**3. Network Errors**
- Connection timeout
- DNS resolution failure
- SSL certificate errors
**4. API Errors**
- 400 Bad Request (invalid parameters)
- 404 Not Found (resource doesn't exist)
- 500 Internal Server Error (API issue)
### Retry Strategy
**Exponential Backoff:**
```python
import time
max_retries = 3
for attempt in range(max_retries):
try:
result = api_client.call()
break
except TransientError:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # 1s, 2s, 4s
time.sleep(wait_time)
else:
raise
```
### Error Handling Example
```python
try:
result = api_client.call_api(params)
except requests.exceptions.Timeout:
logger.error("API timeout")
result = None
except requests.exceptions.ConnectionError as e:
logger.error(f"Connection error: {e}")
result = None
except QuotaExceededError:
logger.warning("Quota exceeded, queuing for retry")
queue_for_retry(params)
except APIError as e:
logger.error(f"API error: {e.status_code} - {e.message}")
result = None
finally:
log_api_call(success=result is not None)
```
---
## Security Considerations
### API Key Storage
**Best Practices:**
- Store in environment variables
- Use `.env` file (NOT committed to git)
- Rotate keys regularly
- Use separate keys for dev/prod
**Never:**
- Hardcode keys in source code
- Commit keys to version control
- Share keys in chat/email
- Use production keys in development
### HTTPS/TLS
All APIs use HTTPS:
- Google APIs: TLS 1.2+
- Microsoft Graph: TLS 1.2+
- Brave Search: TLS 1.2+
- KRS Open API: TLS 1.2+
### Secrets Management
**Production:**
- Environment variables set in systemd service
- Restricted file permissions on `.env` files
- No secrets in logs or error messages
**Development:**
- `.env` file with restricted permissions (600)
- Local `.env` not synced to cloud storage
- Use test API keys when available
---
## Cost Optimization Strategies
### 1. Caching
- **Google Places:** 24-hour cache in `company_website_analysis`
- **PageSpeed:** Cache results, re-audit only when needed
- **Gemini:** Cache common responses (FAQ, greetings)
### 2. Batch Processing
- **SEO Audits:** Run during off-peak hours
- **Social Media Discovery:** Process in batches of 10-20
- **News Monitoring:** Schedule daily/weekly runs
### 3. Model Selection
- **Gemini:** Use cheaper models where appropriate
- `gemini-2.5-flash-lite` for simple tasks
- `gemini-2.5-flash` for general use
- `gemini-2.5-pro` only for complex reasoning
### 4. Result Reuse
- Don't re-analyze unchanged content
- Check last analysis timestamp before API calls
- Use `force_refresh` parameter sparingly
### 5. Quota Monitoring
- Daily reports on API usage and costs
- Alerts when >80% quota used
- Automatic throttling when approaching limit
---
## Monitoring & Troubleshooting
### Health Checks
**Test External API Connectivity:**
```bash
# Gemini API
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=${GOOGLE_GEMINI_API_KEY}" \
-H 'Content-Type: application/json' \
-d '{"contents":[{"parts":[{"text":"Hello"}]}]}'
# PageSpeed API
curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://nordabiznes.pl&key=${GOOGLE_PAGESPEED_API_KEY}"
# KRS API
curl "https://api-krs.ms.gov.pl/api/krs/OdpisAktualny/0000817317?rejestr=P&format=json"
# Brave Search API
curl -H "X-Subscription-Token: ${BRAVE_SEARCH_API_KEY}" \
"https://api.search.brave.com/res/v1/web/search?q=test&count=1"
```
### Common Issues
**1. Gemini Quota Exceeded**
```
Error: 429 Resource has been exhausted
Solution: Wait for quota reset (hourly/daily) or upgrade to paid tier
```
**2. PageSpeed Timeout**
```
Error: Timeout waiting for PageSpeed response
Solution: Increase timeout, retry later, or skip slow websites
```
**3. Places API 403 Forbidden**
```
Error: This API project is not authorized to use this API
Solution: Enable Places API in Google Cloud Console
```
**4. MS Graph Authentication Failed**
```
Error: AADSTS700016: Application not found in directory
Solution: Verify MICROSOFT_TENANT_ID and MICROSOFT_CLIENT_ID
```
### Diagnostic Commands
**Check API Key Configuration:**
```bash
# Development
grep -E "GOOGLE|BRAVE|MICROSOFT" .env
# Production
sudo -u www-data printenv | grep -E "GOOGLE|BRAVE|MICROSOFT"
```
**Check Database API Cost Tracking:**
```sql
-- Gemini API calls today
SELECT
feature,
COUNT(*) as calls,
SUM(total_cost) as cost
FROM ai_api_costs
WHERE DATE(timestamp) = CURRENT_DATE
GROUP BY feature;
-- Failed API calls
SELECT
timestamp,
feature,
error_message
FROM ai_api_costs
WHERE success = FALSE
ORDER BY timestamp DESC
LIMIT 10;
```
---
## Related Documentation
- **[System Context](./01-system-context.md)** - High-level system overview
- **[Container Diagram](./02-container-diagram.md)** - Container architecture
- **[Flask Components](./04-flask-components.md)** - Application components
- **[Database Schema](./05-database-schema.md)** - Database design
- **[External API Integration Analysis](./.auto-claude/specs/003-.../analysis/external-api-integrations.md)** - Detailed API analysis
---
## Maintenance Guidelines
### When to Update This Document
- ✅ Adding new external API integration
- ✅ Changing API authentication method
- ✅ Updating rate limits or quotas
- ✅ Modifying data flow patterns
- ✅ Adding new database tables for API data
- ✅ Changing cost tracking or optimization strategies
### Update Checklist
- [ ] Update Mermaid diagram with new integration
- [ ] Add detailed section for new API
- [ ] Update authentication summary table
- [ ] Update rate limits & quota table
- [ ] Add integration points
- [ ] Document data flow
- [ ] Add health check commands
- [ ] Update cost optimization strategies
---
## Glossary
| Term | Definition |
|------|------------|
| **API Key** | Secret token for authenticating API requests |
| **OAuth 2.0** | Industry-standard protocol for authorization |
| **Client Credentials Flow** | OAuth flow for service-to-service authentication |
| **Rate Limit** | Maximum number of API requests allowed per time period |
| **Quota** | Total allowance for API usage (daily/monthly) |
| **Web Scraping** | Automated extraction of data from websites |
| **Playwright** | Browser automation framework for web scraping |
| **Exponential Backoff** | Retry strategy with increasing delays |
| **HTTPS/TLS** | Secure protocol for encrypted communication |
| **Free Tier** | No-cost API usage level with limits |
| **Pay-per-use** | Pricing model charging per API request |
---
**Document End**