Zmiana domyślnego modelu w dokumentacji i kodzie: - gemini-2.5-flash → gemini-3-flash-preview - gemini-2.5-pro → gemini-3-pro-preview Zaktualizowane pliki: - README.md - opis technologii - docs/architecture/*.md - diagramy i przepływy - nordabiz_chat.py - fallback model name - zopk_news_service.py - model dla AI evaluation - templates/admin/zopk_dashboard.html - wyświetlany model Zachowano mapowania legacy modeli dla kompatybilności wstecznej. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1073 lines
31 KiB
Markdown
1073 lines
31 KiB
Markdown
# External Integrations Architecture
|
|
|
|
**Document Version:** 1.0
|
|
**Last Updated:** 2026-01-10
|
|
**Status:** Production LIVE
|
|
**Diagram Type:** External Systems Integration Architecture
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This diagram shows the **external APIs and data sources** integrated with Norda Biznes Partner. It illustrates:
|
|
|
|
- **6 major API integrations** (Google Gemini, Brave Search, PageSpeed, Places, KRS, MS Graph)
|
|
- **2 web scraping sources** (ALEO.com, rejestr.io)
|
|
- **Authentication methods** for each integration
|
|
- **Data flows** and usage patterns
|
|
- **Rate limits** and quota management
|
|
- **Cost tracking** and optimization
|
|
|
|
**Abstraction Level:** External Integration Architecture
|
|
**Audience:** Developers, DevOps, System Architects
|
|
**Purpose:** Understanding external dependencies, API usage, and integration patterns
|
|
|
|
---
|
|
|
|
## Integration Architecture Diagram
|
|
|
|
```mermaid
|
|
graph TB
|
|
%% Main system
|
|
subgraph "Norda Biznes Partner System"
|
|
WebApp["🌐 Flask Web Application<br/>app.py"]
|
|
|
|
subgraph "Service Layer"
|
|
GeminiSvc["🤖 Gemini Service<br/>gemini_service.py"]
|
|
ChatSvc["💬 Chat Service<br/>nordabiz_chat.py"]
|
|
EmailSvc["📧 Email Service<br/>email_service.py"]
|
|
KRSSvc["🏛️ KRS Service<br/>krs_api_service.py"]
|
|
GBPSvc["📊 GBP Audit Service<br/>gbp_audit_service.py"]
|
|
end
|
|
|
|
subgraph "Background Scripts"
|
|
SEOScript["📊 SEO Audit<br/>scripts/seo_audit.py"]
|
|
SocialScript["📱 Social Media Audit<br/>scripts/social_media_audit.py"]
|
|
ImportScript["📥 Data Import<br/>scripts/import_*.py"]
|
|
end
|
|
|
|
Database["💾 PostgreSQL Database<br/>localhost:5432"]
|
|
end
|
|
|
|
%% External integrations
|
|
subgraph "AI & ML Services"
|
|
Gemini["🤖 Google Gemini API<br/>gemini-3-flash-preview<br/><br/>Free tier: unlimited<br/>Auth: API Key<br/>Cost: Free (preview)"]
|
|
end
|
|
|
|
subgraph "SEO & Analytics"
|
|
PageSpeed["📊 Google PageSpeed Insights<br/>v5 API<br/><br/>Free tier: 25,000 req/day<br/>Auth: API Key<br/>Cost: Free"]
|
|
Places["📍 Google Places API<br/>Maps Platform<br/><br/>Pay-per-use<br/>Auth: API Key<br/>Cost: $0.032/request"]
|
|
end
|
|
|
|
subgraph "Search & Discovery"
|
|
BraveAPI["🔍 Brave Search API<br/>Web & News Search<br/><br/>Free tier: 2,000 req/month<br/>Auth: API Key<br/>Cost: Free"]
|
|
end
|
|
|
|
subgraph "Data Sources"
|
|
KRS["🏛️ KRS Open API<br/>Ministry of Justice Poland<br/><br/>No limits (public API)<br/>Auth: None<br/>Cost: Free"]
|
|
ALEO["🌐 ALEO.com<br/>NIP Verification Service<br/><br/>Web scraping (Playwright)<br/>Auth: None<br/>Cost: Free"]
|
|
Rejestr["🔗 rejestr.io<br/>Company Connections<br/><br/>Web scraping (Playwright)<br/>Auth: None<br/>Cost: Free"]
|
|
end
|
|
|
|
subgraph "Communication"
|
|
MSGraph["📧 Microsoft Graph API<br/>Email & Notifications<br/><br/>10,000 req/10min<br/>Auth: OAuth 2.0 Client Credentials<br/>Cost: Included in M365"]
|
|
end
|
|
|
|
%% Service layer connections to external APIs
|
|
GeminiSvc -->|"HTTPS POST<br/>generateContent<br/>API Key: GOOGLE_GEMINI_API_KEY"| Gemini
|
|
ChatSvc --> GeminiSvc
|
|
GBPSvc -->|"Generate AI recommendations"| GeminiSvc
|
|
|
|
KRSSvc -->|"HTTPS GET<br/>OdpisAktualny/{krs}<br/>Public API (no auth)"| KRS
|
|
|
|
EmailSvc -->|"HTTPS POST<br/>users/{id}/sendMail<br/>OAuth 2.0 + Client Credentials"| MSGraph
|
|
|
|
GBPSvc -->|"HTTPS GET<br/>findplacefromtext<br/>placedetails<br/>API Key: GOOGLE_PLACES_API_KEY"| Places
|
|
|
|
%% Script connections to external APIs
|
|
SEOScript -->|"HTTPS GET<br/>runPagespeed<br/>API Key: GOOGLE_PAGESPEED_API_KEY<br/>Quota tracking: 25K/day"| PageSpeed
|
|
|
|
SocialScript -->|"HTTPS GET<br/>web/search<br/>news/search<br/>API Key: BRAVE_SEARCH_API_KEY"| BraveAPI
|
|
SocialScript -->|"Fallback for reviews"| Places
|
|
|
|
ImportScript -->|"NIP verification<br/>Playwright browser automation"| ALEO
|
|
ImportScript -->|"Company connections<br/>Playwright browser automation"| Rejestr
|
|
ImportScript --> KRSSvc
|
|
|
|
%% Data flows back to database
|
|
GeminiSvc -->|"Log costs<br/>ai_api_costs table"| Database
|
|
SEOScript -->|"Store metrics<br/>company_website_analysis"| Database
|
|
SocialScript -->|"Store profiles<br/>company_social_media"| Database
|
|
GBPSvc -->|"Store audit results<br/>gbp_audits"| Database
|
|
ImportScript -->|"Import companies<br/>companies table"| Database
|
|
|
|
%% Web app connections
|
|
WebApp --> ChatSvc
|
|
WebApp --> EmailSvc
|
|
WebApp --> KRSSvc
|
|
WebApp --> GBPSvc
|
|
WebApp --> GeminiSvc
|
|
WebApp --> Database
|
|
|
|
%% Styling
|
|
classDef serviceStyle fill:#85bbf0,stroke:#5d92c7,color:#000000,stroke-width:2px
|
|
classDef scriptStyle fill:#ffd93d,stroke:#ccae31,color:#000000,stroke-width:2px
|
|
classDef aiStyle fill:#ff6b9d,stroke:#cc5579,color:#ffffff,stroke-width:3px
|
|
classDef seoStyle fill:#c44569,stroke:#9d3754,color:#ffffff,stroke-width:3px
|
|
classDef searchStyle fill:#6a89cc,stroke:#5570a3,color:#ffffff,stroke-width:3px
|
|
classDef dataStyle fill:#4a69bd,stroke:#3b5497,color:#ffffff,stroke-width:3px
|
|
classDef commStyle fill:#20bf6b,stroke:#1a9956,color:#ffffff,stroke-width:3px
|
|
classDef dbStyle fill:#438dd5,stroke:#2e6295,color:#ffffff,stroke-width:3px
|
|
classDef appStyle fill:#1168bd,stroke:#0b4884,color:#ffffff,stroke-width:3px
|
|
|
|
class GeminiSvc,ChatSvc,EmailSvc,KRSSvc,GBPSvc serviceStyle
|
|
class SEOScript,SocialScript,ImportScript scriptStyle
|
|
class Gemini aiStyle
|
|
class PageSpeed,Places seoStyle
|
|
class BraveAPI searchStyle
|
|
class KRS,ALEO,Rejestr dataStyle
|
|
class MSGraph commStyle
|
|
class Database dbStyle
|
|
class WebApp appStyle
|
|
```
|
|
|
|
---
|
|
|
|
## External Integration Details
|
|
|
|
### 🤖 Google Gemini API
|
|
|
|
**Purpose:** AI-powered text generation, chat, and image analysis
|
|
**Service Files:** `gemini_service.py`, `nordabiz_chat.py`, `gbp_audit_service.py`
|
|
**Status:** ✅ Production (Free Tier)
|
|
|
|
#### Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| **Provider** | Google Generative AI |
|
|
| **Endpoint** | https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent |
|
|
| **Authentication** | API Key |
|
|
| **Environment Variable** | `GOOGLE_GEMINI_API_KEY` |
|
|
| **Default Model** | gemini-3-flash-preview |
|
|
| **Timeout** | None (default) |
|
|
|
|
#### Available Models
|
|
|
|
```python
|
|
GEMINI_MODELS = {
|
|
'3-flash': 'gemini-3-flash-preview', # Default - 7x better reasoning, thinking mode
|
|
'3-pro': 'gemini-3-pro-preview', # Advanced - best reasoning, 2M context
|
|
'flash': 'gemini-2.5-flash', # Legacy - balanced cost/quality
|
|
'flash-lite': 'gemini-2.5-flash-lite', # Legacy - ultra cheap
|
|
'pro': 'gemini-2.5-pro', # Legacy - high quality
|
|
'flash-2.0': 'gemini-2.0-flash', # Legacy - 1M context (wycofywany 31.03.2026)
|
|
}
|
|
```
|
|
|
|
#### Pricing (per 1M tokens)
|
|
|
|
| Model | Input Cost | Output Cost |
|
|
|-------|-----------|-------------|
|
|
| gemini-3-flash-preview | Free | Free |
|
|
| gemini-3-pro-preview | $2.00 | $12.00 |
|
|
| gemini-2.5-flash | $0.30 | $2.50 |
|
|
| gemini-2.5-flash-lite | $0.10 | $0.40 |
|
|
| gemini-2.5-pro | $1.25 | $10.00 |
|
|
| gemini-2.0-flash | $0.10 | $0.40 |
|
|
|
|
#### Rate Limits
|
|
|
|
- **Free Tier (Gemini 3 Flash Preview):** Unlimited requests
|
|
- **Token Limits:** Model-dependent (1M for flash-2.0)
|
|
|
|
#### Integration Points
|
|
|
|
| Feature | File | Function |
|
|
|---------|------|----------|
|
|
| AI Chat | `nordabiz_chat.py` | `NordaBizChatEngine.chat()` |
|
|
| GBP Recommendations | `gbp_audit_service.py` | `generate_ai_recommendations()` |
|
|
| Text Generation | `gemini_service.py` | `generate_text()` |
|
|
| Image Analysis | `gemini_service.py` | `analyze_image()` |
|
|
|
|
#### Cost Tracking
|
|
|
|
All API calls logged to `ai_api_costs` table:
|
|
|
|
```sql
|
|
ai_api_costs (
|
|
id, timestamp, api_provider, model_name,
|
|
feature, user_id,
|
|
input_tokens, output_tokens, total_tokens,
|
|
input_cost, output_cost, total_cost,
|
|
success, error_message, latency_ms,
|
|
prompt_hash
|
|
)
|
|
```
|
|
|
|
#### Data Flow
|
|
|
|
```
|
|
User Message → ChatService → GeminiService
|
|
↓
|
|
Google Gemini API
|
|
↓
|
|
Response + Token Count
|
|
↓
|
|
Cost Calculation → ai_api_costs
|
|
↓
|
|
Response to User
|
|
```
|
|
|
|
---
|
|
|
|
### 🏛️ KRS Open API
|
|
|
|
**Purpose:** Official company data from Polish National Court Register
|
|
**Service Files:** `krs_api_service.py`
|
|
**Status:** ✅ Production
|
|
|
|
#### Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| **Provider** | Ministry of Justice Poland |
|
|
| **Endpoint** | https://api-krs.ms.gov.pl/api/krs/OdpisAktualny/{krs} |
|
|
| **Authentication** | None (public API) |
|
|
| **Timeout** | 15 seconds |
|
|
| **Response Format** | JSON |
|
|
|
|
#### Rate Limits
|
|
|
|
- **Official Limit:** None documented
|
|
- **Best Practice:** 1-2 second delays between requests
|
|
- **Timeout:** 15 seconds configured
|
|
|
|
#### Data Retrieved
|
|
|
|
- Basic identifiers (KRS, NIP, REGON)
|
|
- Company name (full and shortened)
|
|
- Legal form (Sp. z o.o., S.A., etc.)
|
|
- Full address (street, city, voivodeship)
|
|
- Share capital and currency
|
|
- Registration dates
|
|
- Management board (anonymized in Open API)
|
|
- Shareholders (anonymized in Open API)
|
|
- Business activities
|
|
- OPP status (Organizacja Pożytku Publicznego)
|
|
|
|
#### Integration Points
|
|
|
|
| Feature | File | Usage |
|
|
|---------|------|-------|
|
|
| Data Import | `import_*.py` scripts | Company verification |
|
|
| Manual Verification | `verify_all_companies_data.py` | Batch verification |
|
|
| API Endpoint | `app.py` | `/api/verify-krs` |
|
|
|
|
#### Data Flow
|
|
|
|
```
|
|
Import Script → KRSService.get_company_from_krs()
|
|
↓
|
|
KRS Open API
|
|
↓
|
|
KRSCompanyData (dataclass)
|
|
↓
|
|
Verification & Validation
|
|
↓
|
|
Update companies table
|
|
```
|
|
|
|
---
|
|
|
|
### 📊 Google PageSpeed Insights API
|
|
|
|
**Purpose:** SEO, performance, accessibility, and best practices analysis
|
|
**Service Files:** `scripts/seo_audit.py`, `scripts/pagespeed_client.py`
|
|
**Status:** ✅ Production
|
|
|
|
#### Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| **Provider** | Google PageSpeed Insights |
|
|
| **Endpoint** | https://www.googleapis.com/pagespeedonline/v5/runPagespeed |
|
|
| **Authentication** | API Key |
|
|
| **Environment Variable** | `GOOGLE_PAGESPEED_API_KEY` |
|
|
| **Google Cloud Project** | NORDABIZNES (gen-lang-client-0540794446) |
|
|
| **Timeout** | 30 seconds |
|
|
| **Strategy** | Mobile (default), Desktop (optional) |
|
|
|
|
#### Rate Limits
|
|
|
|
- **Free Tier:** 25,000 queries/day
|
|
- **Per-Second:** Recommended 1 query/second
|
|
- **Quota Tracking:** In-memory counter in `pagespeed_client.py`
|
|
|
|
#### Metrics Returned
|
|
|
|
```python
|
|
@dataclass
|
|
class PageSpeedScores:
|
|
seo: int # 0-100 SEO score
|
|
performance: int # 0-100 Performance score
|
|
accessibility: int # 0-100 Accessibility score
|
|
best_practices: int # 0-100 Best Practices score
|
|
pwa: Optional[int] # 0-100 PWA score
|
|
|
|
@dataclass
|
|
class CoreWebVitals:
|
|
lcp_ms: Optional[int] # Largest Contentful Paint
|
|
fid_ms: Optional[int] # First Input Delay
|
|
cls: Optional[float] # Cumulative Layout Shift
|
|
```
|
|
|
|
#### Database Storage
|
|
|
|
Results saved to `company_website_analysis`:
|
|
|
|
```sql
|
|
company_website_analysis (
|
|
company_id PRIMARY KEY,
|
|
analyzed_at,
|
|
pagespeed_seo_score,
|
|
pagespeed_performance_score,
|
|
pagespeed_accessibility_score,
|
|
pagespeed_best_practices_score,
|
|
pagespeed_audits JSONB,
|
|
largest_contentful_paint_ms,
|
|
first_input_delay_ms,
|
|
cumulative_layout_shift,
|
|
seo_overall_score,
|
|
seo_health_score,
|
|
seo_issues JSONB
|
|
)
|
|
```
|
|
|
|
#### Integration Points
|
|
|
|
| Feature | File | Endpoint/Function |
|
|
|---------|------|-------------------|
|
|
| Admin Dashboard | `app.py` | `/admin/seo` |
|
|
| Audit Script | `scripts/seo_audit.py` | CLI tool |
|
|
| Batch Audits | `scripts/seo_audit.py` | `SEOAuditor.run_audit()` |
|
|
|
|
#### Data Flow
|
|
|
|
```
|
|
Admin Trigger → SEO Audit Script
|
|
↓
|
|
PageSpeed API
|
|
↓
|
|
Scores + Core Web Vitals + Audits
|
|
↓
|
|
company_website_analysis table
|
|
↓
|
|
Admin Dashboard Display
|
|
```
|
|
|
|
---
|
|
|
|
### 📍 Google Places API
|
|
|
|
**Purpose:** Business profiles, ratings, reviews, and opening hours
|
|
**Service Files:** `gbp_audit_service.py`, `scripts/social_media_audit.py`
|
|
**Status:** ✅ Production
|
|
|
|
#### Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| **Provider** | Google Maps Platform |
|
|
| **Endpoints** | Find Place from Text, Place Details |
|
|
| **Authentication** | API Key |
|
|
| **Environment Variable** | `GOOGLE_PLACES_API_KEY` |
|
|
| **Timeout** | 15 seconds |
|
|
| **Language** | Polish (pl) |
|
|
|
|
#### Cost
|
|
|
|
- **Pricing Model:** Pay-per-use
|
|
- **Cost per Request:** ~$0.032 per Place Details call
|
|
- **Optimization:** 24-hour cache in database
|
|
|
|
#### Endpoints Used
|
|
|
|
**1. Find Place from Text**
|
|
```
|
|
https://maps.googleapis.com/maps/api/place/findplacefromtext/json
|
|
```
|
|
|
|
**2. Place Details**
|
|
```
|
|
https://maps.googleapis.com/maps/api/place/details/json
|
|
```
|
|
|
|
#### Data Retrieved
|
|
|
|
```python
|
|
{
|
|
'google_place_id': str, # Unique Place ID
|
|
'google_name': str, # Business name
|
|
'google_address': str, # Formatted address
|
|
'google_phone': str, # Phone number
|
|
'google_website': str, # Website URL
|
|
'google_types': List[str], # Business categories
|
|
'google_maps_url': str, # Google Maps link
|
|
'google_rating': Decimal, # Rating (1.0-5.0)
|
|
'google_reviews_count': int, # Number of reviews
|
|
'google_photos_count': int, # Number of photos
|
|
'google_opening_hours': dict, # Opening hours
|
|
'google_business_status': str # OPERATIONAL, CLOSED, etc.
|
|
}
|
|
```
|
|
|
|
#### Cache Strategy
|
|
|
|
- **Cache Duration:** 24 hours
|
|
- **Storage:** `company_website_analysis.analyzed_at`
|
|
- **Force Refresh:** `force_refresh=True` parameter
|
|
|
|
#### Integration Points
|
|
|
|
| Feature | File | Function |
|
|
|---------|------|----------|
|
|
| GBP Audit | `gbp_audit_service.py` | `fetch_google_business_data()` |
|
|
| Social Media Audit | `scripts/social_media_audit.py` | `GooglePlacesSearcher` |
|
|
| Admin Dashboard | `app.py` | `/admin/gbp` |
|
|
|
|
#### Data Flow
|
|
|
|
```
|
|
Admin/Script Trigger → GBPService.fetch_google_business_data()
|
|
↓
|
|
Check cache (< 24h old?)
|
|
↓
|
|
[Cache miss] → Places API
|
|
↓
|
|
Business Profile Data (JSON)
|
|
↓
|
|
company_website_analysis table
|
|
↓
|
|
Display in Admin Panel
|
|
```
|
|
|
|
---
|
|
|
|
### 🔍 Brave Search API
|
|
|
|
**Purpose:** News monitoring, social media discovery, web search
|
|
**Service Files:** `scripts/social_media_audit.py`
|
|
**Status:** ✅ Production (Social Media), 📋 Planned (News Monitoring)
|
|
|
|
#### Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| **Provider** | Brave Search |
|
|
| **Endpoint (Web)** | https://api.search.brave.com/res/v1/web/search |
|
|
| **Endpoint (News)** | https://api.search.brave.com/res/v1/news/search |
|
|
| **Authentication** | API Key |
|
|
| **Environment Variable** | `BRAVE_SEARCH_API_KEY` or `BRAVE_API_KEY` |
|
|
| **Timeout** | 15 seconds |
|
|
|
|
#### Rate Limits
|
|
|
|
- **Free Tier:** 2,000 requests/month
|
|
- **Per-Second:** No official limit
|
|
- **Recommended:** 0.5-1 second delay
|
|
|
|
#### Current Usage: Social Media Discovery
|
|
|
|
```python
|
|
# Search for social media profiles
|
|
params = {
|
|
"q": f'"{company_name}" {city} facebook OR instagram',
|
|
"count": 10,
|
|
"country": "pl",
|
|
"search_lang": "pl"
|
|
}
|
|
```
|
|
|
|
#### Planned Usage: News Monitoring
|
|
|
|
```python
|
|
# News search (from CLAUDE.md)
|
|
params = {
|
|
"q": f'"{company_name}" OR "{nip}"',
|
|
"count": 10,
|
|
"freshness": "pw", # past week
|
|
"country": "pl",
|
|
"search_lang": "pl"
|
|
}
|
|
```
|
|
|
|
#### Pattern Extraction
|
|
|
|
**Social Media URLs:** Regex patterns for:
|
|
- Facebook: `facebook.com/[username]`
|
|
- Instagram: `instagram.com/[username]`
|
|
- LinkedIn: `linkedin.com/company/[name]`
|
|
- YouTube: `youtube.com/@[channel]`
|
|
- Twitter/X: `twitter.com/[username]` or `x.com/[username]`
|
|
- TikTok: `tiktok.com/@[username]`
|
|
|
|
**Google Reviews:** Patterns:
|
|
- `"4,5 (123 opinii)"`
|
|
- `"Rating: 4.5 · 123 reviews"`
|
|
|
|
#### Integration Points
|
|
|
|
| Feature | File | Status |
|
|
|---------|------|--------|
|
|
| Social Media Discovery | `scripts/social_media_audit.py` | ✅ Implemented |
|
|
| Google Reviews Fallback | `scripts/social_media_audit.py` | ✅ Implemented |
|
|
| News Monitoring | (Planned) | 📋 Pending |
|
|
|
|
#### Data Flow
|
|
|
|
```
|
|
Social Media Audit Script → Brave Search API
|
|
↓
|
|
Web Search Results (JSON)
|
|
↓
|
|
Pattern Extraction (regex)
|
|
↓
|
|
Social Media URLs (Facebook, Instagram, etc.)
|
|
↓
|
|
company_social_media table
|
|
```
|
|
|
|
---
|
|
|
|
### 📧 Microsoft Graph API
|
|
|
|
**Purpose:** Email notifications via Microsoft 365
|
|
**Service Files:** `email_service.py`
|
|
**Status:** ✅ Production
|
|
|
|
#### Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| **Provider** | Microsoft Graph API |
|
|
| **Endpoint** | https://graph.microsoft.com/v1.0 |
|
|
| **Authentication** | OAuth 2.0 Client Credentials Flow |
|
|
| **Authority** | https://login.microsoftonline.com/{tenant_id} |
|
|
| **Scope** | https://graph.microsoft.com/.default |
|
|
|
|
#### Environment Variables
|
|
|
|
```bash
|
|
MICROSOFT_TENANT_ID=<Azure AD Tenant ID>
|
|
MICROSOFT_CLIENT_ID=<Application Client ID>
|
|
MICROSOFT_CLIENT_SECRET=<Client Secret Value>
|
|
MICROSOFT_MAIL_FROM=noreply@nordabiznes.pl
|
|
```
|
|
|
|
#### Authentication Flow
|
|
|
|
1. **Client Credentials Flow** (Application permissions)
|
|
- No user interaction required
|
|
- Service-to-service authentication
|
|
- Uses client ID + client secret
|
|
|
|
2. **Token Acquisition**
|
|
```python
|
|
app = msal.ConfidentialClientApplication(
|
|
client_id,
|
|
authority=f"https://login.microsoftonline.com/{tenant_id}",
|
|
client_credential=client_secret,
|
|
)
|
|
|
|
result = app.acquire_token_for_client(
|
|
scopes=["https://graph.microsoft.com/.default"]
|
|
)
|
|
```
|
|
|
|
3. **Token Caching**
|
|
- MSAL library handles caching
|
|
- Tokens cached for ~1 hour
|
|
- Automatic refresh when expired
|
|
|
|
#### Required Azure AD Permissions
|
|
|
|
**Application Permissions** (requires admin consent):
|
|
- `Mail.Send` - Send mail as any user
|
|
|
|
#### Rate Limits
|
|
|
|
- **Mail.Send:** 10,000 requests per 10 minutes per app
|
|
- **Throttling:** 429 Too Many Requests (retry with backoff)
|
|
|
|
#### Integration Points
|
|
|
|
| Feature | File | Usage |
|
|
|---------|------|-------|
|
|
| User Registration | `app.py` | Send welcome email |
|
|
| Password Reset | `app.py` | Send reset link |
|
|
| Notifications | `app.py` | News approval notifications |
|
|
|
|
#### Data Flow
|
|
|
|
```
|
|
App Trigger → EmailService.send_mail()
|
|
↓
|
|
MSAL Token Acquisition (cached)
|
|
↓
|
|
Microsoft Graph API
|
|
↓
|
|
POST /users/{id}/sendMail
|
|
↓
|
|
Email Sent via M365
|
|
↓
|
|
Success/Failure Response
|
|
```
|
|
|
|
---
|
|
|
|
### 🌐 ALEO.com (Web Scraping)
|
|
|
|
**Purpose:** NIP verification and company data enrichment
|
|
**Service Files:** `scripts/import_*.py` (Playwright integration)
|
|
**Status:** ✅ Production (Limited Use)
|
|
|
|
#### Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| **Provider** | ALEO.com (Polish business directory) |
|
|
| **Endpoint** | https://www.aleo.com/ |
|
|
| **Authentication** | None (public website) |
|
|
| **Method** | Web scraping (Playwright browser automation) |
|
|
| **Rate Limiting** | Self-imposed delays (1-2 seconds) |
|
|
|
|
#### Data Retrieved
|
|
|
|
- Company NIP verification
|
|
- Company name
|
|
- Address
|
|
- Business category
|
|
- Basic contact information
|
|
|
|
#### Best Practices
|
|
|
|
- **Rate Limiting:** 1-2 second delays between requests
|
|
- **User Agent:** Standard browser user agent
|
|
- **Error Handling:** Handle missing elements gracefully
|
|
- **Caching:** Cache results to minimize requests
|
|
|
|
#### Integration Points
|
|
|
|
| Feature | File | Usage |
|
|
|---------|------|-------|
|
|
| Data Import | `import_*.py` scripts | NIP verification |
|
|
|
|
#### Data Flow
|
|
|
|
```
|
|
Import Script → Playwright Browser
|
|
↓
|
|
ALEO.com Search Page
|
|
↓
|
|
Company Search (by NIP)
|
|
↓
|
|
Parse HTML Results
|
|
↓
|
|
Extract Company Data
|
|
↓
|
|
Verify against KRS API
|
|
↓
|
|
Save to companies table
|
|
```
|
|
|
|
---
|
|
|
|
### 🔗 rejestr.io (Web Scraping)
|
|
|
|
**Purpose:** Company connections, shareholders, management
|
|
**Service Files:** `analyze_connections.py` (Playwright integration)
|
|
**Status:** 📋 Planned Enhancement
|
|
|
|
#### Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| **Provider** | rejestr.io (KRS registry browser) |
|
|
| **Endpoint** | https://rejestr.io/ |
|
|
| **Authentication** | None (public website) |
|
|
| **Method** | Web scraping (Playwright browser automation) |
|
|
| **Rate Limiting** | Self-imposed delays (1-2 seconds) |
|
|
|
|
#### Data to Retrieve (Planned)
|
|
|
|
- Management board members
|
|
- Shareholders with ownership percentages
|
|
- Beneficial owners
|
|
- Prokurents (proxies)
|
|
- Links between companies (shared owners/managers)
|
|
|
|
#### Planned Database Table
|
|
|
|
```sql
|
|
company_people (
|
|
id SERIAL PRIMARY KEY,
|
|
company_id INTEGER REFERENCES companies(id),
|
|
name VARCHAR(255),
|
|
role VARCHAR(100), -- Prezes, Członek Zarządu, Wspólnik
|
|
shares_percent NUMERIC(5,2),
|
|
person_url VARCHAR(500), -- Link to rejestr.io person page
|
|
created_at TIMESTAMP,
|
|
updated_at TIMESTAMP
|
|
)
|
|
```
|
|
|
|
#### Integration Points (Planned)
|
|
|
|
| Feature | File | Status |
|
|
|---------|------|--------|
|
|
| Connection Analysis | `analyze_connections.py` | 📋 Basic implementation exists |
|
|
| Company Profile Display | `templates/company_detail.html` | 📋 Planned |
|
|
| Network Visualization | (Future) | 📋 Planned |
|
|
|
|
---
|
|
|
|
## Authentication Summary
|
|
|
|
### API Key Authentication
|
|
|
|
| API | Environment Variable | Key Location |
|
|
|-----|---------------------|--------------|
|
|
| Google Gemini | `GOOGLE_GEMINI_API_KEY` | Google AI Studio |
|
|
| Google PageSpeed | `GOOGLE_PAGESPEED_API_KEY` | Google Cloud Console |
|
|
| Google Places | `GOOGLE_PLACES_API_KEY` | Google Cloud Console |
|
|
| Brave Search | `BRAVE_SEARCH_API_KEY` | Brave Search API Portal |
|
|
|
|
### OAuth 2.0 Authentication
|
|
|
|
| API | Flow Type | Environment Variables |
|
|
|-----|-----------|----------------------|
|
|
| Microsoft Graph | Client Credentials | `MICROSOFT_TENANT_ID`<br/>`MICROSOFT_CLIENT_ID`<br/>`MICROSOFT_CLIENT_SECRET` |
|
|
|
|
### No Authentication
|
|
|
|
| API | Access Type |
|
|
|-----|------------|
|
|
| KRS Open API | Public API |
|
|
| ALEO.com | Web scraping (public) |
|
|
| rejestr.io | Web scraping (public) |
|
|
|
|
---
|
|
|
|
## Rate Limits & Quota Management
|
|
|
|
### Summary Table
|
|
|
|
| API | Free Tier Quota | Rate Limit | Cost | Tracking |
|
|
|-----|----------------|------------|------|----------|
|
|
| **Google Gemini** | 200 req/day<br/>50 req/hour | Built-in | $0.075-$5.00/1M tokens | `ai_api_costs` table |
|
|
| **Google PageSpeed** | 25,000 req/day | ~1 req/sec | Free | In-memory counter |
|
|
| **Google Places** | Pay-per-use | No official limit | $0.032/request | 24-hour cache |
|
|
| **Brave Search** | 2,000 req/month | No official limit | Free | None |
|
|
| **KRS Open API** | Unlimited | No official limit | Free | None |
|
|
| **Microsoft Graph** | 10,000 req/10min | Built-in throttling | Included in M365 | None |
|
|
| **ALEO.com** | N/A (scraping) | Self-imposed (1-2s) | Free | None |
|
|
| **rejestr.io** | N/A (scraping) | Self-imposed (1-2s) | Free | None |
|
|
|
|
### Quota Monitoring
|
|
|
|
**Gemini AI - Daily Cost Report:**
|
|
```sql
|
|
SELECT
|
|
feature,
|
|
COUNT(*) as calls,
|
|
SUM(total_tokens) as total_tokens,
|
|
SUM(total_cost) as total_cost,
|
|
AVG(latency_ms) as avg_latency_ms
|
|
FROM ai_api_costs
|
|
WHERE DATE(timestamp) = CURRENT_DATE
|
|
GROUP BY feature
|
|
ORDER BY total_cost DESC;
|
|
```
|
|
|
|
**PageSpeed - Remaining Quota:**
|
|
```python
|
|
from scripts.pagespeed_client import GooglePageSpeedClient
|
|
|
|
client = GooglePageSpeedClient()
|
|
remaining = client.get_remaining_quota()
|
|
print(f"Remaining quota: {remaining}/{25000}")
|
|
```
|
|
|
|
---
|
|
|
|
## Error Handling Patterns
|
|
|
|
### Common Error Types
|
|
|
|
**1. Authentication Errors**
|
|
- Invalid API key
|
|
- Expired credentials
|
|
- Missing environment variables
|
|
|
|
**2. Rate Limiting**
|
|
- Quota exceeded (daily/hourly)
|
|
- Too many requests per second
|
|
- Throttling (429 status code)
|
|
|
|
**3. Network Errors**
|
|
- Connection timeout
|
|
- DNS resolution failure
|
|
- SSL certificate errors
|
|
|
|
**4. API Errors**
|
|
- 400 Bad Request (invalid parameters)
|
|
- 404 Not Found (resource doesn't exist)
|
|
- 500 Internal Server Error (API issue)
|
|
|
|
### Retry Strategy
|
|
|
|
**Exponential Backoff:**
|
|
```python
|
|
import time
|
|
|
|
max_retries = 3
|
|
for attempt in range(max_retries):
|
|
try:
|
|
result = api_client.call()
|
|
break
|
|
except TransientError:
|
|
if attempt < max_retries - 1:
|
|
wait_time = 2 ** attempt # 1s, 2s, 4s
|
|
time.sleep(wait_time)
|
|
else:
|
|
raise
|
|
```
|
|
|
|
### Error Handling Example
|
|
|
|
```python
|
|
try:
|
|
result = api_client.call_api(params)
|
|
except requests.exceptions.Timeout:
|
|
logger.error("API timeout")
|
|
result = None
|
|
except requests.exceptions.ConnectionError as e:
|
|
logger.error(f"Connection error: {e}")
|
|
result = None
|
|
except QuotaExceededError:
|
|
logger.warning("Quota exceeded, queuing for retry")
|
|
queue_for_retry(params)
|
|
except APIError as e:
|
|
logger.error(f"API error: {e.status_code} - {e.message}")
|
|
result = None
|
|
finally:
|
|
log_api_call(success=result is not None)
|
|
```
|
|
|
|
---
|
|
|
|
## Security Considerations
|
|
|
|
### API Key Storage
|
|
|
|
✅ **Best Practices:**
|
|
- Store in environment variables
|
|
- Use `.env` file (NOT committed to git)
|
|
- Rotate keys regularly
|
|
- Use separate keys for dev/prod
|
|
|
|
❌ **Never:**
|
|
- Hardcode keys in source code
|
|
- Commit keys to version control
|
|
- Share keys in chat/email
|
|
- Use production keys in development
|
|
|
|
### HTTPS/TLS
|
|
|
|
All APIs use HTTPS:
|
|
- Google APIs: TLS 1.2+
|
|
- Microsoft Graph: TLS 1.2+
|
|
- Brave Search: TLS 1.2+
|
|
- KRS Open API: TLS 1.2+
|
|
|
|
### Secrets Management
|
|
|
|
**Production:**
|
|
- Environment variables set in systemd service
|
|
- Restricted file permissions on `.env` files
|
|
- No secrets in logs or error messages
|
|
|
|
**Development:**
|
|
- `.env` file with restricted permissions (600)
|
|
- Local `.env` not synced to cloud storage
|
|
- Use test API keys when available
|
|
|
|
---
|
|
|
|
## Cost Optimization Strategies
|
|
|
|
### 1. Caching
|
|
- **Google Places:** 24-hour cache in `company_website_analysis`
|
|
- **PageSpeed:** Cache results, re-audit only when needed
|
|
- **Gemini:** Cache common responses (FAQ, greetings)
|
|
|
|
### 2. Batch Processing
|
|
- **SEO Audits:** Run during off-peak hours
|
|
- **Social Media Discovery:** Process in batches of 10-20
|
|
- **News Monitoring:** Schedule daily/weekly runs
|
|
|
|
### 3. Model Selection
|
|
- **Gemini:** Use appropriate models for task complexity
|
|
- `gemini-3-flash-preview` for general use (default, free)
|
|
- `gemini-3-pro-preview` for complex reasoning (paid)
|
|
|
|
### 4. Result Reuse
|
|
- Don't re-analyze unchanged content
|
|
- Check last analysis timestamp before API calls
|
|
- Use `force_refresh` parameter sparingly
|
|
|
|
### 5. Quota Monitoring
|
|
- Daily reports on API usage and costs
|
|
- Alerts when >80% quota used
|
|
- Automatic throttling when approaching limit
|
|
|
|
---
|
|
|
|
## Monitoring & Troubleshooting
|
|
|
|
### Health Checks
|
|
|
|
**Test External API Connectivity:**
|
|
```bash
|
|
# Gemini API
|
|
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent?key=${GOOGLE_GEMINI_API_KEY}" \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"contents":[{"parts":[{"text":"Hello"}]}]}'
|
|
|
|
# PageSpeed API
|
|
curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://nordabiznes.pl&key=${GOOGLE_PAGESPEED_API_KEY}"
|
|
|
|
# KRS API
|
|
curl "https://api-krs.ms.gov.pl/api/krs/OdpisAktualny/0000817317?rejestr=P&format=json"
|
|
|
|
# Brave Search API
|
|
curl -H "X-Subscription-Token: ${BRAVE_SEARCH_API_KEY}" \
|
|
"https://api.search.brave.com/res/v1/web/search?q=test&count=1"
|
|
```
|
|
|
|
### Common Issues
|
|
|
|
**1. Gemini Quota Exceeded**
|
|
```
|
|
Error: 429 Resource has been exhausted
|
|
Solution: Wait for quota reset (hourly/daily) or upgrade to paid tier
|
|
```
|
|
|
|
**2. PageSpeed Timeout**
|
|
```
|
|
Error: Timeout waiting for PageSpeed response
|
|
Solution: Increase timeout, retry later, or skip slow websites
|
|
```
|
|
|
|
**3. Places API 403 Forbidden**
|
|
```
|
|
Error: This API project is not authorized to use this API
|
|
Solution: Enable Places API in Google Cloud Console
|
|
```
|
|
|
|
**4. MS Graph Authentication Failed**
|
|
```
|
|
Error: AADSTS700016: Application not found in directory
|
|
Solution: Verify MICROSOFT_TENANT_ID and MICROSOFT_CLIENT_ID
|
|
```
|
|
|
|
### Diagnostic Commands
|
|
|
|
**Check API Key Configuration:**
|
|
```bash
|
|
# Development
|
|
grep -E "GOOGLE|BRAVE|MICROSOFT" .env
|
|
|
|
# Production
|
|
sudo -u www-data printenv | grep -E "GOOGLE|BRAVE|MICROSOFT"
|
|
```
|
|
|
|
**Check Database API Cost Tracking:**
|
|
```sql
|
|
-- Gemini API calls today
|
|
SELECT
|
|
feature,
|
|
COUNT(*) as calls,
|
|
SUM(total_cost) as cost
|
|
FROM ai_api_costs
|
|
WHERE DATE(timestamp) = CURRENT_DATE
|
|
GROUP BY feature;
|
|
|
|
-- Failed API calls
|
|
SELECT
|
|
timestamp,
|
|
feature,
|
|
error_message
|
|
FROM ai_api_costs
|
|
WHERE success = FALSE
|
|
ORDER BY timestamp DESC
|
|
LIMIT 10;
|
|
```
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- **[System Context](./01-system-context.md)** - High-level system overview
|
|
- **[Container Diagram](./02-container-diagram.md)** - Container architecture
|
|
- **[Flask Components](./04-flask-components.md)** - Application components
|
|
- **[Database Schema](./05-database-schema.md)** - Database design
|
|
- **[External API Integration Analysis](./.auto-claude/specs/003-.../analysis/external-api-integrations.md)** - Detailed API analysis
|
|
|
|
---
|
|
|
|
## Maintenance Guidelines
|
|
|
|
### When to Update This Document
|
|
|
|
- ✅ Adding new external API integration
|
|
- ✅ Changing API authentication method
|
|
- ✅ Updating rate limits or quotas
|
|
- ✅ Modifying data flow patterns
|
|
- ✅ Adding new database tables for API data
|
|
- ✅ Changing cost tracking or optimization strategies
|
|
|
|
### Update Checklist
|
|
|
|
- [ ] Update Mermaid diagram with new integration
|
|
- [ ] Add detailed section for new API
|
|
- [ ] Update authentication summary table
|
|
- [ ] Update rate limits & quota table
|
|
- [ ] Add integration points
|
|
- [ ] Document data flow
|
|
- [ ] Add health check commands
|
|
- [ ] Update cost optimization strategies
|
|
|
|
---
|
|
|
|
## Glossary
|
|
|
|
| Term | Definition |
|
|
|------|------------|
|
|
| **API Key** | Secret token for authenticating API requests |
|
|
| **OAuth 2.0** | Industry-standard protocol for authorization |
|
|
| **Client Credentials Flow** | OAuth flow for service-to-service authentication |
|
|
| **Rate Limit** | Maximum number of API requests allowed per time period |
|
|
| **Quota** | Total allowance for API usage (daily/monthly) |
|
|
| **Web Scraping** | Automated extraction of data from websites |
|
|
| **Playwright** | Browser automation framework for web scraping |
|
|
| **Exponential Backoff** | Retry strategy with increasing delays |
|
|
| **HTTPS/TLS** | Secure protocol for encrypted communication |
|
|
| **Free Tier** | No-cost API usage level with limits |
|
|
| **Pay-per-use** | Pricing model charging per API request |
|
|
|
|
---
|
|
|
|
**Document End**
|