Production moved from on-prem VM 249 (10.22.68.249) to OVH VPS (57.128.200.27, inpi-vps-waw01). Updated ALL documentation, slash commands, memory files, architecture docs, and deploy procedures. Added |local_time Jinja filter (UTC→Europe/Warsaw) and converted 155 .strftime() calls across 71 templates so timestamps display in Polish timezone regardless of server timezone. Also includes: created_by_id tracking, abort import fix, ICS calendar fix for missing end times, Pros Poland data cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
37 KiB
SEO Audit Flow
Document Version: 1.0 Last Updated: 2026-01-10 Status: Production LIVE Flow Type: Admin-Triggered Website SEO Analysis
Overview
This document describes the complete SEO audit flow for the Norda Biznes Partner application, covering:
- Admin Dashboard (
/admin/seoroute) - Single Company Audit (admin-triggered via UI/API)
- Batch Audit (script-based for all companies)
- PageSpeed Insights API Integration for performance metrics
- On-Page SEO Analysis (meta tags, headings, images, links)
- Technical SEO Checks (robots.txt, sitemap, canonical URLs)
- Database Storage in
company_website_analysistable - Results Display on admin dashboard and company profiles
Key Technology:
- PageSpeed API: Google PageSpeed Insights (Lighthouse)
- Analysis Engine: SEOAuditor (scripts/seo_audit.py)
- On-Page Analyzer: OnPageSEOAnalyzer (scripts/seo_analyzer.py)
- Technical Checker: TechnicalSEOChecker (scripts/seo_analyzer.py)
- Database: PostgreSQL (company_website_analysis table)
Key Features:
- Full website analysis (PageSpeed + On-Page + Technical SEO)
- Admin dashboard with sortable table and score distribution
- Color-coded score badges (green 90-100, yellow 50-89, red 0-49)
- Filtering by category, score range, and company name
- Single company audit trigger from admin UI
- Batch audit script for all companies (
scripts/seo_audit.py) - API quota tracking (25,000 requests/day free tier)
API Costs & Performance:
- API: Google PageSpeed Insights (Free tier: 25,000 queries/day)
- Pricing: Free for up to 25,000 requests/day, $5/1000 queries after
- Typical Audit Time: 5-15 seconds per company
- Actual Cost: $0.00 (free tier, 80 companies = 80 audits << 25,000 limit)
1. High-Level SEO Audit Flow
1.1 Complete SEO Audit Flow Diagram
flowchart TD
Admin[Admin User] -->|1. Navigate to /admin/seo| Browser[Browser]
Browser -->|2. GET /admin/seo| Flask[Flask App<br/>app.py]
Flask -->|3. Check permissions| AuthCheck{Is Admin?}
AuthCheck -->|No| Deny[403 Forbidden]
AuthCheck -->|Yes| Dashboard[Admin SEO Dashboard<br/>admin_seo_dashboard.html]
Dashboard -->|4. Render dashboard| Browser
Browser -->|5. Display stats & table| AdminUI[Admin UI]
AdminUI -->|6. Click 'Uruchom audyt'<br/>for single company| TriggerSingle[Trigger Single Audit]
TriggerSingle -->|7. POST /api/seo/audit| Flask
AdminUI -->|8. Click 'Uruchom audyt'<br/>for batch| TriggerBatch[Trigger Batch Audit]
TriggerBatch -->|9. Run script| Script[scripts/seo_audit.py]
Flask -->|10. Verify admin| PermCheck{Is Admin?}
PermCheck -->|No| Error403[403 Error]
PermCheck -->|Yes| CreateAuditor[Create SEOAuditor]
CreateAuditor -->|11. Initialize| Auditor[SEOAuditor<br/>seo_audit.py]
Auditor -->|12. Fetch page| Website[Company Website]
Website -->|13. HTML + HTTP status| Auditor
Auditor -->|14. Analyze HTML| OnPageAnalyzer[OnPageSEOAnalyzer]
OnPageAnalyzer -->|15. Extract meta tags<br/>headings, images| OnPageResult[On-Page Results]
Auditor -->|16. Technical checks| TechnicalChecker[TechnicalSEOChecker]
TechnicalChecker -->|17. Check robots.txt<br/>sitemap, canonical| TechResult[Technical Results]
Auditor -->|18. Check quota| QuotaCheck{Quota > 0?}
QuotaCheck -->|No| SkipPageSpeed[Skip PageSpeed]
QuotaCheck -->|Yes| PageSpeedClient[GooglePageSpeedClient]
PageSpeedClient -->|19. API call| PageSpeedAPI[Google PageSpeed Insights<br/>Lighthouse]
PageSpeedAPI -->|20. Scores + CWV| PageSpeedResult[PageSpeed Results]
PageSpeedResult -->|21. Combine results| Auditor
OnPageResult -->|22. Combine results| Auditor
TechResult -->|23. Combine results| Auditor
SkipPageSpeed -->|24. Combine results| Auditor
Auditor -->|25. Calculate overall score| ScoreCalc[Score Calculator]
ScoreCalc -->|26. Overall SEO score| AuditResult[Complete Audit Result]
AuditResult -->|27. Save to DB| DB[(company_website_analysis)]
DB -->|28. Saved| Auditor
Auditor -->|29. Return results| Flask
Flask -->|30. JSON response| Browser
Browser -->|31. Reload dashboard| AdminUI
Script -->|32. Batch process| Auditor
Script -->|33. For each company| Auditor
style Auditor fill:#4CAF50
style PageSpeedClient fill:#2196F3
style OnPageAnalyzer fill:#FF9800
style TechnicalChecker fill:#9C27B0
style DB fill:#E91E63
1.2 Admin Dashboard View Flow
sequenceDiagram
participant Admin as Admin User
participant Browser as Browser
participant Flask as Flask App
participant DB as PostgreSQL
Admin->>Browser: Navigate to /admin/seo
Browser->>Flask: GET /admin/seo
Flask->>Flask: Check is_admin permission
alt Not Admin
Flask-->>Browser: Redirect to dashboard
else Is Admin
Flask->>DB: Query companies + SEO analysis
DB-->>Flask: Companies with scores
Flask->>Flask: Calculate stats (avg, distribution)
Flask-->>Browser: Render admin_seo_dashboard.html
Browser-->>Admin: Display dashboard with stats & table
end
Admin->>Browser: Click filter/sort
Browser->>Browser: Client-side filtering (JavaScript)
Browser-->>Admin: Updated table view
Admin->>Browser: Click "Uruchom audyt" for company
Browser->>Browser: Show confirmation modal
Admin->>Browser: Confirm audit
Browser->>Flask: POST /api/seo/audit {slug: "company-slug"}
Flask->>Flask: Verify admin + rate limit (10/hour)
Flask->>DB: Find company by slug
DB-->>Flask: Company record
Note over Flask,DB: SEO audit process (see next diagram)
Flask-->>Browser: JSON {success: true, scores: {...}}
Browser->>Browser: Show success modal
Browser->>Browser: Reload page after 1.5s
Browser->>Flask: GET /admin/seo (refresh)
Flask->>DB: Query companies + updated scores
DB-->>Flask: Companies with new scores
Flask-->>Browser: Updated dashboard
Browser-->>Admin: Display updated scores
2. SEO Audit Process Details
2.1 Single Company Audit Flow
sequenceDiagram
participant Flask as Flask App
participant Auditor as SEOAuditor
participant Web as Company Website
participant OnPage as OnPageSEOAnalyzer
participant Tech as TechnicalSEOChecker
participant PageSpeed as GooglePageSpeedClient
participant API as PageSpeed Insights API
participant DB as PostgreSQL
Flask->>Auditor: audit_company(company_dict)
Note over Auditor: 1. FETCH PAGE
Auditor->>Web: HTTP GET website_url
Web-->>Auditor: HTML content + status (200/404/500)
Note over Auditor,OnPage: 2. ON-PAGE ANALYSIS
Auditor->>OnPage: analyze_html(html, base_url)
OnPage->>OnPage: Extract meta tags (title, description, keywords)
OnPage->>OnPage: Count headings (H1, H2, H3)
OnPage->>OnPage: Analyze images (total, alt text)
OnPage->>OnPage: Count links (internal, external)
OnPage->>OnPage: Detect structured data (JSON-LD, Schema.org)
OnPage->>OnPage: Extract Open Graph tags
OnPage->>OnPage: Extract Twitter Card tags
OnPage->>OnPage: Count words on homepage
OnPage-->>Auditor: OnPageSEOResult
Note over Auditor,Tech: 3. TECHNICAL CHECKS
Auditor->>Tech: check_url(final_url)
Tech->>Web: GET /robots.txt
Web-->>Tech: robots.txt content or 404
Tech->>Tech: Parse robots.txt (exists, blocks Googlebot)
Tech->>Web: GET /sitemap.xml
Web-->>Tech: sitemap.xml content or 404
Tech->>Tech: Validate XML sitemap
Tech->>Tech: Check meta robots tags
Tech->>Tech: Check canonical URL
Tech->>Tech: Detect redirect chains
Tech-->>Auditor: TechnicalSEOResult
Note over Auditor,API: 4. PAGESPEED INSIGHTS
Auditor->>PageSpeed: Check remaining quota
PageSpeed-->>Auditor: quota_remaining (e.g., 24,950/25,000)
alt Quota Available
Auditor->>PageSpeed: analyze_url(url, strategy=MOBILE)
PageSpeed->>API: POST runPagespeed?url=...&strategy=mobile
API->>API: Run Lighthouse audit (5-15 seconds)
API-->>PageSpeed: Lighthouse results JSON
PageSpeed->>PageSpeed: Extract scores (0-100)
PageSpeed->>PageSpeed: Extract Core Web Vitals (LCP, FID, CLS)
PageSpeed->>PageSpeed: Extract audits (failed checks)
PageSpeed-->>Auditor: PageSpeedResult
else No Quota
Auditor->>Auditor: Skip PageSpeed (save quota)
end
Note over Auditor: 5. CALCULATE SCORES
Auditor->>Auditor: _calculate_onpage_score(onpage)
Auditor->>Auditor: _calculate_technical_score(technical)
Auditor->>Auditor: _calculate_overall_score(all_results)
Note over Auditor: Score weights:
Note over Auditor: PageSpeed SEO: 3x
Note over Auditor: PageSpeed Perf: 2x
Note over Auditor: On-Page: 2x
Note over Auditor: Technical: 2x
Note over Auditor,DB: 6. SAVE TO DATABASE
Auditor->>DB: UPSERT company_website_analysis
Note over DB: ON CONFLICT (company_id) DO UPDATE
DB-->>Auditor: Saved successfully
Auditor-->>Flask: Complete audit result dict
2.2 Batch Audit Script Flow
flowchart TD
Start[Start: python seo_audit.py --all] --> Init[Initialize SEOAuditor]
Init --> GetCompanies[Get companies from DB<br/>ORDER BY id]
GetCompanies --> Loop{For each company}
Loop -->|Next company| CheckWebsite{Has website?}
CheckWebsite -->|No| Skip[Skip: No website]
Skip --> Loop
CheckWebsite -->|Yes| CheckQuota{Quota > 0?}
CheckQuota -->|No| QuotaWarn[Warn: Quota exceeded<br/>Skip PageSpeed]
QuotaWarn --> AuditPartial[Audit without PageSpeed]
CheckQuota -->|Yes| AuditFull[Full Audit<br/>PageSpeed + OnPage + Technical]
AuditPartial --> SaveResult[Save to database]
AuditFull --> SaveResult
SaveResult --> UpdateStats[Update summary stats]
UpdateStats --> Sleep[Sleep 1s<br/>Rate limiting]
Sleep --> Loop
Loop -->|Done| PrintSummary[Print Summary Report]
PrintSummary --> ShowStats[Show score distribution<br/>Failed audits<br/>Quota usage]
ShowStats --> End[Exit with code]
style AuditFull fill:#4CAF50
style AuditPartial fill:#FF9800
style QuotaWarn fill:#F44336
3. Score Calculation
3.1 Overall SEO Score Formula
The overall SEO score is a weighted average of four components:
Overall Score = (
(PageSpeed SEO × 3) +
(PageSpeed Performance × 2) +
(On-Page Score × 2) +
(Technical Score × 2)
) / Total Weight
Weights:
- PageSpeed SEO: 3x (most important for search rankings)
- PageSpeed Performance: 2x (user experience)
- On-Page Score: 2x (content optimization)
- Technical Score: 2x (crawlability and indexability)
Score Ranges:
- 90-100 (Green): Excellent SEO
- 50-89 (Yellow): Needs improvement
- 0-49 (Red): Poor SEO
3.2 On-Page Score Calculation
Starting Score: 100 (perfect)
Deductions:
| Issue | Deduction | Check |
|---|---|---|
| Missing meta title | -15 | meta_tags['title'] is empty |
| Title too short/long | -5 | Length < 30 or > 70 characters |
| Missing meta description | -10 | meta_tags['description'] is empty |
| Description too short/long | -5 | Length < 120 or > 160 characters |
| No canonical URL | -5 | meta_tags['canonical_url'] is empty |
| No H1 heading | -10 | headings['h1_count'] == 0 |
| Multiple H1 headings | -5 | headings['h1_count'] > 1 |
| Improper heading hierarchy | -5 | H3 without H2, etc. |
| >50% images missing alt | -10 | images_without_alt / total_images > 0.5 |
| >20% images missing alt | -5 | images_without_alt / total_images > 0.2 |
| No structured data | -5 | No JSON-LD or Schema.org |
| No Open Graph tags | -3 | No og:title |
Example:
# Perfect page
score = 100
# Missing meta description (-10)
# 1 image without alt out of 10 (-0, < 20%)
# No structured data (-5)
final_score = 100 - 10 - 5 = 85 (Good)
3.3 Technical Score Calculation
Starting Score: 100 (perfect)
Deductions:
| Issue | Deduction | Check |
|---|---|---|
| No robots.txt | -10 | robots_txt['exists'] == False |
| Robots blocks Googlebot | -20 | robots_txt['blocks_googlebot'] == True |
| No sitemap.xml | -10 | sitemap['exists'] == False |
| Invalid sitemap XML | -5 | sitemap['is_valid_xml'] == False |
| >3 redirects in chain | -10 | redirect_chain['chain_length'] > 3 |
| >1 redirect | -5 | redirect_chain['chain_length'] > 1 |
| Redirect loop detected | -20 | redirect_chain['has_redirect_loop'] == True |
| Not indexable | -15 | indexability['is_indexable'] == False |
| Canonical to different domain | -10 | Points to external site |
Example:
# Typical site
score = 100
# No robots.txt (-10)
# Has sitemap.xml (+0)
# 1 redirect (-5)
# Indexable (+0)
final_score = 100 - 10 - 5 = 85 (Good)
4. Database Schema
4.1 CompanyWebsiteAnalysis Table
The company_website_analysis table stores comprehensive SEO audit results.
Location: database.py (lines ~429-520)
Key Fields:
CREATE TABLE company_website_analysis (
-- Identity
id SERIAL PRIMARY KEY,
company_id INTEGER REFERENCES companies(id) UNIQUE,
analyzed_at TIMESTAMP DEFAULT NOW(),
-- Basic Info
website_url VARCHAR(500),
final_url VARCHAR(500), -- After redirects
http_status_code INTEGER,
load_time_ms INTEGER,
-- PageSpeed Scores (0-100)
pagespeed_seo_score INTEGER,
pagespeed_performance_score INTEGER,
pagespeed_accessibility_score INTEGER,
pagespeed_best_practices_score INTEGER,
pagespeed_audits JSONB, -- Failed Lighthouse audits
-- On-Page SEO
meta_title VARCHAR(500),
meta_description TEXT,
meta_keywords TEXT,
h1_count INTEGER,
h2_count INTEGER,
h3_count INTEGER,
h1_text VARCHAR(500),
total_images INTEGER,
images_without_alt INTEGER,
images_with_alt INTEGER,
internal_links_count INTEGER,
external_links_count INTEGER,
broken_links_count INTEGER,
has_structured_data BOOLEAN,
structured_data_types TEXT[], -- ['Organization', 'LocalBusiness']
structured_data_json JSONB,
-- Technical SEO
has_canonical BOOLEAN,
canonical_url VARCHAR(500),
is_indexable BOOLEAN,
noindex_reason VARCHAR(100),
has_sitemap BOOLEAN,
has_robots_txt BOOLEAN,
viewport_configured BOOLEAN,
is_mobile_friendly BOOLEAN,
-- Core Web Vitals
largest_contentful_paint_ms INTEGER, -- LCP (Good: <2500ms)
first_input_delay_ms INTEGER, -- FID (Good: <100ms)
cumulative_layout_shift NUMERIC(4,2), -- CLS (Good: <0.1)
-- Open Graph
has_og_tags BOOLEAN,
og_title VARCHAR(500),
og_description TEXT,
og_image VARCHAR(500),
has_twitter_cards BOOLEAN,
-- Language & International
html_lang VARCHAR(10),
has_hreflang BOOLEAN,
-- Word Count
word_count_homepage INTEGER,
-- Audit Metadata
seo_audit_version VARCHAR(20),
seo_audited_at TIMESTAMP,
seo_audit_errors TEXT[],
seo_overall_score INTEGER,
seo_health_score INTEGER,
seo_issues JSONB
);
-- Indexes
CREATE INDEX idx_cwa_company_id ON company_website_analysis(company_id);
CREATE INDEX idx_cwa_analyzed_at ON company_website_analysis(analyzed_at);
CREATE INDEX idx_cwa_seo_audited_at ON company_website_analysis(seo_audited_at);
4.2 Upsert Pattern
The audit uses ON CONFLICT DO UPDATE for idempotent saves:
INSERT INTO company_website_analysis (
company_id, analyzed_at, website_url, ...
) VALUES (
:company_id, :analyzed_at, :website_url, ...
)
ON CONFLICT (company_id) DO UPDATE SET
analyzed_at = EXCLUDED.analyzed_at,
website_url = EXCLUDED.website_url,
pagespeed_seo_score = EXCLUDED.pagespeed_seo_score,
-- ... all fields updated
seo_audited_at = EXCLUDED.seo_audited_at;
Benefits:
- Safe to run multiple times (idempotent)
- Always keeps latest audit results
- No duplicate records
- Atomic operation (transaction-safe)
5. API Endpoints
5.1 Admin SEO Dashboard
Route: GET /admin/seo
Authentication: Required (Admin only)
Location: app.py lines 4093-4192
Purpose: Display SEO metrics dashboard for all companies
Query Parameters:
company(optional): Company slug to highlight/filter
Response: HTML (admin_seo_dashboard.html template)
Dashboard Features:
- Summary stats (score distribution, average, not audited count)
- Sortable table by name, category, scores, date
- Filters by category, score range, company name
- Color-coded score badges
- Last audit date with staleness indicator
- Actions: view profile, trigger single audit
Access Control:
if not current_user.is_admin:
flash('Brak uprawnień do tej strony.', 'error')
return redirect(url_for('dashboard'))
5.2 Get SEO Audit Results (Read)
Route: GET /api/seo/audit
Authentication: Not required (public API)
Location: app.py lines 3870-3914
Purpose: Retrieve existing SEO audit results for a company
Query Parameters:
company_id(integer): Company IDslug(string): Company slug
Response:
{
"company_id": 26,
"company_name": "PIXLAB Sp. z o.o.",
"company_slug": "pixlab-sp-z-o-o",
"website": "https://pixlab.pl",
"pagespeed": {
"seo_score": 92,
"performance_score": 78,
"accessibility_score": 95,
"best_practices_score": 88,
"audits": {...}
},
"on_page": {
"meta_title": "PIXLAB - Oprogramowanie na miarę",
"meta_description": "Tworzymy dedykowane oprogramowanie...",
"h1_count": 1,
"total_images": 12,
"images_without_alt": 0,
"has_structured_data": true
},
"technical": {
"has_robots_txt": true,
"has_sitemap": true,
"is_indexable": true,
"is_mobile_friendly": true
},
"overall_score": 88,
"audited_at": "2026-01-10T10:30:00"
}
5.3 Trigger SEO Audit (Write)
Route: POST /api/seo/audit
Authentication: Required (Admin only)
Rate Limit: 10 requests per hour per user
Location: app.py lines 3943-4086
Purpose: Trigger a new SEO audit for a company
Request Body:
{
"company_id": 26,
"slug": "pixlab-sp-z-o-o"
}
Response (Success):
{
"success": true,
"message": "Audyt SEO dla firmy \"PIXLAB Sp. z o.o.\" został zakończony pomyślnie.",
"audit_version": "1.0.0",
"triggered_by": "admin@nordabiznes.pl",
"triggered_at": "2026-01-10T10:35:00",
"company_id": 26,
"company_name": "PIXLAB Sp. z o.o.",
"pagespeed": {...},
"on_page": {...},
"technical": {...},
"overall_score": 88
}
Response (Error - No Website):
{
"success": false,
"error": "Firma \"PIXLAB Sp. z o.o.\" nie ma zdefiniowanej strony internetowej.",
"company_id": 26,
"company_name": "PIXLAB Sp. z o.o."
}
Response (Error - Quota Exceeded):
{
"success": false,
"error": "PageSpeed API quota exceeded. Try again tomorrow.",
"company_id": 26
}
Access Control:
if not current_user.is_admin:
return jsonify({
'success': False,
'error': 'Brak uprawnień. Tylko administrator może uruchamiać audyty SEO.'
}), 403
Rate Limiting:
@limiter.limit("10 per hour")
6. PageSpeed Insights API Integration
6.1 API Configuration
Service File: scripts/pagespeed_client.py
Endpoint: https://www.googleapis.com/pagespeedonline/v5/runPagespeed
Authentication: API Key (GOOGLE_PAGESPEED_API_KEY)
Free Tier:
- 25,000 queries per day
- $5 per 1,000 queries after free tier
API Key:
- Name in Google Cloud: "Page SPEED SEO Audit v2"
- Project: NORDABIZNES (gen-lang-client-0540794446)
- Storage:
.envfile (GOOGLE_PAGESPEED_API_KEY)
6.2 API Request
params = {
'url': 'https://example.com',
'key': GOOGLE_PAGESPEED_API_KEY,
'strategy': 'mobile', # or 'desktop'
'category': ['performance', 'accessibility', 'best-practices', 'seo']
}
response = requests.get(
'https://www.googleapis.com/pagespeedonline/v5/runPagespeed',
params=params,
timeout=30
)
6.3 API Response Structure
{
"lighthouseResult": {
"categories": {
"performance": {"score": 0.78},
"accessibility": {"score": 0.95},
"best-practices": {"score": 0.88},
"seo": {"score": 0.92}
},
"audits": {
"largest-contentful-paint": {"numericValue": 2300},
"first-input-delay": {"numericValue": 85},
"cumulative-layout-shift": {"numericValue": 0.05},
"meta-description": {"score": 1.0},
"robots-txt": {"score": 1.0},
"is-crawlable": {"score": 1.0}
}
},
"loadingExperience": {
"metrics": {
"LARGEST_CONTENTFUL_PAINT_MS": {"category": "FAST"},
"FIRST_INPUT_DELAY_MS": {"category": "FAST"},
"CUMULATIVE_LAYOUT_SHIFT_SCORE": {"category": "FAST"}
}
}
}
6.4 Quota Management
Quota Tracking:
class GooglePageSpeedClient:
def __init__(self):
self.daily_quota = 25000
self.used_today = 0 # Reset daily at midnight
def get_remaining_quota(self) -> int:
"""Returns remaining API quota for today."""
return max(0, self.daily_quota - self.used_today)
def analyze_url(self, url: str) -> PageSpeedResult:
if self.get_remaining_quota() <= 0:
raise QuotaExceededError("Daily quota exceeded")
# Make API call
response = self._call_api(url)
self.used_today += 1
return self._parse_response(response)
Quota Exceeded Handling:
- Check quota before audit:
if quota > 0 - If exceeded, skip PageSpeed but continue on-page/technical
- Log warning: "PageSpeed quota exceeded, skipping"
- Return partial audit result (no PageSpeed scores)
7. SEO Audit Script Usage
7.1 Command Line Interface
Script Location: scripts/seo_audit.py
Basic Usage:
# Audit single company by ID
python seo_audit.py --company-id 26
# Audit single company by slug
python seo_audit.py --company-slug pixlab-sp-z-o-o
# Audit batch of companies (rows 1-10)
python seo_audit.py --batch 1-10
# Audit all companies
python seo_audit.py --all
# Dry run (no database writes)
python seo_audit.py --company-id 26 --dry-run
# Export results to JSON
python seo_audit.py --all --json > seo_report.json
Options:
--company-id ID: Audit single company by ID--company-ids IDS: Audit multiple companies (comma-separated: 1,5,10)--batch RANGE: Audit batch by row offset (e.g., 1-10)--all: Audit all companies--dry-run: Print results without saving to database--verbose, -v: Enable verbose/debug output--quiet, -q: Suppress progress output (only summary)--json: Output results as JSON--database-url URL: Override DATABASE_URL env var
7.2 Exit Codes
| Code | Meaning |
|---|---|
| 0 | All audits completed successfully |
| 1 | Argument error or invalid input |
| 2 | Partial failures (some audits failed) |
| 3 | All audits failed |
| 4 | Database connection error |
| 5 | API quota exceeded |
7.3 Batch Audit Output
============================================================
SEO AUDIT STARTING
============================================================
Companies to audit: 80
Mode: LIVE
PageSpeed API quota remaining: 24,950
============================================================
[1/80] PIXLAB Sp. z o.o. (ID: 26) - ETA: calculating...
Fetching page: https://pixlab.pl
Page fetched successfully (850ms)
Running on-page SEO analysis...
On-page analysis complete
Running technical SEO checks...
Technical checks complete
Running PageSpeed Insights (quota: 24,949)...
PageSpeed complete - SEO: 92, Perf: 78
Saved SEO audit for company 26
→ SUCCESS: Overall SEO score: 88
[2/80] Hotel SPA Wieniawa (ID: 15) - ETA: 00:15:30
Fetching page: https://wieniawa.pl
...
======================================================================
SEO AUDIT COMPLETE
======================================================================
Mode: LIVE
Duration: 00:18:45
----------------------------------------------------------------------
RESULTS BREAKDOWN
----------------------------------------------------------------------
Total companies: 80
✓ Successful: 72
✗ Failed: 5
○ Skipped: 3
- No website: 3
- Unavailable: 2
- Timeout: 2
- SSL errors: 1
----------------------------------------------------------------------
PAGESPEED API QUOTA
----------------------------------------------------------------------
Quota at start: 24,950
Quota used: 72
Quota remaining: 24,878
----------------------------------------------------------------------
SEO SCORE DISTRIBUTION
----------------------------------------------------------------------
Companies with scores: 72
Average SEO score: 76.3
Highest score: 95
Lowest score: 42
Excellent (90-100): 18 ██████████████░░░░░░░░░░░░░░░░░░
Good (70-89): 38 ████████████████████████████████
Fair (50-69): 12 ████████░░░░░░░░░░░░░░░░░░░░░░░░
Poor (<50): 4 ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
----------------------------------------------------------------------
FAILED AUDITS
----------------------------------------------------------------------
🔴 Firma ABC - HTTP 404
⏱ Firma XYZ - Timeout after 30s
🔌 Firma DEF - Connection refused
======================================================================
7.4 Production Deployment
On NORDABIZ-01 Server:
# Connect to server
ssh maciejpi@57.128.200.27
# Navigate to application directory
cd /var/www/nordabiznes
# Activate virtual environment
source venv/bin/activate
# Run audit for all companies (production database)
cd scripts
python seo_audit.py --all
# Run audit for specific company
python seo_audit.py --company-id 26
# Dry run to test without saving
python seo_audit.py --all --dry-run
# Export results to JSON
python seo_audit.py --all --json > ~/seo_audit_$(date +%Y%m%d).json
IMPORTANT - Database Connection:
Scripts in scripts/ must use localhost (127.0.0.1) for PostgreSQL:
# CORRECT:
DATABASE_URL = 'postgresql://nordabiz_app:NordaBiz2025Secure@127.0.0.1:5432/nordabiz'
# WRONG (PostgreSQL doesn't accept external connections):
DATABASE_URL = 'postgresql://nordabiz_app:NordaBiz2025Secure@57.128.200.27:5432/nordabiz'
7.5 Cron Job (Automated Audits)
Schedule weekly audit:
# Edit crontab
crontab -e
# Add weekly audit (Sundays at 2 AM)
0 2 * * 0 cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/seo_audit.py --all >> /var/log/nordabiznes/seo_audit.log 2>&1
Benefits:
- Automatic SEO monitoring
- Detect score degradation
- Track improvements over time
- Email alerts on failures (future)
8. Security & Performance
8.1 Security Features
1. Admin-Only Access:
if not current_user.is_admin:
return jsonify({'error': 'Brak uprawnień'}), 403
2. Rate Limiting:
@limiter.limit("10 per hour")
- Prevents API abuse
- Protects PageSpeed quota
- Per-user rate limit
3. CSRF Protection:
fetch('/api/seo/audit', {
headers: {
'X-CSRFToken': csrfToken
}
})
4. Input Validation:
if not company_id and not slug:
return jsonify({'error': 'Podaj company_id lub slug'}), 400
5. Database Permissions:
GRANT ALL ON TABLE company_website_analysis TO nordabiz_app;
GRANT USAGE, SELECT ON SEQUENCE company_website_analysis_id_seq TO nordabiz_app;
8.2 Performance Optimizations
1. Upsert Instead of Insert:
- ON CONFLICT DO UPDATE (idempotent)
- No duplicate records
- Safe to re-run audits
2. Database Indexing:
CREATE INDEX idx_cwa_company_id ON company_website_analysis(company_id);
CREATE INDEX idx_cwa_seo_audited_at ON company_website_analysis(seo_audited_at);
3. Batch Processing:
- Process companies sequentially
- Sleep 1s between audits (rate limiting)
- Skip companies without websites
4. API Quota Management:
- Check quota before calling PageSpeed
- Skip PageSpeed if quota low
- Continue with on-page/technical only
5. Timeout Handling:
response = requests.get(url, timeout=30)
- Prevents hanging requests
- Falls back gracefully
6. Caching (Future):
- Cache PageSpeed results for 7 days
- Skip re-audit if recent (<7 days old)
- Force refresh option for admins
9. Error Handling
9.1 Common Errors
1. No Website URL:
{
"success": false,
"error": "Firma \"ABC\" nie ma zdefiniowanej strony internetowej.",
"company_id": 15
}
2. Website Unreachable:
{
"success": false,
"error": "Audyt nie powiódł się: HTTP 404, Timeout after 30s",
"company_id": 26
}
3. SSL Certificate Error:
⚠ SSL error for https://example.com
Trying HTTP fallback: http://example.com
✓ Fallback successful
4. PageSpeed API Quota Exceeded:
{
"success": false,
"error": "PageSpeed API quota exceeded. Try again tomorrow."
}
5. Database Connection Error:
❌ Error: Database connection failed: connection refused
Exit code: 4
9.2 Error Recovery
1. SSL Errors → HTTP Fallback:
try:
response = requests.get(https_url)
except requests.exceptions.SSLError:
http_url = https_url.replace('https://', 'http://')
response = requests.get(http_url)
2. Timeout → Skip Company:
try:
response = requests.get(url, timeout=30)
except requests.exceptions.Timeout:
result['errors'].append('Timeout after 30s')
# Continue to next company
3. Quota Exceeded → Skip PageSpeed:
if quota_remaining > 0:
run_pagespeed_audit()
else:
logger.warning("Quota exceeded, skipping PageSpeed")
# Continue with on-page/technical only
4. Database Error → Rollback:
try:
db.execute(query)
db.commit()
except SQLAlchemyError as e:
db.rollback()
logger.error(f"Database error: {e}")
10. Monitoring & Maintenance
10.1 Health Checks
Check SEO Audit Status:
# Check latest audit dates
psql -U nordabiz_app -d nordabiz -c "
SELECT
c.name,
cwa.seo_audited_at,
cwa.pagespeed_seo_score,
cwa.seo_overall_score
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
ORDER BY cwa.seo_audited_at DESC NULLS LAST
LIMIT 10;
"
Check Quota Usage:
# Check how many audits today
psql -U nordabiz_app -d nordabiz -c "
SELECT COUNT(*) AS audits_today
FROM company_website_analysis
WHERE seo_audited_at >= CURRENT_DATE;
"
Check Failed Audits:
# Companies with no SEO data
psql -U nordabiz_app -d nordabiz -c "
SELECT c.id, c.name, c.website
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
AND c.website IS NOT NULL
AND cwa.id IS NULL;
"
10.2 Maintenance Tasks
1. Re-audit Stale Data (>30 days):
python seo_audit.py --all --filter-stale 30
2. Audit New Companies:
# Companies added in last 7 days
python seo_audit.py --filter-new 7
3. Fix Failed Audits:
# Re-audit companies with errors
python seo_audit.py --retry-failed
4. Clean Old Data:
-- Delete audit results older than 90 days (keep latest)
DELETE FROM company_website_analysis
WHERE analyzed_at < NOW() - INTERVAL '90 days'
AND id NOT IN (
SELECT DISTINCT ON (company_id) id
FROM company_website_analysis
ORDER BY company_id, analyzed_at DESC
);
10.3 Monitoring Queries
Score Distribution:
SELECT
CASE
WHEN pagespeed_seo_score >= 90 THEN 'Excellent (90-100)'
WHEN pagespeed_seo_score >= 50 THEN 'Good (50-89)'
WHEN pagespeed_seo_score >= 0 THEN 'Poor (0-49)'
ELSE 'Not Audited'
END AS score_range,
COUNT(*) AS companies
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
GROUP BY score_range
ORDER BY score_range;
Top/Bottom Performers:
-- Top 10 SEO scores
SELECT c.name, cwa.pagespeed_seo_score, cwa.seo_overall_score
FROM companies c
JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
ORDER BY cwa.seo_overall_score DESC
LIMIT 10;
-- Bottom 10 SEO scores
SELECT c.name, cwa.pagespeed_seo_score, cwa.seo_overall_score
FROM companies c
JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active' AND cwa.seo_overall_score IS NOT NULL
ORDER BY cwa.seo_overall_score ASC
LIMIT 10;
Audit Coverage:
SELECT
COUNT(*) AS total_companies,
COUNT(cwa.id) AS audited_companies,
ROUND(COUNT(cwa.id)::NUMERIC / COUNT(*)::NUMERIC * 100, 1) AS coverage_percent
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active' AND c.website IS NOT NULL;
11. Future Enhancements
11.1 Planned Features
1. Automated Re-Audit Scheduling:
- Weekly cron job for all companies
- Priority queue for low-scoring sites
- Email alerts for score drops
2. Historical Trend Tracking:
- Store audit history (not just latest)
- Chart score changes over time
- Identify improving/declining sites
3. Competitor Benchmarking:
- Compare scores within categories
- Identify SEO leaders
- Best practice recommendations
4. SEO Report Generation:
- PDF reports for company owners
- Actionable recommendations
- Step-by-step fix guides
5. Integration with Company Profiles:
- Display SEO badge on company page
- Show top SEO issues
- Link to audit details
6. Mobile vs Desktop Audits:
- Separate scores for mobile/desktop
- Mobile-first optimization tracking
- Device-specific recommendations
11.2 Technical Improvements
1. Async Batch Processing:
- Celery background tasks
- Parallel audits (5 concurrent)
- Real-time progress updates
2. API Webhook Notifications:
- Notify company owners of audit results
- Integration with Slack/Discord
- Email summaries
3. Advanced Caching:
- Cache PageSpeed results for 7 days
- Skip re-audit if recent
- Force refresh button for admins
4. Audit Scheduling:
- Per-company audit frequency
- High-priority companies daily
- Low-priority weekly
12. Troubleshooting
12.1 Common Issues
Issue: "PageSpeed API quota exceeded" Solution: Wait 24 hours for quota reset or upgrade to paid tier
Issue: "Database connection failed"
Solution: Check PostgreSQL is running: systemctl status postgresql
Issue: "SSL certificate verify failed" Solution: Script automatically tries HTTP fallback
Issue: "Company has no website URL" Solution: Add website in company edit form or skip
Issue: "Timeout after 30s" Solution: Website is slow/down, skip or retry later
12.2 Debugging
Enable Verbose Logging:
python seo_audit.py --all --verbose
Check API Key:
echo $GOOGLE_PAGESPEED_API_KEY
# Should print API key, not empty
Test Single Company:
python seo_audit.py --company-id 26 --dry-run
# See full audit output without saving
Check Database Connection:
psql -U nordabiz_app -d nordabiz -h 127.0.0.1 -c "SELECT COUNT(*) FROM companies;"
Test PageSpeed API:
curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://pixlab.pl&key=YOUR_API_KEY&strategy=mobile"
13. Related Documentation
- Google PageSpeed API: docs/architecture/flows/external-api-integrations.md#3-google-pagespeed-insights-api
- Database Schema: docs/architecture/05-database-schema.md
- Flask Components: docs/architecture/04-flask-components.md
- Admin Panel: CLAUDE.md#audyt-seo-panel-adminseo
14. Glossary
| Term | Definition |
|---|---|
| SEO | Search Engine Optimization - improving website visibility in search results |
| PageSpeed Insights | Google tool for measuring website performance and SEO quality |
| Lighthouse | Automated audit tool by Google (powers PageSpeed Insights) |
| Core Web Vitals | Google's UX metrics: LCP (Largest Contentful Paint), FID (First Input Delay), CLS (Cumulative Layout Shift) |
| On-Page SEO | SEO factors on the page itself (meta tags, headings, content) |
| Technical SEO | SEO factors related to crawlability (robots.txt, sitemap, indexability) |
| Meta Tags | HTML tags providing metadata about the page (title, description, keywords) |
| Structured Data | Machine-readable format (JSON-LD, Schema.org) for search engines |
| Canonical URL | Preferred version of a page (prevents duplicate content issues) |
| Robots.txt | File telling search engines which pages to crawl/not crawl |
| Sitemap.xml | XML file listing all pages on a website for search engines |
| Open Graph | Meta tags for social media sharing (og:title, og:image, etc.) |
| Twitter Card | Meta tags for Twitter sharing |
| Upsert | Database operation: INSERT or UPDATE if exists |
| Quota | API usage limit (25,000 requests/day for PageSpeed) |
Document End