# SEO Audit Flow **Document Version:** 1.0 **Last Updated:** 2026-01-10 **Status:** Production LIVE **Flow Type:** Admin-Triggered Website SEO Analysis --- ## Overview This document describes the **complete SEO audit flow** for the Norda Biznes Partner application, covering: - **Admin Dashboard** (`/admin/seo` route) - **Single Company Audit** (admin-triggered via UI/API) - **Batch Audit** (script-based for all companies) - **PageSpeed Insights API Integration** for performance metrics - **On-Page SEO Analysis** (meta tags, headings, images, links) - **Technical SEO Checks** (robots.txt, sitemap, canonical URLs) - **Database Storage** in `company_website_analysis` table - **Results Display** on admin dashboard and company profiles **Key Technology:** - **PageSpeed API:** Google PageSpeed Insights (Lighthouse) - **Analysis Engine:** SEOAuditor (scripts/seo_audit.py) - **On-Page Analyzer:** OnPageSEOAnalyzer (scripts/seo_analyzer.py) - **Technical Checker:** TechnicalSEOChecker (scripts/seo_analyzer.py) - **Database:** PostgreSQL (company_website_analysis table) **Key Features:** - Full website analysis (PageSpeed + On-Page + Technical SEO) - Admin dashboard with sortable table and score distribution - Color-coded score badges (green 90-100, yellow 50-89, red 0-49) - Filtering by category, score range, and company name - Single company audit trigger from admin UI - Batch audit script for all companies (`scripts/seo_audit.py`) - API quota tracking (25,000 requests/day free tier) **API Costs & Performance:** - **API:** Google PageSpeed Insights (Free tier: 25,000 queries/day) - **Pricing:** Free for up to 25,000 requests/day, $5/1000 queries after - **Typical Audit Time:** 5-15 seconds per company - **Actual Cost:** $0.00 (free tier, 80 companies = 80 audits << 25,000 limit) --- ## 1. High-Level SEO Audit Flow ### 1.1 Complete SEO Audit Flow Diagram ```mermaid flowchart TD Admin[Admin User] -->|1. Navigate to /admin/seo| Browser[Browser] Browser -->|2. GET /admin/seo| Flask[Flask App
app.py] Flask -->|3. Check permissions| AuthCheck{Is Admin?} AuthCheck -->|No| Deny[403 Forbidden] AuthCheck -->|Yes| Dashboard[Admin SEO Dashboard
admin_seo_dashboard.html] Dashboard -->|4. Render dashboard| Browser Browser -->|5. Display stats & table| AdminUI[Admin UI] AdminUI -->|6. Click 'Uruchom audyt'
for single company| TriggerSingle[Trigger Single Audit] TriggerSingle -->|7. POST /api/seo/audit| Flask AdminUI -->|8. Click 'Uruchom audyt'
for batch| TriggerBatch[Trigger Batch Audit] TriggerBatch -->|9. Run script| Script[scripts/seo_audit.py] Flask -->|10. Verify admin| PermCheck{Is Admin?} PermCheck -->|No| Error403[403 Error] PermCheck -->|Yes| CreateAuditor[Create SEOAuditor] CreateAuditor -->|11. Initialize| Auditor[SEOAuditor
seo_audit.py] Auditor -->|12. Fetch page| Website[Company Website] Website -->|13. HTML + HTTP status| Auditor Auditor -->|14. Analyze HTML| OnPageAnalyzer[OnPageSEOAnalyzer] OnPageAnalyzer -->|15. Extract meta tags
headings, images| OnPageResult[On-Page Results] Auditor -->|16. Technical checks| TechnicalChecker[TechnicalSEOChecker] TechnicalChecker -->|17. Check robots.txt
sitemap, canonical| TechResult[Technical Results] Auditor -->|18. Check quota| QuotaCheck{Quota > 0?} QuotaCheck -->|No| SkipPageSpeed[Skip PageSpeed] QuotaCheck -->|Yes| PageSpeedClient[GooglePageSpeedClient] PageSpeedClient -->|19. API call| PageSpeedAPI[Google PageSpeed Insights
Lighthouse] PageSpeedAPI -->|20. Scores + CWV| PageSpeedResult[PageSpeed Results] PageSpeedResult -->|21. Combine results| Auditor OnPageResult -->|22. Combine results| Auditor TechResult -->|23. Combine results| Auditor SkipPageSpeed -->|24. Combine results| Auditor Auditor -->|25. Calculate overall score| ScoreCalc[Score Calculator] ScoreCalc -->|26. Overall SEO score| AuditResult[Complete Audit Result] AuditResult -->|27. Save to DB| DB[(company_website_analysis)] DB -->|28. Saved| Auditor Auditor -->|29. Return results| Flask Flask -->|30. JSON response| Browser Browser -->|31. Reload dashboard| AdminUI Script -->|32. Batch process| Auditor Script -->|33. For each company| Auditor style Auditor fill:#4CAF50 style PageSpeedClient fill:#2196F3 style OnPageAnalyzer fill:#FF9800 style TechnicalChecker fill:#9C27B0 style DB fill:#E91E63 ``` ### 1.2 Admin Dashboard View Flow ```mermaid sequenceDiagram participant Admin as Admin User participant Browser as Browser participant Flask as Flask App participant DB as PostgreSQL Admin->>Browser: Navigate to /admin/seo Browser->>Flask: GET /admin/seo Flask->>Flask: Check is_admin permission alt Not Admin Flask-->>Browser: Redirect to dashboard else Is Admin Flask->>DB: Query companies + SEO analysis DB-->>Flask: Companies with scores Flask->>Flask: Calculate stats (avg, distribution) Flask-->>Browser: Render admin_seo_dashboard.html Browser-->>Admin: Display dashboard with stats & table end Admin->>Browser: Click filter/sort Browser->>Browser: Client-side filtering (JavaScript) Browser-->>Admin: Updated table view Admin->>Browser: Click "Uruchom audyt" for company Browser->>Browser: Show confirmation modal Admin->>Browser: Confirm audit Browser->>Flask: POST /api/seo/audit {slug: "company-slug"} Flask->>Flask: Verify admin + rate limit (10/hour) Flask->>DB: Find company by slug DB-->>Flask: Company record Note over Flask,DB: SEO audit process (see next diagram) Flask-->>Browser: JSON {success: true, scores: {...}} Browser->>Browser: Show success modal Browser->>Browser: Reload page after 1.5s Browser->>Flask: GET /admin/seo (refresh) Flask->>DB: Query companies + updated scores DB-->>Flask: Companies with new scores Flask-->>Browser: Updated dashboard Browser-->>Admin: Display updated scores ``` --- ## 2. SEO Audit Process Details ### 2.1 Single Company Audit Flow ```mermaid sequenceDiagram participant Flask as Flask App participant Auditor as SEOAuditor participant Web as Company Website participant OnPage as OnPageSEOAnalyzer participant Tech as TechnicalSEOChecker participant PageSpeed as GooglePageSpeedClient participant API as PageSpeed Insights API participant DB as PostgreSQL Flask->>Auditor: audit_company(company_dict) Note over Auditor: 1. FETCH PAGE Auditor->>Web: HTTP GET website_url Web-->>Auditor: HTML content + status (200/404/500) Note over Auditor,OnPage: 2. ON-PAGE ANALYSIS Auditor->>OnPage: analyze_html(html, base_url) OnPage->>OnPage: Extract meta tags (title, description, keywords) OnPage->>OnPage: Count headings (H1, H2, H3) OnPage->>OnPage: Analyze images (total, alt text) OnPage->>OnPage: Count links (internal, external) OnPage->>OnPage: Detect structured data (JSON-LD, Schema.org) OnPage->>OnPage: Extract Open Graph tags OnPage->>OnPage: Extract Twitter Card tags OnPage->>OnPage: Count words on homepage OnPage-->>Auditor: OnPageSEOResult Note over Auditor,Tech: 3. TECHNICAL CHECKS Auditor->>Tech: check_url(final_url) Tech->>Web: GET /robots.txt Web-->>Tech: robots.txt content or 404 Tech->>Tech: Parse robots.txt (exists, blocks Googlebot) Tech->>Web: GET /sitemap.xml Web-->>Tech: sitemap.xml content or 404 Tech->>Tech: Validate XML sitemap Tech->>Tech: Check meta robots tags Tech->>Tech: Check canonical URL Tech->>Tech: Detect redirect chains Tech-->>Auditor: TechnicalSEOResult Note over Auditor,API: 4. PAGESPEED INSIGHTS Auditor->>PageSpeed: Check remaining quota PageSpeed-->>Auditor: quota_remaining (e.g., 24,950/25,000) alt Quota Available Auditor->>PageSpeed: analyze_url(url, strategy=MOBILE) PageSpeed->>API: POST runPagespeed?url=...&strategy=mobile API->>API: Run Lighthouse audit (5-15 seconds) API-->>PageSpeed: Lighthouse results JSON PageSpeed->>PageSpeed: Extract scores (0-100) PageSpeed->>PageSpeed: Extract Core Web Vitals (LCP, FID, CLS) PageSpeed->>PageSpeed: Extract audits (failed checks) PageSpeed-->>Auditor: PageSpeedResult else No Quota Auditor->>Auditor: Skip PageSpeed (save quota) end Note over Auditor: 5. CALCULATE SCORES Auditor->>Auditor: _calculate_onpage_score(onpage) Auditor->>Auditor: _calculate_technical_score(technical) Auditor->>Auditor: _calculate_overall_score(all_results) Note over Auditor: Score weights: Note over Auditor: PageSpeed SEO: 3x Note over Auditor: PageSpeed Perf: 2x Note over Auditor: On-Page: 2x Note over Auditor: Technical: 2x Note over Auditor,DB: 6. SAVE TO DATABASE Auditor->>DB: UPSERT company_website_analysis Note over DB: ON CONFLICT (company_id) DO UPDATE DB-->>Auditor: Saved successfully Auditor-->>Flask: Complete audit result dict ``` ### 2.2 Batch Audit Script Flow ```mermaid flowchart TD Start[Start: python seo_audit.py --all] --> Init[Initialize SEOAuditor] Init --> GetCompanies[Get companies from DB
ORDER BY id] GetCompanies --> Loop{For each company} Loop -->|Next company| CheckWebsite{Has website?} CheckWebsite -->|No| Skip[Skip: No website] Skip --> Loop CheckWebsite -->|Yes| CheckQuota{Quota > 0?} CheckQuota -->|No| QuotaWarn[Warn: Quota exceeded
Skip PageSpeed] QuotaWarn --> AuditPartial[Audit without PageSpeed] CheckQuota -->|Yes| AuditFull[Full Audit
PageSpeed + OnPage + Technical] AuditPartial --> SaveResult[Save to database] AuditFull --> SaveResult SaveResult --> UpdateStats[Update summary stats] UpdateStats --> Sleep[Sleep 1s
Rate limiting] Sleep --> Loop Loop -->|Done| PrintSummary[Print Summary Report] PrintSummary --> ShowStats[Show score distribution
Failed audits
Quota usage] ShowStats --> End[Exit with code] style AuditFull fill:#4CAF50 style AuditPartial fill:#FF9800 style QuotaWarn fill:#F44336 ``` --- ## 3. Score Calculation ### 3.1 Overall SEO Score Formula The overall SEO score is a **weighted average** of four components: ``` Overall Score = ( (PageSpeed SEO × 3) + (PageSpeed Performance × 2) + (On-Page Score × 2) + (Technical Score × 2) ) / Total Weight ``` **Weights:** - PageSpeed SEO: **3x** (most important for search rankings) - PageSpeed Performance: **2x** (user experience) - On-Page Score: **2x** (content optimization) - Technical Score: **2x** (crawlability and indexability) **Score Ranges:** - **90-100 (Green):** Excellent SEO - **50-89 (Yellow):** Needs improvement - **0-49 (Red):** Poor SEO ### 3.2 On-Page Score Calculation **Starting Score:** 100 (perfect) **Deductions:** | Issue | Deduction | Check | |-------|-----------|-------| | Missing meta title | -15 | `meta_tags['title']` is empty | | Title too short/long | -5 | Length < 30 or > 70 characters | | Missing meta description | -10 | `meta_tags['description']` is empty | | Description too short/long | -5 | Length < 120 or > 160 characters | | No canonical URL | -5 | `meta_tags['canonical_url']` is empty | | No H1 heading | -10 | `headings['h1_count']` == 0 | | Multiple H1 headings | -5 | `headings['h1_count']` > 1 | | Improper heading hierarchy | -5 | H3 without H2, etc. | | >50% images missing alt | -10 | `images_without_alt / total_images` > 0.5 | | >20% images missing alt | -5 | `images_without_alt / total_images` > 0.2 | | No structured data | -5 | No JSON-LD or Schema.org | | No Open Graph tags | -3 | No `og:title` | **Example:** ```python # Perfect page score = 100 # Missing meta description (-10) # 1 image without alt out of 10 (-0, < 20%) # No structured data (-5) final_score = 100 - 10 - 5 = 85 (Good) ``` ### 3.3 Technical Score Calculation **Starting Score:** 100 (perfect) **Deductions:** | Issue | Deduction | Check | |-------|-----------|-------| | No robots.txt | -10 | `robots_txt['exists']` == False | | Robots blocks Googlebot | -20 | `robots_txt['blocks_googlebot']` == True | | No sitemap.xml | -10 | `sitemap['exists']` == False | | Invalid sitemap XML | -5 | `sitemap['is_valid_xml']` == False | | >3 redirects in chain | -10 | `redirect_chain['chain_length']` > 3 | | >1 redirect | -5 | `redirect_chain['chain_length']` > 1 | | Redirect loop detected | -20 | `redirect_chain['has_redirect_loop']` == True | | Not indexable | -15 | `indexability['is_indexable']` == False | | Canonical to different domain | -10 | Points to external site | **Example:** ```python # Typical site score = 100 # No robots.txt (-10) # Has sitemap.xml (+0) # 1 redirect (-5) # Indexable (+0) final_score = 100 - 10 - 5 = 85 (Good) ``` --- ## 4. Database Schema ### 4.1 CompanyWebsiteAnalysis Table The `company_website_analysis` table stores comprehensive SEO audit results. **Location:** `database.py` (lines ~429-520) **Key Fields:** ```sql CREATE TABLE company_website_analysis ( -- Identity id SERIAL PRIMARY KEY, company_id INTEGER REFERENCES companies(id) UNIQUE, analyzed_at TIMESTAMP DEFAULT NOW(), -- Basic Info website_url VARCHAR(500), final_url VARCHAR(500), -- After redirects http_status_code INTEGER, load_time_ms INTEGER, -- PageSpeed Scores (0-100) pagespeed_seo_score INTEGER, pagespeed_performance_score INTEGER, pagespeed_accessibility_score INTEGER, pagespeed_best_practices_score INTEGER, pagespeed_audits JSONB, -- Failed Lighthouse audits -- On-Page SEO meta_title VARCHAR(500), meta_description TEXT, meta_keywords TEXT, h1_count INTEGER, h2_count INTEGER, h3_count INTEGER, h1_text VARCHAR(500), total_images INTEGER, images_without_alt INTEGER, images_with_alt INTEGER, internal_links_count INTEGER, external_links_count INTEGER, broken_links_count INTEGER, has_structured_data BOOLEAN, structured_data_types TEXT[], -- ['Organization', 'LocalBusiness'] structured_data_json JSONB, -- Technical SEO has_canonical BOOLEAN, canonical_url VARCHAR(500), is_indexable BOOLEAN, noindex_reason VARCHAR(100), has_sitemap BOOLEAN, has_robots_txt BOOLEAN, viewport_configured BOOLEAN, is_mobile_friendly BOOLEAN, -- Core Web Vitals largest_contentful_paint_ms INTEGER, -- LCP (Good: <2500ms) first_input_delay_ms INTEGER, -- FID (Good: <100ms) cumulative_layout_shift NUMERIC(4,2), -- CLS (Good: <0.1) -- Open Graph has_og_tags BOOLEAN, og_title VARCHAR(500), og_description TEXT, og_image VARCHAR(500), has_twitter_cards BOOLEAN, -- Language & International html_lang VARCHAR(10), has_hreflang BOOLEAN, -- Word Count word_count_homepage INTEGER, -- Audit Metadata seo_audit_version VARCHAR(20), seo_audited_at TIMESTAMP, seo_audit_errors TEXT[], seo_overall_score INTEGER, seo_health_score INTEGER, seo_issues JSONB ); -- Indexes CREATE INDEX idx_cwa_company_id ON company_website_analysis(company_id); CREATE INDEX idx_cwa_analyzed_at ON company_website_analysis(analyzed_at); CREATE INDEX idx_cwa_seo_audited_at ON company_website_analysis(seo_audited_at); ``` ### 4.2 Upsert Pattern The audit uses **ON CONFLICT DO UPDATE** for idempotent saves: ```sql INSERT INTO company_website_analysis ( company_id, analyzed_at, website_url, ... ) VALUES ( :company_id, :analyzed_at, :website_url, ... ) ON CONFLICT (company_id) DO UPDATE SET analyzed_at = EXCLUDED.analyzed_at, website_url = EXCLUDED.website_url, pagespeed_seo_score = EXCLUDED.pagespeed_seo_score, -- ... all fields updated seo_audited_at = EXCLUDED.seo_audited_at; ``` **Benefits:** - Safe to run multiple times (idempotent) - Always keeps latest audit results - No duplicate records - Atomic operation (transaction-safe) --- ## 5. API Endpoints ### 5.1 Admin SEO Dashboard **Route:** `GET /admin/seo` **Authentication:** Required (Admin only) **Location:** `app.py` lines 4093-4192 **Purpose:** Display SEO metrics dashboard for all companies **Query Parameters:** - `company` (optional): Company slug to highlight/filter **Response:** HTML (admin_seo_dashboard.html template) **Dashboard Features:** - Summary stats (score distribution, average, not audited count) - Sortable table by name, category, scores, date - Filters by category, score range, company name - Color-coded score badges - Last audit date with staleness indicator - Actions: view profile, trigger single audit **Access Control:** ```python if not current_user.is_admin: flash('Brak uprawnień do tej strony.', 'error') return redirect(url_for('dashboard')) ``` ### 5.2 Get SEO Audit Results (Read) **Route:** `GET /api/seo/audit` **Authentication:** Not required (public API) **Location:** `app.py` lines 3870-3914 **Purpose:** Retrieve existing SEO audit results for a company **Query Parameters:** - `company_id` (integer): Company ID - `slug` (string): Company slug **Response:** ```json { "company_id": 26, "company_name": "PIXLAB Sp. z o.o.", "company_slug": "pixlab-sp-z-o-o", "website": "https://pixlab.pl", "pagespeed": { "seo_score": 92, "performance_score": 78, "accessibility_score": 95, "best_practices_score": 88, "audits": {...} }, "on_page": { "meta_title": "PIXLAB - Oprogramowanie na miarę", "meta_description": "Tworzymy dedykowane oprogramowanie...", "h1_count": 1, "total_images": 12, "images_without_alt": 0, "has_structured_data": true }, "technical": { "has_robots_txt": true, "has_sitemap": true, "is_indexable": true, "is_mobile_friendly": true }, "overall_score": 88, "audited_at": "2026-01-10T10:30:00" } ``` ### 5.3 Trigger SEO Audit (Write) **Route:** `POST /api/seo/audit` **Authentication:** Required (Admin only) **Rate Limit:** 10 requests per hour per user **Location:** `app.py` lines 3943-4086 **Purpose:** Trigger a new SEO audit for a company **Request Body:** ```json { "company_id": 26, "slug": "pixlab-sp-z-o-o" } ``` **Response (Success):** ```json { "success": true, "message": "Audyt SEO dla firmy \"PIXLAB Sp. z o.o.\" został zakończony pomyślnie.", "audit_version": "1.0.0", "triggered_by": "admin@nordabiznes.pl", "triggered_at": "2026-01-10T10:35:00", "company_id": 26, "company_name": "PIXLAB Sp. z o.o.", "pagespeed": {...}, "on_page": {...}, "technical": {...}, "overall_score": 88 } ``` **Response (Error - No Website):** ```json { "success": false, "error": "Firma \"PIXLAB Sp. z o.o.\" nie ma zdefiniowanej strony internetowej.", "company_id": 26, "company_name": "PIXLAB Sp. z o.o." } ``` **Response (Error - Quota Exceeded):** ```json { "success": false, "error": "PageSpeed API quota exceeded. Try again tomorrow.", "company_id": 26 } ``` **Access Control:** ```python if not current_user.is_admin: return jsonify({ 'success': False, 'error': 'Brak uprawnień. Tylko administrator może uruchamiać audyty SEO.' }), 403 ``` **Rate Limiting:** ```python @limiter.limit("10 per hour") ``` --- ## 6. PageSpeed Insights API Integration ### 6.1 API Configuration **Service File:** `scripts/pagespeed_client.py` **Endpoint:** `https://www.googleapis.com/pagespeedonline/v5/runPagespeed` **Authentication:** API Key (GOOGLE_PAGESPEED_API_KEY) **Free Tier:** - 25,000 queries per day - $5 per 1,000 queries after free tier **API Key:** - **Name in Google Cloud:** "Page SPEED SEO Audit v2" - **Project:** NORDABIZNES (gen-lang-client-0540794446) - **Storage:** `.env` file (GOOGLE_PAGESPEED_API_KEY) ### 6.2 API Request ```python params = { 'url': 'https://example.com', 'key': GOOGLE_PAGESPEED_API_KEY, 'strategy': 'mobile', # or 'desktop' 'category': ['performance', 'accessibility', 'best-practices', 'seo'] } response = requests.get( 'https://www.googleapis.com/pagespeedonline/v5/runPagespeed', params=params, timeout=30 ) ``` ### 6.3 API Response Structure ```json { "lighthouseResult": { "categories": { "performance": {"score": 0.78}, "accessibility": {"score": 0.95}, "best-practices": {"score": 0.88}, "seo": {"score": 0.92} }, "audits": { "largest-contentful-paint": {"numericValue": 2300}, "first-input-delay": {"numericValue": 85}, "cumulative-layout-shift": {"numericValue": 0.05}, "meta-description": {"score": 1.0}, "robots-txt": {"score": 1.0}, "is-crawlable": {"score": 1.0} } }, "loadingExperience": { "metrics": { "LARGEST_CONTENTFUL_PAINT_MS": {"category": "FAST"}, "FIRST_INPUT_DELAY_MS": {"category": "FAST"}, "CUMULATIVE_LAYOUT_SHIFT_SCORE": {"category": "FAST"} } } } ``` ### 6.4 Quota Management **Quota Tracking:** ```python class GooglePageSpeedClient: def __init__(self): self.daily_quota = 25000 self.used_today = 0 # Reset daily at midnight def get_remaining_quota(self) -> int: """Returns remaining API quota for today.""" return max(0, self.daily_quota - self.used_today) def analyze_url(self, url: str) -> PageSpeedResult: if self.get_remaining_quota() <= 0: raise QuotaExceededError("Daily quota exceeded") # Make API call response = self._call_api(url) self.used_today += 1 return self._parse_response(response) ``` **Quota Exceeded Handling:** 1. Check quota before audit: `if quota > 0` 2. If exceeded, skip PageSpeed but continue on-page/technical 3. Log warning: "PageSpeed quota exceeded, skipping" 4. Return partial audit result (no PageSpeed scores) --- ## 7. SEO Audit Script Usage ### 7.1 Command Line Interface **Script Location:** `scripts/seo_audit.py` **Basic Usage:** ```bash # Audit single company by ID python seo_audit.py --company-id 26 # Audit single company by slug python seo_audit.py --company-slug pixlab-sp-z-o-o # Audit batch of companies (rows 1-10) python seo_audit.py --batch 1-10 # Audit all companies python seo_audit.py --all # Dry run (no database writes) python seo_audit.py --company-id 26 --dry-run # Export results to JSON python seo_audit.py --all --json > seo_report.json ``` **Options:** - `--company-id ID`: Audit single company by ID - `--company-ids IDS`: Audit multiple companies (comma-separated: 1,5,10) - `--batch RANGE`: Audit batch by row offset (e.g., 1-10) - `--all`: Audit all companies - `--dry-run`: Print results without saving to database - `--verbose, -v`: Enable verbose/debug output - `--quiet, -q`: Suppress progress output (only summary) - `--json`: Output results as JSON - `--database-url URL`: Override DATABASE_URL env var ### 7.2 Exit Codes | Code | Meaning | |------|---------| | 0 | All audits completed successfully | | 1 | Argument error or invalid input | | 2 | Partial failures (some audits failed) | | 3 | All audits failed | | 4 | Database connection error | | 5 | API quota exceeded | ### 7.3 Batch Audit Output ``` ============================================================ SEO AUDIT STARTING ============================================================ Companies to audit: 80 Mode: LIVE PageSpeed API quota remaining: 24,950 ============================================================ [1/80] PIXLAB Sp. z o.o. (ID: 26) - ETA: calculating... Fetching page: https://pixlab.pl Page fetched successfully (850ms) Running on-page SEO analysis... On-page analysis complete Running technical SEO checks... Technical checks complete Running PageSpeed Insights (quota: 24,949)... PageSpeed complete - SEO: 92, Perf: 78 Saved SEO audit for company 26 → SUCCESS: Overall SEO score: 88 [2/80] Hotel SPA Wieniawa (ID: 15) - ETA: 00:15:30 Fetching page: https://wieniawa.pl ... ====================================================================== SEO AUDIT COMPLETE ====================================================================== Mode: LIVE Duration: 00:18:45 ---------------------------------------------------------------------- RESULTS BREAKDOWN ---------------------------------------------------------------------- Total companies: 80 ✓ Successful: 72 ✗ Failed: 5 ○ Skipped: 3 - No website: 3 - Unavailable: 2 - Timeout: 2 - SSL errors: 1 ---------------------------------------------------------------------- PAGESPEED API QUOTA ---------------------------------------------------------------------- Quota at start: 24,950 Quota used: 72 Quota remaining: 24,878 ---------------------------------------------------------------------- SEO SCORE DISTRIBUTION ---------------------------------------------------------------------- Companies with scores: 72 Average SEO score: 76.3 Highest score: 95 Lowest score: 42 Excellent (90-100): 18 ██████████████░░░░░░░░░░░░░░░░░░ Good (70-89): 38 ████████████████████████████████ Fair (50-69): 12 ████████░░░░░░░░░░░░░░░░░░░░░░░░ Poor (<50): 4 ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ---------------------------------------------------------------------- FAILED AUDITS ---------------------------------------------------------------------- 🔴 Firma ABC - HTTP 404 ⏱ Firma XYZ - Timeout after 30s 🔌 Firma DEF - Connection refused ====================================================================== ``` ### 7.4 Production Deployment **On NORDABIZ-01 Server:** ```bash # Connect to server ssh maciejpi@57.128.200.27 # Navigate to application directory cd /var/www/nordabiznes # Activate virtual environment source venv/bin/activate # Run audit for all companies (production database) cd scripts python seo_audit.py --all # Run audit for specific company python seo_audit.py --company-id 26 # Dry run to test without saving python seo_audit.py --all --dry-run # Export results to JSON python seo_audit.py --all --json > ~/seo_audit_$(date +%Y%m%d).json ``` **IMPORTANT - Database Connection:** Scripts in `scripts/` must use **localhost (127.0.0.1)** for PostgreSQL: ```python # CORRECT: DATABASE_URL = 'postgresql://nordabiz_app:NordaBiz2025Secure@127.0.0.1:5432/nordabiz' # WRONG (PostgreSQL doesn't accept external connections): DATABASE_URL = 'postgresql://nordabiz_app:NordaBiz2025Secure@57.128.200.27:5432/nordabiz' ``` ### 7.5 Cron Job (Automated Audits) **Schedule weekly audit:** ```bash # Edit crontab crontab -e # Add weekly audit (Sundays at 2 AM) 0 2 * * 0 cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/seo_audit.py --all >> /var/log/nordabiznes/seo_audit.log 2>&1 ``` **Benefits:** - Automatic SEO monitoring - Detect score degradation - Track improvements over time - Email alerts on failures (future) --- ## 8. Security & Performance ### 8.1 Security Features **1. Admin-Only Access:** ```python if not current_user.is_admin: return jsonify({'error': 'Brak uprawnień'}), 403 ``` **2. Rate Limiting:** ```python @limiter.limit("10 per hour") ``` - Prevents API abuse - Protects PageSpeed quota - Per-user rate limit **3. CSRF Protection:** ```javascript fetch('/api/seo/audit', { headers: { 'X-CSRFToken': csrfToken } }) ``` **4. Input Validation:** ```python if not company_id and not slug: return jsonify({'error': 'Podaj company_id lub slug'}), 400 ``` **5. Database Permissions:** ```sql GRANT ALL ON TABLE company_website_analysis TO nordabiz_app; GRANT USAGE, SELECT ON SEQUENCE company_website_analysis_id_seq TO nordabiz_app; ``` ### 8.2 Performance Optimizations **1. Upsert Instead of Insert:** - ON CONFLICT DO UPDATE (idempotent) - No duplicate records - Safe to re-run audits **2. Database Indexing:** ```sql CREATE INDEX idx_cwa_company_id ON company_website_analysis(company_id); CREATE INDEX idx_cwa_seo_audited_at ON company_website_analysis(seo_audited_at); ``` **3. Batch Processing:** - Process companies sequentially - Sleep 1s between audits (rate limiting) - Skip companies without websites **4. API Quota Management:** - Check quota before calling PageSpeed - Skip PageSpeed if quota low - Continue with on-page/technical only **5. Timeout Handling:** ```python response = requests.get(url, timeout=30) ``` - Prevents hanging requests - Falls back gracefully **6. Caching (Future):** - Cache PageSpeed results for 7 days - Skip re-audit if recent (<7 days old) - Force refresh option for admins --- ## 9. Error Handling ### 9.1 Common Errors **1. No Website URL:** ```json { "success": false, "error": "Firma \"ABC\" nie ma zdefiniowanej strony internetowej.", "company_id": 15 } ``` **2. Website Unreachable:** ```json { "success": false, "error": "Audyt nie powiódł się: HTTP 404, Timeout after 30s", "company_id": 26 } ``` **3. SSL Certificate Error:** ``` ⚠ SSL error for https://example.com Trying HTTP fallback: http://example.com ✓ Fallback successful ``` **4. PageSpeed API Quota Exceeded:** ```json { "success": false, "error": "PageSpeed API quota exceeded. Try again tomorrow." } ``` **5. Database Connection Error:** ``` ❌ Error: Database connection failed: connection refused Exit code: 4 ``` ### 9.2 Error Recovery **1. SSL Errors → HTTP Fallback:** ```python try: response = requests.get(https_url) except requests.exceptions.SSLError: http_url = https_url.replace('https://', 'http://') response = requests.get(http_url) ``` **2. Timeout → Skip Company:** ```python try: response = requests.get(url, timeout=30) except requests.exceptions.Timeout: result['errors'].append('Timeout after 30s') # Continue to next company ``` **3. Quota Exceeded → Skip PageSpeed:** ```python if quota_remaining > 0: run_pagespeed_audit() else: logger.warning("Quota exceeded, skipping PageSpeed") # Continue with on-page/technical only ``` **4. Database Error → Rollback:** ```python try: db.execute(query) db.commit() except SQLAlchemyError as e: db.rollback() logger.error(f"Database error: {e}") ``` --- ## 10. Monitoring & Maintenance ### 10.1 Health Checks **Check SEO Audit Status:** ```bash # Check latest audit dates psql -U nordabiz_app -d nordabiz -c " SELECT c.name, cwa.seo_audited_at, cwa.pagespeed_seo_score, cwa.seo_overall_score FROM companies c LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id WHERE c.status = 'active' ORDER BY cwa.seo_audited_at DESC NULLS LAST LIMIT 10; " ``` **Check Quota Usage:** ```bash # Check how many audits today psql -U nordabiz_app -d nordabiz -c " SELECT COUNT(*) AS audits_today FROM company_website_analysis WHERE seo_audited_at >= CURRENT_DATE; " ``` **Check Failed Audits:** ```bash # Companies with no SEO data psql -U nordabiz_app -d nordabiz -c " SELECT c.id, c.name, c.website FROM companies c LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id WHERE c.status = 'active' AND c.website IS NOT NULL AND cwa.id IS NULL; " ``` ### 10.2 Maintenance Tasks **1. Re-audit Stale Data (>30 days):** ```bash python seo_audit.py --all --filter-stale 30 ``` **2. Audit New Companies:** ```bash # Companies added in last 7 days python seo_audit.py --filter-new 7 ``` **3. Fix Failed Audits:** ```bash # Re-audit companies with errors python seo_audit.py --retry-failed ``` **4. Clean Old Data:** ```sql -- Delete audit results older than 90 days (keep latest) DELETE FROM company_website_analysis WHERE analyzed_at < NOW() - INTERVAL '90 days' AND id NOT IN ( SELECT DISTINCT ON (company_id) id FROM company_website_analysis ORDER BY company_id, analyzed_at DESC ); ``` ### 10.3 Monitoring Queries **Score Distribution:** ```sql SELECT CASE WHEN pagespeed_seo_score >= 90 THEN 'Excellent (90-100)' WHEN pagespeed_seo_score >= 50 THEN 'Good (50-89)' WHEN pagespeed_seo_score >= 0 THEN 'Poor (0-49)' ELSE 'Not Audited' END AS score_range, COUNT(*) AS companies FROM companies c LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id WHERE c.status = 'active' GROUP BY score_range ORDER BY score_range; ``` **Top/Bottom Performers:** ```sql -- Top 10 SEO scores SELECT c.name, cwa.pagespeed_seo_score, cwa.seo_overall_score FROM companies c JOIN company_website_analysis cwa ON c.id = cwa.company_id WHERE c.status = 'active' ORDER BY cwa.seo_overall_score DESC LIMIT 10; -- Bottom 10 SEO scores SELECT c.name, cwa.pagespeed_seo_score, cwa.seo_overall_score FROM companies c JOIN company_website_analysis cwa ON c.id = cwa.company_id WHERE c.status = 'active' AND cwa.seo_overall_score IS NOT NULL ORDER BY cwa.seo_overall_score ASC LIMIT 10; ``` **Audit Coverage:** ```sql SELECT COUNT(*) AS total_companies, COUNT(cwa.id) AS audited_companies, ROUND(COUNT(cwa.id)::NUMERIC / COUNT(*)::NUMERIC * 100, 1) AS coverage_percent FROM companies c LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id WHERE c.status = 'active' AND c.website IS NOT NULL; ``` --- ## 11. Future Enhancements ### 11.1 Planned Features **1. Automated Re-Audit Scheduling:** - Weekly cron job for all companies - Priority queue for low-scoring sites - Email alerts for score drops **2. Historical Trend Tracking:** - Store audit history (not just latest) - Chart score changes over time - Identify improving/declining sites **3. Competitor Benchmarking:** - Compare scores within categories - Identify SEO leaders - Best practice recommendations **4. SEO Report Generation:** - PDF reports for company owners - Actionable recommendations - Step-by-step fix guides **5. Integration with Company Profiles:** - Display SEO badge on company page - Show top SEO issues - Link to audit details **6. Mobile vs Desktop Audits:** - Separate scores for mobile/desktop - Mobile-first optimization tracking - Device-specific recommendations ### 11.2 Technical Improvements **1. Async Batch Processing:** - Celery background tasks - Parallel audits (5 concurrent) - Real-time progress updates **2. API Webhook Notifications:** - Notify company owners of audit results - Integration with Slack/Discord - Email summaries **3. Advanced Caching:** - Cache PageSpeed results for 7 days - Skip re-audit if recent - Force refresh button for admins **4. Audit Scheduling:** - Per-company audit frequency - High-priority companies daily - Low-priority weekly --- ## 12. Troubleshooting ### 12.1 Common Issues **Issue:** "PageSpeed API quota exceeded" **Solution:** Wait 24 hours for quota reset or upgrade to paid tier **Issue:** "Database connection failed" **Solution:** Check PostgreSQL is running: `systemctl status postgresql` **Issue:** "SSL certificate verify failed" **Solution:** Script automatically tries HTTP fallback **Issue:** "Company has no website URL" **Solution:** Add website in company edit form or skip **Issue:** "Timeout after 30s" **Solution:** Website is slow/down, skip or retry later ### 12.2 Debugging **Enable Verbose Logging:** ```bash python seo_audit.py --all --verbose ``` **Check API Key:** ```bash echo $GOOGLE_PAGESPEED_API_KEY # Should print API key, not empty ``` **Test Single Company:** ```bash python seo_audit.py --company-id 26 --dry-run # See full audit output without saving ``` **Check Database Connection:** ```bash psql -U nordabiz_app -d nordabiz -h 127.0.0.1 -c "SELECT COUNT(*) FROM companies;" ``` **Test PageSpeed API:** ```bash curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://pixlab.pl&key=YOUR_API_KEY&strategy=mobile" ``` --- ## 13. Related Documentation - **Google PageSpeed API:** [docs/architecture/flows/external-api-integrations.md#3-google-pagespeed-insights-api](../06-external-integrations.md#3-google-pagespeed-insights-api) - **Database Schema:** [docs/architecture/05-database-schema.md](../05-database-schema.md) - **Flask Components:** [docs/architecture/04-flask-components.md](../04-flask-components.md) - **Admin Panel:** [CLAUDE.md#audyt-seo-panel-adminseo](../../CLAUDE.md#audyt-seo-panel-adminseo) --- ## 14. Glossary | Term | Definition | |------|------------| | **SEO** | Search Engine Optimization - improving website visibility in search results | | **PageSpeed Insights** | Google tool for measuring website performance and SEO quality | | **Lighthouse** | Automated audit tool by Google (powers PageSpeed Insights) | | **Core Web Vitals** | Google's UX metrics: LCP (Largest Contentful Paint), FID (First Input Delay), CLS (Cumulative Layout Shift) | | **On-Page SEO** | SEO factors on the page itself (meta tags, headings, content) | | **Technical SEO** | SEO factors related to crawlability (robots.txt, sitemap, indexability) | | **Meta Tags** | HTML tags providing metadata about the page (title, description, keywords) | | **Structured Data** | Machine-readable format (JSON-LD, Schema.org) for search engines | | **Canonical URL** | Preferred version of a page (prevents duplicate content issues) | | **Robots.txt** | File telling search engines which pages to crawl/not crawl | | **Sitemap.xml** | XML file listing all pages on a website for search engines | | **Open Graph** | Meta tags for social media sharing (og:title, og:image, etc.) | | **Twitter Card** | Meta tags for Twitter sharing | | **Upsert** | Database operation: INSERT or UPDATE if exists | | **Quota** | API usage limit (25,000 requests/day for PageSpeed) | --- **Document End**