Allows running the same enrichment workflow as the "Uzbrój firmę" button
directly from the command line, without needing browser/admin login.
Usage: python3 scripts/arm_company.py <company_id> [--force]
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DocumentUploadService.get_file_path() resolves paths using uploaded_at,
so import scripts must store files in directories matching that date.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add document management routes (upload, download, soft-delete) to board blueprint,
link BoardDocument to BoardMeeting via meeting_id FK, add documents section to
meeting view template, and include import scripts for meeting 2/2026 data and PDFs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Brave Search matched unrelated companies by name token (e.g. VINDOR matched
vindorclothing, vindormusic, beautybyneyador). Social media profiles are now
sourced only from website scraping and manual admin entry.
- Disabled BraveSearcher initialization and call in audit_company()
- Removed Brave Search step from audit progress animation
- Updated missing profile message with explanation and link to profile editor
- Added migration 071 to clean up existing brave_search entries
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
City tokens caused too many false positives (matching any business from
the same city). Reverted to name-only matching. The exclude fix
(checking handle instead of full URL substring) is preserved as it
fixes a genuine bug where 'p' in exclude list matched any URL.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 'p' exclude in SOCIAL_MEDIA_EXCLUDE matched any URL containing
the letter 'p' (e.g. ?locale=pl_PL). Now extracts handle first
and checks exclusion with exact match on first path segment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Company name tokens have weight 2, city tokens weight 1. City-only
matches accepted only for top-3 Brave results to prevent false positives.
Fixes detection of facebook.com/itwejherowo for Informatyk1 (Wejherowo).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Direct slug check (e.g. linkedin.com/company/waterm) can match
a different company with the same name. LinkedIn public metadata
is too minimal to verify location/industry without API access.
Rely on Brave Search with title/description validation instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add direct URL check for linkedin.com/company/{slug} before Brave Search
- Prioritize /company/ over /in/ in search result ranking
- Use targeted query "company_name linkedin.com/company" first
- Fall back to personal profile search only if company page not found
- Verify page title matches company name to avoid false positives
Fixes: WATERM showed employee's personal profile instead of existing
company page at linkedin.com/company/waterm
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace placeholder _search_brave() with real Brave API integration
- Fix LinkedIn URL construction: /in/ profiles were incorrectly built as /company/
- Add word-boundary matching to validate search results against company name
- Track source (website_scrape vs brave_search) per platform in audit results
- Increase search results from 5 to 10 for better coverage
Fixes: WATERM LinkedIn profile not detected (website has no LinkedIn link,
but Brave Search finds the personal /in/ profile)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DATABASE_URL and PAGESPEED_API_KEY are read at module level (import
time), so load_dotenv must run before third-party imports that
reference these variables.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add load_dotenv() to seo_audit.py so it reads GOOGLE_PAGESPEED_API_KEY
and DATABASE_URL from .env without requiring manual env var passing.
Fixes PageSpeed 429 errors when running audit via SSH.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
seo_audit.py was missing SSL columns (has_ssl, ssl_expires_at,
ssl_issuer) in its INSERT/UPDATE query, causing all SEO-audited
companies to show has_ssl=false regardless of actual certificate status.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The scraper was matching facebook.com/tr (Meta Pixel tracking endpoint)
as a valid Facebook profile handle. Added 'tr', 'privacy', 'policies',
'ads', 'business', 'legal', 'flx' to the Facebook exclusion list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Display up to 3 next events with RSVP status instead of just one
- Add import script for WhatsApp Norda group data (Feb 2026):
events, company updates, Alter Energy, Croatia announcement
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix ~190 hardcoded Polish strings missing diacritical characters
across seo_audit.html, gbp_audit.html, social_audit.html
- Fix encoding issue in SEO scraper: requests defaults to ISO-8859-1
when server omits charset, causing mojibake for UTF-8 pages.
Now uses apparent_encoding detection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Company settings page with 4 OAuth cards (GBP, Search Console, Facebook, Instagram)
- 3 API service clients: GBP Management, Search Console, Facebook Graph
- OAuth enrichment in GBP audit (owner responses, posts), social media (FB/IG Graph API),
and SEO prompt (Search Console data)
- Fix OAuth callback redirects to point to company settings page
- All integrations have graceful fallback when no OAuth credentials configured
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Google replaced First Input Delay (FID) with Interaction to Next Paint
(INP) as a Core Web Vital in March 2024. This renames the DB column
from first_input_delay_ms to interaction_to_next_paint_ms, updates the
PageSpeed client to prefer the INP audit key, and fixes all references
across routes, services, scripts, and report generators. Updated INP
thresholds: good ≤200ms, needs improvement ≤500ms.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GBP audit:
- Fix review_response_rate bug: check ownerResponse instead of authorAttribution.displayName
- Mark has_posts/has_products/has_qa as OAuth-dependent in AI prompt
- Add review_keywords and description_keywords to AI prompt
SEO audit:
- Replace deprecated FID with INP (Core Web Vital since March 2024)
- Pass 10 additional metrics to AI prompt: FCP, TTFB, TBT, Speed Index,
meta title/desc length, html lang, Schema.org field details
- Update templates with INP thresholds (200ms/500ms)
Social media audit:
- Calculate engagement_rate from industry base rates × activity multiplier
- Calculate posting_frequency_score (0-10 based on posts_count_30d)
- Enrich AI prompt with page_name, freq_score, engagement, last_post_date
- Add avg engagement rate and brand name consistency check to prompt
Completeness: 52% → ~68% (estimated)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- google_places_service.py: Google Places API integration
- competitor_monitoring_service.py: Competitor tracking service
- scripts/competitor_monitor_cron.py, scripts/generate_audit_report.py
- blueprints/admin/routes_competitors.py, templates/admin/competitor_dashboard.html
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add regex pattern for Facebook /p/PageName-ID/ multi-segment URLs
- Add 'p' to Facebook exclusion list (bare /p is always truncated)
- Add minimum length validation for extracted social handles
- Strip Instagram tracking params (?igsh=, &utm_source=) from handles
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The regex was capturing 'profile.php' as a username instead of extracting
the numeric ID from profile.php?id=XXX links. Added dedicated pattern for
profile.php URLs and added profile.php to exclusion list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ~20 remaining is_admin references across backend, templates and scripts
with proper SystemRole checks. Column is_admin stays as deprecated (synced by
set_role()) until DB migration removes it.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes email mismatches that would have created duplicate accounts:
- Artur Wiertel: norda-biznes.info → waterm.pl (existing prod ID=3)
- Andrzej Gorczycki: zukwejherowo.pl → ekofabrykawejherowo.pl (prod ID=41)
Adds secretary (Magdalena Klóska) and deactivates corrupted duplicate (ID=40).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- pytest framework with fixtures for auth (auth_client, admin_client)
- Unit tests for SearchService
- Integration tests for auth flow
- Security tests (OWASP Top 10: SQL injection, XSS, CSRF)
- Smoke tests for production health and backup monitoring
- E2E tests with Playwright (basic structure)
- DR tests for backup/restore procedures
- GitHub Actions CI/CD workflow (.github/workflows/test.yml)
- Coverage configuration (.coveragerc) with 80% minimum
- DR documentation and restore script
Staging environment: VM 248, staging.nordabiznes.pl
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Removed limits on services (was 10) and keywords (was 8)
- Added new extraction categories:
- products: physical/digital products
- brands: partners, certifications (VMware, Veeam, etc.)
- specializations: specific competencies
- target_customers: customer types (SMB, enterprise, etc.)
- regions: geographic coverage
- Merged all data into services_extracted and main_keywords
- Increased content limit to 20000 chars for AI
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- New script: scripts/website_content_updater.py
- Uses Gemini 3 Flash (free tier) for AI extraction
- Extracts: services_extracted, main_keywords, content_summary
- Supports: single company, batch, stale-days filtering, dry-run
- Rate limiting: 2s between API calls
- Documented cron setup in CLAUDE.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add krs_raw_data, krs_fetched_at, krs_registration_date,
krs_representation, krs_activities columns to Company model
- Save complete KRS API response for full data access
- Display in company profile:
- Board members (zarząd) with functions and avatars
- Shareholders (wspólnicy) with share amounts
- Representation method (sposób reprezentacji)
- Business activities (PKD codes)
- Registration date with years active
- KRS address with region info
- OPP (public benefit) status
- Metadata (stan_z_dnia, data_odpisu)
- Add migration 037_krs_extended_data.sql
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add enrich_companies_from_registries() that tries KRS first for companies
with KRS number, then falls back to CEIDG
- Add update_company_from_krs() to save KRS data to Company model
- Fix CEIDG search to use 'nip' parameter directly
- Keep enrich_companies_from_ceidg() as alias for compatibility
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add new Company fields: ceidg_id, ceidg_status, pkd_codes (JSONB),
correspondence address, owner_citizenships, ceidg_raw_data
- Add enrich_companies_from_ceidg() to fetch full CEIDG details
- Add fetch_full_ceidg_details() for detailed API calls
- Add update_company_from_ceidg() to save all CEIDG fields
- Add --enrich and --apply flags for batch enrichment
- Add migration 036_ceidg_extended_data.sql
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add search_ceidg_by_name() for API v3 name-based queries
- Add search_missing_nip_companies() to find NIP for companies without NIP
- Add --missing-nip flag to search for all companies missing NIP
- Add --apply-nip flag to save found NIPs to database
- Fix API endpoint: /api/ceidg/v3/firmy (not /firma)
- Correctly extract NIP from wlasciciel object in response
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changes:
- Remove position: sticky from konto sidebar (dane, prywatnosc, bezpieczenstwo, blokady)
- Add "Firmy" link to admin dropdown menu (before "Użytkownicy")
- Add scan_websites_for_nip.py script for data quality
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Zmiana nazwy: "Norda Biznes Hub" → "Norda Biznes Partner"
- Aktualizacja modelu AI: Gemini 2.0 Flash → Gemini 3 Flash
- Zachowano historyczne odniesienia w timeline i dokumentacji
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- New admin page /admin/model-comparison for comparing AI responses
- Side-by-side comparison: old model (2.5 Flash-Lite) vs new (3 Flash)
- Questions from real conversations (Artur Wiertel, Maciej Pienczyn)
- Run simulation button to generate new responses
- Added link in admin menu under "Porównanie modeli"
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Dodanie parent_id do tabeli categories
- Model Category z relacją parent/subcategories
- 4 główne grupy: Usługi, Budownictwo, Handel, Produkcja
- Skrypt assign_category_parents.py do przypisania podkategorii
- Migracja 030_add_category_hierarchy.sql
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Landing page: przycisk "Norda Partner" + kontakt Izby (email, WhatsApp)
- Landing page: link "Strefa Gościa" → norda-biznes.info
- Menu "Więcej": dodano "Strefa Gościa (Izba)" dla zalogowanych
- Forum: ukryto filtry kategorii/statusów (uproszczenie UX)
- README: zmiana "AI Assistant" → "NordaGPT"
- Skrypt import firmy testowej "Kaszubia 2030"
- .gitignore: wykluczenie notatek ze spotkań (MEETING_*.md)
Zmiany na podstawie spotkania 2026-01-28 i uwag Artura Wiertla.
Wzór nawigacji: Vaillant.pl (Klienci indywidualni / Profesjonaliści)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Skrypt importu miał błędną godzinę 18:00. Faktyczne spotkania
"Chwila dla Biznesu" odbywają się o 19:00.
Zaktualizowano:
- Komentarz w linii 50
- Opis wydarzenia (description)
- time_start: time(19, 0)
Istniejące wydarzenia w bazie zostały zaktualizowane ręcznie (UPDATE SQL).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Dodano skrypt cron do automatycznej ekstrakcji wiedzy (scripts/cron_extract_knowledge.py)
- Dodano panel deduplikacji faktów (/admin/zopk/knowledge/fact-duplicates)
- Dodano API i funkcje auto-weryfikacji encji i faktów
- Dodano panel Timeline ZOPK (/admin/zopk/timeline) z CRUD
- Rozszerzono dashboard bazy wiedzy o statystyki weryfikacji i przyciski auto-weryfikacji
- Dodano migrację 016_zopk_milestones.sql dla tabeli kamieni milowych
- Naprawiono duplikat modelu ZOPKMilestone w database.py
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Zmieniono 'processed' -> 'success' i 'generated' -> 'success' aby
pasowały do wartości zwracanych przez batch_extract() i
generate_chunk_embeddings().
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Uruchamia po kolei: scraping treści, ekstrakcję AI, generowanie embeddingów.
Do użycia w cron job co godzinę.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Problem: Newsy z Google News RSS miały source_domain='news.google.com'
i favicon Google zamiast prawdziwego źródła.
Rozwiązanie: Nowy skrypt fix_google_news_sources.py który:
- Wyciąga nazwę źródła z tytułu (po " - ")
- Mapuje 59 źródeł na ich prawdziwe domeny
- Aktualizuje source_domain i image_url (favicon)
Wynik: 143/143 newsów zaktualizowanych z poprawnymi źródłami.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Zamieniono requests.Session() na bezpośredni requests.get()
- Dodano max_depth=3 jako zabezpieczenie przed nieskończoną rekurencją
- Jawne zamykanie response.close() po każdym request
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Dodano context manager (with) dla sesji requests
- Jawne zamykanie odpowiedzi HTTP (response.close())
- Dodano flush=True do print dla natychmiastowego outputu
- Rozwiązuje problem 725+ otwartych połączeń
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Strategia pobierania obrazków:
1. Rozwiń URL Google News do oryginalnego źródła
2. Pobierz og:image z meta tagów strony
3. Fallback: logo domeny (Clearbit API)
4. Fallback: favicon (Google Favicon API)
Użycie: python scripts/fetch_news_images.py [--dry-run] [--limit N]
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Nowa kolumna member_since w tabeli companies
- Karta "Członek Izby NORDA od" na profilu firmy (niebieski kolor #3b82f6)
- Wyświetlanie liczby lat w Izbie
- Import 57 dat przystąpienia z pliku Excel od Artura
- Skrypt import_member_since.py do importu dat
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Bug: When page fetch fails (SSL error), result['onpage'] is None.
Using dict.get('key', {}) returns None when key exists with None value.
Fix: Use 'or {}' pattern to handle both missing keys and None values.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add person_id column to users table
- Template shows person profile link when person_id exists
- Add script to match and link users to persons by name
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add is_test field to Classified model
- Add test-item styling (opacity + gray border + badge)
- Add yellow toggle button with localStorage persistence
- Add script to mark existing classifieds as test
- Add 'test' to ForumTopic.CATEGORIES with Polish label 'Testowy'
- Add gray styling for test topics (badge + card opacity)
- Add scripts to list and mark test topics
- Add source and source_note fields to NordaEvent model
- Create import_calendar_2026.py for NORDA calendar events
- Create import_excel_members_2026_01_13.py for new members
- Add .private/ to .gitignore (confidential materials)
Imported 26 events from Kalendarz Izby NORDA 2026 (Artur Wiertel)
Imported 31 new member companies from Excel
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add normalize_social_url() function to database.py to prevent
www vs non-www duplicates in social media records
- Update update_social_media.py to normalize URLs before insert
- Update social_media_audit.py to normalize URLs before insert
- Add inline GBP Audit section to company profile
- Add inline Social Media Audit section to company profile
- Add inline IT Audit section to company profile
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add company logo display in search results cards
- Make logo clickable (links to company profile)
- Temporarily hide "Aktualności i wydarzenia" section on company profiles
- Add scripts for KRS PDF download/parsing and CEIDG API
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Created comprehensive test suite for IT audit collaboration matching:
1. Unit tests (tests/test_it_audit_collaboration.py):
- 12 tests verifying all 6 match types
- Backup replication, shared licensing, Teams federation
- Shared monitoring, collective purchasing, knowledge sharing
- Edge cases for size parsing and similarity
2. Integration test script (scripts/test_collaboration_matching.py):
- Creates test audits with matching criteria
- Runs collaboration matching algorithm
- Verifies matches saved to database
All unit tests pass (12/12).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add 'channel', 'c', 'user', '@' etc. to YouTube exclusion list
- Add 'bold_themes', 'boldthemes' to Twitter/Facebook exclusions (theme creators)
- Fix pattern matching loop to stop after first valid match per platform
- Prevents fallback pattern from overwriting correct channel ID with 'channel'
Fixes issue where youtube.com/channel/ID was being overwritten with
youtube.com/channel/channel by the second fallback pattern.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add detailed logging to SocialMediaAuditor (website scan, Brave search, results)
- Slow down progress bar animation (400ms instead of 200ms) for better readability
- Bold "ZNALEZIONO" text for found platforms
- Display Google rating and review count in progress
- Increase wait time before modal close (4 seconds)
- Add console.log for debugging audit response
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixed a bug where google_opening_hours and google_photos_count were being
fetched from the Google Places API but not passed through to the result
dictionary correctly:
- Changed 'opening_hours' key to 'google_opening_hours' to match what
save_audit_result() expects
- Added 'google_photos_count' to the result dictionary
Verified with dry-run: INPI company now shows opening hours schedule
and 10 photos count from Google Business Profile.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added google_opening_hours and google_photos_count to INSERT column list
- Added corresponding placeholders to VALUES list
- Added to ON CONFLICT UPDATE SET clause
- Added to parameter dictionary reading from google_reviews result
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add google_photos_count to result dictionary initialization
- Extract photos count from API response using len(place['photos'])
- Update logging to include photos count in output
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added 'photos' field to the fields list in get_place_details() method
to enable fetching business photos from Google Places API.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --company-slug argument to social_media_audit.py for easier testing
- Add get_company_id_by_slug() method to SocialMediaAuditor class
- Add python-dotenv support to load .env file from project root
- Create verify_google_places.py script for direct API testing
Note: Full verification blocked - current API key (PageSpeed) doesn't have
Places API enabled. Requires enabling Places API in Google Cloud Console
for project NORDABIZNES (gen-lang-client-0540794446).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixed bug in social media exclusion logic that was too aggressive.
The substring check `any(ex in match.lower() for ex in excludes)`
was incorrectly excluding valid usernames containing exclusion
strings (e.g., 'testcompany' was excluded because it contained 'p').
Changed to exact match only to properly handle Instagram post URLs
(`instagram.com/p/...`) without false positives on valid usernames.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add google_places_searcher attribute to SocialMediaAuditor
- Initialize GooglePlacesSearcher if GOOGLE_PLACES_API_KEY env var is set
- Update audit_company() to use Places API directly when available
- Fallback to Brave Search when API key not configured
- Log which data source is being used for reviews
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implemented actual Google reviews data collection in BraveSearcher class:
- Uses GooglePlacesSearcher to find company and get place details
- Returns google_rating, google_reviews_count, opening_hours, business_status
- Falls back to Brave Search API parsing when Google API key not available
- Added _search_brave_for_reviews() helper for fallback implementation
- Proper error handling and logging throughout
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements GooglePlacesSearcher class with:
- find_place() method: searches for business by name and city
using Google Places findplacefromtext API
- get_place_details() method: retrieves rating, review count,
opening hours, business status, phone, and website
Features:
- Uses GOOGLE_PLACES_API_KEY environment variable
- Comprehensive error handling (timeout, request errors)
- Polish language locale support
- Follows existing BraveSearcher class pattern
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Changed PostgreSQL-specific ANY(:ids) to use IN clause with
dynamic placeholders for SQLite/PostgreSQL compatibility
- Verified SEO audit dry-run extracts all metrics correctly:
- HTTP status, load time, final URL
- Meta title, H1 count, image analysis
- Structured data detection
- robots.txt, sitemap.xml, indexability
- Overall SEO score calculation (95 for pixlab.pl)
Note: Company ID 26 has no website configured, tested with ID 1 instead.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Features:
- Single company HTML reports with full SEO audit data
- Batch HTML summary reports for multiple companies
- JSON exports for integration with other tools
- SEO recommendations based on audit findings
- CLI interface with --company-id, --batch, --all selection
- Output format options: --html, --json
- Score visualization with color-coded badges
- Core Web Vitals section with threshold indicators
- Issues and recommendations sections
- Statistics calculation for batch reports
- Polish language support in reports
Usage examples:
- python seo_report_generator.py --company-id 26 --html
- python seo_report_generator.py --all --html --output ./reports
- python seo_report_generator.py --batch 1-10 --json
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Enhanced save_audit_result method with complete column coverage
- Added missing columns to idempotent upsert query:
- broken_links_count (for future link checking)
- viewport_configured (derived from meta viewport tag)
- is_mobile_friendly (derived from viewport content)
- has_hreflang (for international SEO detection)
- All 45+ SEO columns now properly mapped for database upserts
- ON CONFLICT (company_id) DO UPDATE ensures idempotent operations
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enhanced scripts/seo_audit.py with comprehensive CLI improvements:
CLI Arguments:
- --company-id: Audit single company by ID
- --company-ids: Audit multiple companies (comma-separated)
- --batch: Audit range of companies (e.g., 1-10)
- --all: Audit all companies
- --dry-run: Print results without database writes
- --verbose/-v: Debug output
- --quiet/-q: Suppress progress output
- --json: JSON output for scripting
- --database-url: Override DATABASE_URL env var
Progress Logging:
- ETA calculation based on average time per company
- Progress counter [X/Y] for each company
- Status indicators (SUCCESS/SKIPPED/FAILED/TIMEOUT)
Summary Reporting:
- Detailed breakdown by result category
- Edge case counts (no_website, unavailable, timeout, ssl_errors)
- PageSpeed API quota tracking (start/used/remaining)
- Visual score distribution with bar charts
- Failed audits listing with error messages
Error Handling:
- Proper exit codes (0-5) for different scenarios
- Categorization of errors (timeout, connection, SSL, unavailable)
- Database connection error handling
- Quota exceeded handling
- Batch argument validation with helpful error messages
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements SEOAuditor class following social_media_audit.py pattern:
- __init__: Initialize database connection and analysis components
- get_companies: Fetch companies by ID, batch, or all
- audit_company: Full SEO audit (PageSpeed, on-page, technical)
- save_audit_result: Upsert to company_website_analysis table
- run_audit: Orchestration with progress logging and summary
Features:
- Integrates GooglePageSpeedClient for Lighthouse scores
- Uses OnPageSEOAnalyzer for meta tags, headings, images, links
- Uses TechnicalSEOChecker for robots.txt, sitemap, canonical
- Calculates overall SEO score from weighted components
- CLI support: --company-id, --batch, --all, --dry-run, --json
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds TechnicalSEOChecker class that performs technical SEO audits:
- robots.txt: checks existence, parses directives (Disallow, Allow, Sitemap)
detects if blocks Googlebot or all bots
- sitemap.xml: checks existence, validates XML, counts URLs, detects sitemap index
- Canonical URLs: detects canonical tag, checks if self-referencing or cross-domain
- Noindex tags: checks meta robots and X-Robots-Tag HTTP header
- Redirect chains: follows up to 10 redirects, detects loops, HTTPS upgrades,
www redirects, and mixed content issues
Includes:
- 8 dataclasses for structured results (RobotsTxtResult, SitemapResult, etc.)
- TechnicalSEOResult container for complete analysis
- check_technical_seo() convenience function
- CLI support: --technical/-t flag for technical-only analysis
- --all/-a flag for combined on-page and technical analysis
- --json/-j flag for JSON output
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements Google PageSpeed Insights API client with:
- GooglePageSpeedClient class for making API calls
- Exponential backoff retry logic (3 retries, 1-60s backoff)
- RateLimiter class with daily quota tracking (25k req/day)
- Quota persistence to .pagespeed_quota.json
- Support for mobile/desktop strategies
- Core Web Vitals extraction (LCP, FCP, CLS, TTFB)
- Lighthouse audit scores (performance, accessibility, SEO, best-practices)
- Structured dataclasses for results (PageSpeedResult, PageSpeedScore, CoreWebVitals)
- Custom exceptions (QuotaExceededError, RateLimitError, PageSpeedAPIError)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>