Imports membership fee data from Norda Biznes Excel spreadsheet.
Matches company names, creates MembershipFee records per month,
handles different rates (200/300 PLN), new member mid-year start.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Downloads page text and images from external event URLs so Claude
can visually analyze posters/banners for location, times, and other
details not present in page text (e.g. venue address in graphics).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Uses Gemini AI to analyze each company's profile (PKD codes, services,
descriptions, AI insights) against 5 ZOPK projects and generate
relevance scores with collaboration descriptions in Polish.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Source servers return 503 (Cloudflare) for cross-origin image requests
from browsers. Solution: download and cache images server-side during
scraping, serve from /static/uploads/zopk/.
- Scraper now downloads og:image and stores locally during article
scraping (max 2MB, supports jpg/png/webp)
- Backfill script downloads images for all existing articles server-side
- Template fallback shows domain initial letter when image unavailable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix broken news thumbnails by adding og:image extraction during content
scraping (replaces Brave proxy URLs that block hotlinking)
- Add image onerror fallback in templates showing domain favicon when
original image fails to load
- Decode Brave proxy image URLs to original source URLs before saving
- Enforce English-only entity types in AI extraction prompt to prevent
mixed Polish/English type names
- Add migration 083 to normalize 14 existing Polish entity types and
clean up 5 stale fetch jobs stuck in 'running' status
- Add backfill script for existing articles with broken image URLs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
External monitoring via UptimeRobot (free tier) with internal health
logger to differentiate ISP outages from server issues. Includes:
- 4 new DB models (UptimeMonitor, UptimeCheck, UptimeIncident, InternalHealthLog)
- Migration 082 with tables, indexes, and permissions
- Internal health logger script (cron */5 min)
- UptimeRobot sync script (cron hourly) with automatic cause correlation
- Admin dashboard /admin/uptime with uptime %, response time charts,
incident log with editable notes/causes, pattern analysis, monthly report
- SLA comparison table (99.9%/99.5%/99%)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add twitter_service.py using Twitter's internal GraphQL API with
guest token authentication (free, no API key needed). Fetches
followers, tweets, bio, location, media count, and more.
Integrated into social audit enrichment with _twitter_extra data
stored in content_types JSONB, displayed on audit detail cards.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- YouTubeService now fetches: subscribers, views, video count, description,
avatar, banner, country, creation date, recent 5 videos
- Enricher uses API first, falls back to scraping
- Extra YouTube data stored in content_types JSONB
- Audit detail shows view count, country, creation date, recent videos
- Requires enabling YouTube Data API v3 in Google Cloud Console
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3 attempts with 2-5s random delay between retries. Detects authwall
and rate limit (429/999) responses. Updated status message to explain
LinkedIn's inconsistent availability to users.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix checkmarks showing as ✓ by using Unicode ✓/✗ directly
- Decode HTML entities (' &) from og:meta in enricher results
- Replace native confirm()/alert() with styled modal dialogs and toasts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Scraper no longer overwrites API data (source priority hierarchy)
- Per-platform data provenance badges (API OAuth/Scraping/Manual/Unknown)
- Expandable field-level source breakdown (which fields from API vs scraping)
- OAuth status per platform with connect/renew/sync links
- "Run audit" button on dashboard (background enrichment for all companies)
- "Run audit" button on detail view (single company enrichment)
- Enrichment progress polling with real-time status updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move quota file from scripts/ to /tmp/ (writable by gunicorn process).
Add lxml to requirements for faster, more reliable HTML parsing in SEO audits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New email template with friendly tone and green CTA button for first-time
account activation. Script with --dry-run, --test-email, --user-id flags
and 72h token validity.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Polish city name declensions to local keyword matcher
- Add openingHours string format alongside openingHoursSpecification
- Add Wejherowo to page title for city_in_title signal
- Add service+city keyword phrases in visible text (serwis, transport,
szkolenia, sklep, remonty, instalacje + Wejherowo/Rumia/Reda/Gdynia)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
parsedate_to_datetime returns offset-aware datetime from Last-Modified
header, but datetime.now() is naive. Strip tzinfo before subtraction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. seo_analyzer.py: Consider aria-label, title, img AND svg as valid
link text (SVG icon links were falsely counted as "without text")
2. routes_portal_seo.py: Calculate overall_seo score using
SEOAuditor._calculate_overall_score() before saving to DB
(was always None because stream route bypasses audit_company())
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add clickable field coverage bars to filter companies missing specific data
- Add quick-action buttons (Registry/SEO/GBP) per company in dashboard table
- Add stale data detection (>6 months) with yellow badges
- Implement weighted priority score (contacts 34%, audits 17%)
- Add data hints in admin company detail showing where to find missing data
- Add "Available data" section showing Google Business data ready to apply
- Add POST /api/company/<id>/apply-hint endpoint for one-click data fill
- Extend website content updater with phone/email extraction (AI + regex)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract 12-field completeness scoring to utils/data_quality.py service
- Auto-update data_quality_score and data_quality label on company data changes
- Add /admin/data-quality dashboard with field coverage stats, quality distribution, and sortable company table
- Add bulk enrichment with background processing, step selection, and progress tracking
- Flow GBP phone/website to Company record when company fields are empty
- Display Google opening hours on public company profile
- Add BulkEnrichmentJob model and migration 075
- Refactor arm_company.py to support selective steps and progress callbacks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The script was calling /firma?nip=X (wrong endpoint) instead of using
fetch_ceidg_by_nip() which does two-phase /firmy?nip=X then /firma/{id}.
Now uses the same service and field mapping as the admin panel button.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The script was calling auditor.audit_company() but not auditor.save_audit_result(),
causing profiles to be found but never persisted to the database.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- GBP: access .completeness_score attribute + call save_audit()
- Social: count saved DB records instead of parsing audit result dict
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- KRS: search_by_nip() returns dict, not just KRS number
- SEO: SEOAuditor(database_url) + audit_company(company_dict)
- Social: SocialMediaAuditor() + audit_company(company_dict)
- GBP: GBPAuditService(db) + audit_company(company_id)
- Support multiple company IDs in one invocation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows running the same enrichment workflow as the "Uzbrój firmę" button
directly from the command line, without needing browser/admin login.
Usage: python3 scripts/arm_company.py <company_id> [--force]
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DocumentUploadService.get_file_path() resolves paths using uploaded_at,
so import scripts must store files in directories matching that date.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add document management routes (upload, download, soft-delete) to board blueprint,
link BoardDocument to BoardMeeting via meeting_id FK, add documents section to
meeting view template, and include import scripts for meeting 2/2026 data and PDFs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Brave Search matched unrelated companies by name token (e.g. VINDOR matched
vindorclothing, vindormusic, beautybyneyador). Social media profiles are now
sourced only from website scraping and manual admin entry.
- Disabled BraveSearcher initialization and call in audit_company()
- Removed Brave Search step from audit progress animation
- Updated missing profile message with explanation and link to profile editor
- Added migration 071 to clean up existing brave_search entries
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
City tokens caused too many false positives (matching any business from
the same city). Reverted to name-only matching. The exclude fix
(checking handle instead of full URL substring) is preserved as it
fixes a genuine bug where 'p' in exclude list matched any URL.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 'p' exclude in SOCIAL_MEDIA_EXCLUDE matched any URL containing
the letter 'p' (e.g. ?locale=pl_PL). Now extracts handle first
and checks exclusion with exact match on first path segment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Company name tokens have weight 2, city tokens weight 1. City-only
matches accepted only for top-3 Brave results to prevent false positives.
Fixes detection of facebook.com/itwejherowo for Informatyk1 (Wejherowo).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Direct slug check (e.g. linkedin.com/company/waterm) can match
a different company with the same name. LinkedIn public metadata
is too minimal to verify location/industry without API access.
Rely on Brave Search with title/description validation instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add direct URL check for linkedin.com/company/{slug} before Brave Search
- Prioritize /company/ over /in/ in search result ranking
- Use targeted query "company_name linkedin.com/company" first
- Fall back to personal profile search only if company page not found
- Verify page title matches company name to avoid false positives
Fixes: WATERM showed employee's personal profile instead of existing
company page at linkedin.com/company/waterm
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace placeholder _search_brave() with real Brave API integration
- Fix LinkedIn URL construction: /in/ profiles were incorrectly built as /company/
- Add word-boundary matching to validate search results against company name
- Track source (website_scrape vs brave_search) per platform in audit results
- Increase search results from 5 to 10 for better coverage
Fixes: WATERM LinkedIn profile not detected (website has no LinkedIn link,
but Brave Search finds the personal /in/ profile)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DATABASE_URL and PAGESPEED_API_KEY are read at module level (import
time), so load_dotenv must run before third-party imports that
reference these variables.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add load_dotenv() to seo_audit.py so it reads GOOGLE_PAGESPEED_API_KEY
and DATABASE_URL from .env without requiring manual env var passing.
Fixes PageSpeed 429 errors when running audit via SSH.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
seo_audit.py was missing SSL columns (has_ssl, ssl_expires_at,
ssl_issuer) in its INSERT/UPDATE query, causing all SEO-audited
companies to show has_ssl=false regardless of actual certificate status.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The scraper was matching facebook.com/tr (Meta Pixel tracking endpoint)
as a valid Facebook profile handle. Added 'tr', 'privacy', 'policies',
'ads', 'business', 'legal', 'flx' to the Facebook exclusion list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Display up to 3 next events with RSVP status instead of just one
- Add import script for WhatsApp Norda group data (Feb 2026):
events, company updates, Alter Energy, Croatia announcement
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix ~190 hardcoded Polish strings missing diacritical characters
across seo_audit.html, gbp_audit.html, social_audit.html
- Fix encoding issue in SEO scraper: requests defaults to ISO-8859-1
when server omits charset, causing mojibake for UTF-8 pages.
Now uses apparent_encoding detection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Company settings page with 4 OAuth cards (GBP, Search Console, Facebook, Instagram)
- 3 API service clients: GBP Management, Search Console, Facebook Graph
- OAuth enrichment in GBP audit (owner responses, posts), social media (FB/IG Graph API),
and SEO prompt (Search Console data)
- Fix OAuth callback redirects to point to company settings page
- All integrations have graceful fallback when no OAuth credentials configured
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Google replaced First Input Delay (FID) with Interaction to Next Paint
(INP) as a Core Web Vital in March 2024. This renames the DB column
from first_input_delay_ms to interaction_to_next_paint_ms, updates the
PageSpeed client to prefer the INP audit key, and fixes all references
across routes, services, scripts, and report generators. Updated INP
thresholds: good ≤200ms, needs improvement ≤500ms.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GBP audit:
- Fix review_response_rate bug: check ownerResponse instead of authorAttribution.displayName
- Mark has_posts/has_products/has_qa as OAuth-dependent in AI prompt
- Add review_keywords and description_keywords to AI prompt
SEO audit:
- Replace deprecated FID with INP (Core Web Vital since March 2024)
- Pass 10 additional metrics to AI prompt: FCP, TTFB, TBT, Speed Index,
meta title/desc length, html lang, Schema.org field details
- Update templates with INP thresholds (200ms/500ms)
Social media audit:
- Calculate engagement_rate from industry base rates × activity multiplier
- Calculate posting_frequency_score (0-10 based on posts_count_30d)
- Enrich AI prompt with page_name, freq_score, engagement, last_post_date
- Add avg engagement rate and brand name consistency check to prompt
Completeness: 52% → ~68% (estimated)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>