Add twitter_service.py using Twitter's internal GraphQL API with
guest token authentication (free, no API key needed). Fetches
followers, tweets, bio, location, media count, and more.
Integrated into social audit enrichment with _twitter_extra data
stored in content_types JSONB, displayed on audit detail cards.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- YouTubeService now fetches: subscribers, views, video count, description,
avatar, banner, country, creation date, recent 5 videos
- Enricher uses API first, falls back to scraping
- Extra YouTube data stored in content_types JSONB
- Audit detail shows view count, country, creation date, recent videos
- Requires enabling YouTube Data API v3 in Google Cloud Console
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3 attempts with 2-5s random delay between retries. Detects authwall
and rate limit (429/999) responses. Updated status message to explain
LinkedIn's inconsistent availability to users.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix checkmarks showing as ✓ by using Unicode ✓/✗ directly
- Decode HTML entities (' &) from og:meta in enricher results
- Replace native confirm()/alert() with styled modal dialogs and toasts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Scraper no longer overwrites API data (source priority hierarchy)
- Per-platform data provenance badges (API OAuth/Scraping/Manual/Unknown)
- Expandable field-level source breakdown (which fields from API vs scraping)
- OAuth status per platform with connect/renew/sync links
- "Run audit" button on dashboard (background enrichment for all companies)
- "Run audit" button on detail view (single company enrichment)
- Enrichment progress polling with real-time status updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Brave Search matched unrelated companies by name token (e.g. VINDOR matched
vindorclothing, vindormusic, beautybyneyador). Social media profiles are now
sourced only from website scraping and manual admin entry.
- Disabled BraveSearcher initialization and call in audit_company()
- Removed Brave Search step from audit progress animation
- Updated missing profile message with explanation and link to profile editor
- Added migration 071 to clean up existing brave_search entries
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
City tokens caused too many false positives (matching any business from
the same city). Reverted to name-only matching. The exclude fix
(checking handle instead of full URL substring) is preserved as it
fixes a genuine bug where 'p' in exclude list matched any URL.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 'p' exclude in SOCIAL_MEDIA_EXCLUDE matched any URL containing
the letter 'p' (e.g. ?locale=pl_PL). Now extracts handle first
and checks exclusion with exact match on first path segment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Company name tokens have weight 2, city tokens weight 1. City-only
matches accepted only for top-3 Brave results to prevent false positives.
Fixes detection of facebook.com/itwejherowo for Informatyk1 (Wejherowo).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Direct slug check (e.g. linkedin.com/company/waterm) can match
a different company with the same name. LinkedIn public metadata
is too minimal to verify location/industry without API access.
Rely on Brave Search with title/description validation instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add direct URL check for linkedin.com/company/{slug} before Brave Search
- Prioritize /company/ over /in/ in search result ranking
- Use targeted query "company_name linkedin.com/company" first
- Fall back to personal profile search only if company page not found
- Verify page title matches company name to avoid false positives
Fixes: WATERM showed employee's personal profile instead of existing
company page at linkedin.com/company/waterm
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace placeholder _search_brave() with real Brave API integration
- Fix LinkedIn URL construction: /in/ profiles were incorrectly built as /company/
- Add word-boundary matching to validate search results against company name
- Track source (website_scrape vs brave_search) per platform in audit results
- Increase search results from 5 to 10 for better coverage
Fixes: WATERM LinkedIn profile not detected (website has no LinkedIn link,
but Brave Search finds the personal /in/ profile)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The scraper was matching facebook.com/tr (Meta Pixel tracking endpoint)
as a valid Facebook profile handle. Added 'tr', 'privacy', 'policies',
'ads', 'business', 'legal', 'flx' to the Facebook exclusion list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Company settings page with 4 OAuth cards (GBP, Search Console, Facebook, Instagram)
- 3 API service clients: GBP Management, Search Console, Facebook Graph
- OAuth enrichment in GBP audit (owner responses, posts), social media (FB/IG Graph API),
and SEO prompt (Search Console data)
- Fix OAuth callback redirects to point to company settings page
- All integrations have graceful fallback when no OAuth credentials configured
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GBP audit:
- Fix review_response_rate bug: check ownerResponse instead of authorAttribution.displayName
- Mark has_posts/has_products/has_qa as OAuth-dependent in AI prompt
- Add review_keywords and description_keywords to AI prompt
SEO audit:
- Replace deprecated FID with INP (Core Web Vital since March 2024)
- Pass 10 additional metrics to AI prompt: FCP, TTFB, TBT, Speed Index,
meta title/desc length, html lang, Schema.org field details
- Update templates with INP thresholds (200ms/500ms)
Social media audit:
- Calculate engagement_rate from industry base rates × activity multiplier
- Calculate posting_frequency_score (0-10 based on posts_count_30d)
- Enrich AI prompt with page_name, freq_score, engagement, last_post_date
- Add avg engagement rate and brand name consistency check to prompt
Completeness: 52% → ~68% (estimated)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add regex pattern for Facebook /p/PageName-ID/ multi-segment URLs
- Add 'p' to Facebook exclusion list (bare /p is always truncated)
- Add minimum length validation for extracted social handles
- Strip Instagram tracking params (?igsh=, &utm_source=) from handles
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The regex was capturing 'profile.php' as a username instead of extracting
the numeric ID from profile.php?id=XXX links. Added dedicated pattern for
profile.php URLs and added profile.php to exclusion list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add normalize_social_url() function to database.py to prevent
www vs non-www duplicates in social media records
- Update update_social_media.py to normalize URLs before insert
- Update social_media_audit.py to normalize URLs before insert
- Add inline GBP Audit section to company profile
- Add inline Social Media Audit section to company profile
- Add inline IT Audit section to company profile
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add 'channel', 'c', 'user', '@' etc. to YouTube exclusion list
- Add 'bold_themes', 'boldthemes' to Twitter/Facebook exclusions (theme creators)
- Fix pattern matching loop to stop after first valid match per platform
- Prevents fallback pattern from overwriting correct channel ID with 'channel'
Fixes issue where youtube.com/channel/ID was being overwritten with
youtube.com/channel/channel by the second fallback pattern.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add detailed logging to SocialMediaAuditor (website scan, Brave search, results)
- Slow down progress bar animation (400ms instead of 200ms) for better readability
- Bold "ZNALEZIONO" text for found platforms
- Display Google rating and review count in progress
- Increase wait time before modal close (4 seconds)
- Add console.log for debugging audit response
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixed a bug where google_opening_hours and google_photos_count were being
fetched from the Google Places API but not passed through to the result
dictionary correctly:
- Changed 'opening_hours' key to 'google_opening_hours' to match what
save_audit_result() expects
- Added 'google_photos_count' to the result dictionary
Verified with dry-run: INPI company now shows opening hours schedule
and 10 photos count from Google Business Profile.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added google_opening_hours and google_photos_count to INSERT column list
- Added corresponding placeholders to VALUES list
- Added to ON CONFLICT UPDATE SET clause
- Added to parameter dictionary reading from google_reviews result
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add google_photos_count to result dictionary initialization
- Extract photos count from API response using len(place['photos'])
- Update logging to include photos count in output
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added 'photos' field to the fields list in get_place_details() method
to enable fetching business photos from Google Places API.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --company-slug argument to social_media_audit.py for easier testing
- Add get_company_id_by_slug() method to SocialMediaAuditor class
- Add python-dotenv support to load .env file from project root
- Create verify_google_places.py script for direct API testing
Note: Full verification blocked - current API key (PageSpeed) doesn't have
Places API enabled. Requires enabling Places API in Google Cloud Console
for project NORDABIZNES (gen-lang-client-0540794446).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixed bug in social media exclusion logic that was too aggressive.
The substring check `any(ex in match.lower() for ex in excludes)`
was incorrectly excluding valid usernames containing exclusion
strings (e.g., 'testcompany' was excluded because it contained 'p').
Changed to exact match only to properly handle Instagram post URLs
(`instagram.com/p/...`) without false positives on valid usernames.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add google_places_searcher attribute to SocialMediaAuditor
- Initialize GooglePlacesSearcher if GOOGLE_PLACES_API_KEY env var is set
- Update audit_company() to use Places API directly when available
- Fallback to Brave Search when API key not configured
- Log which data source is being used for reviews
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implemented actual Google reviews data collection in BraveSearcher class:
- Uses GooglePlacesSearcher to find company and get place details
- Returns google_rating, google_reviews_count, opening_hours, business_status
- Falls back to Brave Search API parsing when Google API key not available
- Added _search_brave_for_reviews() helper for fallback implementation
- Proper error handling and logging throughout
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements GooglePlacesSearcher class with:
- find_place() method: searches for business by name and city
using Google Places findplacefromtext API
- get_place_details() method: retrieves rating, review count,
opening hours, business status, phone, and website
Features:
- Uses GOOGLE_PLACES_API_KEY environment variable
- Comprehensive error handling (timeout, request errors)
- Polish language locale support
- Follows existing BraveSearcher class pattern
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>