fix: expand directory domain blacklist based on production results
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions

Added imsig.pl, monitorfirm.pb.pl, zwiazekpracodawcow.pl,
transfermarkt.pl, mapcarta.com and other directories/portals
that returned false positives in first production run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Maciej Pienczyn 2026-02-21 08:38:04 +01:00
parent 2835dea7d2
commit 70a0e1c557

View File

@ -24,12 +24,18 @@ logger = logging.getLogger(__name__)
# Domains to skip - business directories, social media, own portal
DIRECTORY_DOMAINS = {
# Business directories & registries
'panoramafirm.pl', 'aleo.com', 'rejestr.io', 'krs-pobierz.pl',
'gowork.pl', 'oferteo.pl', 'pkt.pl', 'firmy.net', 'zumi.pl',
'baza-firm.com.pl', 'e-krs.pl', 'krs-online.com.pl', 'regon.info',
'infoveriti.pl', 'companywall.pl', 'findcompany.pl', 'owg.pl',
'imsig.pl', 'monitorfirm.pb.pl', 'mojepanstwo.pl', 'biznes-polska.pl',
'zwiazekpracodawcow.pl', 'notariuszepl.top', 'wypr.pl', 'mapcarta.com',
'analizy.pl', 'transfermarkt.pl', 'mojewejherowo.pl', 'orlyjubilerstwa.pl',
# Social media
'facebook.com', 'linkedin.com', 'youtube.com', 'instagram.com',
'twitter.com', 'x.com', 'tiktok.com',
# Own portal & major sites
'nordabiznes.pl', 'google.com', 'google.pl',
'wikipedia.org', 'olx.pl', 'allegro.pl',
}