Commit Graph

12 Commits

Author SHA1 Message Date
601bd99559 feat: remember rejected candidates, skip in future bulk discovery
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
- Bulk discovery skips companies with any candidate (including rejected)
- Single discovery skips URLs from previously rejected domains
- Dashboard shows list of companies rejected by admin with note
  that they won't be re-searched in bulk mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 10:24:55 +01:00
d8a0485986 feat: geographic proximity scoring for website discovery
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
Prioritize results from Wejherowo region (Norda Biznes home area):
- Wejherowo: +3 pts
- Powiat wejherowski (Reda, Rumia, Luzino...): +2 pts
- Województwo pomorskie: +1 pt
- Outside region: 0 pts

Dashboard shows colored geo badge per candidate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 10:15:40 +01:00
ec8d419fad fix: NIP regex handles any separator format (588-20-15-465)
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
Previous regex only matched 3-3-2-2 format. New universal pattern
catches any 10-digit NIP with dashes/spaces in any position.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 09:31:09 +01:00
409557ceab feat: evaluate top 5 candidates, early-exit on NIP/REGON/KRS match
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
Increase candidate pool from 3 to 5. Stop evaluating once a
candidate matches NIP/REGON/KRS (100% certainty).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 09:28:24 +01:00
11184c5a58 feat: scrape subpages (kontakt, o-nas) for NIP/REGON verification
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
Root page often lacks NIP/REGON. Now scrapes /kontakt/, /contact,
/o-nas, /o-firmie to find strong verification signals. Stops early
when NIP/REGON/KRS found.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 09:23:50 +01:00
880f5a6715 fix: normalize discovery URLs to root domain
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
Strip paths from candidate URLs (e.g. /kontakt/, /about/) to always
save root domain. Deduplicates results pointing to same domain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 09:09:50 +01:00
026ec97fc5 fix: handle word reordering in domain name matching
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
"Jubiler Agat" now matches "agat-jubiler.pl" by checking individual
words in any order, not just concatenated substring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 08:59:22 +01:00
2e0c19d427 feat: multi-candidate scoring and domain name matching for website discovery
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
Evaluate top 3 Brave results instead of just taking the first one.
Add domain name matching signal (+2 pts when domain contains company name).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 08:52:43 +01:00
3340e176aa fix: expand blacklist, improve bulk progress UX
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
- Added norda-biznes.info, bizraport.pl, aplikuj.pl, lexspace.pl,
  drewnianeabc.pl, f-trust.pl, itspace.llc to directory blacklist
- Delay first poll by 3s so thread has time to populate total
- Better completion messages (show count, handle 0 remaining)
- Increase poll interval to 3s

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 08:43:43 +01:00
70a0e1c557 fix: expand directory domain blacklist based on production results
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
Added imsig.pl, monitorfirm.pb.pl, zwiazekpracodawcow.pl,
transfermarkt.pl, mapcarta.com and other directories/portals
that returned false positives in first production run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 08:38:04 +01:00
2835dea7d2 fix: add Brave API retry with backoff and increase bulk delay to 5s
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
Brave free tier rate limits aggressively (429 after ~1 req/s).
Added retry logic (3 attempts: 3s, 6s, 9s waits) and increased
inter-company delay from 2s to 5s. Error candidates are now
cleaned up before retry to allow re-discovery.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 08:36:36 +01:00
126eff8af6 feat: add website discovery service for companies without websites
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
Automated discovery using Brave Search API to find company websites,
scrape verification data (NIP/REGON/KRS/email/phone), and present
candidates with match badges in the data quality dashboard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 08:27:13 +01:00