fix(nordagpt): catch plain-text company hallucinations (firma X, również X)
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions

AI bypasses link/bold validation by mentioning companies as plain text
like "firma Baumar" or "również Pro-Bud". New regex catches these patterns
and removes them if the company name isn't in the database.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Maciej Pienczyn 2026-03-28 06:55:23 +01:00
parent 87d4fde5c3
commit a1a64730e3

View File

@ -245,7 +245,25 @@ class NordaBizChatEngine:
text = re.sub(r'\*\*([^*]{2,40})\*\*', replace_bold_company, text)
# 4. Clean up artifacts left by removals
# 4. Remove plain-text company name mentions that aren't linked
# Catches: "firma Baumar", "również Pro-Bud", "firmy Baumar i Pro-Bud"
def replace_plain_company(match):
prefix = match.group(1) # "firma", "również", etc.
name = match.group(2).strip().rstrip('.,;:')
if name.lower() in valid_names_set:
return match.group(0) # Valid company
for vn in valid_names_set:
if name.lower() in vn or vn in name.lower():
return match.group(0) # Partial match
logger.warning(f"NordaGPT hallucination blocked: plain text '{name}' after '{prefix}' not in DB")
return ''
text = re.sub(
r'(firma|firmą|firmę|firmy|również|oraz)\s+([A-ZĄĘÓŁŹŻŚĆŃ][a-zA-ZąęółźżśćńĄĘÓŁŹŻŚĆŃ-]{2,25}(?:\s+[A-ZĄĘÓŁŹŻŚĆŃ][a-zA-ZąęółźżśćńĄĘÓŁŹŻŚĆŃ-]+)?)',
replace_plain_company, text
)
# 5. Clean up artifacts left by removals
text = re.sub(r':\s*oraz\s*to\b', ': to', text) # ": oraz to" → ": to"
text = re.sub(r':\s*,', ':', text) # ": ," → ":"
text = re.sub(r'\*\s*\s*\n', '\n', text) # "* "