fix: Reduce chunk size and truncate prompts for Gemini safety filters
Testing revealed that Gemini 2.5 safety filters block texts longer than ~2000 chars. Applied two fixes: 1. Truncate chunk text to 2000 chars in _extract_with_ai() as safety net 2. Reduce MAX_CHUNK_SIZE from 1000 to 500 tokens (~2000 chars) This ensures all AI extraction requests stay within Gemini's safe limits. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
4045106b3c
commit
ac0aceb30e
@ -42,9 +42,11 @@ logger = logging.getLogger(__name__)
|
||||
# ============================================================
|
||||
|
||||
# Chunk size settings
|
||||
# NOTE: Reduced from 1000 to 500 tokens due to Gemini safety filter issues
|
||||
# Long texts (~4000 chars) trigger safety blocks, ~2000 chars work reliably
|
||||
MIN_CHUNK_SIZE = 200 # tokens
|
||||
MAX_CHUNK_SIZE = 1000 # tokens
|
||||
CHUNK_OVERLAP = 100 # tokens overlap between chunks
|
||||
MAX_CHUNK_SIZE = 500 # tokens (~2000 chars for Polish text)
|
||||
CHUNK_OVERLAP = 50 # tokens overlap between chunks (reduced proportionally)
|
||||
APPROX_CHARS_PER_TOKEN = 4 # Polish text approximation
|
||||
|
||||
# AI extraction settings
|
||||
@ -328,9 +330,16 @@ class ZOPKKnowledgeService:
|
||||
Returns parsed JSON or None on error.
|
||||
"""
|
||||
try:
|
||||
# Truncate chunk to avoid Gemini safety filter issues with long texts
|
||||
# Testing showed ~4000 chars triggers safety blocks, ~2000 chars works
|
||||
MAX_PROMPT_CHARS = 2000
|
||||
chunk_text = chunk.content[:MAX_PROMPT_CHARS]
|
||||
if len(chunk.content) > MAX_PROMPT_CHARS:
|
||||
logger.debug(f"Truncated chunk from {len(chunk.content)} to {MAX_PROMPT_CHARS} chars")
|
||||
|
||||
# Simplified single prompt (system prompt removed to avoid safety filter issues)
|
||||
prompt = EXTRACTION_USER_PROMPT.format(
|
||||
chunk_text=chunk.content,
|
||||
chunk_text=chunk_text,
|
||||
source_name=source_name,
|
||||
published_date=published_date
|
||||
)
|
||||
|
||||
Loading…
Reference in New Issue
Block a user