llm_ticket3/prompts/test_prompt_ocr5.txt
2025-05-06 17:09:05 +02:00

306 lines
10 KiB
Plaintext

You are tasked with performing a high-precision OCR extraction on a screenshot of a technical or administrative web interface.
GOAL: Extract all visible and partially visible text — no matter how small, faint, or cropped. Remain strictly factual. Do not interpret, guess, or reword.
📄 FORMAT THE OUTPUT USING THESE CATEGORIES:
---
1. PAGE STRUCTURE
* Page title(s)
* Section or interface headers
* Visible URLs, tabs, or menu paths
2. DATA & IDENTIFIERS
* Sample codes, test references, user/operator names
* Material or item descriptions
* Dates, times, unique identifiers
3. INTERFACE ELEMENTS
* Button labels
* Tab names
* Sidebar/menu content
* Field or dropdown labels
4. SYSTEM MESSAGES & ERRORS
* Status messages, warnings, or connection errors
* Domain names, IPs, server notices
5. METADATA
* Version numbers, standard references, document codes
* Any duplicated text or footer content
6. UNCLEAR OR CROPPED TEXT
* Logos, watermarks, truncated words or symbols
* Use “[...]” to mark incomplete or partially cropped text
---
RULES:
- Do not translate or paraphrase.
- Preserve original casing, spelling, punctuation.
- Include repeated elements as they appear.
- Report faint or background text if legible.
- Leave blank sections if no relevant text is found.
This prompt is designed to work across a wide range of web interfaces, dashboards, and structured forms. Output clearly grouped bullet points per section.
Résultats:
🔵 Paramètres actifs LLM:
{
"temperature": 1.3,
"top_p": 0.85,
"presence_penalty": 0.1,
"frequency_penalty": 0.15,
"stop": [],
"stream": false,
"n": 1,
"seed": 0,
"mirostat": 0,
"mirostat_eta": 0.1,
"mirostat_tau": 5.0,
"top_k": 35,
"min_p": 0.06,
"repeat_penalty": 1.15,
"repeat_last_n": 128,
"tfs_z": 1.0,
"num_keep": 0,
"num_predict": 2048,
"num_ctx": 16384,
"num_batch": 2048
}
AgentVisionOCR: Extraction OCR sur image_145435.png
🔵 Résultat pour l'image output/ticket_T11143/T11143_20250422_084617/attachments/image_145435.png:
Texte extrait:
### 1) PAGE STRUCTURE
* **Page title:** Essai au bleu de méthylène (MB)
* **Section/Interface headers:**
* BRG-LAB
* **Visible URLs, tabs, menu paths:**
* No visible URLs are displayed in the image.
### 2) DATA & IDENTIFIERS
* **Sample codes/test references/user/operator names:**
* BOLLEE Victor
* Echantillon n° 25-00075 réceptionné le 02/04/2025 par BOLLEE Victor.
* **Material/item descriptions:**
* Blue Methylene MB Testing Report
**Note:** There is no mention of dates or times within the given page snippet. Additionally, there seems to be no unique identifiers on this web page interface screenshot provided.
### 3) INTERFACE ELEMENTS
This section cannot be determined based on the information provided in the image.
### 4) SYSTEM MESSAGES & ERRORS
There are no system messages or errors present in this snapshot as it only provides a view of what appears to be an itemized report rather than any kind of error message.
### 5) METADATA
No metadata elements can be identified from the image alone such as version numbers standard references document codes etcetera without having access full context surrounding these items listed under "BRG-LAB". As far we know all text shown here pertains directly related subject matter topic discussed above thus falling outside scope defined categories listed above according guidelines outlined prompt.
Nouveau prompt:
You are tasked with performing a high-precision OCR extraction on a screenshot of a technical or administrative web interface.
GOAL: Extract all visible and partially visible text — no matter how small, faint, or cropped. Remain strictly factual. Do not interpret, guess, or reword.
📄 FORMAT THE OUTPUT USING THESE CATEGORIES:
---
1. PAGE STRUCTURE
* Page titles (e.g., test names or document references)
* Interface headers, menus or section titles
* Navigation breadcrumbs, web paths, or visible URLs
2. DATA & IDENTIFIERS
* Sample numbers, user names, dates
* Material types or item descriptions
* Unique codes or test references
3. INTERFACE ELEMENTS
* Button labels (e.g., RESET, PRINT, SAVE)
* Tabs or navigation elements (e.g., RESULTS, MATERIAL, OBSERVATIONS)
* Form field labels or sidebar sections
4. SYSTEM MESSAGES & ERRORS
* Status or connection errors (e.g., “unable to connect”, IP/domain errors)
* Visible domain names, system paths
5. METADATA
* Document codes, version numbers, dates (e.g., standard formats like “NF EN ####-#”)
* Any repeated or footer elements
6. UNCLEAR / CROPPED TEXT
* Logos, watermarks, or faded UI elements
* Truncated words or text: use “[...]” to indicate partial content
---
RULES:
- Do not interpret or translate.
- Preserve all accents, casing, punctuation.
- Include all duplicates or repeated labels.
- Capture system messages even if incomplete or faded.
- Leave a section empty only if no corresponding text is found.
Output a clean, bullet-pointed list per section. Be exhaustive and literal.
Résultats:
🔵 Paramètres actifs LLM:
{
"temperature": 1.3,
"top_p": 0.85,
"presence_penalty": 0.1,
"frequency_penalty": 0.15,
"stop": [],
"stream": false,
"n": 1,
"seed": 0,
"mirostat": 0,
"mirostat_eta": 0.1,
"mirostat_tau": 5.0,
"top_k": 35,
"min_p": 0.06,
"repeat_penalty": 1.15,
"repeat_last_n": 128,
"tfs_z": 1.0,
"num_keep": 0,
"num_predict": 2048,
"num_ctx": 16384,
"num_batch": 2048
}
AgentVisionOCR: Extraction OCR sur image_145435.png
🔵 Résultat pour l'image output/ticket_T11143/T11143_20250422_084617/attachments/image_145435.png:
Texte extrait:
### Page Structure
Page title: Essai au bleu de méthylène (MB) - NF EN 933-9 (02-2022)
Navigation breadcrumbs/path: none visible
### Data & Identifiers
Sample numbers/usernames/dates/material types/item descriptions/unique codes/test references:
none are clearly readable in the provided image snippet; however, it appears there might be data within tables or sections not fully captured in this view.
### Interface Elements
Button labels/tabs/navigation elements/form field labels/sidebar sections: None explicitly identifiable from the given portion of the interface screenshot.
### System Messages and Errors
Status/connection errors/domain names/system paths: None evident from the visible part of the page.
### Metadata
Document codes/version numbers/dates/repeated/footer elements:
NF EN 933-9 is mentioned which suggests a European standard reference but lacks specifics like date without further context. No version number, repeated text, or footer content is visible due to cropping.
### Unclear/Cropped Text
Logos/watermarks/faded UI elements/truncated words/text with “[...]” indicating partial content:
Due to significant portions being cropped out by the edges of this image snippet, several pieces of information are truncated including what seems to be a document title/standard reference and possibly interface labels or data entries. The full text for these items cannot be determined from the provided visual alone.
Autre prompt:
You are tasked with performing an exhaustive OCR extraction on a technical or administrative web interface screenshot.
GOAL: Extract **every legible piece of text**, even partially visible, faded, or cropped. Structure your output for clarity. Do not guess, but always report what is visible.
📄 FORMAT USING THESE CATEGORIES:
---
1. PAGE STRUCTURE
- Page titles
- Interface headers or section labels
- Navigation bars or visible URLs
2. IDENTIFIERS & DATA
- Operator or user names
- Sample IDs, test references
- Materials, dates, batch numbers
3. INTERFACE ELEMENTS (MANDATORY SCAN)
- Button labels (e.g., RAZ, SAVE)
- Tabs (e.g., MATERIAL, OBSERVATIONS)
- Sidebars, form field labels
4. SYSTEM MESSAGES
- Connection or server errors
- Domains, IP addresses, server notices
5. METADATA
- Standard references (e.g., "NF EN ####-#")
- Version numbers, document codes, timestamps
6. UNCLEAR / CROPPED TEXT
- Logos, partial lines (use “[...]” for truncated)
- Background/faded elements, labels not fully legible
---
RULES:
- Preserve punctuation, case, accents exactly.
- Include duplicates if text appears more than once.
- Never skip faint or partial text; use “[...]” if incomplete.
- Even if cropped, report as much as possible from any UI region.
This prompt is designed to generalize across all web portals, technical forms, or reports. Prioritize completeness over certainty. Do not ignore UI components or system messages.
Résultats:
🔵 Paramètres actifs LLM:
{
"temperature": 1.3,
"top_p": 0.85,
"presence_penalty": 0.1,
"frequency_penalty": 0.15,
"stop": [],
"stream": false,
"n": 1,
"seed": 0,
"mirostat": 0,
"mirostat_eta": 0.1,
"mirostat_tau": 5.0,
"top_k": 35,
"min_p": 0.06,
"repeat_penalty": 1.15,
"repeat_last_n": 128,
"tfs_z": 1.0,
"num_keep": 0,
"num_predict": 2048,
"num_ctx": 16384,
"num_batch": 2048
}
AgentVisionOCR: Extraction OCR sur image_145435.png
🔵 Résultat pour l'image output/ticket_T11143/T11143_20250422_084617/attachments/image_145435.png:
Texte extrait:
### **Page Structure:**
* Page title: "Essai au bleu de méthylène (MB) - NF EN 933-9 (02-2022)"
* Interface header: "RG-LAB"
* Navigation bar/visible URL: Not visible
* Sidebars/form field labels:
* "MATERIEL"
* "OBSERVATIONS"
### **Identifiers & Data:**
No legible identifiers/data present in the image.
### **Interface Elements:**
* Button labels: None fully visible. One partially cropped button appears to start with an ellipsis "...".
### **System Messages**
None are apparent from the interface elements shown, although partial text could suggest a server message or error code ("[...]", "[...]").
### **Metadata**
* Standard references: NF EN 933-9 (02-2022)
### **Unclear/Cropped Text**:
The lower section contains a faded URL and some metadata fields that appear not to be filled out or have been intentionally hidden for privacy/security reasons ("[...]"). The top left corner shows part of what might be another standard reference or version number ("RG-LAB") but is too cropped to interpret clearly.
Nouveau prompt: