mirror of
https://github.com/Ladebeze66/llm_ticket3.git
synced 2025-12-13 10:46:51 +01:00
306 lines
10 KiB
Plaintext
306 lines
10 KiB
Plaintext
You are tasked with performing a high-precision OCR extraction on a screenshot of a technical or administrative web interface.
|
|
|
|
GOAL: Extract all visible and partially visible text — no matter how small, faint, or cropped. Remain strictly factual. Do not interpret, guess, or reword.
|
|
|
|
📄 FORMAT THE OUTPUT USING THESE CATEGORIES:
|
|
|
|
---
|
|
|
|
1. PAGE STRUCTURE
|
|
* Page title(s)
|
|
* Section or interface headers
|
|
* Visible URLs, tabs, or menu paths
|
|
|
|
2. DATA & IDENTIFIERS
|
|
* Sample codes, test references, user/operator names
|
|
* Material or item descriptions
|
|
* Dates, times, unique identifiers
|
|
|
|
3. INTERFACE ELEMENTS
|
|
* Button labels
|
|
* Tab names
|
|
* Sidebar/menu content
|
|
* Field or dropdown labels
|
|
|
|
4. SYSTEM MESSAGES & ERRORS
|
|
* Status messages, warnings, or connection errors
|
|
* Domain names, IPs, server notices
|
|
|
|
5. METADATA
|
|
* Version numbers, standard references, document codes
|
|
* Any duplicated text or footer content
|
|
|
|
6. UNCLEAR OR CROPPED TEXT
|
|
* Logos, watermarks, truncated words or symbols
|
|
* Use “[...]” to mark incomplete or partially cropped text
|
|
|
|
---
|
|
|
|
RULES:
|
|
- Do not translate or paraphrase.
|
|
- Preserve original casing, spelling, punctuation.
|
|
- Include repeated elements as they appear.
|
|
- Report faint or background text if legible.
|
|
- Leave blank sections if no relevant text is found.
|
|
|
|
This prompt is designed to work across a wide range of web interfaces, dashboards, and structured forms. Output clearly grouped bullet points per section.
|
|
|
|
Résultats:
|
|
🔵 Paramètres actifs LLM:
|
|
{
|
|
"temperature": 1.3,
|
|
"top_p": 0.85,
|
|
"presence_penalty": 0.1,
|
|
"frequency_penalty": 0.15,
|
|
"stop": [],
|
|
"stream": false,
|
|
"n": 1,
|
|
"seed": 0,
|
|
"mirostat": 0,
|
|
"mirostat_eta": 0.1,
|
|
"mirostat_tau": 5.0,
|
|
"top_k": 35,
|
|
"min_p": 0.06,
|
|
"repeat_penalty": 1.15,
|
|
"repeat_last_n": 128,
|
|
"tfs_z": 1.0,
|
|
"num_keep": 0,
|
|
"num_predict": 2048,
|
|
"num_ctx": 16384,
|
|
"num_batch": 2048
|
|
}
|
|
AgentVisionOCR: Extraction OCR sur image_145435.png
|
|
|
|
🔵 Résultat pour l'image output/ticket_T11143/T11143_20250422_084617/attachments/image_145435.png:
|
|
Texte extrait:
|
|
### 1) PAGE STRUCTURE
|
|
|
|
* **Page title:** Essai au bleu de méthylène (MB)
|
|
* **Section/Interface headers:**
|
|
* BRG-LAB
|
|
* **Visible URLs, tabs, menu paths:**
|
|
* No visible URLs are displayed in the image.
|
|
|
|
### 2) DATA & IDENTIFIERS
|
|
|
|
* **Sample codes/test references/user/operator names:**
|
|
* BOLLEE Victor
|
|
* Echantillon n° 25-00075 réceptionné le 02/04/2025 par BOLLEE Victor.
|
|
* **Material/item descriptions:**
|
|
* Blue Methylene MB Testing Report
|
|
|
|
**Note:** There is no mention of dates or times within the given page snippet. Additionally, there seems to be no unique identifiers on this web page interface screenshot provided.
|
|
|
|
### 3) INTERFACE ELEMENTS
|
|
|
|
This section cannot be determined based on the information provided in the image.
|
|
|
|
### 4) SYSTEM MESSAGES & ERRORS
|
|
|
|
There are no system messages or errors present in this snapshot as it only provides a view of what appears to be an itemized report rather than any kind of error message.
|
|
|
|
### 5) METADATA
|
|
|
|
No metadata elements can be identified from the image alone such as version numbers standard references document codes etcetera without having access full context surrounding these items listed under "BRG-LAB". As far we know all text shown here pertains directly related subject matter topic discussed above thus falling outside scope defined categories listed above according guidelines outlined prompt.
|
|
|
|
Nouveau prompt:
|
|
You are tasked with performing a high-precision OCR extraction on a screenshot of a technical or administrative web interface.
|
|
|
|
GOAL: Extract all visible and partially visible text — no matter how small, faint, or cropped. Remain strictly factual. Do not interpret, guess, or reword.
|
|
|
|
📄 FORMAT THE OUTPUT USING THESE CATEGORIES:
|
|
|
|
---
|
|
|
|
1. PAGE STRUCTURE
|
|
* Page titles (e.g., test names or document references)
|
|
* Interface headers, menus or section titles
|
|
* Navigation breadcrumbs, web paths, or visible URLs
|
|
|
|
2. DATA & IDENTIFIERS
|
|
* Sample numbers, user names, dates
|
|
* Material types or item descriptions
|
|
* Unique codes or test references
|
|
|
|
3. INTERFACE ELEMENTS
|
|
* Button labels (e.g., RESET, PRINT, SAVE)
|
|
* Tabs or navigation elements (e.g., RESULTS, MATERIAL, OBSERVATIONS)
|
|
* Form field labels or sidebar sections
|
|
|
|
4. SYSTEM MESSAGES & ERRORS
|
|
* Status or connection errors (e.g., “unable to connect”, IP/domain errors)
|
|
* Visible domain names, system paths
|
|
|
|
5. METADATA
|
|
* Document codes, version numbers, dates (e.g., standard formats like “NF EN ####-#”)
|
|
* Any repeated or footer elements
|
|
|
|
6. UNCLEAR / CROPPED TEXT
|
|
* Logos, watermarks, or faded UI elements
|
|
* Truncated words or text: use “[...]” to indicate partial content
|
|
|
|
---
|
|
|
|
RULES:
|
|
- Do not interpret or translate.
|
|
- Preserve all accents, casing, punctuation.
|
|
- Include all duplicates or repeated labels.
|
|
- Capture system messages even if incomplete or faded.
|
|
- Leave a section empty only if no corresponding text is found.
|
|
|
|
Output a clean, bullet-pointed list per section. Be exhaustive and literal.
|
|
|
|
Résultats:
|
|
🔵 Paramètres actifs LLM:
|
|
{
|
|
"temperature": 1.3,
|
|
"top_p": 0.85,
|
|
"presence_penalty": 0.1,
|
|
"frequency_penalty": 0.15,
|
|
"stop": [],
|
|
"stream": false,
|
|
"n": 1,
|
|
"seed": 0,
|
|
"mirostat": 0,
|
|
"mirostat_eta": 0.1,
|
|
"mirostat_tau": 5.0,
|
|
"top_k": 35,
|
|
"min_p": 0.06,
|
|
"repeat_penalty": 1.15,
|
|
"repeat_last_n": 128,
|
|
"tfs_z": 1.0,
|
|
"num_keep": 0,
|
|
"num_predict": 2048,
|
|
"num_ctx": 16384,
|
|
"num_batch": 2048
|
|
}
|
|
AgentVisionOCR: Extraction OCR sur image_145435.png
|
|
|
|
🔵 Résultat pour l'image output/ticket_T11143/T11143_20250422_084617/attachments/image_145435.png:
|
|
Texte extrait:
|
|
### Page Structure
|
|
Page title: Essai au bleu de méthylène (MB) - NF EN 933-9 (02-2022)
|
|
Navigation breadcrumbs/path: none visible
|
|
|
|
### Data & Identifiers
|
|
Sample numbers/usernames/dates/material types/item descriptions/unique codes/test references:
|
|
none are clearly readable in the provided image snippet; however, it appears there might be data within tables or sections not fully captured in this view.
|
|
|
|
### Interface Elements
|
|
Button labels/tabs/navigation elements/form field labels/sidebar sections: None explicitly identifiable from the given portion of the interface screenshot.
|
|
|
|
### System Messages and Errors
|
|
Status/connection errors/domain names/system paths: None evident from the visible part of the page.
|
|
|
|
### Metadata
|
|
Document codes/version numbers/dates/repeated/footer elements:
|
|
NF EN 933-9 is mentioned which suggests a European standard reference but lacks specifics like date without further context. No version number, repeated text, or footer content is visible due to cropping.
|
|
|
|
### Unclear/Cropped Text
|
|
Logos/watermarks/faded UI elements/truncated words/text with “[...]” indicating partial content:
|
|
Due to significant portions being cropped out by the edges of this image snippet, several pieces of information are truncated including what seems to be a document title/standard reference and possibly interface labels or data entries. The full text for these items cannot be determined from the provided visual alone.
|
|
|
|
Autre prompt:
|
|
You are tasked with performing an exhaustive OCR extraction on a technical or administrative web interface screenshot.
|
|
|
|
GOAL: Extract **every legible piece of text**, even partially visible, faded, or cropped. Structure your output for clarity. Do not guess, but always report what is visible.
|
|
|
|
📄 FORMAT USING THESE CATEGORIES:
|
|
|
|
---
|
|
|
|
1. PAGE STRUCTURE
|
|
- Page titles
|
|
- Interface headers or section labels
|
|
- Navigation bars or visible URLs
|
|
|
|
2. IDENTIFIERS & DATA
|
|
- Operator or user names
|
|
- Sample IDs, test references
|
|
- Materials, dates, batch numbers
|
|
|
|
3. INTERFACE ELEMENTS (MANDATORY SCAN)
|
|
- Button labels (e.g., RAZ, SAVE)
|
|
- Tabs (e.g., MATERIAL, OBSERVATIONS)
|
|
- Sidebars, form field labels
|
|
|
|
4. SYSTEM MESSAGES
|
|
- Connection or server errors
|
|
- Domains, IP addresses, server notices
|
|
|
|
5. METADATA
|
|
- Standard references (e.g., "NF EN ####-#")
|
|
- Version numbers, document codes, timestamps
|
|
|
|
6. UNCLEAR / CROPPED TEXT
|
|
- Logos, partial lines (use “[...]” for truncated)
|
|
- Background/faded elements, labels not fully legible
|
|
|
|
---
|
|
|
|
RULES:
|
|
- Preserve punctuation, case, accents exactly.
|
|
- Include duplicates if text appears more than once.
|
|
- Never skip faint or partial text; use “[...]” if incomplete.
|
|
- Even if cropped, report as much as possible from any UI region.
|
|
|
|
This prompt is designed to generalize across all web portals, technical forms, or reports. Prioritize completeness over certainty. Do not ignore UI components or system messages.
|
|
|
|
Résultats:
|
|
🔵 Paramètres actifs LLM:
|
|
{
|
|
"temperature": 1.3,
|
|
"top_p": 0.85,
|
|
"presence_penalty": 0.1,
|
|
"frequency_penalty": 0.15,
|
|
"stop": [],
|
|
"stream": false,
|
|
"n": 1,
|
|
"seed": 0,
|
|
"mirostat": 0,
|
|
"mirostat_eta": 0.1,
|
|
"mirostat_tau": 5.0,
|
|
"top_k": 35,
|
|
"min_p": 0.06,
|
|
"repeat_penalty": 1.15,
|
|
"repeat_last_n": 128,
|
|
"tfs_z": 1.0,
|
|
"num_keep": 0,
|
|
"num_predict": 2048,
|
|
"num_ctx": 16384,
|
|
"num_batch": 2048
|
|
}
|
|
AgentVisionOCR: Extraction OCR sur image_145435.png
|
|
|
|
🔵 Résultat pour l'image output/ticket_T11143/T11143_20250422_084617/attachments/image_145435.png:
|
|
Texte extrait:
|
|
### **Page Structure:**
|
|
|
|
* Page title: "Essai au bleu de méthylène (MB) - NF EN 933-9 (02-2022)"
|
|
* Interface header: "RG-LAB"
|
|
* Navigation bar/visible URL: Not visible
|
|
* Sidebars/form field labels:
|
|
* "MATERIEL"
|
|
* "OBSERVATIONS"
|
|
|
|
### **Identifiers & Data:**
|
|
|
|
No legible identifiers/data present in the image.
|
|
|
|
### **Interface Elements:**
|
|
|
|
* Button labels: None fully visible. One partially cropped button appears to start with an ellipsis "...".
|
|
|
|
### **System Messages**
|
|
|
|
None are apparent from the interface elements shown, although partial text could suggest a server message or error code ("[...]", "[...]").
|
|
|
|
### **Metadata**
|
|
|
|
* Standard references: NF EN 933-9 (02-2022)
|
|
|
|
### **Unclear/Cropped Text**:
|
|
|
|
The lower section contains a faded URL and some metadata fields that appear not to be filled out or have been intentionally hidden for privacy/security reasons ("[...]"). The top left corner shows part of what might be another standard reference or version number ("RG-LAB") but is too cropped to interpret clearly.
|
|
|
|
Nouveau prompt: |