Vision API
The Vision API provides advanced AI-powered content analysis and information extraction capabilities for documents, images, and videos.
Going beyond traditional OCR, our AI vision technology understands context, layout, and relationships within content, enabling intelligent data extraction and analysis.
Supported Languages
Currently supported languages:
pt
Portugueseen
Englishes
Spanish (Coming soon)
Supported Formats
Documents:
pdf
Portable Document Formatdoc
Microsoft Word (Coming soon)ppt
Microsoft PowerPoint (Coming soon)xls
Microsoft Excel (Coming soon)
Media:
jpg
JPEGpng
PNGmp4
MPEG-4
Extract Information
Extract structured data from documents using custom schemas. Ideal for automating data entry, invoice processing, and information retrieval.
Request body
- Name
file
- Type
- string
- Description
Media file to analyze (e.g., PDF, image, etc.)
- Name
schema
- Type
- object
- Description
A JSON schema describing the structure of the information to extract.
Response body
- Name
data
- Type
- object
- Description
The extracted information, structured according to the schema.
Request
// JavaScript (Node.js)
import { Vision } from '@regia-ai/js-sdk'
import { z } from 'zod'
// Initialize the Vision client
const vision = new Vision(process.env.REGIA_API_TOKEN)
// Define a schema with Zod
const schema = z.object({
dueDate: z.string().describe("Invoice due date"),
totalAmount: z.number().describe("Invoice total amount")
})
// Perform the extraction
const { dueDate, totalAmount } = await vision.extract('./invoice.pdf', { schema })
console.log(dueDate, totalAmount)
Transcribe Information
Convert content from various file types into text:
- Images and Documents: Extract text through vision processing
- Video files: Transcribe spoken content with visual descriptions
Request body
- Name
file
- Type
- string
- Description
Media file to transcribe (e.g., PDF, image, etc.)
Response body
- Name
text
- Type
- string
- Description
The transcribed text content from the document.
Request
// JavaScript (Node.js)
import { Vision } from '@regia-ai/js-sdk'
// Initialize the Vision client
const vision = new Vision(process.env.REGIA_API_TOKEN)
// Perform the transcription
const { text } = await vision.transcribe('./document.pdf')
console.log(text)
# cURL
curl -X POST https://api.regia.cloud/v1/vision/transcribe \
-H "Authorization: Bearer {token}" \
-H "Content-Type: multipart/form-data" \
-F "[email protected]"