Vision API

The Vision API provides advanced AI-powered content analysis and information extraction capabilities for documents, images, and videos.

Going beyond traditional OCR, our AI vision technology understands context, layout, and relationships within content, enabling intelligent data extraction and analysis.

Supported Languages

Currently supported languages:

pt Portuguese
en English
es Spanish (Coming soon)

Supported Formats

Documents:

pdf Portable Document Format
doc Microsoft Word (Coming soon)
ppt Microsoft PowerPoint (Coming soon)
xls Microsoft Excel (Coming soon)

Media:

jpg JPEG
png PNG
mp4 MPEG-4

POST/v1/vision/extract

Extract Information

Extract structured data from documents using custom schemas. Ideal for automating data entry, invoice processing, and information retrieval.

Request body

Name
file
Type
string
Description
Media file to analyze (e.g., PDF, image, etc.)
Name
schema
Type
object
Description
A JSON schema describing the structure of the information to extract.

Response body

Name
data
Type
object
Description
The extracted information, structured according to the schema.

Request

POST

/v1/vision/extract

// JavaScript (Node.js)
import { Vision } from '@regia-ai/js-sdk'
import { z } from 'zod'

// Initialize the Vision client
const vision = new Vision(process.env.REGIA_API_TOKEN)

// Define a schema with Zod
const schema = z.object({
  dueDate: z.string().describe("Invoice due date"),
  totalAmount: z.number().describe("Invoice total amount")
})

// Perform the extraction
const { dueDate, totalAmount } = await vision.extract('./invoice.pdf', { schema })

console.log(dueDate, totalAmount)

POST/v1/vision/transcribe

Transcribe Information

Convert content from various file types into text:

Images and Documents: Extract text through vision processing
Video files: Transcribe spoken content with visual descriptions

Request body

Name
file
Type
string
Description
Media file to transcribe (e.g., PDF, image, etc.)

Response body

Name
text
Type
string
Description
The transcribed text content from the document.

Request

POST

/v1/vision/transcribe

// JavaScript (Node.js)
import { Vision } from '@regia-ai/js-sdk'

// Initialize the Vision client
const vision = new Vision(process.env.REGIA_API_TOKEN)

// Perform the transcription
const { text } = await vision.transcribe('./document.pdf')

console.log(text)

# cURL
curl -X POST https://api.regia.cloud/v1/vision/transcribe \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: multipart/form-data" \
  -F "[email protected]"