Vision API

The Vision API provides advanced AI-powered content analysis and information extraction capabilities for documents, images, and videos.

Going beyond traditional OCR, our AI vision technology understands context, layout, and relationships within content, enabling intelligent data extraction and analysis.

Supported Languages

Currently supported languages:

  • pt   Portuguese
  • en   English
  • es   Spanish (Coming soon)

Supported Formats

Documents:

  • pdf   Portable Document Format
  • doc   Microsoft Word (Coming soon)
  • ppt   Microsoft PowerPoint (Coming soon)
  • xls   Microsoft Excel (Coming soon)

Media:

  • jpg   JPEG
  • png   PNG
  • mp4   MPEG-4


POST/v1/vision/extract

Extract Information

Extract structured data from documents using custom schemas. Ideal for automating data entry, invoice processing, and information retrieval.

Request body

  • Name
    file
    Type
    string
    Description

    Media file to analyze (e.g., PDF, image, etc.)

  • Name
    schema
    Type
    object
    Description

    A JSON schema describing the structure of the information to extract.

Response body

  • Name
    data
    Type
    object
    Description

    The extracted information, structured according to the schema.

Request

POST
/v1/vision/extract
// JavaScript (Node.js)
import { Vision } from '@regia-ai/js-sdk'
import { z } from 'zod'

// Initialize the Vision client
const vision = new Vision(process.env.REGIA_API_TOKEN)

// Define a schema with Zod
const schema = z.object({
  dueDate: z.string().describe("Invoice due date"),
  totalAmount: z.number().describe("Invoice total amount")
})

// Perform the extraction
const { dueDate, totalAmount } = await vision.extract('./invoice.pdf', { schema })

console.log(dueDate, totalAmount)

POST/v1/vision/transcribe

Transcribe Information

Convert content from various file types into text:

  • Images and Documents: Extract text through vision processing
  • Video files: Transcribe spoken content with visual descriptions

Request body

  • Name
    file
    Type
    string
    Description

    Media file to transcribe (e.g., PDF, image, etc.)

Response body

  • Name
    text
    Type
    string
    Description

    The transcribed text content from the document.

Request

POST
/v1/vision/transcribe
// JavaScript (Node.js)
import { Vision } from '@regia-ai/js-sdk'

// Initialize the Vision client
const vision = new Vision(process.env.REGIA_API_TOKEN)

// Perform the transcription
const { text } = await vision.transcribe('./document.pdf')

console.log(text)
# cURL
curl -X POST https://api.regia.cloud/v1/vision/transcribe \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: multipart/form-data" \
  -F "[email protected]"

Was this page helpful?