VisionaryAI Suite – Settings

Configure how VisionaryAI analyzes text, images, video and audio.

LLM Settings

The LLM settings control which language model is used to analyze text, transcriptions and summaries within VisionaryAI.

Users can select provider, specify model ID and configure API connections to local or external AI engines.

The system works seamlessly with tools such as LM Studio and other OpenAI-compatible APIs.

Image Analysis

VisionaryAI uses modern computer-vision models such as YOLO, BLIP and CLIP to detect objects, generate captions and create relevant tags.

  • object class selection
  • detection confidence thresholds
  • maximum objects per image
  • AI model selection

Semantic Memory

CLIP-based analysis enables VisionaryAI to understand images based on semantic similarity rather than only explicit object detection.

This allows the system to find visually similar images even if they do not contain identical objects.

VisionaryAI can also use a reference image library to generate more contextual tags and continuously improve analysis over time.

Prompt Settings

Prompt settings control how AI models are instructed to analyze different types of media.

VisionaryAI uses specialized prompt templates for video analysis, image analysis and audio transcription.

Audio Analysis

VisionaryAI uses Whisper models to convert speech to text with high accuracy.

  • voice activity detection
  • speaker identification (diarization)
  • word-level synchronization

Video Analysis

VisionaryAI can automatically extract key frames from video and analyze them using computer-vision models.

At the same time, the audio track can be transcribed and analyzed to create a complete multimodal analysis.

Text Analysis

Analyze documents, emails and reports to identify patterns, themes and recurring issues.

Export to PDF / HTML

Create structured reports directly from analyses and export them to PDF or HTML format.

Hardcoded Subtitles

Generate subtitles from speech and embed them directly into video files.

VTAG Metadata Editor

Edit and organize metadata connected to media files.