Configure how VisionaryAI analyzes text, images, video and audio.
The LLM settings control which language model is used to analyze text, transcriptions and summaries within VisionaryAI.
Users can select provider, specify model ID and configure API connections to local or external AI engines.
The system works seamlessly with tools such as LM Studio and other OpenAI-compatible APIs.
VisionaryAI uses modern computer-vision models such as YOLO, BLIP and CLIP to detect objects, generate captions and create relevant tags.
CLIP-based analysis enables VisionaryAI to understand images based on semantic similarity rather than only explicit object detection.
This allows the system to find visually similar images even if they do not contain identical objects.
VisionaryAI can also use a reference image library to generate more contextual tags and continuously improve analysis over time.
Prompt settings control how AI models are instructed to analyze different types of media.
VisionaryAI uses specialized prompt templates for video analysis, image analysis and audio transcription.
VisionaryAI uses Whisper models to convert speech to text with high accuracy.
VisionaryAI can automatically extract key frames from video and analyze them using computer-vision models.
At the same time, the audio track can be transcribed and analyzed to create a complete multimodal analysis.
Analyze documents, emails and reports to identify patterns, themes and recurring issues.
Create structured reports directly from analyses and export them to PDF or HTML format.
Generate subtitles from speech and embed them directly into video files.
Edit and organize metadata connected to media files.