Ollama on Vitor Pontual | The VeePee Hub

The Daily Vee: Building a Personal AI Podcast

Wed, 01 Apr 2026 00:00:00 +0000

I wanted a morning news briefing that covered exactly the topics I care about — tech, science, crypto, AI, and the occasional sports story — without the ads, the hot takes, or the 45-minute runtime. So I built one.

The Daily Vee is a fully automated podcast that runs every day on my homelab. It pulls stories from my AI news aggregator (The VP Journal), writes a two-host script using a local LLM, voices it with neural text-to-speech, stitches the audio together, and publishes the episode to Telegram, Audiobookshelf, and right here on this site. No cloud APIs. No subscriptions. All local.

The February Shipping Spree

Wed, 18 Feb 2026 00:00:00 +0000

February is shaping up to be the month the commit graph caught fire. After a quiet summer and a slow ramp through the fall, January flipped a switch — and February just kept going. 429 contributions in the last year, and most of the green is piled into the last eight weeks.

The result? Four new services joining the three I launched earlier this month. That brings the total to seven self-hosted, AI-powered tools running on my home infrastructure — no cloud subscriptions, no API bills, all local.

Putting Claude to Work!

Wed, 11 Feb 2026 00:00:00 +0000

I’ve been busy building AI-powered tools on my self-hosted infrastructure. Today I’m adding three new services to the site:

Newsfeed

An AI-powered news aggregator inspired by Kevin Rose’s Nylon project. It pulls from 800+ RSS feeds, uses embeddings to cluster related articles into multi-source stories, and generates LLM headlines and syntheses. A Gravity Engine scores stories based on source count, author reputation, and my personal interests.

Ollama Fleet Manager

A dashboard and intelligent proxy for managing multiple Ollama GPU servers. It monitors model states, VRAM usage, and system metrics in real-time. The proxy emulates the standard Ollama API, routing requests to the optimal server — prioritizing those with the model already loaded.

Ollama Fleet Manager

Mon, 09 Feb 2026 00:00:00 +0000

A dashboard and intelligent proxy for managing a fleet of Ollama GPU servers. Point any Ollama client at the proxy and it routes requests to the best available server automatically—no application changes needed.

Intelligent request routing — prioritizes servers with model already loaded, then model on disk, then most free VRAM
Real-time fleet monitoring — server status, loaded models, VRAM usage, CPU/GPU temperature, memory, disk, and uptime
Usage analytics — request volume, success rates, latency percentiles, and breakdowns by model, source, and server over 24h/7d/30d windows
Request aggregation — /api/tags, /api/ps, and /v1/models combine responses from all servers into a single unified list
Scheduled jobs — cron-based model scheduling with conflict detection across the fleet
Telegram alerts — server offline/online, overheating, low memory, and reboot notifications
Plugin system — extensible architecture for community plugins
OpenAI API compatible — supports /v1/* endpoints so OpenAI-compatible tools work out of the box