<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Ollama on Vitor Pontual | The VeePee Hub</title>
    <link>https://vitorpontual.com/tags/ollama/</link>
    <description>Recent content in Ollama on Vitor Pontual | The VeePee Hub</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 01 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://vitorpontual.com/tags/ollama/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>The Daily Vee: Building a Personal AI Podcast</title>
      <link>https://vitorpontual.com/posts/the-daily-vee/</link>
      <pubDate>Wed, 01 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://vitorpontual.com/posts/the-daily-vee/</guid>
      <description>&lt;p&gt;I wanted a morning news briefing that covered exactly the topics I care about — tech, science, crypto, AI, and the occasional sports story — without the ads, the hot takes, or the 45-minute runtime. So I built one.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;The Daily Vee&lt;/strong&gt; is a fully automated podcast that runs every day on my homelab. It pulls stories from my AI news aggregator (&lt;a href=&#34;https://vitorpontual.com/posts/february-shipping-spree/&#34;&gt;The VP Journal&lt;/a&gt;), writes a two-host script using a local LLM, voices it with neural text-to-speech, stitches the audio together, and publishes the episode to Telegram, Audiobookshelf, and right here on this site. No cloud APIs. No subscriptions. All local.&lt;/p&gt;</description>
    </item>
    <item>
      <title>The February Shipping Spree</title>
      <link>https://vitorpontual.com/posts/february-shipping-spree/</link>
      <pubDate>Wed, 18 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://vitorpontual.com/posts/february-shipping-spree/</guid>
      <description>&lt;p&gt;February is shaping up to be the month the commit graph caught fire. After a quiet summer and a slow ramp through the fall, January flipped a switch — and February just kept going. 429 contributions in the last year, and most of the green is piled into the last eight weeks.&lt;/p&gt;&#xA;&lt;p&gt;The result? Four new services joining the three I &lt;a href=&#34;https://vitorpontual.com/posts/new-self-hosted-services/&#34;&gt;launched earlier this month&lt;/a&gt;. That brings the total to seven self-hosted, AI-powered tools running on my home infrastructure — no cloud subscriptions, no API bills, all local.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Putting Claude to Work!</title>
      <link>https://vitorpontual.com/posts/new-self-hosted-services/</link>
      <pubDate>Wed, 11 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://vitorpontual.com/posts/new-self-hosted-services/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been busy building AI-powered tools on my self-hosted infrastructure. Today I&amp;rsquo;m adding three new services to the site:&lt;/p&gt;&#xA;&lt;h2 id=&#34;newsfeed&#34;&gt;Newsfeed&lt;/h2&gt;&#xA;&lt;p&gt;An AI-powered news aggregator inspired by Kevin Rose&amp;rsquo;s Nylon project. It pulls from 800+ RSS feeds, uses embeddings to cluster related articles into multi-source stories, and generates LLM headlines and syntheses. A Gravity Engine scores stories based on source count, author reputation, and my personal interests.&lt;/p&gt;&#xA;&lt;h2 id=&#34;ollama-fleet-manager&#34;&gt;Ollama Fleet Manager&lt;/h2&gt;&#xA;&lt;p&gt;A dashboard and intelligent proxy for managing multiple Ollama GPU servers. It monitors model states, VRAM usage, and system metrics in real-time. The proxy emulates the standard Ollama API, routing requests to the optimal server — prioritizing those with the model already loaded.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Ollama Fleet Manager</title>
      <link>https://vitorpontual.com/services/ollama-fleet-manager/</link>
      <pubDate>Mon, 09 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://vitorpontual.com/services/ollama-fleet-manager/</guid>
      <description>&lt;p&gt;A dashboard and intelligent proxy for managing a fleet of Ollama GPU servers. Point any Ollama client at the proxy and it routes requests to the best available server automatically—no application changes needed.&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Intelligent request routing&lt;/strong&gt; — prioritizes servers with model already loaded, then model on disk, then most free VRAM&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Real-time fleet monitoring&lt;/strong&gt; — server status, loaded models, VRAM usage, CPU/GPU temperature, memory, disk, and uptime&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Usage analytics&lt;/strong&gt; — request volume, success rates, latency percentiles, and breakdowns by model, source, and server over 24h/7d/30d windows&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Request aggregation&lt;/strong&gt; — &lt;code&gt;/api/tags&lt;/code&gt;, &lt;code&gt;/api/ps&lt;/code&gt;, and &lt;code&gt;/v1/models&lt;/code&gt; combine responses from all servers into a single unified list&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Scheduled jobs&lt;/strong&gt; — cron-based model scheduling with conflict detection across the fleet&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Telegram alerts&lt;/strong&gt; — server offline/online, overheating, low memory, and reboot notifications&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Plugin system&lt;/strong&gt; — extensible architecture for community plugins&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;OpenAI API compatible&lt;/strong&gt; — supports &lt;code&gt;/v1/*&lt;/code&gt; endpoints so OpenAI-compatible tools work out of the box&lt;/li&gt;&#xA;&lt;/ul&gt;</description>
    </item>
  </channel>
</rss>
