<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Piece by Piece]]></title><description><![CDATA[Nir Adler personal blog, tech related, and open source projects.]]></description><link>https://blog.niradler.com</link><image><url>https://cdn.hashnode.com/uploads/logos/5fa47dee3e634314b5179767/efe24754-e1b4-423a-bde7-91b47a2be6e7.png</url><title>Piece by Piece</title><link>https://blog.niradler.com</link></image><generator>RSS for Node</generator><lastBuildDate>Thu, 14 May 2026 14:06:48 GMT</lastBuildDate><atom:link href="https://blog.niradler.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Postgres Already Is Your Backend]]></title><description><![CDATA[AI agents are unreasonably good at writing SQL. They get specific error messages, fix their own mistakes in one retry, and can introspect query performance with EXPLAIN ANALYZE before you even ask. No]]></description><link>https://blog.niradler.com/postgres-already-is-your-backend</link><guid isPermaLink="true">https://blog.niradler.com/postgres-already-is-your-backend</guid><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Mon, 04 May 2026 20:56:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/5fa47dee3e634314b5179767/cfd83984-3782-45d3-9840-493d7271d795.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AI agents are unreasonably good at writing SQL. They get specific error messages, fix their own mistakes in one retry, and can introspect query performance with EXPLAIN ANALYZE before you even ask. No other programming interface gives an agent that kind of feedback loop. So when you learn that PostgreSQL extensions now cover GraphQL APIs, job queues, cron scheduling, JWT auth, and HTTP webhooks, all through SQL, a question starts forming: what if the entire backend was just a Postgres database that an agent could build and operate?</p>
<p>That's not hypothetical. Supabase runs 1.5 million databases this way. Most of them power full production applications with auth, APIs, background jobs, and access control. The application server? There isn't one. It's PostgreSQL all the way down. And the extension ecosystem has quietly matured to the point where this approach isn't a prototype or a hack. It's a real architecture, and it happens to be the one AI agents are best equipped to work with.</p>
<h2>The Extension Stack</h2>
<p>The reference point here is <a href="https://github.com/supabase/postgres">supabase/postgres</a>, which packages a curated set of extensions into a single Postgres distribution. You don't have to use Supabase's hosted platform to benefit from this. The image is open source, and the extensions work on any Postgres instance. But looking at what they've bundled tells you exactly what a "Postgres as backend" stack looks like in practice.</p>
<p><strong>pg_graphql</strong> turns your database schema into a GraphQL API automatically. You define your tables, your foreign keys, your constraints, and pg_graphql reflects them as a fully queryable GraphQL endpoint. No resolvers, no schema stitching, no code generation step. Add a column, and the API updates. It handles filtering, pagination, ordering, and relationships out of the box. If you've ever spent three days writing boilerplate CRUD resolvers in a Node.js backend, this will feel like cheating. And because the entire schema is defined in SQL, an agent can create tables, add relationships, and immediately test the resulting API through the same database connection.</p>
<p>For REST, there's <strong>PostgREST</strong>, which does the same thing over HTTP. Define a table, get a REST API. It respects your Row Level Security policies, so authorization is handled at the database layer. The API is surprisingly complete: you get bulk inserts, upserts, resource embedding for joins, and content negotiation. It's been in production at serious scale for years.</p>
<p><strong>pg_cron</strong> gives you cron scheduling inside PostgreSQL. Schedule a function to run every night at 2 AM, every five minutes, every Monday. The syntax is standard cron. The jobs execute as SQL, which means they can do anything your database can do: aggregate data, clean up old records, refresh materialized views, call external services through pg_net. No separate scheduler service, no Lambda functions, no Kubernetes CronJobs. An agent can set up a scheduled job with a single <code>SELECT cron.schedule()</code> call, verify it's registered, and monitor its execution, all without leaving SQL.</p>
<p><strong>pgmq</strong> is a message queue built as a Postgres extension, originally created by Tembo. It gives you exactly what you'd use Redis or RabbitMQ for: publish messages, consume them, handle visibility timeouts, dead letter queues. The difference is that your queue lives in the same database as your data, participates in the same transactions, and doesn't require a separate infrastructure component. For most applications that aren't processing millions of messages per second, this is more than enough.</p>
<p><strong>pg_net</strong> lets Postgres make HTTP requests. This is the piece that closes the loop. Your database can now call external APIs, send webhooks, trigger serverless functions. Combine it with pg_cron and you have a fully autonomous system that can poll APIs, process data, and push results without any application code running anywhere.</p>
<h2>Auth Without an Auth Service</h2>
<p>The most underappreciated part of the Postgres-as-backend approach is how naturally it handles authorization. Row Level Security (RLS) is a built-in Postgres feature, not an extension, that lets you define access policies directly on tables. When you write <code>CREATE POLICY user_data ON profiles USING (auth.uid() = user_id)</code>, you're telling Postgres: no matter how someone queries this table, they only see their own rows. This isn't application-level filtering that someone might forget to add. It's enforced at the storage engine level.</p>
<p>Pair RLS with <strong>pgjwt</strong>, which can generate and verify JSON Web Tokens inside PostgreSQL, and you have a complete auth flow. The JWT contains the user's role and ID, Postgres verifies it on every request, and RLS policies use that identity to filter data. PostgREST and pg_graphql both pass the JWT through, so your API layer is stateless and your authorization logic lives exactly where your data lives.</p>
<p>This is the kind of architecture that would normally require an Express server, a Passport.js configuration, a middleware chain, and a few hundred lines of authorization code scattered across your route handlers. Here it's a few SQL statements. Statements that an agent can write, test, and iterate on with immediate feedback about whether the policy does what it should.</p>
<h2>Why This Is an Agent-Native Architecture</h2>
<p>The deeper you look at this stack, the more you realize it's not just convenient for AI agents. It's structurally ideal.</p>
<p>SQL has an exceptionally tight feedback loop. An agent writes a query, runs it, and immediately knows if it worked. The error messages are specific and actionable: "column 'user_naem' does not exist" tells the agent exactly what to fix. Compare this to debugging a REST API integration where the failure mode might be a silent 200 response with unexpected data. An agent working with SQL can self-correct in a single retry. It reads the error, fixes the typo or adjusts the join, and tries again. Most queries converge on the correct result within one or two iterations.</p>
<p>Then there's <code>EXPLAIN ANALYZE</code>. An agent can not only write a query but immediately evaluate its performance, see the query plan, identify sequential scans, and rewrite for efficiency. This kind of introspection is rare in programming interfaces. You don't get an equivalent of "here's exactly how your code will execute and how long each step takes" in most languages. With Postgres, it's one command away.</p>
<p>The extension ecosystem multiplies this advantage. An agent that can write SQL can also create RLS policies, schedule cron jobs, enqueue messages, define GraphQL schemas, and set up webhooks. All through the same interface. All with the same feedback loop. All correctable in the same way. You're not asking the agent to context-switch between a Terraform config, a YAML deployment manifest, a JavaScript middleware function, and a SQL migration. It's SQL the whole way through.</p>
<p>This matters because the bottleneck with AI agents isn't intelligence, it's interface friction. Every time an agent has to switch between tools, formats, or paradigms, error rates go up. A Postgres-native backend compresses the entire surface area into one language that agents already handle well.</p>
<h2>What This Actually Looks Like</h2>
<p>Picture a SaaS application. Users sign up, manage projects, get daily email digests, and can query their data through a dashboard. In a traditional stack, you'd need an API server (Express, FastAPI, Rails), a background job processor (Sidekiq, Celery, Bull), a message queue (Redis, SQS), an auth service (Auth0, Clerk, or hand-rolled), and a cron scheduler (CloudWatch Events, Kubernetes CronJobs).</p>
<p>With the Postgres extension stack, your migration files <em>are</em> your backend. You write <code>CREATE TABLE</code> for your schema. <code>CREATE POLICY</code> for your auth rules. <code>SELECT cron.schedule()</code> for your daily digest job. A <code>pgmq</code> queue for async processing. pg_graphql for your dashboard API. pg_net to call your email provider. The "application code" is SQL, and it lives in version-controlled migration files.</p>
<p>Now point an AI agent at that database. It can scaffold the schema, write the security policies, set up the background jobs, and test everything, all in one session, all in one language. Try doing that across Express, Redis, Auth0, and CloudWatch.</p>
<p>Is this right for every application? No. If you're building a real-time multiplayer game or processing video streams, you need application servers. But for the vast majority of web applications, the ones that are fundamentally CRUD with some business logic and background processing, the Postgres extension ecosystem covers more ground than most developers realize.</p>
]]></content:encoded></item><item><title><![CDATA[Your Next Startup: The Agentic Workflow Engine]]></title><description><![CDATA[n8n has over 400 integrations. Zapier claims 7,000+. Every single one was hand-built, tested against a moving API, and will eventually break when that API ships a v2. The entire workflow automation in]]></description><link>https://blog.niradler.com/your-next-startup-the-agentic-workflow-engine</link><guid isPermaLink="true">https://blog.niradler.com/your-next-startup-the-agentic-workflow-engine</guid><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Sun, 03 May 2026 22:28:54 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/5fa47dee3e634314b5179767/ac0da5eb-91b7-4825-87a4-caa3d463a722.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>n8n has over 400 integrations. Zapier claims 7,000+. Every single one was hand-built, tested against a moving API, and will eventually break when that API ships a v2. The entire workflow automation industry runs on a treadmill: build connectors, fix connectors, rebuild connectors. Meanwhile, the real shift in software, agents that reason and act autonomously, is happening outside these platforms entirely.</p>
<p>Here is a startup idea that kills the integration treadmill and builds for the agentic era at the same time.</p>
<h2>The Integration Tax</h2>
<p>Every workflow platform eventually becomes an integration company. n8n, Zapier, Make, Pipedream, Tray.io, they all spend enormous engineering effort building and maintaining connectors. Each connector is a custom adapter: someone reads the API docs for Salesforce or HubSpot or Stripe, writes the mapping code, handles auth, pagination, error states, rate limiting, and then maintains it forever. When Stripe changes a webhook payload or HubSpot deprecates an endpoint, that connector needs an update. Multiply that by thousands and you have a permanent tax on the business.</p>
<p>This made sense in a world where the consumer of these workflows was a human dragging boxes on a canvas. Humans need pre-built nodes with nice dropdown menus and labeled fields. But agents don't need any of that. An agent can read a spec.</p>
<h2>Specs Are the New Connectors</h2>
<p>MCP (Model Context Protocol) gives you a standardized way to describe tools that AI agents can call. OpenAPI specs already describe REST APIs in machine-readable format. GraphQL has introspection built into the protocol. Every serious API already publishes at least one of these.</p>
<p>Instead of building a Stripe connector, you point the engine at Stripe's OpenAPI spec. Instead of maintaining a Slack integration, you register Slack's MCP server. The engine reads the spec, understands the available operations, their parameters, their auth requirements, and generates step configurations automatically. You get instant support for any service that publishes a spec, which is effectively every service worth integrating with.</p>
<p>The step configuration layer works like this: given a spec (MCP, OpenAPI, or GraphQL schema), the engine extracts available operations, input/output schemas, and authentication flows. It builds typed step definitions that can be used in flows, both by agents constructing flows dynamically and by humans editing them in a UI. When the upstream API changes and publishes an updated spec, the step definitions update automatically. No connector code to maintain. No integration team racing to keep up.</p>
<h2>Built for Enterprise Scale</h2>
<p>n8n runs as a single Node.js process. You can add workers, bolt on Redis, configure queue mode, but at its core you're scaling an application server. For a team running a few hundred workflows, that's fine. For an enterprise running tens of thousands of concurrent flows across dozens of teams, it's a ceiling you'll hit hard.</p>
<p>The engine should be Kubernetes-native from day one. Each step in a flow runs as its own container. The orchestrator schedules steps across the cluster, scales horizontally without configuration, retries failed steps independently, and handles massive parallelism because that's just what Kubernetes does. There's no single bottleneck, no shared process, no "upgrade to our enterprise tier for more workers." You scale by adding nodes to your cluster, the same way you scale everything else in your infrastructure.</p>
<p>This matters for enterprises specifically because their workflows aren't simple "if this then that" chains. They're complex graphs with fan-out, fan-in, conditional branching, human approval gates, and SLA requirements. Running these on a single process with a queue is like running your CI/CD pipeline on a cron job. It technically works until it doesn't.</p>
<h2>Agentic First, Not Agentic Bolted On</h2>
<p>Here's the part that makes this more than just "n8n with better infrastructure." The engine is designed from the ground up for flows where agents are participants, not just triggers.</p>
<p>A traditional workflow is fully deterministic. Step A calls an API, passes the result to Step B, which transforms it, passes it to Step C. Every input and output is known at design time. These workflows are valuable and this engine supports them through the spec-driven step system described above. An MCP client or HTTP client executes the call, the output schema is known, validation is straightforward.</p>
<p>But the interesting workflows combine deterministic and non-deterministic steps. An agent step takes context from previous steps, reasons about it, decides what to do, and produces output that might vary every time. Maybe an agent reads a customer support ticket, decides which internal team should handle it, drafts a response, and flags whether the issue needs escalation. The reasoning is non-deterministic. The actions it takes (filing a Jira ticket, sending a Slack message, updating a CRM record) are deterministic steps driven by specs.</p>
<p>The engine lets you compose these freely. You can follow an agent step with a validation step that checks the output against a schema or a set of business rules before the flow continues. You can have an agent step that dynamically constructs a sub-flow by selecting which spec-driven steps to execute based on its reasoning. You can insert human review gates after agent steps for high-stakes decisions. The flow graph becomes a mix of reliable, typed, spec-driven operations and flexible, reasoning-capable agent nodes, with explicit boundaries between them.</p>
<p>This is what enterprises actually need. Not a chatbot that can call tools, but a governed system where agent behavior is observable, auditable, and constrained by the flow structure around it. The deterministic steps give you predictability and compliance. The agent steps give you flexibility and intelligence. The validation steps give you guardrails.</p>
<h2>Why Agents Should Build the Flows</h2>
<p>The last piece is that agents aren't just participants in flows, they're also builders. Because every integration is spec-driven, an agent can browse available specs, understand what operations are possible, and assemble a flow from them. A product manager describes what they need in natural language: "When a deal closes in HubSpot over $50k, create a Jira onboarding epic, notify the CS team in Slack, and schedule a kickoff in Google Calendar." The agent reads the relevant specs, constructs the flow graph with the right steps, parameters, and connections, and presents it for review.</p>
<p>This works because the specs are self-describing. The agent doesn't need to know how your custom n8n node for HubSpot works. It reads HubSpot's OpenAPI spec, finds the deal webhook and the relevant endpoints, and wires them up. The same agent can build flows against any service you've registered, even internal APIs, as long as they publish a spec.</p>
<p>For enterprises with hundreds of internal services, this is transformative. Instead of waiting for an integration team to build a connector for your internal billing API, you publish an OpenAPI spec for it and every agent in the system can immediately build flows against it.</p>
<h2>The Opportunity</h2>
<p>The workflow automation market is large and growing, but every incumbent is anchored to the connector model. They can't easily abandon it because their entire value proposition is "we integrate with everything, pre-built." Switching to a spec-driven approach would mean admitting that the thing they spent years building is now a liability.</p>
<p>A new entrant doesn't carry that baggage. You start with zero connectors and infinite integrations. You ship a platform that scales like infrastructure, governs agent behavior like an enterprise demands, and lets teams build workflows by describing what they need rather than dragging boxes around.</p>
<p>The wedge is any enterprise already deep in Kubernetes that has outgrown n8n or hit the limits of Zapier's enterprise tier. They already think in containers and specs. They just need a workflow engine that thinks the same way.</p>
<p>Build it.</p>
]]></content:encoded></item><item><title><![CDATA[Your Next Startup: The Auth Layer for the Agentic Era]]></title><description><![CDATA[Part of the "Your Next Startup" series, where I break down startup ideas I think are worth building.
Auth0 sold for \(6.5B. Okta is worth \)15B+. CyberArk, Delinea, BeyondTrust, all printing money fro]]></description><link>https://blog.niradler.com/your-next-startup-the-auth-layer-for-the-agentic-era</link><guid isPermaLink="true">https://blog.niradler.com/your-next-startup-the-auth-layer-for-the-agentic-era</guid><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Sun, 03 May 2026 22:03:55 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/5fa47dee3e634314b5179767/02c8bca4-6536-4f03-bef2-dd7f52838732.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Part of the "Your Next Startup" series, where I break down startup ideas I think are worth building.</em></p>
<p>Auth0 sold for \(6.5B. Okta is worth \)15B+. CyberArk, Delinea, BeyondTrust, all printing money from managing who gets access to what. And yet none of them were built for what's actually happening right now.</p>
<p>Your Slack bot spins up a Claude agent that calls your billing API, which triggers a background job that queries production, which fires a webhook to Stripe, all because a user pressed a button twenty minutes ago and has since gone to lunch. Who authorized that database query? With what permissions? For how long? Can you prove it to an auditor?</p>
<p>Be honest. The answer at most companies is that there's probably a long-lived API key in an environment variable somewhere. It has way more permissions than it needs. Nobody remembers who created it. It never expires.</p>
<p>That's the startup.</p>
<h2>Human Auth is Cooked (in a Good Way)</h2>
<p>Here's something that was controversial two years ago but is obvious now. Traditional human authentication is a commodity. Login with email, OAuth, TOTP, passkeys, magic links. Solved problems with a dozen good implementations. In the age of AI-assisted development, spinning up solid human auth takes hours not months. Claude Code can scaffold a complete auth integration with social login, MFA, and session management faster than you can read Auth0's docs.</p>
<p>The moat in auth is no longer "we make login work." The moat is everything that happens after login, and everything that happens without a login at all.</p>
<p>Look at how identity actually works at a modern company. Humans get proper auth. SSO, MFA, RBAC, audit trails. Well solved. Services get environment variables. An API key minted six months ago by someone who may have left the company. Permissions? Whatever the key has. Expiry? Never. AI agents get the worst deal of all. They inherit the human's complete credentials. Your AI assistant that only needs to read today's calendar can delete every event you've ever created. No scoping, no time-bounding, no containment.</p>
<p>The idea here is a single primitive that covers all three. Every access event, whether it's a human logging in, a service calling an API, or an agent performing a task, goes through the same thing: an Access Grant. Time-bound, scope-limited, auditable. Who wants access, what do they want to touch, for how long, who approved it, and what happened during the grant. That's the whole abstraction. HashiCorp Vault does credential minting. OPA does policy evaluation. Teleport does infrastructure access. Nobody unifies these into a single grant that works the same way for humans, agents, and services.</p>
<h2>Auth is the Best Guardrail You Have</h2>
<p>Here's the conceptual leap that makes this more than just another IAM startup. Auth and agent guardrails are the same problem viewed from different angles.</p>
<p>Everyone is building guardrails at the prompt layer. Don't say bad things, don't hallucinate, stay in character. But the most effective guardrail is the simplest one: if the agent doesn't have permission to do something, it can't do it. No prompt injection, no jailbreak, no cleverness overrides the fact that the credential simply doesn't exist yet. You constrain agent behavior at the infrastructure level, which is deterministic, instead of the language model level, which is fundamentally unreliable. The agent can say whatever it wants about what it intends to do. It can only do what the access layer allows.</p>
<p>When an agent hits a permission boundary, the system doesn't just return a 403 and kill the flow. It initiates an access request. This is what I'd call Human-in-the-Loop for Auth. Same pattern as "human-in-the-loop" for AI decisions, but for permissions. The agent needs to access customer PII? Instead of either blocking it or giving it a permanent key, the system pings a human approver in Slack. "Agent billing-reconciler wants read access to prod.customers for 5 minutes, triggered by monthly-reconcile, approve?" Engineer approves, a scoped credential is minted, the agent does its work, the credential dies, everything is logged.</p>
<p>The agent is contained, not killed. The work continues, just with appropriate oversight. This is radically different from today where the agent either has permanent access or no access at all.</p>
<h2>MCP, Tool-Level Policies, and the Credential Gateway</h2>
<p>Think about MCP for a second. Model Context Protocol is essentially an agent tool use standard. An agent connects to an MCP server and gets access to a set of tools: read a database, search files, create a ticket, send an email. Today when you connect an agent to an MCP server it gets access to all the tools on that server. There's no per-tool policy. There's no "this agent can use read-ticket but not delete-ticket." There's no "this agent can use send-email only during business hours and only to internal addresses."</p>
<p>Per-tool-call policy enforcement is a natural extension of this platform. Every MCP tool invocation goes through the access grant system. The policy engine evaluates whether this agent has permission to call this specific tool, with these specific parameters, right now. The answer might be auto-approve, might require human approval, might be deny. This turns the platform into the control plane for agent capabilities. Not just "can this agent access this resource" but "can this agent perform this action through this tool with these parameters at this time." That's a complete permission boundary for agentic systems.</p>
<p>Now layer the Credential Gateway on top. When you give an AI agent an API key, even a short-lived one, that key exists in the agent's context. It can leak in a response, get extracted via prompt injection, get logged or cached. The moment a credential touches an agent's context window your security boundary is the LLM. And LLMs are not a security boundary.</p>
<p>The gateway ensures agents never hold real credentials at all. They get an opaque internal token that means nothing outside your system. When the agent needs to call Stripe or GitHub or your internal API, it goes through the gateway. The gateway validates the token, checks the policy engine, swaps it for the real external credential, forwards the request, and returns the response. The agent calls gateway.yourplatform.com/stripe/charges with its internal token. The gateway translates that to api.stripe.com/charges with the actual Stripe key. The agent never knows the key exists.</p>
<p>This gives you credential isolation, request-level enforcement, response filtering so you can strip sensitive fields before they reach the LLM, instant revocation without rotating external keys, full observability of what every agent does across all external services, and even sandbox mode where the gateway returns mock responses instead of hitting real services. Same agent code, same auth flow, zero risk in development. Think of it as a reverse proxy, but for auth. The same way a load balancer sits between the internet and your services, the Credential Gateway sits between your agents and the outside world.</p>
<h2>Delegated Mode, Background Mode, and Trust Escalation</h2>
<p>There's a subtle but important distinction in agent permissions. When a user triggers an agent directly ("check my calendar and find a meeting time") the agent is operating as a delegate of that user. It makes sense for it to have the user's calendar permissions, scoped to read-only, time-bound to that interaction.</p>
<p>But when that same agent runs a background job at 3 AM processing data or running reports, it's not acting on behalf of anyone. It shouldn't have any user's permissions. It should have its own identity with its own policy-governed grants. Today this distinction doesn't exist. Agents either run as the user, which means too much access in background mode, or as a service account, which means permanent unscoped access in interactive mode. Delegated mode versus background mode should be a first-class concept in the platform.</p>
<p>And then there's trust escalation. A new agent starts with zero trust. Every access request needs human approval. After 10 approved requests of the same type the policy auto-upgrades to auto-approve. The agent earns trust, like a new employee does. If something anomalous happens, trust resets.</p>
<h2>Where This Goes</h2>
<p>What excites me about this idea is how many directions it can expand. The starting point is narrow: give your AI agents secure time-bound access to your database with Slack approval. One integration, immediate value, clear before and after. But it naturally grows into MCP auth, per-tool policy enforcement, agent guardrails, credential gateways, sandbox environments, compliance automation, trust scoring. Each of these could be its own feature or its own product.</p>
<p>The architecture can lean on strong existing infrastructure. SPIFFE and SPIRE for workload identity, where agents get cryptographic identity based on what they are, not what secret they hold. Cedar or OPA for policy evaluation. Vault or OpenBao for credential minting. Temporal for durable approval workflows. Envoy for the gateway proxy. PostgreSQL for audit. What the startup builds is the orchestration layer, the UX, and the product intelligence on top.</p>
<p>The market straddles IAM (Okta, Auth0, Clerk at \(20B+, focused on humans) and PAM (CyberArk, BeyondTrust at \)5B+, built for enterprise IT). Agent Access Management as a category doesn't formally exist yet. The wedge is developer teams building AI agent products who need to answer one question: how do I give my agent access to production without giving it the keys to the kingdom?</p>
<p>Auth0 gave every app a login page. This startup gives every agent a permission boundary. Somebody should build it.</p>
<p><em>Ideas I think are worth building. Some I'll build, some I hope you will.</em></p>
]]></content:encoded></item><item><title><![CDATA[The Third Interface: Your Next User Doesn't Have Hands]]></title><description><![CDATA[For as long as software has existed, we've been building two doors into our systems. Door one: the UI, a carefully designed surface where humans point, click, and occasionally rage-quit. Door two: the]]></description><link>https://blog.niradler.com/the-third-interface-your-next-user-doesn-t-have-hands</link><guid isPermaLink="true">https://blog.niradler.com/the-third-interface-your-next-user-doesn-t-have-hands</guid><category><![CDATA[agentic AI]]></category><category><![CDATA[mcp]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Sat, 02 May 2026 20:54:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/5fa47dee3e634314b5179767/16b57ae1-e025-4111-b14b-3ce92add1d41.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For as long as software has existed, we've been building two doors into our systems. Door one: the <strong>UI</strong>, a carefully designed surface where humans point, click, and occasionally rage-quit. Door two: the <strong>API</strong>, a structured contract where machines exchange JSON and get work done without a single pixel rendered.</p>
<p>These two paradigms served us well. But something happened in the last eighteen months that neither was designed for: a third kind of consumer showed up. Not a human navigating your dashboard. Not a script calling your REST endpoint with a hardcoded bearer token. Something in between, an <em>agent</em>, that reasons, improvises, discovers capabilities on the fly, and decides what to do next based on context it gathered three tool calls ago.</p>
<p>And we are, collectively, building for it all wrong.</p>
<h2>The Two Doors (and Why They're Not Enough)</h2>
<p><strong>UI-first</strong> solved friction for humans. You obsess over information hierarchy, button placement, loading states. The assumption: your user has eyes, a cursor, and the ability to look at something confusing and figure it out through visual context clues.</p>
<p><strong>API-first</strong> enabled programmatic access. You define schemas, version endpoints, write docs, publish SDKs. The assumption: the consumer is a developer who reads your documentation, writes integration code once, and maintains it.</p>
<p>Both share a hidden assumption: <strong>someone already knows what your system can do before they use it</strong>. The human read the onboarding flow. The developer read the API docs.</p>
<p>Agents don't do either of those things. An agent lands in your system like a new employee on day one, except this employee can read every document in the company in 200 milliseconds but can't find the bathroom unless someone labeled the door clearly.</p>
<h2>The Third Door Is Already Wide Open</h2>
<p>The agent interface isn't one technology. It's an entire surface area that emerged in parallel across protocols, runtimes, CLIs, and sandboxes, all converging on the same idea: software needs a machine-native entry point that isn't just "the API with a prompt wrapper."</p>
<p><strong>Protocols.</strong> MCP went from an Anthropic side project in late 2024 to an industry standard governed by the Linux Foundation, adopted by OpenAI, Google, Microsoft, and AWS, pulling nearly 100 million monthly SDK downloads. Meanwhile, researchers are already pushing past it. The ANX protocol out of Hangzhou combines CLI, skills, and MCP into a unified agent-native framework, reporting 47-55% token reduction over MCP-based skill approaches alone. The protocol layer is maturing fast.</p>
<p><strong>Skills and tool registries.</strong> Agents don't just call endpoints. They discover <em>capabilities</em>. OpenAI's updated Agents SDK introduced progressive disclosure via skills, custom instructions via <code>AGENTS.md</code> files, and standardized tool primitives. The pattern is the same everywhere: give agents a structured way to understand what your system can do, what parameters it expects, and when to use each capability. Skills are the new API docs, except the reader isn't human.</p>
<p><strong>CLIs.</strong> The command line became the agent's native habitat almost by accident. Claude Code, Codex CLI, Gemini CLI, Aider, Goose: these aren't IDE plugins. They're agent runtimes that happen to let humans watch. The CLI is structured, scriptable, parseable, and composable. It turns out that everything we built for power users over the last 30 years is exactly what agents need. Consistent <code>--help</code> output, JSON response formats, composable subcommands: this is agent UX.</p>
<p><strong>Sandboxes.</strong> When agents write and execute code, they need somewhere safe to do it. This went from a niche concern to a full market in under a year. E2B, Daytona, Blaxel, Cloudflare Dynamic Workers, Together Code Sandbox, OpenAI's sandbox agents: all shipping microVM or isolate-based infrastructure where agents get filesystems, shell access, and mounted data rooms. Cloudflare's approach is particularly telling. Their MCP server exposes the entire Cloudflare API through just two tools, <code>search</code> and <code>execute</code>, because agents write code against a typed API inside a sandbox instead of navigating hundreds of individual tool definitions.</p>
<p>The infrastructure exists. The question is whether your product is legible to the things using it.</p>
<h2>What Agents Actually Need</h2>
<p>Most teams think "building for agents" means exposing their existing API with a good system prompt. That's like thinking "building for mobile" meant making the desktop site smaller.</p>
<p><strong>Discoverability over documentation.</strong> Agents don't browse your developer portal. They receive a list of tools with names, descriptions, and parameter schemas, then decide what to call based on how well those descriptions match the user's intent. Your tool descriptions aren't metadata. They're your entire UX. If your tool is called <code>batchProcessResourceModification</code> with a description that says "Processes resources," you've built the agent equivalent of a button labeled "Click Here" that navigates to a 404.</p>
<p><strong>Composability over completeness.</strong> APIs mirror your internal domain model: Users, Invoices, clean RESTful nouns. Agents don't care about your domain model. They care about <em>tasks</em>. Don't expose your object model. Expose your capability surface. Think verbs, not nouns.</p>
<p><strong>Structured errors over status codes.</strong> When your API returns <code>422</code>, a developer reads the message, checks the docs, fixes the request. An agent needs to understand exactly what went wrong and what to try differently, in a structured format, not prose. A <code>suggestion</code> field in your error response isn't for humans. It's a hint that saves a retry and a reasoning cycle.</p>
<p><strong>Guardrails as a feature.</strong> Agents will try things humans never would, not out of malice, but out of optimization. An agent asked to "clean up staging" might decide the most efficient path is to delete everything and re-provision. Technically correct. Operationally catastrophic. OpenAI's sandbox architecture makes this explicit: the harness (approvals, audit logs, recovery) never trusts the sandbox (file ops, code execution). Your agent interface needs the same separation. Define what the agent can do. Define what it must ask about first. Make the boundary machine-readable.</p>
<p><strong>Context windows are the new rate limits.</strong> Every tool description, every schema, every response payload competes for space in the agent's context window. If your MCP server exposes 200 tools with verbose descriptions, you've burned thousands of tokens before the agent does anything useful. Fewer, smarter tools beat a sprawling catalog every time.</p>
]]></content:encoded></item><item><title><![CDATA[Your Database Already Has an Authorization System. You're Just Not Using It.]]></title><description><![CDATA[Every backend developer has written this line a thousand times:
SELECT * FROM orders WHERE user_id = $1

And this one:
DELETE FROM invoices WHERE id = \(2 AND user_id = \)1

And somewhere, deep in the]]></description><link>https://blog.niradler.com/your-database-already-has-an-authorization-system-you-re-just-not-using-it</link><guid isPermaLink="true">https://blog.niradler.com/your-database-already-has-an-authorization-system-you-re-just-not-using-it</guid><category><![CDATA[postgres]]></category><category><![CDATA[PostgreSQL]]></category><category><![CDATA[authorization]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Sat, 02 May 2026 20:21:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/5fa47dee3e634314b5179767/1dacdd54-88a7-4956-a397-17b27def49c0.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every backend developer has written this line a thousand times:</p>
<pre><code class="language-sql">SELECT * FROM orders WHERE user_id = $1
</code></pre>
<p>And this one:</p>
<pre><code class="language-sql">DELETE FROM invoices WHERE id = \(2 AND user_id = \)1
</code></pre>
<p>And somewhere, deep in the codebase, someone forgot the <code>AND user_id = $1</code> part. Maybe it was a new hire. Maybe it was you at 11pm before a deadline. The query works fine in testing because your test user happens to own the right data. It ships. And three months later you find out that any logged-in user could read every invoice in the system.</p>
<p>This is not a hypothetical. This is the #1 cause of data leaks in multi-user applications. You're relying on every single query, in every single endpoint, written by every single developer, to always remember to filter by the right user. That's not a security model. That's a prayer.</p>
<p>PostgreSQL has a built-in feature that makes this entire class of bug impossible. It's called <strong>Row-Level Security (RLS)</strong>, and after reading this post, you'll wonder why you ever did it any other way.</p>
<hr />
<h2>Now Think About Who's Actually Writing Your Queries</h2>
<p>AI is generating a growing share of our backend code. Copilot autocompletes queries. Claude writes data access layers. Cursor refactors endpoints. The code works, tests pass, it ships.</p>
<p>But here's the reality: AI-generated queries are often complex. Joins across five tables, nested CTEs, multi-branch subqueries. The kind of SQL where a missing WHERE clause doesn't jump out at you during review. And the volume keeps growing. Reviewing every generated query for authorization correctness is a losing game.</p>
<p>And it's not just code generation. More and more teams are connecting AI agents directly to their databases, letting them build and run queries at runtime based on user input. That trend is only going to accelerate. These queries don't exist in your codebase. You can't review them in a PR. They're constructed on the fly.</p>
<p>Authorization is the one domain where "almost always correct" isn't good enough. One missed filter in one query is a data leak.</p>
<p>RLS takes this off your plate entirely. The database enforces access rules regardless of what the query looks like or where it came from. Define the rules once, and every query gets scoped automatically. Whether it was written by a developer, generated by Copilot, or constructed by an agent at runtime. It's one of those rare things you can genuinely delegate and stop thinking about.</p>
<pre><code class="language-sql">-- At the start of every agent turn
SELECT set_config('app.current_user_id', '&lt;uuid-from-jwt&gt;', true);
SET ROLE authenticated;

-- Now let the agent run whatever it generates
-- RLS makes it impossible to access unauthorized data
</code></pre>
<p>The agent operates inside a sandbox it can't escape, not because you told it nicely, but because the database won't let it. That's a fundamentally different security posture than hoping your prompt engineering holds up.</p>
<hr />
<h2>How It Actually Works</h2>
<p>RLS has three building blocks: <strong>Roles</strong>, <strong>Grants</strong>, and <strong>Policies</strong>. They work as layers, each one narrowing what's possible.</p>
<h3>Roles: Who Are You?</h3>
<p>PostgreSQL doesn't separate "users" and "groups." It has one concept: the <strong>role</strong>. A role can log in (acts like a user), or it can be a container that other roles inherit from (acts like a group).</p>
<p>In practice, every RLS setup lands on three roles:</p>
<p><code>anon</code> is the unauthenticated visitor. Almost no access. Exists so that requests without a valid token fail loudly instead of silently returning empty results.</p>
<p><code>authenticated</code> is any logged-in user (or any agent acting on behalf of a logged-in user). Can read and write tables, but every operation passes through RLS policies first.</p>
<p><code>service_role</code> bypasses RLS entirely. Reserved for migrations, background jobs, and admin scripts. Never exposed to end users. Never given to an agent.</p>
<pre><code class="language-sql">CREATE ROLE anon NOLOGIN;
CREATE ROLE authenticated NOLOGIN;
CREATE ROLE service_role NOLOGIN BYPASSRLS;

CREATE ROLE app_api LOGIN PASSWORD '...' NOINHERIT;
GRANT anon, authenticated, service_role TO app_api;
</code></pre>
<p>That <code>NOINHERIT</code> is critical. Without it, <code>app_api</code> automatically inherits <code>service_role</code>'s bypass power the moment it connects. With <code>NOINHERIT</code>, your app must explicitly <code>SET ROLE authenticated</code> per request. This is the gate that makes the whole model work.</p>
<h3>Grants: What Can You Do?</h3>
<p>Grants control which operations a role can perform on which objects. Can this role SELECT from this table at all? Can it INSERT?</p>
<pre><code class="language-sql">GRANT USAGE ON SCHEMA app TO authenticated;
GRANT SELECT, INSERT, UPDATE, DELETE ON app.documents TO authenticated;
-- No grants to anon = anon can't touch the table, period
</code></pre>
<p>If a role doesn't have SELECT on a table, RLS never even comes into play. Grants are the outer wall. RLS is the room-by-room access control inside the building.</p>
<h3>Policies: Which Rows Can You See?</h3>
<p>This is the main event. A policy is a rule attached to a table that Postgres injects into every query as an invisible WHERE clause.</p>
<pre><code class="language-sql">ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
ALTER TABLE documents FORCE ROW LEVEL SECURITY;

CREATE POLICY users_own_documents ON documents
  FOR SELECT TO authenticated
  USING (owner_id = app.current_user_id());
</code></pre>
<p>Now when any <code>authenticated</code> role runs <code>SELECT * FROM documents</code>, Postgres silently rewrites it to <code>SELECT * FROM documents WHERE owner_id = '&lt;current-user&gt;'</code>. Every time. Whether the query came from your application, a raw SQL console, or an AI agent that decided to get creative.</p>
<p>The <code>FORCE</code> part matters. Without it, the table owner bypasses RLS by default. Add it, or your migration user can see everything.</p>
<hr />
<h2>USING vs WITH CHECK: The Two Guards</h2>
<p>Every policy has up to two expressions that do different jobs.</p>
<p><code>USING</code> filters which existing rows you can see or target. It's your read filter. Applied to SELECT, and to the "before" state of UPDATE and DELETE.</p>
<p><code>WITH CHECK</code> validates new or modified rows after a write. It catches sneaky mutations.</p>
<p>Why do you need both? Without WITH CHECK, someone (or some agent) could do:</p>
<pre><code class="language-sql">INSERT INTO documents (owner_id, title)
VALUES ('someone-elses-id', 'Gotcha');
</code></pre>
<p>USING wouldn't catch it because there's no existing row to filter. WITH CHECK rejects the write because the new row doesn't belong to the current user. For UPDATE, both work together: USING controls which rows you can touch, WITH CHECK ensures you can't mutate a row into something you shouldn't own.</p>
<hr />
<h2>Permissive vs Restrictive: Building Walls and Doors</h2>
<p>Postgres has two flavors of policy, and they combine with different logic.</p>
<p><strong>Permissive</strong> (the default) policies are OR together. If any one of them says yes, the row is visible. These are doors. Each one is a way in.</p>
<p><strong>Restrictive</strong> policies are AND on top of the permissive result. They can only narrow access, never widen it. These are walls. They hold no matter how many doors you add.</p>
<p>This is where multi-tenancy gets elegant:</p>
<pre><code class="language-sql">-- WALL: tenant isolation. Always enforced.
CREATE POLICY tenant_wall ON documents AS RESTRICTIVE
  FOR ALL TO authenticated
  USING (tenant_id = app.current_tenant_id())
  WITH CHECK (tenant_id = app.current_tenant_id());

-- DOOR: see your own docs within your tenant
CREATE POLICY own_docs ON documents
  FOR SELECT USING (owner_id = app.current_user_id());

-- DOOR: see public docs within your tenant
CREATE POLICY public_docs ON documents
  FOR SELECT USING (is_public = true);
</code></pre>
<p>The effective filter: <code>(own_docs OR public_docs) AND tenant_wall</code></p>
<p>You can add ten more permissive doors and the tenant wall still holds. A developer (or an agent) can never accidentally create a path that leaks data across tenants. The restrictive layer is a structural guarantee, not a convention.</p>
<p>One thing to watch: if you only have restrictive policies and zero permissive ones, nothing is visible. Restrictive narrows the permissive set. An empty set AND with anything is still empty. You always need at least one door.</p>
<hr />
<h2>The Plumbing</h2>
<p>Policies reference <code>app.current_user_id()</code>, but how does Postgres know who the current user is? Through session variables that your application sets at the start of each request.</p>
<pre><code class="language-sql">CREATE FUNCTION app.current_user_id() RETURNS uuid
LANGUAGE sql STABLE AS $$
  SELECT nullif(current_setting('app.current_user_id', true), '')::uuid;
$$;
</code></pre>
<p>Your backend (or your agent runtime) sets context before running any queries:</p>
<pre><code class="language-sql">BEGIN;
  SELECT set_config('app.current_user_id', '&lt;uuid-from-jwt&gt;', true);
  SET ROLE authenticated;

  -- queries go here, RLS is active

COMMIT;  -- settings revert, safe for connection pooling
</code></pre>
<p>The <code>true</code> in <code>set_config</code> means "transaction-local." When the transaction ends, the setting vanishes. No state leaks between requests, even with connection pooling. This is what makes it safe to share a single Postgres connection pool across thousands of users (or thousands of agent sessions).</p>
<hr />
<h2>Teams and Shared Access</h2>
<p>Real apps aren't just "users own rows." You need teams, organizations, shared workspaces. The pattern is clean:</p>
<pre><code class="language-sql">CREATE FUNCTION app.my_team_ids() RETURNS SETOF uuid
LANGUAGE sql STABLE SECURITY DEFINER SET search_path = public AS $$
  SELECT team_id FROM team_members
  WHERE user_id = app.current_user_id();
$$;

CREATE POLICY team_access ON projects
  FOR SELECT TO authenticated
  USING (team_id IN (SELECT app.my_team_ids()));
</code></pre>
<p><code>SECURITY DEFINER</code> means the function runs as the function owner, bypassing RLS on the <code>team_members</code> table itself. This avoids a circular dependency. Always set <code>search_path</code> explicitly on SECURITY DEFINER functions to block injection.</p>
<p>Want role-based permissions? Layer it. Viewers can read. Members can write. Admins can delete. Same table, different policies, each checking the user's team role.</p>
<hr />
<h2>Why RLS Matters More Now Than Ever</h2>
<p>We're entering an era where the code accessing your database isn't fully written by humans anymore. Agents generate queries. Copilots autocomplete SQL. Low-code platforms abstract away the data layer. The surface area for "someone forgot the WHERE clause" is exploding.</p>
<p>The traditional approach of scattering authorization checks across application code was already fragile. In an agentic world, it's untenable. You can't review dynamically generated queries. You can't unit test every possible natural language input mapped to SQL. You can't guarantee that an LLM will always respect a system prompt that says "only access the current user's data."</p>
<p>But you can guarantee that the database won't return unauthorized rows. That's not a prompt. That's not a convention. That's a constraint enforced by the database engine on every query, from every source, every time.</p>
<p>RLS turns your database from a dumb store that trusts its callers into an active participant in your security model. It doesn't replace application logic. It catches everything your application logic misses. And when an AI agent is the one writing the application logic at runtime, that safety net isn't optional anymore.</p>
]]></content:encoded></item><item><title><![CDATA[A Brief History of Kubernetes Fleet Controllers & Essential Features]]></title><description><![CDATA[Building Scalable Multi-Cluster Systems with Reusable Components
Managing a handful of Kubernetes clusters is difficult but manageable. Managing hundreds or thousands of clusters is a fundamentally different problem. At that scale, Kubernetes stops b...]]></description><link>https://blog.niradler.com/a-brief-history-of-kubernetes-fleet-controllers-and-essential-features</link><guid isPermaLink="true">https://blog.niradler.com/a-brief-history-of-kubernetes-fleet-controllers-and-essential-features</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[containers]]></category><category><![CDATA[fleet management]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Sat, 07 Feb 2026 14:47:24 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/oHgwYC79ARo/upload/d99b04df294cc0dfeb2898ef848d98ce.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-building-scalable-multi-cluster-systems-with-reusable-components">Building Scalable Multi-Cluster Systems with Reusable Components</h2>
<p>Managing a handful of Kubernetes clusters is difficult but manageable. Managing hundreds or thousands of clusters is a <strong>fundamentally different problem</strong>. At that scale, Kubernetes stops being "infrastructure you run" and becomes a <strong>system you must design</strong>.</p>
<p>This article is a companion to my <strong>ContainerDays</strong> talk. It walks through why Kubernetes fleets are inevitable, where most teams go wrong, how fleet controllers emerged, and how to evaluate and choose the right tools using a practical framework. We'll do a fast-paced review of five open-source tools - <strong>Clusternet</strong>, <strong>Karmada</strong>, <strong>Crossplane</strong>, <strong>Cluster API</strong>, and <strong>Rancher</strong> - and show how each addresses different multi-cluster management challenges.</p>
<hr />
<h2 id="heading-from-clusters-to-fleets-how-we-got-here">From Clusters to Fleets: How We Got Here</h2>
<p>Nobody sets out to build a Kubernetes fleet. Teams usually start with one cluster, one environment, and a small number of services. Then reality arrives: more regions, more environments, more teams, sometimes per-tenant clusters. Infrastructure costs drop. Automation improves. Kubernetes makes scale accessible.</p>
<p>Nothing breaks. And yet everything changes.</p>
<p>At some point, adding "just one more cluster" doesn't feel free anymore. Coordination overhead grows. Consistency erodes. Visibility fragments. Operational toil creeps in. This isn't a Kubernetes problem - it's a systems problem.</p>
<hr />
<h2 id="heading-why-fleet-problems-are-predictable">Why Fleet Problems Are Predictable</h2>
<p>The key insight is simple: <strong>infrastructure scales linearly, but complexity does not</strong>. Every new cluster adds policies, permissions, upgrades, observability pipelines, and human coordination. This is why fleets appear suddenly and feel overwhelming. Teams didn't fail - they <strong>reached the next stage of maturity</strong>.</p>
<hr />
<h2 id="heading-why-this-feels-familiar-to-developers">Why This Feels Familiar to Developers</h2>
<p>If you've ever worked on a large codebase, this story should sound familiar. <strong>Kubernetes fleets fail for the same reasons software systems fail</strong>.</p>
<p>Think of it this way:</p>
<ul>
<li><strong>Clusters</strong> behave like services</li>
<li><strong>YAML</strong> becomes an untyped API</li>
<li><strong>Helm charts</strong> act like shared libraries</li>
<li><strong>Configuration drift</strong> looks exactly like forked code</li>
<li><strong>Snowflake clusters</strong> are just technical debt by another name</li>
</ul>
<p>The core failure mode is always the same: <strong>too much reuse, too late, without abstraction</strong>. Teams copy YAML. They templatize configurations. They standardize Helm charts. And then every cluster needs "just one exception." Standardization slows divergence - it doesn't prevent it.</p>
<hr />
<h2 id="heading-the-shift-from-devops-to-platform-engineering">The Shift: From DevOps to Platform Engineering</h2>
<p>At fleet scale, <strong>infrastructure is no longer a task - it's a product</strong>. This is where many organizations shift from <strong>DevOps as operators</strong> to <strong>platform engineering as enablers</strong>. The goal changes from "manage clusters" to "enable teams through self-service, automation, and clear abstractions."</p>
<p>When you're running clusters at a 100:1 ratio, you can't afford to have engineers manually configuring each one. Standardization, observability, security, and access control become pressing issues that demand automation. <strong>Fleet controllers emerge as a direct response to this shift</strong>.</p>
<hr />
<h2 id="heading-understanding-controllers-in-kubernetes">Understanding Controllers in Kubernetes</h2>
<p>Before diving into fleet controllers, let's understand what a <strong>controller</strong> is in Kubernetes. A controller is a <strong>control loop</strong> that watches the state of your cluster through the API server and makes changes to move the current state toward the desired state.</p>
<p>Controllers work with <strong>Custom Resource Definitions (CRDs)</strong> to extend Kubernetes capabilities. For example, imagine you want to manage a fleet of speakers at a conference. You could define a CRD for <code>Speaker</code> resources:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">conference.example.com/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Speaker</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">john-doe</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">topic:</span> <span class="hljs-string">"Kubernetes Fleet Management"</span>
  <span class="hljs-attr">duration:</span> <span class="hljs-number">45</span>
  <span class="hljs-attr">room:</span> <span class="hljs-string">"main-hall"</span>
  <span class="hljs-attr">requiredEquipment:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">microphone</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">projector</span>
<span class="hljs-attr">status:</span>
  <span class="hljs-attr">assigned:</span> <span class="hljs-literal">false</span>
  <span class="hljs-attr">equipmentReady:</span> <span class="hljs-literal">false</span>
</code></pre>
<p>A speaker controller would continuously watch for <code>Speaker</code> resources and ensure the actual state matches the desired state. When a new <code>Speaker</code> is created, the controller might:</p>
<ol>
<li><p>Check room availability and assign the speaker</p>
</li>
<li><p>Verify required equipment is available</p>
</li>
<li><p>Send calendar invites to attendees</p>
</li>
<li><p>Update the status to reflect current state</p>
</li>
<li><p>Handle conflicts or resource constraints</p>
</li>
</ol>
<p>If the speaker's room changes, the controller detects the difference between desired state (<code>spec</code>) and current state (<code>status</code>), then takes action to reconcile them. This same pattern applies at fleet scale - <strong>fleet controllers watch cluster resources and continuously reconcile state across hundreds or thousands of clusters</strong>.</p>
<hr />
<h2 id="heading-what-is-a-fleet-controller-really">What Is a Fleet Controller (Really)?</h2>
<p>A fleet controller follows the exact same control-loop model as a standard Kubernetes controller - it just <strong>operates across clusters</strong> instead of inside a single cluster.</p>
<p>If a regular controller watches resources within one cluster and reconciles their state, a <strong>fleet controller watches many clusters and reconciles their collective state</strong> against a desired configuration.</p>
<p>Think of it as moving up one level:</p>
<p><strong>Controller:</strong>
“Given this desired state, make this cluster match it.”</p>
<p><strong>Fleet controller:</strong>
“Given this desired state, make all these clusters match it.”</p>
<p>Instead of reconciling Pods, Services, or custom resources, a fleet controller reconciles things like:</p>
<ul>
<li><p>Which clusters exist and are healthy</p>
</li>
<li><p>Which workloads should run on which clusters</p>
</li>
<li><p>Which policies apply globally vs regionally</p>
</li>
<li><p>How upgrades, failures, and drift are handled across the fleet</p>
</li>
</ul>
<p>From an implementation perspective, a fleet controller is <strong>not magic</strong> and <strong>not a single product</strong>. It is a higher-level control plane built from familiar Kubernetes primitives: APIs, CRDs, controllers, and reconciliation loops - just applied at fleet scale.</p>
<p>Common responsibilities include:</p>
<ul>
<li><p>Cluster registration and inventory</p>
</li>
<li><p>Declarative propagation of workloads and policies</p>
</li>
<li><p>Lifecycle automation (create, upgrade, decommission clusters)</p>
</li>
<li><p>Observability, governance, and drift detection</p>
</li>
<li><p>High availability and failover across clusters</p>
</li>
</ul>
<p>These are not “advanced” features.
<strong>Once you operate more than a handful of clusters, they become survival features</strong>.</p>
<hr />
<h2 id="heading-the-big-mistake-treating-tools-as-competitors">The Big Mistake: Treating Tools as Competitors</h2>
<p>One of the most common mistakes teams make is asking <strong>"which fleet management tool should we choose?"</strong> This is the <strong>wrong question</strong>. Fleet management is layered, and different tools solve different layers of the problem.</p>
<p>However, this doesn't mean you should adopt every tool that exists. There's a balance between <strong>composition</strong> and <strong>consolidation</strong>. Adding a new tool to your stack should be justified by real pain points and clear value. More importantly, your tools must <strong>work in synergy with each other</strong> - not in conflict.</p>
<p>Before adopting a new tool, ask: <strong>does this solve a problem we actually have?</strong> Does it integrate well with our existing stack? Are we introducing unnecessary complexity? The goal is a composed platform where each tool has a clear purpose and they work together cohesively, not a fragmented collection of competing solutions.</p>
<hr />
<h2 id="heading-the-three-layers-of-fleet-management">The Three Layers of Fleet Management</h2>
<h3 id="heading-layer-1-infrastructure-cluster-lifecycle">Layer 1: Infrastructure - Cluster Lifecycle</h3>
<p>This layer answers <strong>how clusters are created, upgraded, and how consistency is enforced</strong>.</p>
<p><strong>Cluster API</strong> gives you declarative, repeatable cluster lifecycle management. Instead of scripts and manual processes, clusters become versioned resources. Features like <code>ClusterClass</code> let organizations define reusable cluster templates - the same way you'd define a base class in software. Other teams rely on managed Kubernetes offerings combined with automation and policy tooling. The key shift is that <strong>clusters stop being snowflakes and start being managed resources</strong> with lifecycle, ownership, and consistency built in.</p>
<h3 id="heading-layer-2-platform-abstraction-and-reuse">Layer 2: Platform - Abstraction and Reuse</h3>
<p>This is where most teams struggle - and where the <strong>biggest wins exist</strong>. This layer answers <strong>how teams consume infrastructure, how complexity is hidden, and how intent is expressed</strong>.</p>
<p><strong>Crossplane</strong> lets you expose infrastructure and services as Kubernetes APIs, using compositions to define reusable intent. <strong>kro</strong> helps teams define Kubernetes-native abstractions for applications and environments. Many organizations also build lighter-weight internal platforms using CRDs, controllers, and policy engines like <strong>Kyverno</strong> or <strong>OPA</strong>. What matters is that <strong>teams consume intent - not raw YAML</strong>.</p>
<h3 id="heading-layer-3-application-delivery-and-distribution">Layer 3: Application - Delivery and Distribution</h3>
<p>This layer answers <strong>how workloads are packaged, deployed across clusters, and how changes are rolled out safely</strong>. Tools like <strong>Helm</strong>, <strong>Kustomize</strong>, <strong>Argo CD</strong>, and <strong>Flux</strong> are commonly used here. GitOps works extremely well at this layer, but it's important to be honest: <strong>GitOps delivers workloads - it does not define infrastructure abstractions or solve fleet architecture</strong>.</p>
<hr />
<h2 id="heading-five-open-source-fleet-tools-a-practical-review">Five Open-Source Fleet Tools: A Practical Review</h2>
<p>Let's look at five tools that address multi-cluster management from different angles. Each has unique strengths in provisioning, management, and application support.</p>
<h3 id="heading-clusternet">Clusternet</h3>
<p><strong>Clusternet</strong> is a lightweight, Kubernetes-native multi-cluster management platform. It focuses on managing clusters as a fleet by providing a hub-agent architecture where child clusters register with a parent hub. Its strength is <strong>workload distribution</strong> - you can define scheduling policies to deploy applications across clusters based on labels, regions, or custom rules. Clusternet treats multi-cluster workload orchestration as a first-class concern and is a good fit for teams that need to distribute applications across many clusters without heavy infrastructure investment.</p>
<h3 id="heading-karmada">Karmada</h3>
<p><strong>Karmada</strong> (Kubernetes Armada) is built specifically for multi-cloud and multi-cluster orchestration. It extends Kubernetes APIs to work across clusters, so you can use familiar resources like Deployments and Services while Karmada handles the propagation and scheduling across your fleet. Its <code>PropagationPolicy</code> and <code>OverridePolicy</code> resources give fine-grained control over where and how workloads land. Karmada shines when you need <strong>cross-cluster failover, replica scheduling, and policy-based distribution at scale</strong>. It's one of the most feature-complete open-source options for fleet-wide workload management.</p>
<h3 id="heading-crossplane">Crossplane</h3>
<p><strong>Crossplane</strong> takes a fundamentally different approach. Rather than managing clusters directly, it <strong>turns infrastructure into Kubernetes APIs</strong> through Compositions and Claims. Teams define what they need using custom resources, and Crossplane provisions the underlying infrastructure - cloud resources, databases, clusters, anything with a provider. At the fleet level, Crossplane is invaluable as a platform layer tool: it enables <strong>self-service infrastructure consumption</strong> and enforces organizational standards through compositions. It doesn't orchestrate workloads across clusters, but it's the <strong>best open-source option for building platform abstractions</strong>.</p>
<h3 id="heading-cluster-api">Cluster API</h3>
<p><strong>Cluster API</strong> (CAPI) focuses squarely on the <strong>lifecycle of Kubernetes clusters themselves</strong>. It lets you declaratively create, configure, upgrade, and destroy clusters using the Kubernetes API. With <code>ClusterClass</code>, you can define reusable cluster templates - think of it as <strong>inheritance for your infrastructure</strong>. This means new clusters are consistent by default, upgrades are version-controlled, and provisioning is repeatable across clouds. Cluster API is the <strong>go-to tool at the infrastructure layer</strong> when you need to manage cluster lifecycle at scale.</p>
<h3 id="heading-rancher">Rancher</h3>
<p><strong>Rancher</strong> by SUSE is the <strong>most complete platform in this list</strong> - it provides a full management plane for Kubernetes clusters across any infrastructure. It handles cluster provisioning, centralized authentication, monitoring, policy enforcement, and application catalog management through a single UI and API. Rancher is often the <strong>first fleet management tool organizations adopt</strong> because it offers immediate visibility and control. Its strength is <strong>breadth</strong>: it covers lifecycle, observability, security, and app delivery in one package, making it a pragmatic starting point for teams that need results quickly.</p>
<h3 id="heading-how-these-tools-map-to-the-three-layers">How These Tools Map to the Three Layers</h3>
<p>No single tool covers all three layers perfectly. Here's where each tool provides the most value:</p>
<p><strong>Infrastructure Layer (Lifecycle):</strong> <strong>Cluster API</strong> and <strong>Rancher</strong> are strongest here. CAPI for declarative lifecycle-as-code, Rancher for centralized management with a UI.</p>
<p><strong>Platform Layer (Abstraction):</strong> <strong>Crossplane</strong> dominates this space. It's purpose-built for turning infrastructure into consumable APIs.</p>
<p><strong>Application Layer (Distribution):</strong> <strong>Karmada</strong> and <strong>Clusternet</strong> focus on workload propagation and scheduling across clusters. Rancher also provides app catalog and deployment capabilities.</p>
<p>The practical takeaway: <strong>compose these tools based on your organizational needs rather than picking one and forcing it to do everything</strong>.</p>
<hr />
<h2 id="heading-compose-dont-consolidate">Compose, Don't Consolidate</h2>
<p>There is no single tool that solves fleet management end-to-end. <strong>Successful platforms are composed, layered, and intentional</strong>. Cluster lifecycle tools don't replace GitOps. Platform abstractions don't replace delivery pipelines. Some teams lean more heavily on managed services. Some invest deeply in platform tooling. Some do both. What successful teams have in common is that <strong>they align tools to layers instead of forcing one tool to do everything</strong>.</p>
<hr />
<h2 id="heading-reusable-components-are-the-unit-of-scale">Reusable Components Are the Unit of Scale</h2>
<p>This is the most important idea in fleet management: <strong>clusters are not the unit of scale - reusable components are</strong>. Cluster templates, platform APIs, application abstractions, and policy bundles are what allow organizations to grow without growing operational load. <strong>Clusters are execution environments. Components encode intent</strong>.</p>
<hr />
<h2 id="heading-the-interface-for-successful-fleet-management">The Interface for Successful Fleet Management</h2>
<p>There's a concept from the development world that applies directly to fleet management but is often overlooked: the <strong>developer portal</strong>. While fleet controllers manage the technical orchestration, teams still need a unified interface to discover, understand, and interact with their distributed infrastructure.</p>
<p>A developer portal should orchestrate all the operational context: <strong>documentation, service inventory, metrics dashboards, deployment URLs, regional endpoints, team ownership, dependencies, and SLOs</strong>. It's the <strong>human interface to your fleet</strong> - a single place where engineers can answer questions like "where is this service running?", "who owns this cluster?", "what's the health of services in us-west?", or "how do I deploy to production?"</p>
<p>These concepts are not new. Tools like <strong>Backstage</strong>, <strong>Port</strong>, and others have made developer portals mainstream in software engineering. However, they're not always treated as native components in Kubernetes fleet management - and they should be. A well-designed platform includes not just the <strong>control plane</strong> (fleet controllers, GitOps, policy engines) but also the <strong>developer plane</strong> (portals, service catalogs, observability interfaces).</p>
<p>Without this layer, you end up with infrastructure that works but nobody knows how to use it. Engineers waste time searching for information, duplicating work, or making changes blindly. A developer portal <strong>bridges the gap between powerful infrastructure and productive teams</strong> by making the fleet discoverable, understandable, and accessible.</p>
<hr />
<h2 id="heading-a-framework-for-evaluating-fleet-tools">A Framework for Evaluating Fleet Tools</h2>
<p>When choosing tools for your fleet, evaluate against these dimensions:</p>
<p><strong>Provisioning:</strong> Does the tool help you create and destroy clusters declaratively? Can you template and version cluster configurations?</p>
<p><strong>Management:</strong> Does it provide centralized inventory, health monitoring, and policy enforcement? Can you manage upgrades across the fleet?</p>
<p><strong>Application Support:</strong> Does it handle workload distribution, scheduling policies, and multi-cluster deployment? Does it integrate with your existing CI/CD and GitOps workflows?</p>
<p><strong>Abstraction Quality:</strong> Does it let you hide complexity without hiding capability? Can teams consume infrastructure through clean APIs rather than raw YAML?</p>
<p><strong>Composability:</strong> Does it play well with other tools in your stack, or does it demand to be the single pane of glass?</p>
<p>Use these questions as a replicable framework to evaluate tools based on your specific organizational needs rather than feature comparisons.</p>
<hr />
<h2 id="heading-principles-for-success">Principles for Success</h2>
<p>Across organizations, the same principles apply:</p>
<ul>
<li><strong>Start with Standards.</strong> Define clear conventions for clusters, networking, security, and observability before adopting tools.</li>
<li><strong>Layer Properly.</strong> Separate lifecycle, abstraction, and delivery concerns. Don't conflate them.</li>
<li><strong>Abstract Progressively.</strong> Hide complexity, not capability. Bad abstractions are worse than none.</li>
<li><strong>Compose Solutions.</strong> Align tools to problems instead of chasing a silver bullet.</li>
<li><strong>Measure Impact.</strong> Track reduced toil, faster onboarding, and fewer incidents - not just tool adoption.</li>
</ul>
<hr />
<h2 id="heading-where-to-start-practically">Where to Start (Practically)</h2>
<p>There is no universal starting point. Ask instead: <strong>where is the pain? Where is the toil? What breaks most often?</strong></p>
<p>If provisioning and upgrades are painful, focus on <strong>lifecycle automation</strong> with Cluster API or Rancher. If teams struggle with consistency, invest in <strong>platform abstractions</strong> with Crossplane. If workload distribution is complex, evaluate <strong>Karmada</strong> or <strong>Clusternet</strong>. If delivery is slow or risky, improve your <strong>GitOps workflows</strong>.</p>
<p>The reality is that most large companies end up building <strong>custom solutions</strong> - custom CRDs and controllers tailored to their specific workflows and constraints. This is reasonable and often necessary at scale when off-the-shelf tools can't fully capture your domain logic.</p>
<p>For smaller organizations or those just starting their fleet journey, my recommendation is different: <strong>start by composing a solution from existing tools</strong>. Extend and adapt them to fit your needs. Only move to fully custom solutions when it becomes a must - when the complexity or constraints of existing tools outweigh the cost of maintaining your own controllers.</p>
<p>Custom solutions give you ultimate flexibility, but they also come with <strong>maintenance burden, technical debt, and the need for deep Kubernetes expertise</strong>. Compose first, customize progressively, and build fully custom only when justified by clear requirements that cannot be met otherwise.</p>
<p>Pick one problem, solve it well, and build reusable components around it. <strong>Start small. Document it well. Expand intentionally</strong>.</p>
<hr />
<h2 id="heading-the-next-step-autonomous-operations">The Next Step: Autonomous Operations</h2>
<p>Today, my daily work focuses on building <strong>AI-driven SRE systems</strong>. The goal is simple: <strong>engineers become the bosses, not the on-call responders</strong>.</p>
<p>We're building <strong>autonomous AI SRE agents</strong> that monitor systems in real time, understand incidents as they happen, and actively heal issues before you even wake up. Imagine starting your morning with an incident report waiting for you - including a <strong>pull request that fixes the root cause</strong>, clear explanations of what happened, and concrete recommendations to prevent similar issues in the future.</p>
<p>That future only works if infrastructure is well-designed. If fleets are observable. If abstractions are clear. And if systems are built to be reasoned about - <strong>by humans and machines</strong>.</p>
<p>Fleet management, platform engineering, and reusable components are not just about scale - they are the <strong>foundation for autonomous operations</strong>.</p>
<hr />
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p><strong>Fleets are inevitable. Chaos is not.</strong></p>
<p>By applying software architecture principles to infrastructure - <strong>abstraction, reuse, composition</strong> - and by choosing the right tools for the right layers, teams can scale Kubernetes without losing control. The five tools we covered each solve a genuine piece of the puzzle, and understanding where they fit is more valuable than any feature comparison.</p>
<p>If any of this resonates with you, feel free to reach out. I'm always happy to talk.</p>
]]></content:encoded></item><item><title><![CDATA[AI on Microcontrollers: A Deep Dive into TinyML, ESP-Skainet, and the Embedded Intelligence Revolution]]></title><description><![CDATA[The convergence of artificial intelligence and microcontrollers represents one of the most exciting developments in embedded systems. TinyML enables devices to make smart decisions without needing to send data to the cloud, which is beneficial from b...]]></description><link>https://blog.niradler.com/ai-on-microcontrollers-a-deep-dive-into-tinyml-esp-skainet-and-the-embedded-intelligence-revolution</link><guid isPermaLink="true">https://blog.niradler.com/ai-on-microcontrollers-a-deep-dive-into-tinyml-esp-skainet-and-the-embedded-intelligence-revolution</guid><category><![CDATA[ESP32]]></category><category><![CDATA[ESP32-S3]]></category><category><![CDATA[AI]]></category><category><![CDATA[ML]]></category><category><![CDATA[#VoiceAI]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Sat, 07 Feb 2026 13:17:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/MceA9kSze0U/upload/d03867f762f0e6caafe4038027b8c6b2.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The convergence of artificial intelligence and microcontrollers represents one of the most exciting developments in embedded systems. TinyML enables devices to make smart decisions without needing to send data to the cloud, which is beneficial from both efficiency and privacy perspectives. This article explores the landscape of AI on microcontrollers, examining the tools, frameworks, and practical considerations for bringing intelligence to the edge.</p>
<h2 id="heading-what-is-tinyml"><strong>What is TinyML?</strong></h2>
<p>TinyML refers to the application of machine learning techniques on extremely resource-constrained devices, such as microcontrollers and other small embedded systems with limited memory, processing power, and energy resources. Unlike traditional machine learning that runs on powerful servers or cloud infrastructure, TinyML brings inference capabilities directly to devices that might have as little as 256KB of RAM and a few hundred MHz of processing power.</p>
<p>The appeal is compelling: TinyML helps tiny devices make decisions based on huge amounts of data without wasting time and energy transmitting it elsewhere, and with inference on-device, user privacy is protected since no audio or data ever needs to be sent to the cloud.</p>
<h2 id="heading-esp-skainet-voice-intelligence-for-esp32"><strong>ESP-Skainet: Voice Intelligence for ESP32</strong></h2>
<h3 id="heading-what-is-esp-skainet"><strong>What is ESP-Skainet?</strong></h3>
<p>ESP-Skainet enables convenient development of wake word detection and speech command recognition applications based on Espressif's ESP32 series chips, allowing developers to easily build wake word and command recognition solutions. It's Espressif's answer to bringing voice assistant capabilities to low-power microcontrollers.</p>
<h3 id="heading-core-components"><strong>Core Components</strong></h3>
<p>ESP-Skainet consists of several key engines:</p>
<p><strong>WakeNet (Wake Word Detection):</strong> WakeNet is designed for low-power embedded MCUs with a low memory usage of approximately 20KB and achieves a 97% wake-up performance within a one-meter distance in a quiet environment, and 95% within a three-meter distance. Espressif provides wake words such as "Hi, Lexin" and "Hi, ESP" for free, and also supports custom wake words.</p>
<p><strong>MultiNet (Speech Command Recognition):</strong> MultiNet is a lightweight model that allows ESP32 to perform offline speech-recognition of multiple commands, using Convolutional Recurrent Neural Networks (CRNN) and Connectionist Temporal Classification (CTC), processing an audio clip's Mel-Frequency Cepstral Coefficients (MFCC) as input and outputting phonemes. Currently, MultiNet supports up to 200 Chinese or English speech commands such as "Turn on the air conditioner" and "Turn on the bedroom light", and users can easily add their own commands without retraining the model.</p>
<p><strong>Audio Front-End (AFE):</strong> The Audio Front-End integrates AEC (Acoustic Echo Cancellation), VAD (Voice Activity Detection), BSS (Blind Source Separation), and NS (Noise Suppression), with Espressif's two-mic AFE qualified as a "Software Audio Front-End Solution" for Amazon Alexa Built-in devices.</p>
<h3 id="heading-what-you-can-do-with-esp-skainet"><strong>What You Can Do with ESP-Skainet</strong></h3>
<p>ESP-Skainet is ideal for AIoT and smart home applications, enabling local voice control of devices, with typical applications including smart-home devices like voice-controlled switches, outlets, lamps, thermostats, and security systems, smart-office equipment like voice-controlled displays and phones, and interactive products such as educational toys or assistants for the elderly.</p>
<h3 id="heading-hardware-requirements"><strong>Hardware Requirements</strong></h3>
<p>To run ESP-Skainet, you need an ESP32 or ESP32-S3 development board with an integrated audio input module. Popular boards include:</p>
<ul>
<li><p>ESP32-Korvo</p>
</li>
<li><p>ESP32-S3-Korvo-1 and Korvo-2</p>
</li>
<li><p>ESP-BOX</p>
</li>
<li><p>ESP32-S3-EYE</p>
</li>
</ul>
<p>The ESP32-Korvo includes an ESP32-WROVER-B module with 16MB SPI flash (comfortably above the 4MB minimum for MultiNet support) and 64Mb pseudo-static RAM (PSRAM).</p>
<h3 id="heading-limitations"><strong>Limitations</strong></h3>
<ul>
<li><p>Language support is primarily Chinese and English</p>
</li>
<li><p>Requires specific hardware with adequate RAM (ESP32 or ESP32-S3)</p>
</li>
<li><p>Requires ESP-IDF v4.4 or ESP-IDF v5.0</p>
</li>
<li><p>The wake word detection must be active before command recognition begins</p>
</li>
</ul>
<h2 id="heading-tensorflow-lite-for-microcontrollers-the-foundation-of-tinyml"><strong>TensorFlow Lite for Microcontrollers: The Foundation of TinyML</strong></h2>
<h3 id="heading-overview"><strong>Overview</strong></h3>
<p>TensorFlow Lite for Microcontrollers is designed for the specific constraints of microcontroller development, allowing deployment of machine learning models on tiny microcontrollers to boost the intelligence of billions of devices in our lives, including household appliances and Internet of Things devices, without relying on expensive hardware or reliable internet connections.</p>
<h3 id="heading-how-it-works"><strong>How It Works</strong></h3>
<p>Because machine learning is computationally expensive, TensorFlow Lite for Microcontrollers requires a 32-bit processor, such as an ARM Cortex-M or ESP32, and the library is mostly written in C++, requiring a C++ compiler.</p>
<p>The workflow follows these steps:</p>
<ol>
<li><p><strong>Train the model</strong> on a computer or server</p>
</li>
<li><p><strong>Convert to FlatBuffer format</strong> (.tflite file)</p>
</li>
<li><p><strong>Convert to C array</strong> for embedding in firmware</p>
</li>
<li><p><strong>Deploy</strong> using the TensorFlow Lite for Microcontrollers library</p>
</li>
<li><p><strong>Run inference</strong> on the microcontroller</p>
</li>
</ol>
<p>On the microcontroller, the TensorFlow Lite for Microcontrollers library uses the model to perform inference, such as feeding unseen photos to determine if there is a cat in the photo.</p>
<h3 id="heading-what-you-can-build"><strong>What You Can Build</strong></h3>
<p>Example applications include a Hello World demonstration of the absolute basics, and person detection that captures camera data with an image sensor to detect the presence or absence of a person. TinyML applications include visual and audio wake words that trigger an action when a person is detected in an image or a keyword is spoken, predictive maintenance on industrial machines using sensors to continuously monitor for anomalous behavior, and gesture and activity detection for medical, consumer, and agricultural devices, such as gait analysis, fall detection or animal health monitoring.</p>
<h2 id="heading-edge-impulse-the-tinyml-development-platform"><strong>Edge Impulse: The TinyML Development Platform</strong></h2>
<h3 id="heading-what-is-edge-impulse"><strong>What is Edge Impulse?</strong></h3>
<p>Edge Impulse is a development platform that simplifies building, training, and deploying machine learning models on embedded systems and edge devices such as microcontrollers, sensors, and single-board computers like the Raspberry Pi or Arduino. Edge Impulse is a cloud-based machine learning operations (MLOps) platform for developing embedded and edge ML (TinyML) systems that can be deployed to a wide range of hardware targets, addressing challenges of fragmented software stacks and heterogeneous deployment hardware by streamlining the TinyML design cycle with various software and hardware optimizations.</p>
<h3 id="heading-key-features"><strong>Key Features</strong></h3>
<p><strong>End-to-End Workflow:</strong> Edge Impulse makes it easy to collect a dataset, choose the right machine learning algorithm, train a production-grade model, and run tests to prove that it works, with the whole process quick enough to run through in a few minutes.</p>
<p><strong>Data Collection:</strong> Edge Impulse can easily collect data from any sensor and development board using the Data forwarder, a small application that reads data over serial and sends it to Edge Impulse.</p>
<p><strong>Model Optimization:</strong> The platform provides estimates of how the model will perform on the target device, including memory usage (RAM and flash) and latency, helping ensure the model fits within hardware constraints.</p>
<p><strong>Wide Hardware Support:</strong> Edge Impulse launched with the Arduino Nano 33 BLE Sense, but models can be exported as an Arduino library to run on any Arm-based Arduino platform including the Arduino MKR family or Arduino Nano 33 IoT, providing the board has enough RAM.</p>
<h3 id="heading-typical-workflow"><strong>Typical Workflow</strong></h3>
<ol>
<li><p><strong>Data acquisition</strong> - Connect your device and collect labeled sensor data</p>
</li>
<li><p><strong>Impulse design</strong> - Configure processing blocks (like MFCC for audio) and learning blocks (neural network)</p>
</li>
<li><p><strong>Feature generation</strong> - Extract features from raw data</p>
</li>
<li><p><strong>Training</strong> - Train the model with configurable parameters</p>
</li>
<li><p><strong>Testing</strong> - Validate accuracy and performance</p>
</li>
<li><p><strong>Deployment</strong> - Export as optimized library for your target hardware</p>
</li>
</ol>
<p>The model trained using Edge Impulse can be around 18kb in size, which is mind-blowingly small for something so sophisticated and leaves a lot of space for application code.</p>
<h2 id="heading-other-major-tools-and-frameworks"><strong>Other Major Tools and Frameworks</strong></h2>
<h3 id="heading-stm32cubeai-x-cube-ai"><strong>STM32Cube.AI / X-CUBE-AI</strong></h3>
<p>X-CUBE-AI is a package that extends the capabilities of STM32CubeMX, adding the possibility to convert a pre-trained neural network into an ANSI C library that is performance optimized for STM32 microcontrollers based on ARM Cortex-M4 and M7 processor cores. The tool has advantages for developers such as a graphical user interface, support for different deep learning frameworks such as Keras and TensorFlow Lite, 8-bit-quantization, and compatibility with different STM32 microcontroller series.</p>
<h3 id="heading-arm-cmsis-nn"><strong>ARM CMSIS-NN</strong></h3>
<p>Common Microcontroller Software Interface Standard (CMSIS) has a version to deploy Neural Networks (CMSIS-NN) that was developed hand by hand with TensorFlow Lite engineers, meaning the operations supported by ARM microcontrollers are the same that TensorFlow Lite supports. CMSIS-NN kernels are used at a low level by tools like STM32Cube.AI.</p>
<h2 id="heading-common-practices-and-optimization-techniques"><strong>Common Practices and Optimization Techniques</strong></h2>
<h3 id="heading-model-optimization"><strong>Model Optimization</strong></h3>
<p>The key to running ML on microcontrollers is aggressive optimization:</p>
<p><strong>Quantization:</strong> Converting 32-bit floating-point models to 8-bit integer representations, dramatically reducing memory footprint and computation requirements with minimal accuracy loss.</p>
<p><strong>Pruning:</strong> Removing unnecessary connections in neural networks to reduce model size.</p>
<p><strong>Knowledge Distillation:</strong> Training smaller "student" models to mimic larger "teacher" models.</p>
<h3 id="heading-hardware-selection"><strong>Hardware Selection</strong></h3>
<p>A 32-bit processor such as an ARM Cortex-M or ESP32 is required for TensorFlow Lite for Microcontrollers. Popular platforms include:</p>
<ul>
<li><p><strong>ARM Cortex-M4/M7</strong>: STM32 series, nRF52 series</p>
</li>
<li><p><strong>ESP32/ESP32-S3</strong>: For Wi-Fi/Bluetooth connectivity</p>
</li>
<li><p><strong>Arduino Nano 33 BLE Sense</strong>: Features an Arm Cortex-M4 microcontroller running at 64 MHz with 1MB Flash memory and 256 KB of RAM, with onboard sensors including a 9-axis IMU</p>
</li>
</ul>
<h3 id="heading-development-workflow"><strong>Development Workflow</strong></h3>
<p>Currently, STM32Cube.AI is more commonly used because the X-CUBE-AI expansion package provides end-to-end solutions for automatic neural network model conversion, validation, and system performance measurements, making ARM Cortex-M 32-bit STMicroelectronics microcontrollers the most common platform.</p>
<h2 id="heading-what-you-can-and-cant-do"><strong>What You Can and Can't Do</strong></h2>
<h3 id="heading-what-works-well"><strong>What Works Well</strong></h3>
<p><strong>Audio Classification:</strong> Models can recognize household sounds like running water from a faucet, and be trained in just a few minutes with only a small amount of audio data</p>
<p><strong>Simple Image Recognition:</strong> Person detection, basic object recognition <strong>Gesture Detection:</strong> Using accelerometer/IMU data <strong>Keyword Spotting:</strong> Wake word detection and simple voice commands <strong>Anomaly Detection:</strong> Identifying unusual patterns in sensor data</p>
<h3 id="heading-limitations-1"><strong>Limitations</strong></h3>
<p><strong>Memory Constraints:</strong> TinyML systems often have very flat memory hierarchies, due to small or non-existent caches and often no off-chip memory</p>
<p><strong>Model Complexity:</strong> Complex vision models like modern CNNs, large language models, and multi-modal systems typically won't fit</p>
<p><strong>Computation Speed:</strong> Real-time video processing remains challenging</p>
<p><strong>Training:</strong> All training must happen off-device; microcontrollers can only run inference</p>
<p><strong>Accuracy Trade-offs:</strong> Smaller models generally mean lower accuracy compared to cloud-based alternatives</p>
<h2 id="heading-how-tinyml-works-a-technical-overview"><strong>How TinyML Works: A Technical Overview</strong></h2>
<h3 id="heading-the-inference-engine"><strong>The Inference Engine</strong></h3>
<p>The main application performs real-time predictions by defining the model to be loaded from a header file, adding necessary operation resolvers (like Fully Connected and ReLU layers), allocating a tensor arena which provides memory for intermediate computations and tensor data during inference, and initializing the interpreter using the model, the operation resolver, the tensor arena, and its size.</p>
<h3 id="heading-memory-management"><strong>Memory Management</strong></h3>
<p>Models are typically stored in flash memory as constant C arrays. During inference, a "tensor arena" in RAM holds intermediate calculations. The challenge is balancing model complexity with available resources.</p>
<h3 id="heading-hardware-acceleration"><strong>Hardware Acceleration</strong></h3>
<p>Many modern microcontrollers include DSP instructions or dedicated ML accelerators. ARM Cortex-M processors have advanced development of hardware architectures and DSP capabilities while maintaining low cost and power consumption.</p>
<h2 id="heading-getting-started-a-practical-guide"><strong>Getting Started: A Practical Guide</strong></h2>
<h3 id="heading-prerequisites"><strong>Prerequisites</strong></h3>
<ul>
<li><p>Basic programming skills (C/C++ and Python)</p>
</li>
<li><p>Understanding of machine learning fundamentals</p>
</li>
<li><p>Familiarity with embedded systems (helpful but not required)</p>
</li>
</ul>
<h3 id="heading-recommended-hardware"><strong>Recommended Hardware</strong></h3>
<p><strong>For Beginners:</strong></p>
<ul>
<li>ESP32 DevKit ($10-20) - great value, Wi-Fi/Bluetooth</li>
</ul>
<p><strong>For Voice Projects:</strong></p>
<ul>
<li><p>ESP32-S3-BOX or ESP32-Korvo boards</p>
</li>
<li><p>Boards with built-in microphones</p>
</li>
</ul>
<h3 id="heading-software-setup"><strong>Software Setup</strong></h3>
<ol>
<li><p><strong>Choose Your Path:</strong></p>
<ul>
<li><p><strong>Beginner-friendly:</strong> Edge Impulse (web-based, no local setup)</p>
</li>
<li><p><strong>ESP32 voice:</strong> Clone ESP-Skainet and install ESP-IDF v4.4 or v5.0</p>
</li>
<li><p><strong>General TinyML:</strong> Install TensorFlow, TensorFlow Lite, and your preferred IDE</p>
</li>
</ul>
</li>
<li><p><strong>First Project:</strong> Follow a tutorial to train a model that can recognize household sounds like running water from a faucet, using Edge Impulse to collect audio data, train a simple model, and export it as a C++ library</p>
</li>
</ol>
<h3 id="heading-learning-resources"><strong>Learning Resources</strong></h3>
<p><strong>Books:</strong></p>
<ul>
<li>TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers by Pete Warden and Daniel Situnayake, the standard textbook on embedded machine learning</li>
</ul>
<p><strong>Courses:</strong></p>
<ul>
<li>Harvard's Deploying TinyML course provides hands-on experience with deploying TinyML to physical devices, teaching programming in TensorFlow Lite for microcontrollers and featuring projects based on a TinyML Program Kit that includes an Arduino board with onboard sensors and an ARM Cortex-M4 microcontroller</li>
</ul>
<p><strong>Online Platforms:</strong></p>
<ul>
<li><p>Edge Impulse documentation and tutorials</p>
</li>
<li><p>TensorFlow Lite for Microcontrollers examples</p>
</li>
<li><p>ESP-Skainet GitHub repository</p>
</li>
</ul>
<h3 id="heading-sample-projects-to-start-with"><strong>Sample Projects to Start With</strong></h3>
<ol>
<li><p><strong>Audio wake word detection</strong> - Using ESP-Skainet or Edge Impulse</p>
</li>
<li><p><strong>Gesture recognition</strong> - Using accelerometer data</p>
</li>
<li><p><strong>Simple image classification</strong> - Person detection with a camera module</p>
</li>
<li><p><strong>Anomaly detection</strong> - Monitor sensor patterns for unusual behavior</p>
</li>
</ol>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>AI on microcontrollers represents a paradigm shift in how we think about embedded intelligence. Machine learning at the very edge enables valuable use of the 99% of sensor data that is discarded today due to cost, bandwidth or power constraints, with applications spanning health, white goods, mobility, industry, retail and agriculture.</p>
<p>While the technology is still maturing, the tools available today—from ESP-Skainet's voice capabilities to TensorFlow Lite's flexibility and Edge Impulse's user-friendly platform—make it possible for developers to add AI capabilities to their embedded projects without needing deep machine learning expertise.</p>
<p>The key is understanding the constraints: work within memory limits, optimize aggressively, and choose problems suited to edge inference. Most papers on TinyML have shown quite promising results in terms of accuracy, execution time, power consumption, and memory footprint, though edge computing is a relatively new research topic facing many new challenges.</p>
<p>Start small, experiment with existing tools and examples, and gradually build your understanding. The future of embedded intelligence is being written right now, and with the democratization of TinyML tools, anyone can contribute to it.</p>
<hr />
<p><strong>Sources:</strong></p>
<ul>
<li><p><a target="_blank" href="https://github.com/espressif/esp-skainet">ESP-Skainet GitHub Repository</a></p>
</li>
<li><p><a target="_blank" href="https://www.espressif.com/en/solutions/audio-solutions/esp-skainet/overview">Espressif ESP-Skainet Overview</a></p>
</li>
<li><p><a target="_blank" href="https://ai.google.dev/edge/litert/microcontrollers/overview">TensorFlow Lite for Microcontrollers Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://www.edgeimpulse.com">Edge Impulse Platform</a></p>
</li>
<li><p><a target="_blank" href="https://www.mdpi.com/2079-9292/11/16/2545">STM32Cube.AI and CMSIS-NN Research</a></p>
</li>
<li><p><a target="_blank" href="https://pll.harvard.edu/course/deploying-tinyml">Harvard TinyML Course</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Introducing a Modern Web Interface for USB Army Knife: Built for the Future]]></title><description><![CDATA[I've created a brand new web interface for the USB Army Knife project - a complete reimagination built with modern web technologies and client-side best practices. While the original USB Army Knife is a powerful ESP32-based security tool, I wanted to...]]></description><link>https://blog.niradler.com/introducing-a-modern-web-interface-for-usb-army-knife-built-for-the-future</link><guid isPermaLink="true">https://blog.niradler.com/introducing-a-modern-web-interface-for-usb-army-knife-built-for-the-future</guid><category><![CDATA[ESP32]]></category><category><![CDATA[hacking]]></category><category><![CDATA[web]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Wed, 28 Jan 2026 22:18:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769635595636/064f35aa-fb51-43e3-928b-497c9be092db.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I've created a brand new web interface for the USB Army Knife project - a complete reimagination built with modern web technologies and client-side best practices. While the original USB Army Knife is a powerful ESP32-based security tool, I wanted to build a modern, maintainable UI that leverages contemporary web development standards and provides a better user experience.</p>
<h2 id="heading-what-is-usb-army-knife">What is USB Army Knife?</h2>
<p>USB Army Knife is an ESP32-based multitool designed for security research, penetration testing, and hardware hacking. Built on the versatile ESP32 platform, it combines multiple capabilities into a single portable device, including BadUSB functionality, WiFi attacks, file management, and custom script execution.</p>
<h2 id="heading-why-a-new-modern-interface">Why a New Modern Interface?</h2>
<p>While the original USB Army Knife has a functional interface, I wanted to create something that embraced modern web development practices:</p>
<ul>
<li><p><strong>Modern Client Code</strong>: Built with current web standards and frameworks for better performance and maintainability</p>
</li>
<li><p><strong>Easier Maintenance</strong>: Clean, modular codebase that's straightforward to update and extend</p>
</li>
<li><p><strong>Better UX</strong>: Intuitive design with responsive layouts and smooth interactions</p>
</li>
<li><p><strong>Developer-Friendly</strong>: Well-structured code that other developers can easily contribute to</p>
</li>
</ul>
<p>This isn't meant to replace the official interface - rather, it's an alternative for users who prefer a more modern web application experience and developers who want cleaner code to work with.</p>
<h2 id="heading-the-new-web-interface">The New Web Interface</h2>
<p>The new web interface provides a comprehensive dashboard accessible through your browser, offering real-time device monitoring and full control over all device capabilities. The interface is designed with usability in mind, featuring a clean navigation structure and intuitive controls.</p>
<h3 id="heading-key-features">Key Features</h3>
<h4 id="heading-1-real-time-device-dashboard">1. <strong>Real-Time Device Dashboard</strong></h4>
<p>The dashboard provides an at-a-glance view of your device status:</p>
<ul>
<li><p>Device uptime and current status</p>
</li>
<li><p>USB mode indicator (Serial + HID)</p>
</li>
<li><p>Memory and heap usage monitoring</p>
</li>
<li><p>SD card storage status</p>
</li>
<li><p>Error tracking</p>
</li>
<li><p>Agent connection status</p>
</li>
<li><p>Hardware capabilities (SD, WiFi, TFT, Button, LED, Marauder)</p>
</li>
<li><p>System information including chip type and firmware version</p>
</li>
</ul>
<h4 id="heading-2-file-management">2. <strong>File Management</strong></h4>
<p>The file management system allows you to:</p>
<ul>
<li><p>Browse all files stored on the device's SD card</p>
</li>
<li><p>Upload new files directly through the browser</p>
</li>
<li><p>Download files from the device</p>
</li>
<li><p>Edit text-based configuration files</p>
</li>
<li><p>Execute scripts with a single click</p>
</li>
<li><p>View and display images stored on the device</p>
</li>
<li><p>Delete unwanted files</p>
</li>
</ul>
<p>The interface supports multiple file types including scripts (.ds), images (.png), text files (.txt), and configuration files (.json).</p>
<h4 id="heading-3-script-execution">3. <strong>Script Execution</strong></h4>
<p>One of the most powerful features is the script execution system:</p>
<ul>
<li><p>Run DuckyScript files and custom commands</p>
</li>
<li><p>Browse available scripts with easy-to-read listings</p>
</li>
<li><p>Access a comprehensive command reference</p>
</li>
<li><p>Execute raw commands directly</p>
</li>
<li><p>View execution results in real-time</p>
</li>
</ul>
<p>The built-in command reference includes support for delays, keyboard inputs, LED control, and display operations.</p>
<h4 id="heading-4-display-amp-led-control">4. <strong>Display &amp; LED Control</strong></h4>
<p>Control the device's display and LED indicators:</p>
<ul>
<li><p>Display custom text at specific coordinates</p>
</li>
<li><p>Show images from the SD card</p>
</li>
<li><p>Control RGB LEDs (Red, Green, Blue, Off)</p>
</li>
<li><p>Clear the display</p>
</li>
<li><p>Upload and display custom graphics</p>
</li>
</ul>
<h4 id="heading-5-esp32-marauder-integration">5. <strong>ESP32 Marauder Integration</strong></h4>
<p>For WiFi security testing, the interface includes full ESP32 Marauder support:</p>
<ul>
<li><p>Execute Marauder attack commands</p>
</li>
<li><p>Common commands readily available (attack, scan, sniff, beacon, deauth, probe, list, select, clear, help)</p>
</li>
<li><p>View command results in the device logs</p>
</li>
<li><p>Easy command input with helpful placeholders</p>
</li>
</ul>
<h4 id="heading-6-on-screen-keyboard">6. <strong>On-Screen Keyboard</strong></h4>
<p>Send keystrokes to target devices using the virtual keyboard:</p>
<ul>
<li><p>Full QWERTY layout with all standard keys</p>
</li>
<li><p>Support for modifier keys (Ctrl, Shift, Alt, Win)</p>
</li>
<li><p>Function keys (F1-F12)</p>
</li>
<li><p>Special keys (Tab, Caps, Enter, Backspace, etc.)</p>
</li>
<li><p>Navigation keys (Home, End, PgUp, PgDn, Insert, Delete)</p>
</li>
<li><p>Text input area for typing longer strings</p>
</li>
</ul>
<h4 id="heading-7-device-logs">7. <strong>Device Logs</strong></h4>
<p>Monitor all device activity through the comprehensive logging system:</p>
<ul>
<li><p>Real-time log viewing</p>
</li>
<li><p>Track script execution</p>
</li>
<li><p>Monitor command execution</p>
</li>
<li><p>View display operations</p>
</li>
<li><p>Debug device behavior</p>
</li>
<li><p>Refresh logs on demand</p>
</li>
<li><p>Clear logs when needed</p>
</li>
</ul>
<h4 id="heading-8-complete-api-documentation">8. <strong>Complete API Documentation</strong></h4>
<p>For developers, the interface includes full API documentation:</p>
<ul>
<li><p>REST API endpoints for all device functions</p>
</li>
<li><p>WebSocket connections for real-time updates</p>
</li>
<li><p>Device status and information endpoints</p>
</li>
<li><p>File management operations</p>
</li>
<li><p>Script and command execution</p>
</li>
<li><p>Agent operations</p>
</li>
<li><p>Display control</p>
</li>
<li><p>Marauder command execution</p>
</li>
<li><p>Audio capture and streaming</p>
</li>
<li><p>Settings configuration</p>
</li>
</ul>
<h2 id="heading-getting-it-running">Getting It Running</h2>
<h3 id="heading-the-cors-thing">The CORS Thing</h3>
<p>Okay, so there's this annoying browser security thing called CORS. Basically, browsers don't like it when web pages talk to random devices on your network (for good reason). The fix? Use <a target="_blank" href="https://github.com/niradler/corsy">Corsy</a>, a lightweight CORS proxy I made. It's super simple and solves the problem.</p>
<p>If you're running the interface locally (like, actually cloning the repo and running it on your machine), you don't need Corsy. Otherwise, yeah, you'll want it.</p>
<h3 id="heading-quick-start">Quick Start</h3>
<ol>
<li>Make sure your USB Army Knife is powered up and on your network</li>
<li>Set up Corsy if you need it (using the GitHub Pages version or accessing remotely)</li>
<li>Open the interface - either the <a target="_blank" href="https://niradler.github.io/USBArmyKnife-web/">live demo</a>, or clone it and run locally</li>
<li>Start playing around!</li>
</ol>
<h2 id="heading-what-can-you-do-with-this">What Can You Do With This?</h2>
<p>Honestly? Tons of stuff:</p>
<ul>
<li><strong>Security Research</strong>: Test WiFi networks, try out different attacks, see what works</li>
<li><strong>Penetration Testing</strong>: BadUSB payloads, automated keystroke injection, all that good stuff</li>
<li><strong>Hardware Hacking</strong>: Build ESP32 projects, mess with sensors, debug weird behavior</li>
<li><strong>Learning</strong>: Figure out how this hardware/software combo works, break things, fix things</li>
<li><strong>IoT Experiments</strong>: Test device security, poke at protocols, find vulnerabilities</li>
</ul>
<p>But really, the best use case is "I wonder what happens if I do this..." and then spending your weekend finding out.</p>
<h2 id="heading-links">Links</h2>
<ul>
<li><strong>This Modern Interface</strong>: <a target="_blank" href="https://github.com/niradler/USBArmyKnife-web">USBArmyKnife-web</a></li>
<li><strong>CORS Proxy</strong>: <a target="_blank" href="https://github.com/niradler/corsy">Corsy</a></li>
<li><strong>Live Demo</strong>: <a target="_blank" href="https://niradler.github.io/USBArmyKnife-web/">https://niradler.github.io/USBArmyKnife-web/</a></li>
<li><strong>Original USB Army Knife</strong>: <a target="_blank" href="https://github.com/">USBArmyKnife</a></li>
</ul>
<h2 id="heading-wrap-up">Wrap Up</h2>
<p>This project is what happens when you combine curiosity, modern web dev, and a love for hardware that does cool stuff. It's not perfect, it's not trying to be corporate, and it's definitely not the "official" anything. It's just a fun alternative that works well and is easy to tinker with.</p>
<p>The whole point is learning and experimentation - mixing software and hardware, seeing what you can build, and maybe learning something along the way. If you want to add features, change things, or just see how it works, go for it. The code is there, the hardware is fun, and there's always something new to try.</p>
<p>Whether you're into security research, hardware hacking, or just like building things that do stuff, give it a shot. And if you break something, hey, that's half the fun.</p>
<h2 id="heading-contributing">Contributing</h2>
<p>Pull requests welcome! Found a bug? Cool. Want to add a feature? Even better. The codebase is clean enough that you won't want to cry when you look at it, which is always a plus.</p>
<hr />
<p><strong>Disclaimer</strong>: This is for learning, research, and authorized testing only. Don't be that person who uses this stuff without permission. Seriously, don't.</p>
]]></content:encoded></item><item><title><![CDATA[From Flipper Zero to DIY ESP32 Hack Lab]]></title><description><![CDATA[If you've been following the hardware hacking scene, you've probably heard of Flipper Zero. It's the Swiss Army knife of hacking gadgets packing Sub-GHz RF, NFC, RFID, IR, GPIO, and BLE into one sleek, pocketable device. For good reason, it became th...]]></description><link>https://blog.niradler.com/from-flipper-zero-to-diy-esp32-hack-lab</link><guid isPermaLink="true">https://blog.niradler.com/from-flipper-zero-to-diy-esp32-hack-lab</guid><category><![CDATA[hardware]]></category><category><![CDATA[ESP32]]></category><category><![CDATA[flipper-zero]]></category><category><![CDATA[pentesting]]></category><category><![CDATA[Security]]></category><category><![CDATA[DIY]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Sun, 14 Dec 2025 22:37:17 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/7gaxRnvdEAc/upload/dfa21b5c5a6898af8aecf8c997fd3198.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you've been following the hardware hacking scene, you've probably heard of Flipper Zero. It's the Swiss Army knife of hacking gadgets packing Sub-GHz RF, NFC, RFID, IR, GPIO, and BLE into one sleek, pocketable device. For good reason, it became the go-to tool for security researchers and curious tinkerers alike.</p>
<h2 id="heading-the-flipper-zero-sweet-spot">The Flipper Zero Sweet Spot</h2>
<p>What makes Flipper Zero special is its polish. Everything works out of the box. The UI is intuitive, the form factor is perfect, and the community around it is massive. You can sniff NFC cards, replay Sub-GHz signals, control IR devices, and even write BadUSB payloads without breaking a sweat.</p>
<p>But here's where it gets interesting: Flipper Zero wasn't designed to be a WiFi powerhouse. Yes, you can add WiFi dev boards as modules, but the native capabilities are limited. If you want to dive into WPA3 attacks, 5GHz testing, or custom ESP-IDF exploits, you'll quickly hit a wall.</p>
<h2 id="heading-enter-the-esp32-ecosystem">Enter the ESP32 Ecosystem</h2>
<p>The DIY hardware community has been quietly building something remarkable around ESP32, CC1101, and M5Stack boards. We're talking about "Flipper-class" setups sometimes even better for a fraction of the cost.</p>
<p>Here's what caught my attention:</p>
<p><strong>ESP32Marauder</strong> - A WiFi swiss army knife that handles deauth attacks, PMKID/EAPOL capture, and beacon spam. It runs on inexpensive ESP32 boards and gives you capabilities that would require expensive WiFi Pineapple hardware.</p>
<p><strong>Ghost_ESP</strong> - Takes things further with 5GHz support and WPA3 flood testing on ESP32-C5 hardware. This is territory Flipper Zero doesn't touch without significant modifications.</p>
<p><strong>Bruce Firmware</strong> - This one's ambitious. Running on M5Stack Cardputer or T-Embed devices, it combines WiFi, BLE, Sub-GHz, NFC, IR, and BadUSB plus a JavaScript engine for automation. Add a CC1101 module and PN532 NFC board, and you've essentially built your own Flipper Zero with better WiFi capabilities.</p>
<p><strong>EvilCrow RF</strong> - Focused purely on RF, the V2 version uses dual CC1101 transceivers for more sophisticated Sub-GHz experiments like rolljam attacks. More flexible than Flipper's stock firmware for RF research.</p>
<h2 id="heading-the-economics-are-compelling">The Economics Are Compelling</h2>
<p>Let's talk numbers. A Flipper Zero costs $199-299. A basic ESP32 dev board? $15-30. An M5Stack Cardputer with all the modules to match Flipper's capabilities? Around $60-90 total. Even if you build multiple specialized devices one for WiFi, one for RF, one for NFC you're still spending less than a single Flipper Zero.</p>
<p>This isn't about cheap knockoffs. These are purpose-built tools that often exceed Flipper's capabilities in their respective domains.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765752036538/33551f27-bd83-4016-a2ec-2cbe7f8b65e2.jpeg" alt class="image--center mx-auto" /></p>
<h2 id="heading-why-this-matters">Why This Matters</h2>
<p>I'm not saying Flipper Zero is obsolete. Far from it. If you want an all-in-one device with excellent build quality, great documentation, and a thriving community, Flipper Zero is still the best choice. It's plug-and-play security research.</p>
<p>But if you're comfortable with DIY electronics, if you want to deeply understand how these systems work, or if you need specific capabilities like advanced WiFi pentesting, the ESP32 ecosystem is incredibly powerful.</p>
<h2 id="heading-what-im-building">What I'm Building</h2>
<p>Over the next few posts, I'll be documenting my journey building a DIY security research lab using ESP32 hardware. I want to understand:</p>
<ul>
<li><p>How these tools compare in real-world scenarios</p>
</li>
<li><p>What the trade-offs are between convenience and capability</p>
</li>
<li><p>How to build a modular testing setup that's both powerful and portable</p>
</li>
<li><p>Whether the DIY approach actually delivers on its promise</p>
</li>
</ul>
<p>The goal is to explore what's possible when you combine modern microcontrollers, open-source firmware, and a willingness to get your hands dirty.</p>
<h2 id="heading-next-steps">Next Steps</h2>
<p>In the next post, I'll break down how to choose the right hardware stack for your needs.</p>
<p>I'll compare specific use cases: WiFi pentesting, RF research, NFC/RFID work, and multi-protocol scenarios. We'll look at what works, what doesn't, and what you should actually build.</p>
<p><em>Important Note: All the tools and techniques discussed in this series are for educational purposes and authorized security testing only. Always get explicit permission before testing any systems or networks you don't own.</em></p>
<h2 id="heading-resources">Resources</h2>
<ul>
<li><p><a target="_blank" href="https://flipper.net">Flipper Zero Official Site</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/justcallmekoko/ESP32Marauder">ESP32Marauder GitHub</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/Spooks4576/Ghost_ESP">Ghost_ESP Releases</a></p>
</li>
<li><p><a target="_blank" href="https://bruce.computer">Bruce Firmware</a></p>
</li>
<li><p><a target="_blank" href="https://www.bordergate.co.uk/evilcrowrf-v2/">EvilCrow RF Information</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Building a GitHub Dashboard on ESP32 E-Paper]]></title><description><![CDATA[I built this mostly because playing with hardware is fun. There's something satisfying about making a physical thing that sits on your desk and actually does something. Plus, let's be honest - having a little e-paper display showing live GitHub stats...]]></description><link>https://blog.niradler.com/building-a-github-dashboard-on-esp32-e-paper</link><guid isPermaLink="true">https://blog.niradler.com/building-a-github-dashboard-on-esp32-e-paper</guid><category><![CDATA[arduino]]></category><category><![CDATA[mcp]]></category><category><![CDATA[ESP32]]></category><category><![CDATA[LılyGo]]></category><category><![CDATA[GitHub API]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Tue, 18 Nov 2025 23:24:47 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763564602703/6b4516f0-bfd0-4cd6-928a-76a1611cff2c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I built this mostly because playing with hardware is fun. There's something satisfying about making a physical thing that sits on your desk and actually does something. Plus, let's be honest - having a little e-paper display showing live GitHub stats makes your desk look way more interesting.</p>
<p>Does it solve a problem? Kind of. I can glance at my notifications and contribution stats without opening a browser. But that's the excuse I gave myself. The real reason was wanting to build something cool with an e-paper display and ESP32.</p>
<h2 id="heading-screens">Screens</h2>
<p>Press Button (GPIO 39) to cycle through:</p>
<ol>
<li><p><strong>Notifications</strong> - Reviews, mentions, assignments</p>
</li>
<li><p><strong>Profile</strong> - Repos, stars, open PRs, followers</p>
</li>
<li><p><strong>PR Overview</strong> - Open, Waiting to review, Ready to merge, Request changes</p>
</li>
</ol>
<p>Each screen fetches fresh data when you switch to it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763506858274/93307ebe-527a-4486-8140-8a92dc524dad.jpeg" alt /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763506866680/6f13df5f-c80f-445d-acad-7a18df806901.jpeg" alt /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763506873050/43055dd7-a97a-4c02-a726-83e72fdc327f.jpeg" alt /></p>
<h2 id="heading-hardware">Hardware</h2>
<p><strong>LILYGO T5 V2.3.1</strong> (~$15-20) - ESP32 with 2.13" e-paper and two built-in buttons:</p>
<ul>
<li><p>GPIO 39: Cycle screens</p>
</li>
<li><p>GPIO 0: Force refresh</p>
</li>
</ul>
<p>That's it. Flash and go.</p>
<h2 id="heading-how-it-works">How It Works</h2>
<p><strong>Data fetching:</strong></p>
<ul>
<li><p>REST API for notifications</p>
</li>
<li><p>GraphQL for profile/activity (smaller payloads)</p>
</li>
<li><p>Only fetches active screen data</p>
</li>
<li><p>Only refreshes display when data changes</p>
</li>
</ul>
<p><strong>Power:</strong></p>
<ul>
<li><p>Deep sleep between updates (default 10 min)</p>
</li>
<li><p>Wakes on timer or button press</p>
</li>
<li><p>Web server runs 30 seconds after boot for config access</p>
</li>
</ul>
<h2 id="heading-web-interface">Web Interface</h2>
<p>First boot creates AP: "NotificationHub" (password: <code>configure</code>)<br />Connect and go to <code>192.168.4.1</code></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763506896772/d9d701a3-3408-4718-b04f-686530807c94.png" alt /></p>
<p>Four tabs:</p>
<ul>
<li><p><strong>Dashboard</strong> - Status, refresh button</p>
</li>
<li><p><strong>WiFi</strong> - Network config, admin password</p>
</li>
<li><p><strong>Providers</strong> - GitHub token and username</p>
</li>
<li><p><strong>Settings</strong> - Update interval</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763506901536/bab16d42-1bc4-4391-8cf9-31435a7f44f8.png" alt /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763506916311/fa3744ad-3551-4d3b-a300-08e0d5e9b8ed.png" alt /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763506906405/e61fb79f-855b-4eb1-a83a-3e28cc3a977a.png" alt /></p>
<h2 id="heading-github-token">GitHub Token</h2>
<p>Generate a Classic token with:</p>
<ul>
<li><p><code>notifications</code></p>
</li>
<li><p><code>read:user</code></p>
</li>
</ul>
<p>Add it in Providers tab.</p>
<h2 id="heading-setup">Setup</h2>
<ol>
<li><p>Flash firmware</p>
</li>
<li><p>Connect to "NotificationHub" (password: <code>configure</code>)</p>
</li>
<li><p>Go to <code>192.168.4.1</code> (or whatever your network assigns)</p>
</li>
<li><p>Configure WiFi and admin password</p>
</li>
<li><p>Add GitHub token and username</p>
</li>
<li><p>Reboot</p>
</li>
</ol>
<p>Updates every 10 minutes. Press buttons for manual control.</p>
<h2 id="heading-built-with-arduino-mcp">Built with Arduino MCP</h2>
<p>Used <a target="_blank" href="https://github.com/niradler/arduino-mcp">arduino-mcp</a> - connects Claude/Cursor to Arduino CLI:</p>
<pre><code class="lang-bash"><span class="hljs-string">"Compile for ESP32"</span>
<span class="hljs-string">"Upload to board"</span>
<span class="hljs-string">"Convert this PNG to C array"</span>
</code></pre>
<p>Way faster than copy-pasting commands.</p>
<p>Claude Desktop config:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"mcpServers"</span>: {
    <span class="hljs-attr">"arduino-uvx"</span>: {
      <span class="hljs-attr">"command"</span>: <span class="hljs-string">"uvx"</span>,
      <span class="hljs-attr">"args"</span>: [<span class="hljs-string">"arduino-mcp"</span>],
      <span class="hljs-attr">"env"</span>: {}
    }
  }
}
</code></pre>
<h2 id="heading-api-endpoints">API Endpoints</h2>
<pre><code class="lang-bash">GET /api/status          <span class="hljs-comment"># Current status</span>
POST /api/refresh        <span class="hljs-comment"># Force refresh</span>
POST /api/reset          <span class="hljs-comment"># Factory reset</span>
</code></pre>
<h2 id="heading-memory-handling">Memory Handling</h2>
<p>ESP32 RAM is tight:</p>
<ul>
<li><p>Paged API requests</p>
</li>
<li><p>Dynamic JSON buffers</p>
</li>
<li><p>Independent screen state</p>
</li>
<li><p>Display hibernation during sleep</p>
</li>
</ul>
<h2 id="heading-whats-next">What's Next</h2>
<p>Could add:</p>
<ul>
<li><p>GitLab/Bitbucket support</p>
</li>
<li><p>Larger displays (4.2" or 7.5")</p>
</li>
<li><p>Historical graphs</p>
</li>
<li><p>Battery indicator</p>
</li>
<li><p>Webhooks for instant updates</p>
</li>
<li><p>3D printed <a target="_blank" href="https://www.printables.com/model/412141-lilygo-ttgo-t5-213-case">case</a></p>
</li>
</ul>
<h2 id="heading-stack">Stack</h2>
<p><strong>Hardware:</strong> <a target="_blank" href="https://lilygo.cc/products/t5-2-13inch-e-paper">LILYGO T5 V2.3.1</a></p>
<p><strong>Software:</strong> ESP32 Arduino, GxEPD2, ArduinoJson</p>
<p><strong>APIs:</strong> GitHub REST + GraphQL, NTP</p>
<h2 id="heading-links">Links</h2>
<p><strong>Code:</strong> <a target="_blank" href="https://github.com/niradler/github-dashboard-esp32-epaper">github.com/niradler/github-dashboard-esp32-epaper</a></p>
<p><strong>Arduino MCP:</strong> <a target="_blank" href="https://github.com/niradler/arduino-mcp">github.com/niradler/arduino-mcp</a></p>
<p><strong>License:</strong> MIT | <strong>Cost:</strong> ~$15-20 | <strong>Time:</strong> 1-2 hours</p>
]]></content:encoded></item><item><title><![CDATA[MCP Conductor: The Advantages of Equipping Your Agent with Code Execution Powers]]></title><description><![CDATA[TL;DR: MCP Conductor gives AI models a secure Deno sandbox to run TypeScript/JavaScript code with three powerful integration options: call MCP servers via mcpFactory, shell out to CLI tools (if permitted), or import npm/JSR packages directly.
This fl...]]></description><link>https://blog.niradler.com/mcp-conductor-the-advantages-of-equipping-your-agent-with-code-execution-powers</link><guid isPermaLink="true">https://blog.niradler.com/mcp-conductor-the-advantages-of-equipping-your-agent-with-code-execution-powers</guid><category><![CDATA[mcp]]></category><category><![CDATA[mcp server]]></category><category><![CDATA[agentic AI]]></category><category><![CDATA[agentic ai development]]></category><category><![CDATA[#anthropic]]></category><category><![CDATA[code]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Tue, 11 Nov 2025 18:58:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762887452705/40f73f46-0f69-4149-9841-5be2eb0fdb2c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>TL;DR:</strong> MCP Conductor gives AI models a secure Deno sandbox to run TypeScript/JavaScript code with three powerful integration options: call MCP servers via <code>mcpFactory</code>, shell out to CLI tools (if permitted), or import npm/JSR packages directly.</p>
<p>This flexibility lets you orchestrate for example GitHub via MCP server, GitHub CLI, AND the Octokit SDK in a single execution, choosing the right tool for each subtask while keeping the model's context clean.</p>
<hr />
<h2 id="heading-the-problem-when-direct-tool-calls-hit-their-limits">The Problem: When Direct Tool Calls Hit Their Limits</h2>
<p>The Model Context Protocol excels at giving AI models structured access to external systems. Want to query GitHub? Use the GitHub MCP server. Need to process files? Use the filesystem MCP server. This works beautifully until it doesn't.</p>
<p><strong>Token waste from intermediate data.</strong> Fetch a 50KB document to extract three numbers? The entire document flows through the model's context. Query an API that returns 200 results when you need 5? All 200 results consume tokens.</p>
<p><strong>Sequential execution bottlenecks.</strong> Need data from three APIs? Make three separate MCP tool calls, waiting for each to complete before starting the next even when they're independent operations.</p>
<p><strong>Limited data transformation.</strong> MCP servers return raw data. If you need to filter, aggregate, or transform it, the model must either do it in-context (expensive) or make additional tool calls (slow).</p>
<p><strong>Rigid interfaces.</strong> Sometimes the MCP server doesn't expose exactly what you need. Maybe you need a complex Git operation that requires multiple commands, or you want to use a specific npm package that's perfect for the job.</p>
<p>Consider automating a GitHub workflow:</p>
<ol>
<li><p>Check open PRs (MCP tool call → 10,000 tokens of PR data)</p>
</li>
<li><p>For each PR, check CI status (5 more MCP calls → 15,000 tokens)</p>
</li>
<li><p>Filter to PRs with passing CI (model does this in context)</p>
</li>
<li><p>Generate a summary report (more context processing)</p>
</li>
</ol>
<p>Total: 30,000+ tokens, 3-4 seconds of sequential execution, and the model is drowning in raw API responses.</p>
<hr />
<h2 id="heading-solution-execute-code-not-just-tools">Solution: Execute Code, Not Just Tools</h2>
<p>MCP Conductor provides a single MCP tool <code>run_deno_code</code> that executes TypeScript/JavaScript in a secure Deno sandbox. But here's what makes it powerful: <strong>it gives you three ways to integrate with external systems, and you can mix them freely:</strong></p>
<h3 id="heading-1-call-mcp-servers-via-mcpfactory-reuse-the-ecosystem">1. Call MCP Servers via <code>mcpFactory</code> (Reuse the Ecosystem)</h3>
<p>Want to use existing MCP servers? The global <code>mcpFactory</code> object is auto-injected into your code:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Connect to GitHub MCP server</span>
<span class="hljs-keyword">const</span> github = <span class="hljs-keyword">await</span> mcpFactory.load(<span class="hljs-string">'github'</span>);

<span class="hljs-comment">// Call its tools directly from code</span>
<span class="hljs-keyword">const</span> prs = <span class="hljs-keyword">await</span> github.callTool(<span class="hljs-string">'list_pull_requests'</span>, {
  repo: <span class="hljs-string">'myorg/myrepo'</span>,
  state: <span class="hljs-string">'open'</span>
});

<span class="hljs-comment">// Process in execution environment - no tokens wasted</span>
<span class="hljs-keyword">const</span> passingPRs = prs.filter(<span class="hljs-function"><span class="hljs-params">pr</span> =&gt;</span> pr.ci_status === <span class="hljs-string">'passing'</span>);

<span class="hljs-comment">// Only final result returns to model</span>
<span class="hljs-keyword">return</span> { count: passingPRs.length, titles: passingPRs.map(<span class="hljs-function"><span class="hljs-params">pr</span> =&gt;</span> pr.title) };
</code></pre>
<p><strong>Benefit:</strong> Leverage the entire MCP ecosystem (100+ community servers) while processing data efficiently.</p>
<h3 id="heading-2-shell-out-to-cli-tools-maximum-flexibility">2. Shell Out to CLI Tools (Maximum Flexibility)</h3>
<p>Sometimes CLI tools are the best interface. With <code>--allow-run</code> permission, spawn subprocesses directly:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Use GitHub CLI for complex operations</span>
<span class="hljs-keyword">const</span> ghProcess = <span class="hljs-keyword">new</span> Deno.Command(<span class="hljs-string">'gh'</span>, {
  args: [<span class="hljs-string">'pr'</span>, <span class="hljs-string">'list'</span>, <span class="hljs-string">'--json'</span>, <span class="hljs-string">'number,title,state'</span>, <span class="hljs-string">'--limit'</span>, <span class="hljs-string">'100'</span>],
  stdout: <span class="hljs-string">'piped'</span>
});

<span class="hljs-keyword">const</span> { stdout } = <span class="hljs-keyword">await</span> ghProcess.output();
<span class="hljs-keyword">const</span> prs = <span class="hljs-built_in">JSON</span>.parse(<span class="hljs-keyword">new</span> TextDecoder().decode(stdout));

<span class="hljs-comment">// Use git CLI for repository operations</span>
<span class="hljs-keyword">const</span> gitLog = <span class="hljs-keyword">new</span> Deno.Command(<span class="hljs-string">'git'</span>, {
  args: [<span class="hljs-string">'log'</span>, <span class="hljs-string">'--oneline'</span>, <span class="hljs-string">'-10'</span>],
  stdout: <span class="hljs-string">'piped'</span>
});

<span class="hljs-keyword">const</span> commits = <span class="hljs-keyword">new</span> TextDecoder().decode((<span class="hljs-keyword">await</span> gitLog.output()).stdout);

<span class="hljs-keyword">return</span> { prs: prs.length, recentCommits: commits };
</code></pre>
<p><strong>Benefit:</strong> Full access to mature CLI tools and their rich feature sets.</p>
<h3 id="heading-3-import-npmjsr-packages-direct-api-access">3. Import npm/JSR Packages (Direct API Access)</h3>
<p>Need fine-grained control or want to use specialized packages? Import them directly:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { Octokit } <span class="hljs-keyword">from</span> <span class="hljs-string">'npm:@octokit/rest@^20'</span>;

<span class="hljs-keyword">const</span> octokit = <span class="hljs-keyword">new</span> Octokit({ auth: Deno.env.get(<span class="hljs-string">'GITHUB_TOKEN'</span>) });

<span class="hljs-comment">// Use full Octokit SDK</span>
<span class="hljs-keyword">const</span> { data: prs } = <span class="hljs-keyword">await</span> octokit.pulls.list({
  owner: <span class="hljs-string">'myorg'</span>,
  repo: <span class="hljs-string">'myrepo'</span>,
  state: <span class="hljs-string">'open'</span>,
  per_page: <span class="hljs-number">100</span>
});

<span class="hljs-comment">// Complex filtering and transformation</span>
<span class="hljs-keyword">const</span> analysis = prs
  .filter(<span class="hljs-function"><span class="hljs-params">pr</span> =&gt;</span> pr.draft === <span class="hljs-literal">false</span>)
  .reduce(<span class="hljs-function">(<span class="hljs-params">acc, pr</span>) =&gt;</span> {
    acc[pr.user.login] = (acc[pr.user.login] || <span class="hljs-number">0</span>) + <span class="hljs-number">1</span>;
    <span class="hljs-keyword">return</span> acc;
  }, {});

<span class="hljs-keyword">return</span> { totalPRs: prs.length, prsByAuthor: analysis };
</code></pre>
<p><strong>Benefit:</strong> Direct SDK access with full TypeScript support and comprehensive APIs.</p>
<hr />
<h2 id="heading-the-power-of-three-options-a-real-example">The Power of Three Options: A Real Example</h2>
<p>Here's what makes MCP Conductor powerful <strong>you can mix all three approaches</strong> in a single execution:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Use MCP server for discovery (good for structured MCP operations)</span>
<span class="hljs-keyword">const</span> github = <span class="hljs-keyword">await</span> mcpFactory.load(<span class="hljs-string">'github'</span>);
<span class="hljs-keyword">const</span> repos = <span class="hljs-keyword">await</span> github.callTool(<span class="hljs-string">'list_repositories'</span>, { 
  org: <span class="hljs-string">'myorg'</span> 
});

<span class="hljs-comment">// Use CLI for complex git operations (best tool for the job)</span>
<span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> repo <span class="hljs-keyword">of</span> repos.slice(<span class="hljs-number">0</span>, <span class="hljs-number">5</span>)) {
  <span class="hljs-keyword">const</span> clone = <span class="hljs-keyword">new</span> Deno.Command(<span class="hljs-string">'git'</span>, {
    args: [<span class="hljs-string">'clone'</span>, <span class="hljs-string">'--depth'</span>, <span class="hljs-string">'1'</span>, repo.clone_url, <span class="hljs-string">`/tmp/<span class="hljs-subst">${repo.name}</span>`</span>],
  });
  <span class="hljs-keyword">await</span> clone.output();
}

<span class="hljs-comment">// Use npm package for analysis (specialized functionality)</span>
<span class="hljs-keyword">import</span> { analyze } <span class="hljs-keyword">from</span> <span class="hljs-string">'npm:code-complexity@^2'</span>;

<span class="hljs-keyword">const</span> results = [];
<span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> repo <span class="hljs-keyword">of</span> repos.slice(<span class="hljs-number">0</span>, <span class="hljs-number">5</span>)) {
  <span class="hljs-keyword">const</span> stats = <span class="hljs-keyword">await</span> analyze(<span class="hljs-string">`/tmp/<span class="hljs-subst">${repo.name}</span>`</span>);
  results.push({ repo: repo.name, complexity: stats.average });
}

<span class="hljs-comment">// Return only the summary</span>
<span class="hljs-keyword">return</span> results.sort(<span class="hljs-function">(<span class="hljs-params">a, b</span>) =&gt;</span> b.complexity - a.complexity);
</code></pre>
<p><strong>Why this matters:</strong> You choose the best tool for each subtask:</p>
<ul>
<li><p>MCP server for structured operations</p>
</li>
<li><p>CLI for mature tooling</p>
</li>
<li><p>npm packages for specialized logic</p>
</li>
<li><p>Process everything in the sandbox</p>
</li>
<li><p>Return only what the model needs</p>
</li>
</ul>
<hr />
<h2 id="heading-how-it-actually-works">How It Actually Works</h2>
<h3 id="heading-1-configuration">1. Configuration</h3>
<p>Configure MCP Conductor in your MCP client (Cursor, Claude Desktop, etc.):</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"mcpServers"</span>: {
    <span class="hljs-attr">"mcp-conductor"</span>: {
      <span class="hljs-attr">"command"</span>: <span class="hljs-string">"deno"</span>,
      <span class="hljs-attr">"args"</span>: [<span class="hljs-string">"run"</span>, <span class="hljs-string">"--allow-read"</span>, <span class="hljs-string">"--allow-write"</span>, <span class="hljs-string">"--allow-net"</span>, 
               <span class="hljs-string">"--allow-env"</span>, <span class="hljs-string">"--allow-run=deno"</span>, <span class="hljs-string">"jsr:@conductor/mcp"</span>, <span class="hljs-string">"stdio"</span>],
      <span class="hljs-attr">"env"</span>: {
        <span class="hljs-attr">"MCP_CONDUCTOR_WORKSPACE"</span>: <span class="hljs-string">"${userHome}/.mcp-conductor/workspace"</span>,
        <span class="hljs-attr">"MCP_CONDUCTOR_RUN_ARGS"</span>: <span class="hljs-string">"allow-read=/workspace,allow-write=/workspace,allow-net,allow-run=gh,git"</span>,
        <span class="hljs-attr">"MCP_CONDUCTOR_MCP_CONFIG"</span>: <span class="hljs-string">"${userHome}/.mcp-conductor/mcp.json"</span>
      }
    }
  }
}
</code></pre>
<h3 id="heading-2-optional-configure-mcp-servers-for-proxy">2. Optional: Configure MCP Servers for Proxy</h3>
<p>Create <code>~/.mcp-conductor/mcp.json</code> to expose MCP servers to executed code:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"mcpServers"</span>: {
    <span class="hljs-attr">"github"</span>: {
      <span class="hljs-attr">"command"</span>: <span class="hljs-string">"npx"</span>,
      <span class="hljs-attr">"args"</span>: [<span class="hljs-string">"-y"</span>, <span class="hljs-string">"@modelcontextprotocol/server-github"</span>],
      <span class="hljs-attr">"env"</span>: {
        <span class="hljs-attr">"GITHUB_PERSONAL_ACCESS_TOKEN"</span>: <span class="hljs-string">"ghp_..."</span>
      }
    },
    <span class="hljs-attr">"filesystem"</span>: {
      <span class="hljs-attr">"command"</span>: <span class="hljs-string">"npx"</span>,
      <span class="hljs-attr">"args"</span>: [<span class="hljs-string">"-y"</span>, <span class="hljs-string">"@modelcontextprotocol/server-filesystem"</span>, <span class="hljs-string">"/allowed/path"</span>]
    }
  }
}
</code></pre>
<h3 id="heading-3-model-writes-code">3. Model Writes Code</h3>
<p>The model uses the <code>run_deno_code</code> tool:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"tool"</span>: <span class="hljs-string">"run_deno_code"</span>,
  <span class="hljs-attr">"arguments"</span>: {
    <span class="hljs-attr">"deno_code"</span>: <span class="hljs-string">"const github = await mcpFactory.load('github');\nconst prs = await github.callTool('list_pull_requests', { repo: 'myrepo' });\nreturn prs.filter(pr =&gt; pr.state === 'open').length;"</span>,
    <span class="hljs-attr">"timeout"</span>: <span class="hljs-number">30000</span>
  }
}
</code></pre>
<h3 id="heading-4-secure-execution">4. Secure Execution</h3>
<p>Conductor spawns a <strong>fresh Deno subprocess</strong> for each execution:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762815692565/ce968db6-4a3c-491b-8b69-39dee0aba91e.png" alt class="image--center mx-auto" /></p>
<p><strong>Key security features:</strong></p>
<ul>
<li><p>Zero permissions by default</p>
</li>
<li><p>Admin-controlled via <code>MCP_CONDUCTOR_RUN_ARGS</code></p>
</li>
<li><p>Fresh process per execution (no state leakage)</p>
</li>
<li><p><code>--no-prompt</code> flag prevents permission escalation</p>
</li>
<li><p><code>--cached-only</code> by default (no dynamic package fetching)</p>
</li>
</ul>
<h3 id="heading-5-return-results">5. Return Results</h3>
<p>Only the final expression/return value goes back to the model:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Execution processes 50KB of data internally</span>
<span class="hljs-comment">// Model receives 200 bytes of summary</span>
{
  <span class="hljs-string">"content"</span>: [
    {
      <span class="hljs-string">"type"</span>: <span class="hljs-string">"text"</span>, 
      <span class="hljs-string">"text"</span>: <span class="hljs-string">"Execution successful\nReturn value: {\"openPRs\": 12, \"passingCI\": 8}"</span>
    }
  ]
}
</code></pre>
<hr />
<h2 id="heading-why-this-approach-works">Why This Approach Works</h2>
<h3 id="heading-1-maximum-flexibility">1. Maximum Flexibility</h3>
<p>You're not locked into one integration method:</p>
<p><strong>Scenario: GitHub Automation</strong></p>
<ul>
<li><p>Use GitHub MCP server for straightforward operations</p>
</li>
<li><p>Use GitHub CLI for complex git workflows</p>
</li>
<li><p>Use Octokit SDK when you need fine-grained API control</p>
</li>
<li><p>Mix them in the same execution as needed</p>
</li>
</ul>
<h3 id="heading-2-token-efficiency">2. Token Efficiency</h3>
<p>Process data in the execution environment, not in the model's context:</p>
<p><strong>Before (Direct MCP calls):</strong></p>
<ul>
<li><p>Fetch 50KB document → 50KB in context</p>
</li>
<li><p>Extract 3 metrics → Model does this (more tokens)</p>
</li>
<li><p><strong>Total:</strong> ~15,000 tokens</p>
</li>
</ul>
<p><strong>After (Code execution):</strong></p>
<ul>
<li><p>Fetch + process in sandbox → 3 metrics returned</p>
</li>
<li><p><strong>Total:</strong> ~500 tokens</p>
</li>
</ul>
<h3 id="heading-3-parallel-execution-your-code-your-control">3. Parallel Execution (Your Code, Your Control)</h3>
<pre><code class="lang-typescript"><span class="hljs-comment">// Model writes parallel execution logic</span>
<span class="hljs-keyword">const</span> [repos, issues, prs] = <span class="hljs-keyword">await</span> <span class="hljs-built_in">Promise</span>.all([
  github.callTool(<span class="hljs-string">'list_repositories'</span>, { org: <span class="hljs-string">'myorg'</span> }),
  github.callTool(<span class="hljs-string">'search_issues'</span>, { query: <span class="hljs-string">'is:open'</span> }),
  github.callTool(<span class="hljs-string">'list_pull_requests'</span>, { state: <span class="hljs-string">'open'</span> })
]);

<span class="hljs-comment">// 3x faster than sequential MCP calls</span>
<span class="hljs-keyword">return</span> { repoCount: repos.length, openIssues: issues.length, openPRs: prs.length };
</code></pre>
<h3 id="heading-4-rich-npmjsr-ecosystem">4. Rich npm/JSR Ecosystem</h3>
<p>Any package is available (if you grant <code>--allow-net</code> or pre-cache it):</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Data processing</span>
<span class="hljs-keyword">import</span> { parse } <span class="hljs-keyword">from</span> <span class="hljs-string">'npm:csv-parse@^5'</span>;
<span class="hljs-keyword">import</span> { z } <span class="hljs-keyword">from</span> <span class="hljs-string">'npm:zod@^3'</span>;

<span class="hljs-comment">// API clients</span>
<span class="hljs-keyword">import</span> { Octokit } <span class="hljs-keyword">from</span> <span class="hljs-string">'npm:@octokit/rest'</span>;
<span class="hljs-keyword">import</span> Stripe <span class="hljs-keyword">from</span> <span class="hljs-string">'npm:stripe@^14'</span>;

<span class="hljs-comment">// Utilities</span>
<span class="hljs-keyword">import</span> { format } <span class="hljs-keyword">from</span> <span class="hljs-string">'npm:date-fns@^3'</span>;
<span class="hljs-keyword">import</span> * <span class="hljs-keyword">as</span> path <span class="hljs-keyword">from</span> <span class="hljs-string">'jsr:@std/path'</span>;
</code></pre>
<hr />
<h2 id="heading-security-model-admin-control-not-model-control">Security Model: Admin Control, Not Model Control</h2>
<p><strong>Critical design choice:</strong> The model writes code, but <strong>admins control all permissions</strong> via environment variables.</p>
<h3 id="heading-admin-configured-permissions">Admin-Configured Permissions</h3>
<pre><code class="lang-json">{
  <span class="hljs-attr">"env"</span>: {
    <span class="hljs-attr">"MCP_CONDUCTOR_RUN_ARGS"</span>: <span class="hljs-string">"allow-read=/workspace,allow-write=/workspace,allow-net=api.github.com,allow-run=gh,git"</span>
  }
}
</code></pre>
<h3 id="heading-two-process-isolation">Two-Process Isolation</h3>
<p><strong>Server Process</strong> (Trusted):</p>
<ul>
<li><p>Full permissions to manage workspace</p>
</li>
<li><p>Install dependencies</p>
</li>
<li><p>Spawn sandbox subprocesses</p>
</li>
</ul>
<p><strong>User Code Subprocess</strong> (Untrusted):</p>
<ul>
<li><p>Only admin-granted permissions</p>
</li>
<li><p>Fresh process (no state leakage)</p>
</li>
<li><p>Crashes don't affect server</p>
</li>
<li><p>Auto-killed after timeout</p>
</li>
</ul>
<h3 id="heading-security-by-default">Security by Default</h3>
<p>Every execution runs with:</p>
<ul>
<li><p><code>--no-prompt</code> - Fails fast, no permission dialogs</p>
</li>
<li><p><code>--cached-only</code> - No dynamic package fetching (unless admin allows)</p>
</li>
<li><p><code>--no-remote</code> - Blocks remote imports</p>
</li>
<li><p>Zero permissions + only admin-granted ones</p>
</li>
</ul>
<hr />
<h2 id="heading-real-world-example-the-github-automation-workflow">Real-World Example: The GitHub Automation Workflow</h2>
<p>Let's see all three integration methods working together:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">/**
 * Task: Generate a weekly team report from GitHub
 * - List all PRs merged this week (MCP server - simple structured call)
 * - Clone repos to analyze code quality (CLI - best tool for git operations)
 * - Calculate complexity metrics (npm package - specialized analysis)
 */</span>

<span class="hljs-comment">// 1. Use MCP server for structured GitHub operations</span>
<span class="hljs-keyword">const</span> github = <span class="hljs-keyword">await</span> mcpFactory.load(<span class="hljs-string">'github'</span>);

<span class="hljs-keyword">const</span> oneWeekAgo = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(<span class="hljs-built_in">Date</span>.now() - <span class="hljs-number">7</span> * <span class="hljs-number">24</span> * <span class="hljs-number">60</span> * <span class="hljs-number">60</span> * <span class="hljs-number">1000</span>).toISOString();

<span class="hljs-keyword">const</span> mergedPRs = <span class="hljs-keyword">await</span> github.callTool(<span class="hljs-string">'search_pull_requests'</span>, {
  query: <span class="hljs-string">`is:pr is:merged merged:&gt;<span class="hljs-subst">${oneWeekAgo}</span>`</span>,
  repo: <span class="hljs-string">'myorg/myrepo'</span>
});

<span class="hljs-comment">// 2. Use CLI for git operations (better than MCP for complex git workflows)</span>
<span class="hljs-keyword">const</span> cloneDir = <span class="hljs-string">'/workspace/analysis'</span>;
<span class="hljs-keyword">const</span> gitClone = <span class="hljs-keyword">new</span> Deno.Command(<span class="hljs-string">'git'</span>, {
  args: [<span class="hljs-string">'clone'</span>, <span class="hljs-string">'--depth'</span>, <span class="hljs-string">'1'</span>, <span class="hljs-string">'https://github.com/myorg/myrepo.git'</span>, cloneDir],
  stdout: <span class="hljs-string">'piped'</span>,
  stderr: <span class="hljs-string">'piped'</span>
});
<span class="hljs-keyword">await</span> gitClone.output();

<span class="hljs-comment">// 3. Use npm package for specialized analysis</span>
<span class="hljs-keyword">import</span> { analyze } <span class="hljs-keyword">from</span> <span class="hljs-string">'npm:code-complexity-analyzer@^2'</span>;
<span class="hljs-keyword">import</span> { format } <span class="hljs-keyword">from</span> <span class="hljs-string">'npm:date-fns@^3'</span>;

<span class="hljs-keyword">const</span> complexity = <span class="hljs-keyword">await</span> analyze(cloneDir);

<span class="hljs-comment">// 4. Generate report (all processing happens in sandbox)</span>
<span class="hljs-keyword">const</span> report = {
  week: format(<span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(), <span class="hljs-string">'MMM dd, yyyy'</span>),
  prsAnalyzed: mergedPRs.length,
  averageComplexity: complexity.average,
  highComplexityFiles: complexity.files
    .filter(<span class="hljs-function"><span class="hljs-params">f</span> =&gt;</span> f.score &gt; <span class="hljs-number">10</span>)
    .map(<span class="hljs-function"><span class="hljs-params">f</span> =&gt;</span> f.path),
  topContributors: <span class="hljs-built_in">Object</span>.entries(
    mergedPRs.reduce(<span class="hljs-function">(<span class="hljs-params">acc, pr</span>) =&gt;</span> {
      acc[pr.author] = (acc[pr.author] || <span class="hljs-number">0</span>) + <span class="hljs-number">1</span>;
      <span class="hljs-keyword">return</span> acc;
    }, {})
  )
  .sort(<span class="hljs-function">(<span class="hljs-params">[,a], [,b]</span>) =&gt;</span> b - a)
  .slice(<span class="hljs-number">0</span>, <span class="hljs-number">5</span>)
};

<span class="hljs-comment">// 5. Only the summary returns to the model (not raw PR data, git output, or analysis details)</span>
<span class="hljs-keyword">return</span> report;
</code></pre>
<p><strong>Token savings:</strong></p>
<ul>
<li><p>Without Conductor: ~50,000 tokens (all PR data, git output, analysis results in context)</p>
</li>
<li><p>With Conductor: ~800 tokens (just the summary report)</p>
</li>
</ul>
<p><strong>Execution time:</strong></p>
<ul>
<li><p>Without Conductor: ~8 seconds (sequential MCP calls)</p>
</li>
<li><p>With Conductor: ~3 seconds (parallel execution in sandbox)</p>
</li>
</ul>
<hr />
<h3 id="heading-security-considerations">Security Considerations</h3>
<ul>
<li><p>Admin must carefully configure permissions</p>
</li>
<li><p>Code execution always carries risk (even in sandboxes)</p>
</li>
<li><p>Monitor executions and set reasonable timeouts</p>
</li>
<li><p>Use <code>--cached-only</code> in production (no dynamic package fetching)</p>
</li>
<li><p>Never use <code>--allow-all</code></p>
</li>
</ul>
<hr />
<h2 id="heading-getting-started">Getting Started</h2>
<h3 id="heading-1-install-no-installation-needed">1. Install (No Installation Needed!)</h3>
<p>MCP Conductor is available on JSR:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"mcpServers"</span>: {
    <span class="hljs-attr">"mcp-conductor"</span>: {
      <span class="hljs-attr">"command"</span>: <span class="hljs-string">"deno"</span>,
      <span class="hljs-attr">"args"</span>: [
        <span class="hljs-string">"run"</span>, <span class="hljs-string">"--allow-read"</span>, <span class="hljs-string">"--allow-write"</span>, <span class="hljs-string">"--allow-net"</span>, 
        <span class="hljs-string">"--allow-env"</span>, <span class="hljs-string">"--allow-run=deno"</span>,
        <span class="hljs-string">"jsr:@conductor/mcp@^0.1"</span>, <span class="hljs-string">"stdio"</span>
      ]
    }
  }
}
</code></pre>
<h3 id="heading-2-configure-permissions">2. Configure Permissions</h3>
<p>Set admin-controlled permissions:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"env"</span>: {
    <span class="hljs-attr">"MCP_CONDUCTOR_WORKSPACE"</span>: <span class="hljs-string">"${userHome}/.mcp-conductor/workspace"</span>,
    <span class="hljs-attr">"MCP_CONDUCTOR_RUN_ARGS"</span>: <span class="hljs-string">"allow-read=/workspace,allow-write=/workspace,allow-net,allow-run=gh,git"</span>
  }
}
</code></pre>
<h3 id="heading-3-optional-configure-mcp-servers">3. Optional: Configure MCP Servers</h3>
<p>Create <code>~/.mcp-conductor/mcp-config.json</code>:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"mcpServers"</span>: {
    <span class="hljs-attr">"github"</span>: {
      <span class="hljs-attr">"command"</span>: <span class="hljs-string">"npx"</span>,
      <span class="hljs-attr">"args"</span>: [<span class="hljs-string">"-y"</span>, <span class="hljs-string">"@modelcontextprotocol/server-github"</span>],
      <span class="hljs-attr">"env"</span>: { <span class="hljs-attr">"GITHUB_PERSONAL_ACCESS_TOKEN"</span>: <span class="hljs-string">"..."</span> }
    }
  }
}
</code></pre>
<h3 id="heading-4-start-using">4. Start Using</h3>
<p>The model can now execute code with all three integration options available.</p>
<hr />
<h2 id="heading-conclusion">Conclusion</h2>
<p>MCP Conductor provides a <strong>secure Deno sandbox</strong> accessible via MCP's <code>run_deno_code</code> tool. What makes it powerful isn't complex orchestration or content-addressed storage it's <strong>flexibility</strong>:</p>
<p><strong>Three ways to integrate:</strong></p>
<ol>
<li><p>Call MCP servers via <code>mcpFactory</code> (reuse ecosystem)</p>
</li>
<li><p>Shell out to CLI tools (use mature tooling)</p>
</li>
<li><p>Import npm/JSR packages (specialized functionality)</p>
</li>
</ol>
<p><strong>Mix them freely:</strong></p>
<ul>
<li><p>Use GitHub MCP server for simple queries</p>
</li>
<li><p>Use GitHub CLI for complex git operations</p>
</li>
<li><p>Use Octokit SDK for fine-grained API control</p>
</li>
<li><p>All in one execution, choosing the best tool for each subtask</p>
</li>
</ul>
<p><strong>Security Considerations</strong></p>
<ul>
<li><p>Admin must carefully configure permissions</p>
</li>
<li><p>Code execution always carries risk (even in sandboxes)</p>
</li>
<li><p>Monitor executions and set reasonable timeouts</p>
</li>
<li><p>Use <code>--cached-only</code> in production (no dynamic package fetching)</p>
</li>
<li><p>Never use <code>--allow-all</code></p>
</li>
</ul>
<hr />
<h2 id="heading-sources-amp-further-reading">Sources &amp; Further Reading</h2>
<ol>
<li><p><strong>Anthropic Engineering: "Code execution with MCP: building more efficient AI agents"</strong><br /> <a target="_blank" href="https://www.anthropic.com/engineering/code-execution-with-mcp">anthropic.com/engineering/code-execution-with-mcp</a><br /> Introduces the pattern of code-based MCP orchestration for token efficiency.</p>
</li>
<li><p><strong>Pydantic:</strong> <code>mcp-run-python</code><br /> <a target="_blank" href="https://github.com/pydantic/mcp-run-python">github.com/pydantic/mcp-run-python</a><br /> Two-process security model for sandboxed code execution.</p>
</li>
<li><p><strong>MCP Conductor Repository</strong><br /> <a target="_blank" href="https://github.com/niradler/mcp-conductor">github.com/niradler/mcp-conductor</a><br /> Full documentation, examples, and security guidelines.</p>
</li>
<li><p><strong>Deno Security Model</strong><br /> <a target="_blank" href="https://docs.deno.com/runtime/fundamentals/security">docs.deno.com/runtime/fundamentals/security</a><br /> Capability-based permission system documentation.</p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[AG UI Protocol vs MCP UI: Which One Should You Use?]]></title><description><![CDATA[The way agents interact with users is changing fast. Some teams need real-time, production-grade collaboration between humans and agents. Others care more about embedding rich, interactive UI elements directly into agent responses.
That’s where AG UI...]]></description><link>https://blog.niradler.com/ag-ui-protocol-vs-mcp-ui-which-one-should-you-use</link><guid isPermaLink="true">https://blog.niradler.com/ag-ui-protocol-vs-mcp-ui-which-one-should-you-use</guid><category><![CDATA[mcp]]></category><category><![CDATA[UI]]></category><category><![CDATA[agentic AI]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Mon, 22 Sep 2025 15:19:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/gcHFXsdcmJE/upload/d74d2c0d0348a99f624badcd9041506d.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The way agents interact with users is changing fast. Some teams need real-time, production-grade collaboration between humans and agents. Others care more about embedding rich, interactive UI elements directly into agent responses.</p>
<p>That’s where <strong>AG UI Protocol</strong> and <strong>MCP UI</strong> come in. They share a similar mission—making agent-user interaction smoother—but they take very different paths to get there.</p>
<h2 id="heading-two-philosophies-of-agentui-interaction">Two philosophies of agent–UI interaction</h2>
<p><strong>AG UI Protocol</strong> is built around <strong>event streams</strong>. Agents don’t send UI elements directly—they send events: lifecycle updates, tool calls, token-by-token messages, or state diffs. The frontend consumes those events with a client SDK (React, Vue, etc.) and re-renders the UI in real time. Think <em>“chatting with an agent that’s typing in front of you”</em> or <em>“watching tool calls unfold as they happen.”</em></p>
<p><strong>MCP UI</strong>, on the other hand, is all about <strong>embedding UI resources</strong>. Instead of events, the agent returns a <code>UIResource</code> that might be raw HTML, an external app URL, or a <strong>Shopify Remote DOM component tree</strong>. The client renders this in a sandbox—usually an iframe—or directly through Remote DOM. Think <em>“the agent gives you a form, a product gallery, or a media player right inside the chat window.”</em></p>
<h2 id="heading-what-it-feels-like-for-the-user">What it feels like for the user</h2>
<ul>
<li><p>With <strong>AG UI Protocol</strong>, the user experiences something like Google Docs with an AI collaborator: the UI updates continuously, users can intervene mid-process, and agents can hand off tasks between each other seamlessly.</p>
</li>
<li><p>With <strong>MCP UI</strong>, the user feels like they’re interacting with mini-apps spawned by the agent: a shopping widget, a chart viewer, a quiz form, all sandboxed for security.</p>
</li>
</ul>
<p>Both experiences are valuable—but very different.</p>
<h2 id="heading-where-ag-ui-protocol-shines">Where AG UI Protocol shines</h2>
<p>If your application depends on <strong>real-time collaboration</strong>, AG UI Protocol is hard to beat. It’s already running in production at enterprises and supports multi-agent orchestration, live state synchronization, and human-in-the-loop workflows.</p>
<p>Use it for things like:</p>
<ul>
<li><p>AI coding assistants that stream results as you type</p>
</li>
<li><p>Financial/trading dashboards where latency matters</p>
</li>
<li><p>Real-time customer support where agents and humans co-pilot a session</p>
</li>
<li><p>Any system that requires streaming state across multiple agents or clients</p>
</li>
</ul>
<p>The ecosystem is strong, with SDKs in TypeScript and Python, and integrations with LangGraph, CrewAI, Mastra, and others. If you’re aiming for production stability and scalability, AG UI Protocol is the pragmatic choice.</p>
<h2 id="heading-where-mcp-ui-makes-sense">Where MCP UI makes sense</h2>
<p>MCP UI is younger, but don’t underestimate it—especially with Shopify’s <strong>Remote DOM</strong> backing. It’s purpose-built for teams already inside the <strong>MCP ecosystem</strong> who want to enrich agent responses with actual UI components, not just text.</p>
<p>It’s ideal for:</p>
<ul>
<li><p>E-commerce experiences (product carousels, checkout forms)</p>
</li>
<li><p>Interactive data visualizations embedded directly in chat</p>
</li>
<li><p>Content management or education tools where interactivity is more important than raw speed</p>
</li>
<li><p>Teams experimenting with new UI paradigms inside MCP</p>
</li>
</ul>
<p>If your priority is <strong>visual richness</strong> and <strong>UI extensibility</strong>, and you’re comfortable with evolving APIs, MCP UI is the better fit.</p>
<h2 id="heading-quick-comparison-table">Quick Comparison Table</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Dimension</td><td><strong>AG UI Protocol</strong></td><td><strong>MCP UI</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Interaction model</strong></td><td>Streams JSON events (text, tool calls, state patches) → frontend SDK updates UI in real time</td><td>Returns <code>UIResource</code> objects (HTML, URL, Remote DOM) → rendered in sandboxed iframe or Shopify Remote DOM</td></tr>
<tr>
<td><strong>User experience</strong></td><td>Live, collaborative updates (like Google Docs with AI)</td><td>Embedded, interactive widgets inside agent responses</td></tr>
<tr>
<td><strong>Best for</strong></td><td>Real-time collaboration, multi-agent orchestration, production apps</td><td>Rich, visual components inside MCP ecosystem (forms, product catalogs, media players)</td></tr>
<tr>
<td><strong>Performance</strong></td><td>Sub-ms event processing, scalable, transport-agnostic (SSE, WS, HTTP/2)</td><td>Adds iframe/Remote DOM overhead, performance depends on component complexity</td></tr>
<tr>
<td><strong>Ecosystem maturity</strong></td><td>7,800+ stars, enterprise deployments, SDKs for multiple frameworks</td><td>280 stars, experimental, backed by Shopify’s Remote DOM, small but growing</td></tr>
<tr>
<td><strong>Ease of use</strong></td><td>Quick setup (<code>create-ag-ui-app</code>), intuitive event model, strong docs</td><td>Requires MCP infra, iframe security expertise, APIs still evolving</td></tr>
<tr>
<td><strong>Security model</strong></td><td>Transport + state sync; trust handled at app level</td><td>Strong sandboxing via iframe + Remote DOM, strict isolation</td></tr>
<tr>
<td><strong>When to choose</strong></td><td>You need <strong>production-ready, streaming, low-latency collaboration</strong></td><td>You want <strong>UI-rich, sandboxed components inside MCP responses</strong></td></tr>
</tbody>
</table>
</div><h2 id="heading-thinking-in-trade-offs">Thinking in trade-offs</h2>
<p>So how do you decide? It comes down to what matters more for your product:</p>
<ul>
<li><p><strong>Do you need streaming, low-latency collaboration with enterprise reliability?</strong> → Go with <strong>AG UI Protocol</strong>.</p>
</li>
<li><p><strong>Do you want to give users embedded, sandboxed components inside responses—forms, widgets, visual apps—especially if you’re already using MCP?</strong> → Try <strong>MCP UI</strong>.</p>
</li>
</ul>
<p>Some teams may even combine them: AG UI Protocol for the real-time backbone, MCP UI for rich component embedding where visual interactivity adds value.</p>
]]></content:encoded></item><item><title><![CDATA[Cursor as an AI SRE]]></title><description><![CDATA[Every week, we see a new “AI SRE” announcement. Some look like magic, others more like marketing.But here’s the real superpower: you can create your own AI SRE today, by letting an AI work with the same CLI tools you already use daily.
I built a Curs...]]></description><link>https://blog.niradler.com/cursor-as-an-ai-sre</link><guid isPermaLink="true">https://blog.niradler.com/cursor-as-an-ai-sre</guid><category><![CDATA[ai-sre]]></category><category><![CDATA[cursor]]></category><category><![CDATA[SRE]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Sun, 24 Aug 2025 20:13:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1754654743428/d23305b8-0587-473e-a1df-57423cdee9f6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every week, we see a new “AI SRE” announcement. Some look like magic, others more like marketing.<br />But here’s the real superpower: <strong>you can create your own AI SRE today,</strong> by letting an AI work with the same CLI tools you already use daily.</p>
<p>I built a <strong>Cursor rule file</strong> that turns an AI into a <em>collaborative SRE copilot</em>, able to run <code>kubectl</code>, <code>helm</code>, <code>argocd</code>, <code>istioctl</code>, and more, all under your control.</p>
<hr />
<h2 id="heading-why-give-ai-your-sre-toolbox">Why Give AI Your SRE Toolbox?</h2>
<p>As human SREs, we already have the skills and instincts. But incidents eat time on repetitive steps:</p>
<ul>
<li><p>Checking service discovery &amp; endpoints</p>
</li>
<li><p>Inspecting Istio routes</p>
</li>
<li><p>Pulling Helm diffs</p>
</li>
<li><p>Reading logs &amp; events</p>
</li>
<li><p>Hunting down recent config changes</p>
</li>
</ul>
<p>AI can do those steps <strong>fast</strong>, and with our <strong>approval</strong>, so we can focus on judgment calls, not keystrokes.</p>
<hr />
<h2 id="heading-what-is-a-cursorrules-file">What is a <code>.cursorrules</code> File?</h2>
<p><code>.cursorrules</code> files are <strong>configuration files</strong> that provide Cursor AI with specific, always-on instructions.<br />Think of them as the <strong>blueprint</strong> for how the AI should behave, what tools it should use, and how it should structure its responses.</p>
<h2 id="heading-inside-the-cursor-rule-file">Inside the Cursor Rule File</h2>
<p>We designed the rule file to make the AI:</p>
<ol>
<li><p><strong>Context-Aware</strong></p>
<ul>
<li><p>First confirm cluster/context, namespace, and affected service/workload.</p>
</li>
<li><p>Detect which tools are installed in the cluster.</p>
</li>
</ul>
</li>
<li><p><strong>Structured in Reasoning</strong></p>
<ul>
<li><p>Always produce a <strong>Diagnostic Plan</strong> before running anything.</p>
</li>
<li><p>Steps cover service health, routing, workload status, logs/events, and recent changes.</p>
</li>
</ul>
</li>
<li><p><strong>Safe by Default</strong></p>
<ul>
<li><p>Read-only diagnostics first.</p>
</li>
<li><p>Approval gates before any change.</p>
</li>
<li><p>Rollback plans for every mitigation.</p>
</li>
</ul>
</li>
<li><p><strong>Multi-Tool Fluent</strong></p>
<ul>
<li><p>If Istio exists, check routing.</p>
</li>
<li><p>If ArgoCD is present, diff last known good state.</p>
</li>
<li><p>If Helm is installed, compare release histories.</p>
</li>
</ul>
</li>
<li><p><strong>Human in the Loop</strong></p>
<ul>
<li><p>You approve every action.</p>
</li>
<li><p>The AI explains what it’s doing, no black boxes.</p>
</li>
</ul>
</li>
</ol>
<hr />
<h2 id="heading-why-its-structured-this-way">Why It’s Structured This Way</h2>
<ul>
<li><p><strong>Consistency</strong>: Every incident starts with the same high-quality checklist, so nothing gets missed.</p>
</li>
<li><p><strong>Transparency</strong>: The “Diagnostic Plan” makes the AI’s thought process visible, not hidden.</p>
</li>
<li><p><strong>Tool-Agnostic</strong>: The AI can dynamically use whatever’s in your cluster, from plain <code>kubectl</code> to full GitOps pipelines.</p>
</li>
<li><p><strong>Safety</strong>: Two approval gates (diagnostics, then changes) mean no accidental breakage.</p>
</li>
<li><p><strong>Auditability</strong>: Each command is logged with the reason it’s being run.</p>
</li>
</ul>
<hr />
<h2 id="heading-heres-the-ai-in-action">Here’s the AI in Action</h2>
<p>Below is an actual run from Cursor using our rule file, in this case, debugging a <strong>CrashLoopBackOff</strong> issue.</p>
<p><img src="https://raw.githubusercontent.com/niradler/local-ai-sre/refs/heads/master/cursor/cursor-demo.gif" alt class="image--center mx-auto" /></p>
<blockquote>
<p><strong>User:</strong> help me debug the issue<br /><code>kubectl get pods -n scenario-1753115617359</code><br /><em>(Output shows pod in CrashLoopBackOff)</em></p>
</blockquote>
<hr />
<p><strong>Cursor’s Structured Diagnosis</strong></p>
<ul>
<li><p>Found pod restarting 424 times over 35 hours.</p>
</li>
<li><p>Pulled logs &amp; inspected related secret.</p>
</li>
<li><p>Discovered <code>APP_MODE</code> was set to <code>broken</code> instead of <code>safe</code>.</p>
</li>
<li><p>Confirmed the app exits immediately if <code>APP_MODE</code> is not <code>safe</code>.</p>
</li>
</ul>
<hr />
<p><strong>Proposed Mitigation Options</strong></p>
<ol>
<li><p><strong>Patch Secret</strong> (recommended):</p>
<pre><code class="lang-bash"> kubectl patch secret availability-issue-bad-secret-change-a37a6159-secret \
   -n scenario-1753115617359 \
   -p <span class="hljs-string">'{"data":{"APP_MODE":"c2FmZQ=="}}'</span>
</code></pre>
</li>
<li><p>Create a new secret and update the deployment.</p>
</li>
<li><p>Update deployment to hardcode <code>APP_MODE</code> temporarily.</p>
</li>
</ol>
<p>Rollback plan included. Awaited approval before doing anything.</p>
<p>The full chat is at <a target="_blank" href="https://github.com/niradler/local-ai-sre/blob/master/cursor/chat.md">chat.md</a>.</p>
<hr />
<h2 id="heading-the-takeaway">The Takeaway</h2>
<p>By letting AI use the same battle-tested tools you already rely on, but with a structured, safe, and transparent process, you turn it into a real teammate.</p>
<p>You still own the decisions.<br />You just get there faster.</p>
<hr />
<p>The full <a target="_blank" href="https://github.com/niradler/local-ai-sre/blob/master/cursor/.cursorrules">.cursorrule</a>s file.</p>
<p>Feel free to visit <a target="_blank" href="http://komodor.com">komodor.com</a> for more info on AI SRE.</p>
]]></content:encoded></item><item><title><![CDATA[Effortlessly Manage MCP Configurations with Cross-Platform MCP Manager CLI]]></title><description><![CDATA[Managing MCP (Model Context Protocol) configurations across different environments like Cursor and Claude Desktop can be a real headache. Copying files, keeping track of updates, merging configs, or sharing them between machines often turns into a ma...]]></description><link>https://blog.niradler.com/effortlessly-manage-mcp-configurations-with-cross-platform-mcp-manager-cli</link><guid isPermaLink="true">https://blog.niradler.com/effortlessly-manage-mcp-configurations-with-cross-platform-mcp-manager-cli</guid><category><![CDATA[mcp]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[claude]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Sun, 24 Aug 2025 20:06:30 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/yXrl2lkGJ-k/upload/cab03bc6d63c81bfe72a3064b8b732f8.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Managing <strong>MCP (Model Context Protocol)</strong> configurations across different environments like <strong>Cursor</strong> and <strong>Claude Desktop</strong> can be a real headache. Copying files, keeping track of updates, merging configs, or sharing them between machines often turns into a manual, error-prone process.</p>
<p>That’s why we built <strong>Portable MCP Manager</strong> - an open-source, cross-platform CLI tool that makes managing, sharing, and syncing MCP configurations simple and reliable.</p>
<hr />
<h2 id="heading-why-portable-mcp-manager">✨ Why Portable MCP Manager?</h2>
<p>Portable MCP Manager solves a common problem for developers and power users who rely on MCP-based tools. Instead of juggling files and paths across environments, you now have a single, streamlined way to:</p>
<ul>
<li><p>🔄 <strong>Replace</strong> or <strong>merge</strong> configs from multiple sources</p>
</li>
<li><p>🔗 Fetch configs directly from <strong>URLs</strong> or <strong>GitHub Gists</strong> (even private ones)</p>
</li>
<li><p>📱 Manage configs for <strong>Cursor</strong> and <strong>Claude Desktop</strong></p>
</li>
<li><p>🖥️ Run seamlessly on <strong>Windows</strong> and <strong>macOS</strong></p>
</li>
<li><p>📤 Upload configs back to <strong>GitHub Gists</strong> with ease</p>
</li>
<li><p>🎯 Use an <strong>interactive mode</strong> if you prefer guided setup</p>
</li>
<li><p>🔀 Handle <strong>multi-file gists</strong> without hassle</p>
</li>
<li><p>🛠️ Take advantage of <strong>GitHub CLI</strong> if installed</p>
</li>
</ul>
<hr />
<h2 id="heading-getting-started">⚡ Getting Started</h2>
<p>Install it globally via npm:</p>
<pre><code class="lang-bash">npm install -g portable-mcp
</code></pre>
<p>Or just run it on-demand with npx:</p>
<pre><code class="lang-bash">npx portable-mcp --<span class="hljs-built_in">help</span>
</code></pre>
<hr />
<h2 id="heading-quick-start">🚦 Quick Start</h2>
<h3 id="heading-1-interactive-mode-beginner-friendly">1. Interactive Mode (Beginner-Friendly)</h3>
<pre><code class="lang-bash">portable-mcp
</code></pre>
<h3 id="heading-2-replace-a-configuration-from-gist">2. Replace a Configuration from Gist</h3>
<pre><code class="lang-bash">portable-mcp replace --<span class="hljs-built_in">type</span> cursor --gist 50007c6cd60db13cf8477b3b5caa96f0
</code></pre>
<h3 id="heading-3-merge-configurations">3. Merge Configurations</h3>
<pre><code class="lang-bash">portable-mcp merge --<span class="hljs-built_in">type</span> claude --gist abc123def456
</code></pre>
<h3 id="heading-4-upload-config-to-gist">4. Upload Config to Gist</h3>
<pre><code class="lang-bash">portable-mcp store --<span class="hljs-built_in">type</span> cursor --private
</code></pre>
<hr />
<h2 id="heading-example-workflows">🔍 Example Workflows</h2>
<h3 id="heading-complete-sync-cycle">Complete Sync Cycle</h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Check default config path</span>
portable-mcp path --<span class="hljs-built_in">type</span> cursor  

<span class="hljs-comment"># Replace with a config from Gist</span>
portable-mcp replace --<span class="hljs-built_in">type</span> cursor --gist abc123def456  

<span class="hljs-comment"># Upload updated config back</span>
portable-mcp store --<span class="hljs-built_in">type</span> cursor --private
</code></pre>
<h3 id="heading-multi-file-gist-example">Multi-file Gist Example</h3>
<pre><code class="lang-bash">portable-mcp replace --<span class="hljs-built_in">type</span> claude --gist abc123def456/claude.json
</code></pre>
<hr />
<h2 id="heading-github-integration">🔗 GitHub Integration</h2>
<ul>
<li><p>If you have <strong>GitHub CLI</strong> installed, the tool uses it automatically.</p>
</li>
<li><p>Otherwise, set a <code>GITHUB_TOKEN</code> for direct API uploads.</p>
</li>
</ul>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> GITHUB_TOKEN=ghp_your_personal_access_token_here
</code></pre>
<hr />
<h2 id="heading-supported-config-paths">📁 Supported Config Paths</h2>
<ul>
<li><p><strong>Cursor</strong></p>
<ul>
<li><p>Windows: <code>C:\Users\&lt;username&gt;\.cursor\mcp.json</code></p>
</li>
<li><p>macOS: <code>~/Library/Application Support/Cursor/User/mcp.json</code></p>
</li>
</ul>
</li>
<li><p><strong>Claude Desktop</strong></p>
<ul>
<li><p>Windows: <code>C:\Users\&lt;username&gt;\AppData\Roaming\Claude\claude_desktop_config.json</code></p>
</li>
<li><p>macOS: <code>~/Library/Application Support/Claude/claude_desktop_config.json</code></p>
</li>
</ul>
</li>
</ul>
<hr />
<p>Source code: <a target="_blank" href="https://github.com/niradler/portable-mcp">https://github.com/niradler/portable-mcp</a></p>
]]></content:encoded></item><item><title><![CDATA[Say Goodbye to Outdated Dependencies]]></title><description><![CDATA[One of the most frustrating things when building with AI coding assistants is when they happily write code for you… but the dependencies they suggest are outdated or simply wrong. You know the drill:

You ask your LLM to add a library.

It writes npm...]]></description><link>https://blog.niradler.com/say-goodbye-to-outdated-dependencies</link><guid isPermaLink="true">https://blog.niradler.com/say-goodbye-to-outdated-dependencies</guid><category><![CDATA[cursor]]></category><category><![CDATA[mcp server]]></category><category><![CDATA[mcp]]></category><category><![CDATA[vibe coding]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Tue, 19 Aug 2025 08:58:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1755593601367/0af09b39-b4aa-42a7-9348-dd7bc15a257f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>One of the most frustrating things when building with AI coding assistants is when they happily write code for you… but the dependencies they suggest are outdated or simply wrong. You know the drill:</p>
<ul>
<li><p>You ask your LLM to add a library.</p>
</li>
<li><p>It writes <code>npm install some-package@1.0.0</code>… but the latest version is actually <code>3.2.4</code>.</p>
</li>
<li><p>Or worse, it invents a version that doesn’t even exist.</p>
</li>
</ul>
<p>Suddenly, instead of building features, you’re wrestling with dependency mismatches.</p>
<p>I built the <strong>Dependency MCP Server</strong> to solve exactly this problem.</p>
<hr />
<h2 id="heading-what-it-does">What It Does</h2>
<p>The Dependency MCP is an <strong>MCP (Model Context Protocol) server</strong> that lets your AI development tools <strong>check dependencies across multiple registries in real time</strong>.</p>
<p>That means whenever the AI suggests a package, it can instantly verify:</p>
<ul>
<li><p>✅ What the <strong>latest version</strong> is</p>
</li>
<li><p>✅ Whether a <strong>specific version exists</strong></p>
</li>
<li><p>✅ Full <strong>package metadata</strong> including all available versions</p>
</li>
<li><p>✅ Run <strong>bulk checks</strong> across your entire dependency list</p>
</li>
</ul>
<p>It supports all the major registries: <strong>npm, PyPI, Maven, NuGet, RubyGems,</strong> <a target="_blank" href="http://Crates.io"><strong>Crates.io</strong></a><strong>, and Go modules</strong>.</p>
<p>So whether you’re in Node.js, Python, Java, .NET, Ruby, Rust, or Go, the AI never needs to guess again.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1755593654633/443b6645-f0c3-4d9c-b30f-d4a370cf96c7.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-why-it-matters">Why It Matters</h2>
<p>This isn’t just about convenience. Correct dependencies make a huge difference in:</p>
<ul>
<li><p><strong>Reducing errors</strong> – no more wasted time debugging phantom versions.</p>
</li>
<li><p><strong>Faster development</strong> – the AI can give you working install commands immediately.</p>
</li>
<li><p><strong>CI/CD reliability</strong> – bulk validation tools let you enforce correct versions across pipelines.</p>
</li>
<li><p><strong>Security audits</strong> – you can fetch full package metadata for reviews.</p>
</li>
</ul>
<p>Instead of trusting the model’s memory (which is always a little stale), you give it a direct way to <strong>ask the source of truth</strong>: the package registries themselves.</p>
<hr />
<h2 id="heading-fun-fact-how-i-use-it-with-cursor">Fun Fact: How I Use It with Cursor</h2>
<p>One of the coolest things about building MCP servers is that you can actually use <strong>Cursor’s own MCP integration</strong> to create an <em>automatic development feedback loop</em>.</p>
<p>The flow looks like this: you start by writing or updating your MCP server, then register it in Cursor’s configuration. Once it’s wired up, Cursor can immediately call your server and test each of the tools you’ve exposed. If something doesn’t work, Cursor surfaces the error right away - and even suggests fixes or improvements directly in your editor. That means you’re not just coding; you’re effectively building software that validates itself, with an AI co-pilot constantly reviewing and stress-testing your MCP server in real time.  </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1755593737588/41367b7c-0f25-4427-afad-30d147141da9.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-try-it-out">Try It Out</h2>
<p>Wire it into your AI tooling (like Claude Desktop or Cursor), just drop this into your config:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"mcpServers"</span>: {
    <span class="hljs-attr">"dependency-checker"</span>: {
      <span class="hljs-attr">"command"</span>: <span class="hljs-string">"npx"</span>,
      <span class="hljs-attr">"args"</span>: [<span class="hljs-string">"dependency-mcp"</span>]
    }
  }
}
</code></pre>
<hr />
<h2 id="heading-closing-thoughts">Closing Thoughts</h2>
<p>Building with AI is amazing - but only if the code it writes actually runs. With the Dependency MCP server, you never have to worry about outdated or invalid dependencies again.</p>
<p>For me, this has completely changed the dev cycle: instead of correcting the AI, the AI corrects itself.</p>
]]></content:encoded></item><item><title><![CDATA[How Code Feedback MCP Enhances AI-Generated Code Quality]]></title><description><![CDATA[TL;DR: As most new code is now generated by LLMs, Code Feedback MCP provides the critical feedback loop that enables AI to automatically validate, fix, and improve its own code generation in real-time. It's the missing piece that transforms unreliabl...]]></description><link>https://blog.niradler.com/how-code-feedback-mcp-enhances-ai-generated-code-quality</link><guid isPermaLink="true">https://blog.niradler.com/how-code-feedback-mcp-enhances-ai-generated-code-quality</guid><category><![CDATA[Open Source]]></category><category><![CDATA[mcp]]></category><category><![CDATA[mcp server]]></category><category><![CDATA[code]]></category><category><![CDATA[coding]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Sat, 28 Jun 2025 20:13:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/40XgDxBfYXM/upload/e2e39c3102a3e75496f6845dca3abc5f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>TL;DR</strong>: As most new code is now generated by LLMs, Code Feedback MCP provides the critical feedback loop that enables AI to automatically validate, fix, and improve its own code generation in real-time. It's the missing piece that transforms unreliable AI code into production-ready, quality-assured software.</p>
<hr />
<p><strong>The reality of modern development has fundamentally shifted.</strong> Studies show that over 80% of new code is now generated or co-written by AI assistants like Claude, GPT-4, and Copilot. But here's the problem: LLMs generate code without knowing if it actually compiles, passes tests, or meets quality standards.</p>
<p>This creates a dangerous gap between code generation and code validation that traditional development workflows weren't designed to handle.</p>
<p><strong>Code Feedback MCP Server</strong> bridges this gap by providing LLMs with the real-time feedback they need to generate better code, catch their own mistakes, and iteratively improve until the code meets production standards.</p>
<h2 id="heading-the-llm-code-generation-revolution-and-its-problem">The LLM Code Generation Revolution (And Its Problem)</h2>
<p>The shift to AI-generated code has been dramatic:</p>
<ul>
<li><p><strong>Volume</strong>: Developers report 40-60% of their code is now AI-generated</p>
</li>
<li><p><strong>Speed</strong>: What took hours now takes minutes with AI assistance</p>
</li>
<li><p><strong>Scope</strong>: LLMs can generate entire modules, APIs, and applications</p>
</li>
<li><p><strong>Languages</strong>: AI excels across TypeScript, Python, Go, and more</p>
</li>
</ul>
<p>But this revolution comes with a critical flaw:</p>
<h3 id="heading-llms-generate-code-blind"><strong>LLMs Generate Code Blind</strong></h3>
<p>When an LLM writes code, it has no way to know:</p>
<ul>
<li><p>❌ Does the code actually compile?</p>
</li>
<li><p>❌ Are there syntax or type errors?</p>
</li>
<li><p>❌ Do the tests pass?</p>
</li>
<li><p>❌ Does it follow project conventions?</p>
</li>
<li><p>❌ Are there security vulnerabilities?</p>
</li>
<li><p>❌ Is the performance acceptable?</p>
</li>
</ul>
<p><strong>The result?</strong> Developers spend significant time debugging and fixing AI-generated code, often losing the productivity gains that AI promised to deliver.</p>
<h2 id="heading-the-solution-real-time-ai-code-validation-amp-auto-correction">The Solution: Real-Time AI Code Validation &amp; Auto-Correction</h2>
<p>Code Feedback MCP Server creates the essential feedback loop for AI-generated code by providing:</p>
<h3 id="heading-llm-first-architecture">🤖 <strong>LLM-First Architecture</strong></h3>
<ul>
<li><p><strong>Instant feedback</strong>: LLMs get immediate validation results after code generation</p>
</li>
<li><p><strong>Structured responses</strong>: JSON format that LLMs can parse and act upon</p>
</li>
<li><p><strong>Error descriptions</strong>: Detailed explanations that help LLMs understand and fix issues</p>
</li>
<li><p><strong>Iterative improvement</strong>: Enable LLMs to generate → validate → fix → repeat until perfect</p>
</li>
</ul>
<h3 id="heading-the-ai-quality-loop">🔄 <strong>The AI Quality Loop</strong></h3>
<ul>
<li><p><strong>Generate</strong>: LLM creates code based on requirements</p>
</li>
<li><p><strong>Validate</strong>: Code Feedback MCP tests compilation, syntax, and quality</p>
</li>
<li><p><strong>Analyze</strong>: Advanced prompts provide detailed feedback and suggestions</p>
</li>
<li><p><strong>Iterate</strong>: LLM uses feedback to automatically improve the code</p>
</li>
<li><p><strong>Verify</strong>: Final validation ensures production readiness</p>
</li>
</ul>
<h3 id="heading-multi-language-ai-validation">🧠 <strong>Multi-Language AI Validation</strong></h3>
<ul>
<li><p>TypeScript/JavaScript: Catch type errors that confuse LLMs</p>
</li>
<li><p>Python: Detect linting issues LLMs commonly miss</p>
</li>
<li><p>Go: Ensure compilation and formatting standards</p>
</li>
<li><p>Extensible for any language your LLMs work with</p>
</li>
</ul>
<h3 id="heading-auto-correction-capabilities">🔄 <strong>Auto-Correction Capabilities</strong></h3>
<ul>
<li><p><strong>Smart error reporting</strong>: LLMs understand exactly what went wrong</p>
</li>
<li><p><strong>Fix suggestions</strong>: Prompts provide specific guidance for improvements</p>
</li>
<li><p><strong>Iterative refinement</strong>: LLMs can automatically apply fixes and re-validate</p>
</li>
<li><p><strong>Quality enforcement</strong>: Ensure AI-generated code meets your standards</p>
</li>
</ul>
<h3 id="heading-developer-experience-first">⚡ <strong>Developer Experience First</strong></h3>
<ul>
<li><p>Cross-platform support (Windows, macOS, Linux)</p>
</li>
<li><p>Simple configuration with <code>mcp-config.json</code></p>
</li>
<li><p>Comprehensive error reporting with actionable feedback</p>
</li>
<li><p>Integration with popular editors and CI systems</p>
</li>
</ul>
<h2 id="heading-game-changing-ai-that-fixes-its-own-code">Game-Changing: AI That Fixes Its Own Code</h2>
<p><strong>The Old Way (Broken):</strong></p>
<pre><code class="lang-plaintext">Human: "Create a TypeScript API handler"
LLM: *Generates code with type errors*
Human: *Discovers errors during manual testing*
Human: "Fix these 5 compilation errors"  
LLM: *Attempts fixes, introduces new issues*
Human: *Repeats cycle multiple times*
</code></pre>
<p><strong>The New Way (Code Feedback MCP):</strong></p>
<pre><code class="lang-plaintext">Human: "Create a TypeScript API handler"
LLM: *Generates code*
LLM: *Automatically validates with Code Feedback MCP*
Code Feedback MCP: *Returns structured error feedback*
LLM: *Automatically fixes issues based on feedback*
LLM: *Re-validates until compilation succeeds*  
Human: *Receives working, tested, quality code*
</code></pre>
<p>Here's what this looks like in practice:</p>
<p><strong>Step 1: Initial Generation</strong></p>
<pre><code class="lang-typescript"><span class="hljs-comment">// LLM generates this code</span>
<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">handleRequest</span>(<span class="hljs-params">req: Request</span>): <span class="hljs-title">Response</span> </span>{
  <span class="hljs-keyword">const</span> data = req.body.parse(); <span class="hljs-comment">// Error: parse() doesn't exist</span>
  <span class="hljs-keyword">return</span> { status: <span class="hljs-number">200</span>, data }; <span class="hljs-comment">// Error: wrong return type</span>
}
</code></pre>
<p><strong>Step 2: Automatic Validation</strong> The AI immediately checks if the code actually works by running it through the validation system.</p>
<p><strong>Step 3: Smart Feedback</strong> Instead of cryptic error messages, the AI gets clear, actionable feedback:</p>
<ul>
<li><p>"Hey, <code>parse()</code> doesn't exist on request bodies - try <code>json()</code> instead"</p>
</li>
<li><p>"This return type won't work - you need to return a proper Response object"</p>
</li>
</ul>
<p><strong>Step 4: LLM Auto-Correction</strong></p>
<pre><code class="lang-typescript"><span class="hljs-comment">// LLM automatically fixes based on feedback</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">handleRequest</span>(<span class="hljs-params">req: Request</span>): <span class="hljs-title">Promise</span>&lt;<span class="hljs-title">Response</span>&gt; </span>{
  <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> req.body.json();
  <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> Response(<span class="hljs-built_in">JSON</span>.stringify({ data }), { 
    status: <span class="hljs-number">200</span>,
    headers: { <span class="hljs-string">'Content-Type'</span>: <span class="hljs-string">'application/json'</span> }
  });
}
</code></pre>
<p><strong>Step 5: Success!</strong> The AI validates again and confirms everything works perfectly. No more broken code!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751141310802/14207482-9f5c-4bb8-8f42-700c97a49056.gif" alt class="image--center mx-auto" /></p>
<h2 id="heading-advanced-ai-code-intelligence-beyond-basic-validation">Advanced AI Code Intelligence: Beyond Basic Validation</h2>
<p>The real breakthrough is the <strong>AI-powered prompt system</strong> that enables LLMs to perform sophisticated code analysis and self-improvement:</p>
<h3 id="heading-intelligent-code-review">🔍 <strong>Intelligent Code Review</strong></h3>
<p>Think of this as having a senior developer review your AI's code instantly. The LLM can ask for detailed feedback on any code it generates, focusing on specific areas like performance, security, or maintainability.</p>
<h3 id="heading-automated-security-amp-bug-detection">🛡️ <strong>Automated Security &amp; Bug Detection</strong></h3>
<p>Your AI can now audit its own code for vulnerabilities and common mistakes - catching issues that even experienced developers sometimes miss.</p>
<h3 id="heading-performance-optimization">🚀 <strong>Performance Optimization</strong></h3>
<p>The LLM can analyze its own code for performance bottlenecks and automatically implement optimizations. It's like having a performance expert built right into your coding workflow.</p>
<h2 id="heading-transformative-use-cases-for-ai-development">Transformative Use Cases for AI Development</h2>
<h3 id="heading-1-autonomous-code-generation-amp-validation">1. <strong>Autonomous Code Generation &amp; Validation</strong></h3>
<p>LLMs can now generate complete, working features without human intervention:</p>
<pre><code class="lang-plaintext">Human: "Build a REST API for user management with TypeScript"

AI Process:
1. Generate initial code structure
2. Validate with Code Feedback MCP → Find compilation errors
3. Auto-fix type issues and re-validate
4. Run security audit → Detect missing input validation
5. Add validation and re-audit
6. Performance analysis → Optimize database queries
7. Final validation → All checks pass
</code></pre>
<p><strong>Result:</strong> Production-ready code delivered in minutes, not hours.</p>
<h3 id="heading-2-smart-code-improvement">2. <strong>Smart Code Improvement</strong></h3>
<p>Instead of just accepting the first code an AI generates, the LLM can continuously improve existing code by asking for refactoring suggestions, then automatically applying and testing improvements.</p>
<h3 id="heading-3-intelligent-problem-solving">3. <strong>Intelligent Problem Solving</strong></h3>
<p>When the AI hits an error (like a missing dependency), it can automatically diagnose and fix the issue - installing packages, updating configurations, or correcting code - then continue with the original task seamlessly.</p>
<h3 id="heading-4-full-stack-project-management">4. <strong>Full-Stack Project Management</strong></h3>
<p>The AI can work across different programming languages in the same project, ensuring everything works together. Generate a Python backend, TypeScript frontend, and Go microservice - all validated and tested as a complete system.</p>
<h2 id="heading-the-future-is-here-ai-that-actually-works">The Future is Here: AI That Actually Works</h2>
<p>Here's what's really exciting - LLMs can now handle the complete development cycle:</p>
<p>✅ <strong>Generate code</strong> from your ideas and requirements<br />✅ <strong>Test compilation</strong> and fix syntax errors instantly<br />✅ <strong>Run and validate tests</strong> to ensure functionality<br />✅ <strong>Check for security issues</strong> and patch vulnerabilities<br />✅ <strong>Optimize performance</strong> based on real analysis<br />✅ <strong>Maintain quality standards</strong> consistently<br />✅ <strong>Handle project setup</strong> and dependencies automatically</p>
<p>This isn't some far-off future - it's working right now.</p>
<h3 id="heading-why-this-changes-everything">Why This Changes Everything</h3>
<p>The biggest pain point in AI coding has always been the back-and-forth debugging dance:</p>
<ol>
<li><p>Ask AI to write code</p>
</li>
<li><p>Copy code and try to run it</p>
</li>
<li><p>Hit errors and spend time figuring out what's wrong</p>
</li>
<li><p>Go back to AI with error messages</p>
</li>
<li><p>Repeat until something works (maybe)</p>
</li>
</ol>
<p><strong>Code Feedback MCP cuts through all of that.</strong> The AI can now test, debug, and perfect its code automatically, giving you working solutions on the first attempt.</p>
<h2 id="heading-ready-to-supercharge-your-ai-development">Ready to Supercharge Your AI Development?</h2>
<p>Code Feedback MCP Server is the missing infrastructure for reliable AI-generated code. Whether you're building with Claude, GPT-4, or any other LLM, this tool ensures your AI can generate production-ready code autonomously.</p>
<p><strong>Perfect for:</strong></p>
<ul>
<li><p>🤖 <strong>AI-First Development Teams</strong> seeking autonomous code generation</p>
</li>
<li><p>🚀 <strong>Startups</strong> moving fast with AI-generated features</p>
</li>
<li><p>🏢 <strong>Enterprise Teams</strong> needing quality assurance for AI code</p>
</li>
<li><p>👨‍💻 <strong>Individual Developers</strong> maximizing AI productivity</p>
</li>
<li><p>🔧 <strong>DevTools Builders</strong> creating intelligent development experiences</p>
</li>
</ul>
<p><strong>Get started today:</strong></p>
<ul>
<li>📚 <a target="_blank" href="https://github.com/niradler/code-feedback">GitHub Repository</a></li>
</ul>
<p><strong>Contributing is welcome!</strong> Add support for new languages, improve existing tools, or enhance the prompt system. Every contribution makes the tool better for the entire community.</p>
<hr />
<p><em>The era of unreliable AI-generated code is over. With Code Feedback MCP, your LLMs can generate, validate, and fix code autonomously — delivering production-ready solutions that just work. Join the autonomous development revolution today.</em></p>
<p><strong>Tags:</strong> #LLM #AICode #CodeGeneration #MCP #DevTools #TypeScript #Python #Go #Automation #CodeQuality #OpenSource</p>
]]></content:encoded></item><item><title><![CDATA[The Death of the Dashboard: Why Your Next SaaS Should Feel Like a Conversation]]></title><description><![CDATA[Part 1 of our "Chat-First SaaS" series
Remember the last time you tried to find something in your company's project management tool? You probably clicked through three different menus, filtered a table, exported a CSV, and then realized you were look...]]></description><link>https://blog.niradler.com/the-death-of-the-dashboard-why-your-next-saas-should-feel-like-a-conversation</link><guid isPermaLink="true">https://blog.niradler.com/the-death-of-the-dashboard-why-your-next-saas-should-feel-like-a-conversation</guid><category><![CDATA[AI]]></category><category><![CDATA[UX]]></category><category><![CDATA[chatbot]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Fri, 06 Jun 2025 22:12:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/JKUTrJ4vK00/upload/329c5a15003571d291df8ea3c2e8c328.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Part 1 of our "Chat-First SaaS" series</em></p>
<p>Remember the last time you tried to find something in your company's project management tool? You probably clicked through three different menus, filtered a table, exported a CSV, and then realized you were looking at last month's data. Sound familiar?</p>
<p>We've all been there. Modern business software has become a maze of tabs, dashboards, and dropdown menus that would make even a GPS jealous. But what if I told you there's a better way—one that feels as natural as asking a colleague for help?</p>
<h2 id="heading-the-problem-with-todays-software">The Problem with Today's Software</h2>
<p>Think about your typical workday. You probably juggle between 5-10 different software tools: your CRM, project manager, analytics dashboard, HR system, and who knows what else. Each one has its own logic, its own way of organizing information, and its own special quirks that you have to remember.</p>
<p>It's like having to learn a new language for every room in your house. Want to check the thermostat? Better remember the HVAC control system. Need to turn on the TV? Time to decipher that remote with 47 buttons.</p>
<p><strong>The result?</strong> Most people use maybe 20% of their software's capabilities. The other 80% remains hidden behind menus they've never clicked or features they don't know exist.</p>
<h2 id="heading-enter-the-chat-first-revolution">Enter the Chat-First Revolution</h2>
<p>Now imagine this: instead of clicking through endless menus, you simply type or say what you need, just like you would ask a knowledgeable coworker.</p>
<p><strong>Instead of:</strong></p>
<ol>
<li>Navigate to Reports → Customer Analytics → Filter by Date Range → Select Metrics → Export to Excel → Open Excel → Create Charts</li>
</ol>
<p><strong>You simply say:</strong>
"Show me this month's customer growth by region"</p>
<p>And boom—there's your answer, complete with charts, insights, and follow-up suggestions.</p>
<h2 id="heading-why-conversation-changes-everything">Why Conversation Changes Everything</h2>
<h3 id="heading-1-no-more-hide-and-seek-with-features">1. <strong>No More Hide-and-Seek with Features</strong></h3>
<p>Ever spent 10 minutes looking for a feature you used last week? With chat-first software, you don't need to remember where things are—you just ask for what you need. It's like having a super-smart assistant who knows every corner of your business tools.</p>
<h3 id="heading-2-context-that-actually-matters">2. <strong>Context That Actually Matters</strong></h3>
<p>Traditional software treats every click as a fresh start. But conversations have memory. When you ask about "those customers from yesterday's discussion," the system remembers. When you say "create a similar report," it knows what you're referring to.</p>
<h3 id="heading-3-learning-as-you-go">3. <strong>Learning as You Go</strong></h3>
<p>The best part? Chat-first tools get smarter the more you use them. They learn your patterns, suggest shortcuts, and even anticipate what you might need next. It's like having software that actually pays attention.</p>
<h2 id="heading-real-world-magic-how-it-actually-works">Real-World Magic: How It Actually Works</h2>
<p>Let's look at a practical example. Imagine you're a customer success manager dealing with a tricky situation:</p>
<p><strong>Traditional Way:</strong></p>
<ul>
<li>Log into CRM → Search customer → Open profile → Check ticket history → Switch to billing system → Verify payment status → Open communication tool → Draft email → Switch back to CRM → Update customer notes</li>
</ul>
<p><strong>Chat-First Way:</strong>
Simply type: "Customer TechCorp seems unhappy, help me understand what's going on"</p>
<p>The system responds with everything you need: recent tickets, payment history, communication timeline, and even suggests next steps—all in one conversation.</p>
<h2 id="heading-but-what-about-power-users">But What About Power Users?</h2>
<p>"This sounds nice for beginners," you might think, "but I'm fast with the current system."</p>
<p>Here's the thing: chat-first doesn't mean dumbed-down. Power users can create custom shortcuts, chain multiple commands together, and even build their own workflows through conversation. Think of it as upgrading from clicking buttons to having a conversation with an expert who never gets tired or forgets details.</p>
<p>You can still get your data fast—actually faster—but now your colleagues can too, without a 3-hour training session.</p>
<h2 id="heading-the-ripple-effect-on-teams">The Ripple Effect on Teams</h2>
<p>When software becomes conversational, something interesting happens to teams:</p>
<p><strong>Knowledge Sharing Gets Effortless:</strong> Instead of "How do I run that report again?" conversations, team members can just ask the system directly.</p>
<p><strong>Onboarding Becomes Natural:</strong> New hires don't need to memorize complex workflows—they learn by asking questions, just like they would with a mentor.</p>
<p><strong>Collaboration Improves:</strong> When everyone can access information through natural conversation, meetings become about decisions, not data-gathering.</p>
<h2 id="heading-beyond-the-hype-real-benefits">Beyond the Hype: Real Benefits</h2>
<p>This isn't just about being trendy or following the latest tech fad. Chat-first software addresses real business problems:</p>
<ul>
<li><strong>Reduced Training Costs:</strong> Less time teaching people where buttons are, more time on actual work</li>
<li><strong>Fewer Mistakes:</strong> When software guides you through processes conversationally, you're less likely to miss steps</li>
<li><strong>Better Decision Making:</strong> Faster access to information means decisions based on current data, not last week's report</li>
<li><strong>Happier Employees:</strong> Less time fighting with software, more time doing meaningful work</li>
</ul>
<h2 id="heading-what-this-means-for-your-business">What This Means for Your Business</h2>
<p>We're not talking about replacing every tool overnight. This is about reimagining how people interact with business software. Instead of adapting to rigid interfaces, software adapts to how people naturally communicate.</p>
<p>Think about it: when you need information from a colleague, you don't hand them a form to fill out—you have a conversation. Why should software be any different?</p>
<h2 id="heading-the-future-is-conversational">The Future Is Conversational</h2>
<p>This shift is already happening. The companies that recognize it early will have teams that are more efficient, more collaborative, and frankly, happier with their tools.</p>
<p>The question isn't whether chat-first software will become mainstream—it's whether your organization will be an early adopter or playing catch-up.</p>
<hr />
<p><em>Coming up next in our series: "From Dashboards to Dialogues: The Technical Magic Behind Chat-First Software" where we'll explore how this actually works behind the scenes.</em></p>
<p><strong>What do you think?</strong> Have you ever wished you could just ask your software what you need instead of hunting for it? Share your biggest software frustration in the comments—you might be surprised how many others share it.</p>
<hr />
<p><em>This is Part 1 of our Chat-First SaaS series. Follow along as we explore how conversation is transforming business software, one chat at a time.</em></p>
]]></content:encoded></item><item><title><![CDATA[GPU Inference Servers Comparison: Triton vs TGI vs vLLM vs Ollama]]></title><description><![CDATA[The landscape of GPU inference servers has evolved dramatically, with several powerful solutions competing for dominance in serving large language models (LLMs) and other AI workloads. As organizations scale their AI deployments, choosing the right i...]]></description><link>https://blog.niradler.com/gpu-inference-servers-comparison-triton-vs-tgi-vs-vllm-vs-ollama</link><guid isPermaLink="true">https://blog.niradler.com/gpu-inference-servers-comparison-triton-vs-tgi-vs-vllm-vs-ollama</guid><category><![CDATA[tgi]]></category><category><![CDATA[ollama]]></category><category><![CDATA[triton]]></category><category><![CDATA[GPU]]></category><category><![CDATA[NVIDIA]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Thu, 29 May 2025 10:04:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/nqCEPrvnLKQ/upload/57a154b210230a0bf8de9dc2bb5403d3.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The landscape of GPU inference servers has evolved dramatically, with several powerful solutions competing for dominance in serving large language models (LLMs) and other AI workloads. As organizations scale their AI deployments, choosing the right inference gateway becomes critical for performance, cost efficiency, and developer experience.</p>
<p>This comprehensive analysis examines the leading GPU inference servers: NVIDIA Triton Inference Server, Text Generation Inference (TGI), vLLM, and Ollama.</p>
<h2 id="heading-what-are-inference-servers-a-primer-for-ai-practitioners">What Are Inference Servers? A Primer for AI Practitioners</h2>
<p>If you're working with AI models but haven't yet deployed them in production, you might wonder: "Why do I need an inference server when I can just run my model directly?" The answer lies in the gap between research/development and production deployment.</p>
<h3 id="heading-core-features-every-inference-server-provides">Core Features Every Inference Server Provides</h3>
<h4 id="heading-1-concurrent-request-handling">1. <strong>Concurrent Request Handling</strong> 🔄</h4>
<p><strong>What it does</strong>: Serves multiple users simultaneously instead of processing one request at a time.</p>
<p><strong>Why you need it</strong>: Your Jupyter notebook can't handle 1,000 users hitting your model at once. Inference servers use queuing, batching, and resource management to serve multiple requests efficiently.</p>
<p><strong>Real impact</strong>: Transform from serving 1 user to serving 1,000+ concurrent users.</p>
<h4 id="heading-2-dynamic-batching">2. <strong>Dynamic Batching</strong> 📦</h4>
<p><strong>What it does</strong>: Automatically groups individual requests into batches for more efficient GPU utilization.</p>
<p><strong>Why you need it</strong>: GPUs are designed for parallel processing. Processing requests one-by-one wastes 90%+ of your expensive GPU resources.</p>
<p><strong>Example</strong>: Instead of processing 10 text requests individually, the server batches them together, reducing inference time from 10 seconds to 2 seconds total.</p>
<h4 id="heading-3-model-optimization">3. <strong>Model Optimization</strong> ⚡</h4>
<p><strong>What it does</strong>: Automatically optimizes your model for faster inference through quantization, kernel fusion, and memory layout optimization.</p>
<p><strong>Why you need it</strong>: Your research model might run fine on your laptop but be too slow/expensive for production. Inference servers can make models 2-10x faster without code changes.</p>
<p><strong>Techniques include</strong>:</p>
<ul>
<li><strong>Quantization</strong>: Converting FP32 models to FP16 or INT8 (2-4x memory reduction)</li>
<li><strong>Kernel Fusion</strong>: Combining operations to reduce GPU memory transfers</li>
<li><strong>Memory Layout Optimization</strong>: Reorganizing data for faster access</li>
</ul>
<h4 id="heading-4-auto-scaling">4. <strong>Auto-Scaling</strong> 📈</h4>
<p><strong>What it does</strong>: Automatically spins up/down server instances based on demand.</p>
<p><strong>Why you need it</strong>: Your AI app might have 10 users at 3 AM but 10,000 users at peak hours. Manual scaling is impossible.</p>
<p><strong>Cost impact</strong>: Pay for resources only when needed, potentially reducing infrastructure costs by 60-80%.</p>
<h4 id="heading-5-health-monitoring-amp-observability">5. <strong>Health Monitoring &amp; Observability</strong> 📊</h4>
<p><strong>What it does</strong>: Tracks model performance, latency, throughput, error rates, and resource usage.</p>
<p><strong>Why you need it</strong>: When your model starts giving wrong answers or becomes slow, you need to know immediately, not when users complain.</p>
<p><strong>Metrics tracked</strong>:</p>
<ul>
<li>Requests per second</li>
<li>Average latency (P50, P95, P99)</li>
<li>Error rates</li>
<li>GPU/CPU utilization</li>
<li>Memory usage</li>
</ul>
<h4 id="heading-6-ab-testing-amp-model-versioning">6. <strong>A/B Testing &amp; Model Versioning</strong> 🧪</h4>
<p><strong>What it does</strong>: Allows you to test new model versions against existing ones with real traffic.</p>
<p><strong>Why you need it</strong>: You've trained a new model version that performs better in testing, but will it perform better with real users? Inference servers let you route 10% of traffic to the new model to compare performance.</p>
<h4 id="heading-7-caching-amp-request-deduplication">7. <strong>Caching &amp; Request Deduplication</strong> 💾</h4>
<p><strong>What it does</strong>: Stores results of common requests and detects duplicate requests to avoid redundant computation.</p>
<p><strong>Why you need it</strong>: If 100 users ask "What's the weather like?", why run inference 100 times? Caching can reduce compute costs by 30-70% for many applications.</p>
<h2 id="heading-inference-server-comparison">Inference Server Comparison</h2>
<h3 id="heading-1-text-generation-inference-tgi">1. Text Generation Inference (TGI) 🚀</h3>
<p><strong>Developer</strong>: Hugging Face<br /><strong>Specialty</strong>: Production-ready LLM serving with enterprise focus</p>
<h4 id="heading-key-strengths">Key Strengths:</h4>
<ul>
<li><strong>Hugging Face Ecosystem Integration</strong>: Seamless compatibility with HF model hub and datasets</li>
<li><strong>Production-Ready Architecture</strong>: Built for high-throughput, low-latency production environments</li>
<li><strong>Advanced Quantization</strong>: Supports FP16 and INT8 quantization for memory optimization</li>
<li><strong>Kubernetes-Native</strong>: Designed for cloud-scale deployments with auto-scaling capabilities</li>
<li><strong>Asynchronous Processing</strong>: Handles high-volume concurrent requests efficiently</li>
</ul>
<h4 id="heading-performance-profile">Performance Profile:</h4>
<ul>
<li><strong>Best For</strong>: Text generation, chatbots, customer support systems</li>
<li><strong>Memory Management</strong>: Excellent with FP16/INT8 quantization</li>
<li><strong>Batch Processing</strong>: Full support with dynamic batching</li>
<li><strong>Scaling</strong>: Enterprise-grade Kubernetes integration</li>
</ul>
<h4 id="heading-real-world-applications">Real-World Applications:</h4>
<p>TGI excels in customer support chatbots where consistent response times and automatic scaling based on demand fluctuations are crucial. Its tight integration with Hugging Face makes it ideal for teams already invested in the HF ecosystem.</p>
<h4 id="heading-benchmarking-results">Benchmarking Results:</h4>
<ul>
<li>MPT-30B achieved 35.43 tokens/second with a remarkable 36.23% performance increase over TensorRT-LLM in specific configurations</li>
</ul>
<hr />
<h3 id="heading-2-vllm-very-large-language-models">2. vLLM (Very Large Language Models) ⚡</h3>
<p><strong>Developer</strong>: UC Berkeley<br /><strong>Specialty</strong>: Memory-efficient inference with innovative architecture</p>
<h4 id="heading-revolutionary-features">Revolutionary Features:</h4>
<ul>
<li><strong>PagedAttention</strong>: Breakthrough memory management technique that optimizes GPU memory usage</li>
<li><strong>Token Parallelism</strong>: Reduces memory requirements by breaking inference into manageable tokens</li>
<li><strong>Dynamic Batching</strong>: Automatic batch size optimization based on available resources</li>
<li><strong>Multi-GPU Distribution</strong>: Efficient model distribution across multiple GPUs</li>
</ul>
<h4 id="heading-performance-profile-1">Performance Profile:</h4>
<ul>
<li><strong>Best For</strong>: Large-scale LLM inference in resource-constrained environments</li>
<li><strong>Memory Efficiency</strong>: Industry-leading memory optimization</li>
<li><strong>Cost Optimization</strong>: Ideal for educational and enterprise applications focused on cost efficiency</li>
<li><strong>GPU Utilization</strong>: Maximizes throughput while minimizing memory waste</li>
</ul>
<h4 id="heading-benchmarking-highlights">Benchmarking Highlights:</h4>
<ul>
<li><strong>SOLAR-10.7B</strong>: Peak performance of 57.86 tokens/second</li>
<li><strong>Qwen1.5-14B</strong>: 46.84 tokens/second, consistently outperforming Triton configurations</li>
<li>Strong performance across multiple model sizes, often matching or exceeding TensorRT-LLM</li>
</ul>
<h4 id="heading-why-its-gaining-traction">Why It's Gaining Traction:</h4>
<p>The Reddit community notes that "vLLM is catching up with TensorRT-LLM" in performance while maintaining superior user-friendliness and memory efficiency.</p>
<hr />
<h3 id="heading-3-nvidia-triton-inference-server">3. NVIDIA Triton Inference Server 🏢</h3>
<p><strong>Developer</strong>: NVIDIA<br /><strong>Specialty</strong>: Enterprise-grade multi-model inference platform</p>
<h4 id="heading-enterprise-grade-features">Enterprise-Grade Features:</h4>
<ul>
<li><strong>Framework Agnostic</strong>: Supports PyTorch, TensorFlow, ONNX, and custom backends</li>
<li><strong>Multi-Model Serving</strong>: Deploy multiple models simultaneously on a single server</li>
<li><strong>Model Ensembles</strong>: Chain models together for complex AI pipelines (e.g., text-to-vision workflows)</li>
<li><strong>NVIDIA Hardware Optimization</strong>: Maximum performance on NVIDIA GPU stack</li>
<li><strong>Dynamic Batching</strong>: Efficient GPU utilization through intelligent batching</li>
</ul>
<h4 id="heading-performance-profile-2">Performance Profile:</h4>
<ul>
<li><strong>Best For</strong>: Enterprise environments requiring diverse model deployments</li>
<li><strong>Versatility</strong>: Handles everything from recommendation engines to image classification</li>
<li><strong>Integration</strong>: Deep NVIDIA ecosystem integration</li>
<li><strong>Scalability</strong>: Multi-GPU scaling with model parallelism</li>
</ul>
<h4 id="heading-real-world-applications-1">Real-World Applications:</h4>
<p>Triton dominates enterprise settings where multiple AI models need deployment across diverse workloads. It's particularly strong in recommendation engines, image classification pipelines, and NLP applications requiring high throughput.</p>
<h4 id="heading-community-developments">Community Developments:</h4>
<p>Active development of Triton-co-pilot projects to streamline model deployment and conversion processes, making Triton more accessible to developers.</p>
<hr />
<h3 id="heading-4-ollama">4. Ollama 🛠️</h3>
<p><strong>Developer</strong>: Ollama Team<br /><strong>Specialty</strong>: Developer-friendly local LLM deployment</p>
<h4 id="heading-developer-centric-features">Developer-Centric Features:</h4>
<ul>
<li><strong>LLaMA Optimization</strong>: Specifically designed for LLaMA-based models</li>
<li><strong>Cross-Platform</strong>: Seamless operation on macOS, Windows, and Linux</li>
<li><strong>Zero-Setup Philosophy</strong>: Minimal configuration required for rapid prototyping</li>
<li><strong>CLI and API Support</strong>: User-friendly command-line interface with comprehensive API</li>
<li><strong>Local and Cloud Flexibility</strong>: Deploy models locally or scale to cloud environments</li>
</ul>
<h4 id="heading-performance-profile-3">Performance Profile:</h4>
<ul>
<li><strong>Best For</strong>: Rapid prototyping, small teams, solo developers</li>
<li><strong>Learning Curve</strong>: Extremely accessible for developers new to LLMs</li>
<li><strong>Deployment Speed</strong>: Fastest time-to-deployment for LLaMA models</li>
<li><strong>Resource Requirements</strong>: Optimized for resource-conscious environments</li>
</ul>
<h4 id="heading-ideal-use-cases">Ideal Use Cases:</h4>
<p>Perfect for developers creating language analysis tools, personal AI assistants, and research-focused applications. Its ease of use makes it popular among smaller teams and individual developers who need quick LLaMA model deployment.</p>
<hr />
<h2 id="heading-performance-comparison-matrix">Performance Comparison Matrix</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>TGI</td><td>vLLM</td><td>Triton</td><td>Ollama</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Primary Use Case</strong></td><td>Production text generation</td><td>Large-scale LLM inference</td><td>Multi-model enterprise deployment</td><td>Local LLaMA development</td></tr>
<tr>
<td><strong>Memory Efficiency</strong></td><td>Very Good (FP16/INT8)</td><td>Excellent (PagedAttention)</td><td>Good (Dynamic allocation)</td><td>Limited</td></tr>
<tr>
<td><strong>Multi-GPU Support</strong></td><td>Yes</td><td>Yes (Distribution)</td><td>Yes (Parallelism)</td><td>Limited</td></tr>
<tr>
<td><strong>Framework Support</strong></td><td>Hugging Face focus</td><td>LLM-optimized</td><td>Framework agnostic</td><td>LLaMA-specific</td></tr>
<tr>
<td><strong>Deployment Complexity</strong></td><td>Medium</td><td>Medium</td><td>High</td><td>Very Low</td></tr>
<tr>
<td><strong>Batch Processing</strong></td><td>Full support</td><td>Dynamic optimization</td><td>Advanced batching</td><td>Limited</td></tr>
<tr>
<td><strong>Enterprise Features</strong></td><td>Good</td><td>Moderate</td><td>Excellent</td><td>Basic</td></tr>
<tr>
<td><strong>Community Support</strong></td><td>Strong (HF ecosystem)</td><td>Growing rapidly</td><td>Mature</td><td>Active</td></tr>
</tbody>
</table>
</div><h2 id="heading-choosing-the-right-solution-decision-framework">Choosing the Right Solution: Decision Framework</h2>
<h3 id="heading-choose-vllm-if">Choose <strong>vLLM</strong> if:</h3>
<ul>
<li>Memory efficiency is critical</li>
<li>You're running large models (13B+ parameters)</li>
<li>Cost optimization is a primary concern</li>
<li>You need cutting-edge performance with user-friendly deployment</li>
</ul>
<h3 id="heading-choose-tgi-if">Choose <strong>TGI</strong> if:</h3>
<ul>
<li>You're heavily invested in the Hugging Face ecosystem</li>
<li>Production reliability and enterprise features are essential</li>
<li>You need robust quantization support</li>
<li>Kubernetes-native deployment is required</li>
</ul>
<h3 id="heading-choose-triton-if">Choose <strong>Triton</strong> if:</h3>
<ul>
<li>You're running diverse model types (not just LLMs)</li>
<li>Enterprise multi-model deployment is needed</li>
<li>You require model ensemble capabilities</li>
<li>NVIDIA hardware optimization is crucial</li>
</ul>
<h3 id="heading-choose-ollama-if">Choose <strong>Ollama</strong> if:</h3>
<ul>
<li>You're prototyping with LLaMA models</li>
<li>Rapid deployment with minimal setup is priority</li>
<li>You're working in small teams or as an individual developer</li>
<li>Cross-platform compatibility is important</li>
</ul>
<h2 id="heading-performance-benchmarking-insights">Performance Benchmarking Insights</h2>
<h3 id="heading-key-findings">Key Findings</h3>
<ol>
<li><strong>vLLM's Rising Performance</strong>: Consistently competitive with TensorRT-LLM while maintaining superior usability</li>
<li><strong>TGI's Specialized Strength</strong>: Exceptional performance on specific model types (MPT-30B showed 36% improvement)</li>
<li><strong>Triton's Versatility</strong>: Strong across diverse workloads but slightly behind in pure LLM inference</li>
<li><strong>Memory Efficiency Leader</strong>: vLLM's PagedAttention provides the best memory utilization</li>
</ol>
<h2 id="heading-future-outlook-and-recommendations">Future Outlook and Recommendations</h2>
<p>The GPU inference server landscape is rapidly evolving, with each solution addressing different market needs:</p>
<p><strong>For Startups and Scale-ups</strong>: vLLM offers the best balance of performance, cost efficiency, and ease of use.</p>
<p><strong>For Enterprise Deployments</strong>: Triton provides the most comprehensive feature set for complex, multi-model environments.</p>
<p><strong>For Hugging Face-Centric Teams</strong>: TGI remains the natural choice with its ecosystem integration and production readiness.</p>
<p><strong>For Rapid Prototyping</strong>: Ollama continues to excel in developer velocity and accessibility.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>There's no universal "best" GPU inference server—the optimal choice depends on your specific requirements, technical constraints, and organizational context. The good news is that all major solutions are actively developed and continuously improving, ensuring robust options regardless of your chosen path.</p>
<p>Consider running your own benchmarks with your specific models and infrastructure to make the most informed decision. The performance landscape is dynamic, and what works best today may evolve as these technologies mature.</p>
]]></content:encoded></item><item><title><![CDATA[DevOps ❤️ Developers: How Kro Finally Made K8s Peace Possible 🤝]]></title><description><![CDATA[The Ancient War ⚔️
Picture this: It's 3 AM. Your Slack is blowing up. The deployment is broken. Again.
Developer: "I just need to deploy my app! Why do I need 47 YAML files?!"
DevOps: "You can't just wing it! Where's your ServiceAccount? Your network...]]></description><link>https://blog.niradler.com/devops-developers-how-kro-finally-made-k8s-peace-possible</link><guid isPermaLink="true">https://blog.niradler.com/devops-developers-how-kro-finally-made-k8s-peace-possible</guid><category><![CDATA[KRO]]></category><category><![CDATA[k8s]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Helm]]></category><category><![CDATA[helm chart]]></category><category><![CDATA[Kubernetes]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Thu, 29 May 2025 09:00:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/n95VMLxqM2I/upload/dadd3972bad4b45302b5fdb764b43ad0.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-the-ancient-war">The Ancient War ⚔️</h2>
<p>Picture this: It's 3 AM. Your Slack is blowing up. The deployment is broken. Again.</p>
<p><strong>Developer</strong>: "I just need to deploy my app! Why do I need 47 YAML files?!"</p>
<p><strong>DevOps</strong>: "You can't just wing it! Where's your ServiceAccount? Your network policies? Your monitoring setup?!"</p>
<p><strong>Developer</strong>: "I don't know what half of those words mean!"</p>
<p><strong>DevOps</strong>: <em>screams into the void</em></p>
<p>Sound familiar? If you've worked with Kubernetes, you've lived this nightmare. Developers want simplicity. DevOps wants control. Kubernetes gives you... complexity.</p>
<p>But what if I told you there's a new sheriff in town that's making everyone happy? 🤠</p>
<h2 id="heading-enter-kro-the-kubernetes-peacekeeper">Enter Kro: The Kubernetes Peacekeeper 🏴‍☠️</h2>
<p><strong>Kro</strong> (Kube Resource Orchestrator) is like that cool mediator friend who helps feuding roommates finally get along. It's an open-source, Kubernetes-native tool that lets you create custom APIs that are actually... <em>wait for it</em>... <strong>simple to use</strong>.</p>
<p>Here's the magic: DevOps teams define the complex stuff once, developers get a clean, simple API to work with. Everyone wins!</p>
<h2 id="heading-how-kro-works-its-magic">How Kro Works Its Magic ✨</h2>
<h3 id="heading-for-devops-teams-the-platform-heroes">For DevOps Teams (The Platform Heroes):</h3>
<p>You create a <code>ResourceGraphDefinition</code> that bundles all your best practices:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># One definition to rule them all</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">kro.run/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ResourceGraphDefinition</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">my-application</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">schema:</span>
    <span class="hljs-attr">kind:</span> <span class="hljs-string">Application</span>  <span class="hljs-comment"># This becomes your new simple API!</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">name:</span> <span class="hljs-string">string</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">string</span> <span class="hljs-string">|</span> <span class="hljs-string">default="nginx"</span>
      <span class="hljs-attr">ingress:</span>
        <span class="hljs-attr">enabled:</span> <span class="hljs-string">boolean</span> <span class="hljs-string">|</span> <span class="hljs-string">default=false</span>

  <span class="hljs-attr">resources:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">id:</span> <span class="hljs-string">deployment</span>    <span class="hljs-comment"># Your carefully crafted Deployment</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">id:</span> <span class="hljs-string">service</span>      <span class="hljs-comment"># Your perfectly configured Service  </span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">id:</span> <span class="hljs-string">ingress</span>      <span class="hljs-comment"># Your security-compliant Ingress</span>
    <span class="hljs-comment"># + all your monitoring, RBAC, policies, etc.</span>
</code></pre>
<p>You get to:</p>
<ul>
<li><p>✅ Enforce security best practices</p>
</li>
<li><p>✅ Standardize across teams</p>
</li>
<li><p>✅ Include all the "boring" stuff (monitoring, RBAC, etc.)</p>
</li>
<li><p>✅ Sleep better at night</p>
</li>
</ul>
<h3 id="heading-for-developers-the-feature-builders">For Developers (The Feature Builders):</h3>
<p>You get a beautifully simple API:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># That's it. That's the whole deployment.</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">kro.run/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Application</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">my-awesome-app</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">my-awesome-app</span>
  <span class="hljs-attr">image:</span> <span class="hljs-string">my-app:v1.2.3</span>
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>You get to:</p>
<ul>
<li><p>✅ Deploy with confidence</p>
</li>
<li><p>✅ Focus on your application logic</p>
</li>
<li><p>✅ Stop googling "kubernetes service yaml example" for the 847th time</p>
</li>
<li><p>✅ Actually ship features</p>
</li>
</ul>
<h2 id="heading-the-secret-sauce-cel-expressions">The Secret Sauce: CEL Expressions 🌶️</h2>
<p>Kro uses CEL (Common Expression Language) to wire everything together intelligently. It's like having a smart assistant that knows:</p>
<ul>
<li><p>"Oh, the service needs to point to the deployment? I got you."</p>
</li>
<li><p>"User wants ingress? Let me create that AND wire it to the service."</p>
</li>
<li><p>"Need to pass values between resources? On it."</p>
</li>
</ul>
<p>The best part? Kro automatically figures out the dependency order. No more "Service created before Deployment" errors!</p>
<h2 id="heading-what-this-actually-means">What This Actually Means 💡</h2>
<h3 id="heading-before-kro">Before Kro:</h3>
<p><strong>Developer</strong>: "I need to deploy my microservice" <strong>DevOps</strong>: "Okay, here's a 200-line YAML template. Don't forget the ServiceAccount, NetworkPolicy, PodDisruptionBudget, HorizontalPodAutoscaler, ServiceMonitor, and..." <strong>Developer</strong>: <em>eye twitching intensifies</em></p>
<h3 id="heading-after-kro">After Kro:</h3>
<p><strong>Developer</strong>: "I need to deploy my microservice" <strong>DevOps</strong>: "Cool, just set your app name and image in the Application API" <strong>Developer</strong>: "That's... it?" <strong>DevOps</strong>: "Yep! All the security and monitoring stuff is already baked in" <strong>Developer</strong>: "I... I love you"</p>
<h2 id="heading-why-this-changes-everything">Why This Changes Everything 🚀</h2>
<ol>
<li><p><strong>Developers Deploy Fearlessly</strong>: No more broken deployments because someone forgot a label</p>
</li>
<li><p><strong>DevOps Sleeps Soundly</strong>: Standards are enforced automatically</p>
</li>
<li><p><strong>Platform Teams Become Heroes</strong>: Instead of gatekeepers, they're enablers</p>
</li>
<li><p><strong>Compliance Teams Rejoice</strong>: Best practices are built-in, not bolt-on</p>
</li>
<li><p><strong>Everyone Ships Faster</strong>: Less time debugging YAML, more time building features</p>
</li>
</ol>
<h2 id="heading-but-what-about-helm">"But What About Helm?" 🎭</h2>
<p><em>I see you, Helm users!</em> Yes, Helm charts can package resources together, but here's the thing: Helm is still about managing YAML templates. With Kro, you're creating actual <strong>Kubernetes APIs</strong>.</p>
<p>Think of it this way:</p>
<ul>
<li><p><strong>Helm</strong>: "Here's a templated way to deploy complex YAML"</p>
</li>
<li><p><strong>Kro</strong>: "Here's a simple API that happens to create complex resources"</p>
</li>
</ul>
<p>You can even use them together! Your ResourceGraphDefinition could deploy Helm charts if that's your jam. Kro plays nice with everyone. 🤝</p>
<h2 id="heading-getting-started">Getting Started 🏁</h2>
<p>Want to try Kro? It's surprisingly simple:</p>
<ol>
<li><p>Install Kro in your cluster</p>
</li>
<li><p>Platform team creates ResourceGraphDefinitions</p>
</li>
<li><p>Developers use the shiny new APIs</p>
</li>
<li><p>Profit! (and inner peace)</p>
</li>
</ol>
<p>The best part? It works with your existing tools and processes. No need to rip and replace everything.</p>
<h2 id="heading-the-bottom-line">The Bottom Line 📝</h2>
<p>Kro doesn't just solve technical problems—it solves people problems. It gives developers the simplicity they crave while giving DevOps the control they need. It's like relationship therapy, but for Kubernetes teams.</p>
<p>#Kubernetes #DevOps #DeveloperExperience #Kro #PlatformEngineering #CloudNative #TeamWork #YAML #APIs</p>
]]></content:encoded></item><item><title><![CDATA[Training Our First Kubernetes Expert Model]]></title><description><![CDATA[Training the Model
We trained our first expert model in Kubernetes kubectl commands using a fine-tuning process on a specialized dataset. The dataset consists of structured prompt-command pairs where natural language queries are mapped to their respe...]]></description><link>https://blog.niradler.com/training-our-first-kubernetes-expert-model</link><guid isPermaLink="true">https://blog.niradler.com/training-our-first-kubernetes-expert-model</guid><category><![CDATA[k8s]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[models]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Nir Adler]]></dc:creator><pubDate>Mon, 24 Mar 2025 22:23:02 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/nGoCBxiaRO0/upload/e0be0f2aafe8c5c1e366cd858e18a2e5.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-training-the-model">Training the Model</h2>
<p>We trained our first expert model in Kubernetes <code>kubectl</code> commands using a fine-tuning process on a specialized dataset. The dataset consists of structured prompt-command pairs where natural language queries are mapped to their respective <code>kubectl</code> commands. This allows the model to generalize and generate accurate <code>kubectl</code> commands for a variety of Kubernetes management tasks.</p>
<h2 id="heading-why-fine-tune-a-model-for-kubectl">Why Fine-Tune a Model for <code>kubectl</code>?</h2>
<p>General-purpose AI models lack deep Kubernetes knowledge and struggle with domain-specific queries. Our goal is to fine-tune a model that can:</p>
<ul>
<li><p>Generate correct <code>kubectl</code> commands based on natural language input.</p>
</li>
<li><p>Explain command syntax and usage with step-by-step reasoning.</p>
</li>
<li><p>Identify potential errors and suggest fixes.</p>
</li>
</ul>
<h3 id="heading-our-decision-making-process">Our Decision-Making Process</h3>
<ol>
<li><p><strong>Model Selection</strong>: We use LLaMA 3.2-3B-Instruct, which balances efficiency and accuracy. It runs efficiently with 4-bit quantization, making it feasible for fine-tuning on consumer-grade GPUs.</p>
</li>
<li><p><strong>Fine-Tuning with Unsloth</strong>: We streamline model training with LoRA adapters, reducing memory usage and improving efficiency.</p>
</li>
<li><p><strong>Dataset Curation</strong>: We use multiple datasets to cover different aspects of <code>kubectl</code> usage:</p>
<ul>
<li><p><strong>Basic Commands</strong>: ComponentSoft/k8s-kubectl</p>
</li>
<li><p><strong>Advanced Scenarios</strong>: ComponentSoft/k8s-kubectl-35k</p>
</li>
<li><p><strong>Chain-of-Thought (CoT) Explanations</strong>: ComponentSoft/k8s-kubectl-cot-20k</p>
</li>
<li><p><strong>Instruction-based Training</strong>: sozercan/k8s-instructions</p>
</li>
<li><p><strong>Troubleshooting &amp; Debugging</strong>: eliashasnat/k8s-qa</p>
</li>
</ul>
</li>
<li><p><strong>Deployment with Ollama</strong>: Ollama was selected for easy packaging and distribution, making it seamless to integrate into terminal-based applications.</p>
</li>
</ol>
<h3 id="heading-training-process">Training Process</h3>
<ol>
<li><p><strong>Dataset Collection</strong>: We compiled a comprehensive dataset of <code>kubectl</code> commands mapped to natural language queries, covering a wide range of Kubernetes operations.</p>
</li>
<li><p><strong>Preprocessing</strong>: We cleaned and structured the data to ensure consistency, removing redundant entries and normalizing formatting.</p>
</li>
<li><p><strong>Fine-Tuning</strong>: We optimized the model using instruction-tuning techniques, allowing it to generate precise <code>kubectl</code> commands from user queries.</p>
</li>
<li><p><strong>Validation</strong>: The trained model was rigorously tested against various Kubernetes scenarios to ensure accuracy and reliability.</p>
</li>
<li><p><strong>Deployment</strong>: The final model was packaged as <code>niradler/k8s-operator:latest</code> for easy use within the KasK application.</p>
</li>
</ol>
<h2 id="heading-integrating-the-model-into-kask">Integrating the Model into KasK</h2>
<h3 id="heading-kask-kubernetes-assistant-terminal-app">KasK - Kubernetes Assistant Terminal App</h3>
<p>KasK is an open-source terminal-based application designed to simplify your interaction with Kubernetes clusters. With KasK, you can ask natural language questions about your Kubernetes resources, and it will generate accurate <code>kubectl</code> commands to fetch the required details. The app also provides a JSON viewer to display and explore the command output in a structured and user-friendly way.</p>
<h2 id="heading-features">Features</h2>
<ul>
<li><p><strong>Natural Language Queries</strong>: Ask questions like "Show all running pods" or "List services in all namespaces," and KasK will generate the appropriate <code>kubectl</code> command.</p>
</li>
<li><p><strong>JSON Viewer</strong>: View the output of <code>kubectl</code> commands in a tree-like structure with search and filtering capabilities.</p>
</li>
<li><p><strong>Clipboard Integration</strong>: Copy selected JSON values to your clipboard for easy sharing or further use.</p>
</li>
<li><p><strong>Dark Mode Support</strong>: Enhanced readability with dark mode styles.</p>
</li>
<li><p><strong>Keyboard Shortcuts</strong>: Navigate and interact with the app efficiently using intuitive key bindings.</p>
</li>
</ul>
<p><img src="https://raw.githubusercontent.com/niradler/kask/refs/heads/main/kask.gif" alt class="image--center mx-auto" /></p>
<h2 id="heading-usage">Usage</h2>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> https://github.com/niradler/kask.git
<span class="hljs-built_in">cd</span> kask
pip install -r requirements.txt
python main.py
</code></pre>
<p>Write your query in the "Write your prompt here" text area. For example: Show all pods in the default namespace.</p>
<p>Click the "Prompt" button or press Enter to generate the <code>kubectl</code> command and view the output.</p>
<p>Use the JSON viewer to explore the output:</p>
<ul>
<li><p>Search for specific keys or values.</p>
</li>
<li><p>Expand or collapse nodes.</p>
</li>
<li><p>Copy selected values to your clipboard.</p>
</li>
</ul>
<p>Use keyboard shortcuts for quick actions:</p>
<ul>
<li><p><code>x</code>: Expand/Collapse all nodes.</p>
</li>
<li><p><code>s</code> or <code>/</code>: Focus on the search bar.</p>
</li>
<li><p><code>c</code>: Copy the selected value.</p>
</li>
<li><p><code>q</code>: Quit the application.</p>
</li>
</ul>
<h2 id="heading-requirements">Requirements</h2>
<ul>
<li><p>Python 3.8 or higher</p>
</li>
<li><p>Ollama server with the <code>niradler/k8s-operator:latest</code> <a target="_blank" href="https://huggingface.co/niradler/k8s_operator"><mark>model</mark></a>:</p>
<ul>
<li><p><a target="_blank" href="https://huggingface.co/niradler/k8s_operator/resolve/main/unsloth.Q4_K_M.gguf?download=true">Download model</a></p>
</li>
<li><p>Use the <a target="_blank" href="https://chatgpt.com/c/Modelfile">Modelfile</a></p>
</li>
<li><p><code>ollama create k8s-operator -f Modelfile</code></p>
</li>
</ul>
</li>
<li><p><code>kubectl</code> installed and configured to access your Kubernetes cluster.</p>
</li>
</ul>
<h2 id="heading-features-for-the-future">Features for the future:</h2>
<ul>
<li><p>Limit commands (read-only) or review before execution.</p>
</li>
<li><p>Compact table view.</p>
</li>
<li><p>Tools integration.</p>
</li>
<li><p>Chat memory.</p>
</li>
<li><p>Model selector.</p>
</li>
<li><p>Ollama configuration.</p>
</li>
<li><p>UI/UX improvements.</p>
</li>
<li><p>Submit prompt with keyboard.</p>
</li>
</ul>
<p>With our first fine-tuned model powering KasK, we’re excited to continue improving Kubernetes management through AI-driven automation! 🚀</p>
]]></content:encoded></item></channel></rss>