AJ
System DesignArchitectureAI AgentsLLMsResiliencyData Engineering

Architecting a Fault-Tolerant AI Agent: From Brittle Scripts to Self-Healing SQL Pipelines

Discover how to transform a fragile, hallucination-prone LLM wrapper into a highly resilient, agentic analytics engine capable of self-correcting database errors in real-time.

Hand-drawn System Diagram of a Self-Healing Agentic SQL Pipeline

In the current rush to integrate Generative AI into enterprise platforms, a dangerous architectural anti-pattern has emerged: the linear LLM wrapper.

Wrapping an API call around a Large Language Model and asking it to generate executable code—like database SQL—is trivial. However, relying on a non-deterministic AI to consistently output perfectly formatted, syntactically correct queries without any safety nets is an engineering disaster waiting to happen.

I recently took ownership of an AI-driven analytics pipeline suffering from this exact fragility. If the LLM appended a markdown block (```sql), hallucinated a non-existent column, or made a slight syntax error, the Python execution thread would crash entirely. The baseline system had a near 0% success rate on complex queries.

To solve this, I dismantled the linear script and engineered a Fault-Tolerant Agentic Workflow. By treating the LLM not just as a text generator, but as a reasoning engine capable of self-correction, we pushed the success rate to 100%. Here is a technical breakdown of how to build resilience, security, and token efficiency into production AI systems.


Phase 1: The Anti-Pattern of Linear Execution

To understand the solution, you must first look at why basic AI pipelines fail in production. A standard text-to-SQL pipeline usually follows a linear path:

  1. Accept User Prompt.
  2. Inject Database Schema.
  3. Generate SQL via LLM.
  4. Execute SQL.

This works flawlessly in a demo environment. In production, it shatters. If the user asks for a metric requiring complex multi-join logic, smaller models often hallucinate schema relationships. The database throws a Syntax Error, the application crashes, and the user receives a generic 500 Internal Server Error.

Building a resilient system requires acknowledging that LLMs will make mistakes. You cannot architect around the assumption of perfect output; you must architect for graceful degradation and automated recovery.

Phase 2: The Agentic Retry Loop (Self-Healing)

To stop the crashes, the system needed an autonomic nervous system. I replaced the linear execution flow with a 3-Attempt Agentic Retry Loop.

Instead of allowing a database exception to crash the application, the pipeline actively catches the SQLite error (e.g., no such column: zodiac_sign). It then packages this exact error message, along with the failed SQL string, into a targeted retry payload and feeds it back to the LLM.

The Thought Process: The prompt effectively tells the model: "You attempted to run Query X, but the environment returned Error Y. Analyze your mistake and generate a corrected query." Because we utilized a high-reasoning model (a 72B parameter instruction-tuned model), the AI acts as its own debugger. If it hallucinates a column on Attempt 1, it realizes the mistake on Attempt 2, adjusts its aggregate logic, and successfully pulls the data. This closed-loop feedback mechanism transforms a static script into an autonomous, self-healing agent.

Phase 3: Zero-Trust Security Validation

When you allow an AI to generate and execute SQL directly against a database, you are inherently exposing your persistence layer to massive risk. Relying on system prompts (e.g., "Do not delete data") is insufficient; LLMs can be easily jailbroken via prompt injection.

To secure the environment, I implemented a strict, deterministic Regex-Based SQL Validator that acts as an impenetrable firewall between the AI and the database.

The Thought Process: Before any generated SQL reaches the execution layer, it must pass through the validator.

  • It explicitly blocks destructive commands (DROP, DELETE, UPDATE, ALTER).
  • It accounts for injection bypass attempts, scanning for keywords hidden behind parentheses or newlines.
  • It blocks multi-statement piggyback attacks (; DROP TABLE users;).
  • It forces a strict whitelist, requiring the query to begin with SELECT or WITH.

If the validator flags a query, the system safely aborts the execution, logs the attempt, and returns a controlled "Invalid SQL" status. Security must always rely on deterministic code, never on AI compliance.

Phase 4: Intent Routing and Token Economics

For an AI analytics tool to be truly useful, it must support multi-turn conversations. If a user asks, "What is the average latency?" and follows up with, "What about just for enterprise clients?", the model needs context.

However, mindlessly appending the entire chat history to every single API request creates massive token bloat, slowing down response times and degrading the LLM's reasoning capabilities over time.

To optimize token economics, I engineered an Intent Router. Before the main SQL generation begins, a lightweight LLM call acts as a traffic cop. It analyzes the user's prompt for ambiguous pronouns, superlatives, or continuation phrases.

  • If the prompt is a standalone question, the router ignores the chat history, saving thousands of tokens.
  • If the prompt is classified as a follow-up, the router injects a sliding window of the last three successful queries, instructing the main LLM to inherit the previous aggregations and simply modify the WHERE or ORDER BY clauses.

This dynamic routing keeps the context window lean, reducing API costs while preserving deep, conversational statefulness.


The Engineering Impact: Accuracy Over Speed

When architecting AI systems, you must often choose between raw speed and mathematical correctness. By implementing the agentic retry loop, the strict validator, and the intent router, the pipeline's latency increased from a few milliseconds to several seconds.

However, this was a deliberate, senior-level architectural trade-off. Speed without accuracy in a data analytics pipeline has zero value. By prioritizing a high-reasoning model and allowing the system the time it needs to self-correct, we achieved a 100% success rate on complex queries that previously crashed the system. The next phase of evolution involves taking this highly accurate, high-latency pipeline and using its successful outputs to fine-tune a smaller, lightning-fast 8B model, eventually achieving both perfect accuracy and sub-second latency.

Generative AI is not magic; it is just another component in your distributed system. When you wrap it in robust error handling, stateful memory management, and zero-trust security, you stop building fragile wrappers and start building true enterprise agents.


Wrestling with LLM hallucinations in production or looking to architect resilient agentic workflows? Let's Connect! I am Ankit Jaiswal, a Senior Full Stack AI Engineer specializing in the design, deployment, and optimization of highly resilient, cloud-agnostic SaaS platforms and intelligent, event-driven applications.

Get in Touch

Want to connect? Feel free to reach out with a direct question on LinkedIn, email, or X and I'll respond as soon as I can. You can also explore my code and latest projects on GitHub.