
In the current rush to integrate Generative AI into enterprise platforms, a dangerous architectural anti-pattern has emerged: the linear LLM wrapper.
Wrapping an API call around a Large Language Model and asking it to generate executable code—like database SQL—is trivial. However, relying on a non-deterministic AI to consistently output perfectly formatted, syntactically correct queries without any safety nets is an engineering disaster waiting to happen.
I recently took ownership of an AI-driven analytics pipeline suffering from this exact fragility. If the LLM appended a markdown block (```sql), hallucinated a non-existent column, or made a slight syntax error, the Python execution thread would crash entirely. The baseline system had a near 0% success rate on complex queries.
To solve this, I dismantled the linear script and engineered a Fault-Tolerant Agentic Workflow. By treating the LLM not just as a text generator, but as a reasoning engine capable of self-correction, we pushed the success rate to 100%. Here is a technical breakdown of how to build resilience, security, and token efficiency into production AI systems.
Phase 1: The Anti-Pattern of Linear Execution
To understand the solution, you must first look at why basic AI pipelines fail in production. A standard text-to-SQL pipeline usually follows a linear path:
- Accept User Prompt.
- Inject Database Schema.
- Generate SQL via LLM.
- Execute SQL.
This works flawlessly in a demo environment. In production, it shatters. If the user asks for a metric requiring complex multi-join logic, smaller models often hallucinate schema relationships. The database throws a Syntax Error, the application crashes, and the user receives a generic 500 Internal Server Error.
Building a resilient system requires acknowledging that LLMs will make mistakes. You cannot architect around the assumption of perfect output; you must architect for graceful degradation and automated recovery.
Phase 2: The Agentic Retry Loop (Self-Healing)
To stop the crashes, the system needed an autonomic nervous system. I replaced the linear execution flow with a 3-Attempt Agentic Retry Loop.
Instead of allowing a database exception to crash the application, the pipeline actively catches the SQLite error (e.g., no such column: zodiac_sign). It then packages this exact error message, along with the failed SQL string, into a targeted retry payload and feeds it back to the LLM.
The Thought Process: The prompt effectively tells the model: "You attempted to run Query X, but the environment returned Error Y. Analyze your mistake and generate a corrected query." Because we utilized a high-reasoning model (a 72B parameter instruction-tuned model), the AI acts as its own debugger. If it hallucinates a column on Attempt 1, it realizes the mistake on Attempt 2, adjusts its aggregate logic, and successfully pulls the data. This closed-loop feedback mechanism transforms a static script into an autonomous, self-healing agent.
Phase 3: Zero-Trust Security Validation
When you allow an AI to generate and execute SQL directly against a database, you are inherently exposing your persistence layer to massive risk. Relying on system prompts (e.g., "Do not delete data") is insufficient; LLMs can be easily jailbroken via prompt injection.
To secure the environment, I implemented a strict, deterministic Regex-Based SQL Validator that acts as an impenetrable firewall between the AI and the database.
The Thought Process: Before any generated SQL reaches the execution layer, it must pass through the validator.
- It explicitly blocks destructive commands (
DROP,DELETE,UPDATE,ALTER). - It accounts for injection bypass attempts, scanning for keywords hidden behind parentheses or newlines.
- It blocks multi-statement piggyback attacks (
; DROP TABLE users;). - It forces a strict whitelist, requiring the query to begin with
SELECTorWITH.
If the validator flags a query, the system safely aborts the execution, logs the attempt, and returns a controlled "Invalid SQL" status. Security must always rely on deterministic code, never on AI compliance.
Phase 4: Intent Routing and Token Economics
For an AI analytics tool to be truly useful, it must support multi-turn conversations. If a user asks, "What is the average latency?" and follows up with, "What about just for enterprise clients?", the model needs context.
However, mindlessly appending the entire chat history to every single API request creates massive token bloat, slowing down response times and degrading the LLM's reasoning capabilities over time.
To optimize token economics, I engineered an Intent Router. Before the main SQL generation begins, a lightweight LLM call acts as a traffic cop. It analyzes the user's prompt for ambiguous pronouns, superlatives, or continuation phrases.
- If the prompt is a standalone question, the router ignores the chat history, saving thousands of tokens.
- If the prompt is classified as a follow-up, the router injects a sliding window of the last three successful queries, instructing the main LLM to inherit the previous aggregations and simply modify the
WHEREorORDER BYclauses.
This dynamic routing keeps the context window lean, reducing API costs while preserving deep, conversational statefulness.
The Engineering Impact: Accuracy Over Speed
When architecting AI systems, you must often choose between raw speed and mathematical correctness. By implementing the agentic retry loop, the strict validator, and the intent router, the pipeline's latency increased from a few milliseconds to several seconds.
However, this was a deliberate, senior-level architectural trade-off. Speed without accuracy in a data analytics pipeline has zero value. By prioritizing a high-reasoning model and allowing the system the time it needs to self-correct, we achieved a 100% success rate on complex queries that previously crashed the system. The next phase of evolution involves taking this highly accurate, high-latency pipeline and using its successful outputs to fine-tune a smaller, lightning-fast 8B model, eventually achieving both perfect accuracy and sub-second latency.
Generative AI is not magic; it is just another component in your distributed system. When you wrap it in robust error handling, stateful memory management, and zero-trust security, you stop building fragile wrappers and start building true enterprise agents.
Wrestling with LLM hallucinations in production or looking to architect resilient agentic workflows? Let's Connect! I am Ankit Jaiswal, a Senior Full Stack AI Engineer specializing in the design, deployment, and optimization of highly resilient, cloud-agnostic SaaS platforms and intelligent, event-driven applications.
Read more

Architecting Resilient Ingestion: Decoupling High-Payload Data from Real-Time Streams
A deep dive into solving critical connection drops during high-concurrency events by replacing fragile WebSockets with a hybrid REST/Queue architecture for heavy file ingestion.

Eradicating Operational Drag: Architecting a Resilient Data Ingestion Pipeline
A case study on migrating from manual data entry to a highly resilient, automated ingestion microservice. Discover the thought process behind building robust web-scraping architectures that scale without breaking.
The Evolution of a SaaS Architecture
A successful online business relies on an architecture that learns and evolves alongside its users. Discover the technical roadmap for scaling gracefully from a modular monolith to a distributed, event-driven ecosystem.