AJ
System DesignArchitectureCost OptimizationCloud ComputingAI EconomicsScalability

Cost Evaluation: The Art of Back-of-the-Envelope Calculations in System Design

Stop deploying blindly. Learn how to mathematically validate your architecture's financial viability and cloud infrastructure costs before writing a single line of code.

Hand-drawn System Diagram of a Cost Evaluation Pipeline

In the modern era of cloud computing and managed AI services, a dangerous shift has occurred in software engineering. We have traded upfront capital expenditure for infinite, opaque operational costs.

A junior engineer will look at a system design and ask, "Can we build it?" A senior engineer looks at the exact same diagram and asks, "Can we afford to run it at scale?"

Architecting a highly resilient, event-driven SaaS platform is an incredible feat, but if your AWS bill eclipses your Monthly Recurring Revenue (MRR), your architecture has failed the business. To prevent this, engineering leaders rely on a critical skill during the system design phase: Back-of-the-Envelope (BotE) Cost Evaluation.

Here is a technical breakdown of how to mathematically model your system's traffic, storage, and compute costs—ensuring your platform remains financially viable as it scales from zero to enterprise levels.


Phase 1: The Anti-Pattern of Infinite Scalability

To understand the necessity of cost evaluation, we must first recognize the trap of "infinite scalability."

When you build a feature using Serverless functions (like AWS Lambda), managed databases (like DynamoDB), or external APIs (like OpenAI's LLMs), the infrastructure scales automatically to meet demand. This is brilliant for developer velocity, but it is financially hazardous.

If you introduce a highly unoptimized database query or an overly verbose LLM prompt, the system will not crash. Instead, the cloud provider will silently provision more resources to handle the inefficiency, and you will only discover the architectural flaw 30 days later when a massive invoice arrives.

To prevent this, you must build a mathematical model of your architecture before implementation. You need to transition from guessing to calculating.

Phase 2: Establishing the Baseline Metrics

Every back-of-the-envelope calculation starts by defining the scale of the system. You need concrete numbers for user behavior and data throughput. In system design, we typically round a day to 100,000 seconds (instead of 86,400) to make the mental math significantly easier.

Let's evaluate a hypothetical feature: A new AI-driven image generation tool within our SaaS platform.

The Assumptions:

  • Daily Active Users (DAU): 50,000 users.
  • Usage Pattern: Each active user generates 4 images per day.
  • Total Daily Requests: 200,000 generation requests per day.
  • Requests Per Second (RPS): 200,000 requests / 100,000 seconds = 2 RPS on average.
  • Note: Peak RPS is usually 2x to 3x the average, so we must architect for a peak of ~6 RPS.

With these baseline metrics established, we can begin evaluating the actual infrastructure costs across three primary dimensions: Compute, Storage, and Bandwidth.

Phase 3: The Storage and Bandwidth Equation

Storing and moving data are often the silent killers of SaaS profitability. Let's calculate the financial impact of our AI image generator.

1. Storage Costs (The Persistence Layer)

  • Assume each generated image is 5MB in size.
  • Daily Storage Growth: 200,000 images * 5MB = 1,000,000 MB (1 TB per day).
  • Monthly Storage: 30 TB added per month.
  • If standard AWS S3 pricing is roughly $0.023 per GB, 30 TB (30,000 GB) will cost approximately $690/month just to store. And this cost compounds every single month.

The Architectural Pivot: Seeing this $690 compounding monthly cost immediately forces an architectural decision. We cannot simply dump raw 5MB files into S3 forever. We must introduce an asynchronous background worker (using a tool like sharp) to compress these images to WebP (reducing size to ~500KB) and implement an S3 Lifecycle Rule to move assets older than 30 days to S3 Glacier, slashing our storage costs by 80%.

2. Bandwidth Costs (Network Egress) Cloud providers usually do not charge you to bring data in, but they charge a premium to send data out to the internet.

  • If users download half of the images they generate, that is 500 GB of daily egress.
  • Monthly Egress: 15 TB per month.
  • At roughly $0.09 per GB for AWS egress, 15,000 GB costs $1,350/month.

The Architectural Pivot: To mitigate this, we must introduce a Content Delivery Network (CDN) like Cloudflare or AWS CloudFront in front of our storage layer, which drastically reduces the per-GB outbound bandwidth cost and caches data at the edge.

Phase 4: Intent Routing and Token Economics

If you are integrating Generative AI into your architecture, computing standard server costs is no longer enough. You must calculate Token Economics. In an AI-driven platform, every single API call directly impacts your profit margins.

Let's assume our feature utilizes a flagship reasoning model for the prompt processing (e.g., GPT-4 class or Claude 3.5 Sonnet) before generating the image.

  • Input Context: We send the user's prompt + system instructions (avg. 1,000 tokens per request).
  • Output Context: The model returns a highly refined prompt (avg. 200 tokens per request).
  • Daily Token Volume: 200,000 requests * 1,200 tokens = 240 Million tokens per day.

If the pricing is $5.00 per 1M input tokens and $15.00 per 1M output tokens, the daily cost scales rapidly:

  • Input Cost: 200M tokens * $5.00 = $1,000/day
  • Output Cost: 40M tokens * $15.00 = $600/day
  • Total AI Compute: $1,600 per day (or nearly $48,000 a month).

The Architectural Pivot: This calculation proves that routing every request to a massive 70B+ parameter model is financially ruinous. As an engineer, you must implement an Intent Router. You use a lightning-fast, highly affordable 8B model to classify the user's intent. Only highly complex requests are routed to the expensive model, while standard requests are handled by the cheaper, fine-tuned alternative. This single architectural decision can reduce AI API costs by 90%.


The Engineering Impact: Designing for the Bottom Line

Software architecture is fundamentally an exercise in resource allocation. A system is only beautifully designed if it aligns with the business model it supports.

By executing a back-of-the-envelope cost evaluation, you transition from being a developer who simply writes code to an architect who engineers leverage. You uncover the hidden bottlenecks—like compounding S3 costs or explosive LLM token volumes—before they bankrupt the project.

The goal isn't to calculate costs down to the exact penny. The goal is to determine the order of magnitude. Is this feature going to cost us $10 a month, $1,000 a month, or $100,000 a month? Once you know the math, you can design the resilient, fault-tolerant infrastructure required to handle it efficiently.


Wrestling with compounding cloud costs or looking to architect highly optimized, cost-efficient SaaS pipelines? Let's Connect! I am Ankit Jaiswal, a Senior Full Stack AI Engineer specializing in the design, deployment, and optimization of highly resilient, cloud-agnostic SaaS platforms and intelligent, event-driven applications.

Get in Touch

Want to connect? Feel free to reach out with a direct question on LinkedIn, email, or X and I'll respond as soon as I can. You can also explore my code and latest projects on GitHub.