AvnishYadav
WorkProjectsBlogsNewsletterSupportAbout
Work With Me

Avnish Yadav

Engineer. Automate. Build. Scale.

© 2026 Avnish Yadav. All rights reserved.

The Automation Update

AI agents, automation, and micro-SaaS. Weekly.

Explore

  • Home
  • Projects
  • Blogs
  • Newsletter Archive
  • About
  • Contact
  • Support

Legal

  • Privacy Policy

Connect

LinkedInGitHubInstagramYouTube
Week 3 Retro: Breaking Bottlenecks and Hardening the Pipeline
2026-02-21

Week 3 Retro: Breaking Bottlenecks and Hardening the Pipeline

7 min readEngineeringRetrospectiveAI EngineeringBuild in PublicSystem ArchitectureNext.jsDevlog

A technical retrospective on Week 3 of the build. Covering pipeline optimization, resolving async bottlenecks, and the shift from feature shipping to system stability.

The Messy Middle

They say the start of a project is pure adrenaline and the end is pure relief. But Week 3? Week 3 is usually the trench. It’s the point where the initial "new project energy" fades, the coffee doesn't hit as hard, and the reality of your architectural decisions starts to punch back.

For the past two weeks, I’ve been sprinting. I pushed an MVP, hooked up basic agents, and got the frontend talking to the backend. It worked—technically. But "working" isn't the same as "production-ready."

This week wasn't about shiny new features. It was about Proof. Proving that the system can handle friction, proving that the logic holds up when inputs get weird, and proving to myself that this isn't just a toy project. This week, we fixed the pipeline.


The Build: What Actually Shipped

If Week 1 was Hello World and Week 2 was MVP, Week 3 was Refactor & Hardening. Here is the breakdown of the montage:

1. The Event-Driven Shift

My initial implementation for the AI agent workflow was synchronous. User sends request -> Server waits for OpenAI -> Server waits for database -> User gets response.

The problem: It worked fine for me. It worked terrible for three people at once. Vercel serverless functions were timing out on complex reasoning chains.

The fix: I moved the heavy lifting to a background worker setup using Redis and BullMQ. Now, the API acknowledges the request instantly, spins up a job ID, and the frontend polls (or listens via WebSocket) for updates. It’s unsexy work, but it’s the difference between a timeout error and a working product.

2. Structured Logging (Observability)

You can't fix what you can't see. I spent Tuesday implementing proper structured logging. I stopped `console.log('here')` and started logging context, trace IDs, and token usage per step. This immediately revealed that 30% of my latency wasn't the LLM—it was inefficient vector database queries.

3. The UI Feedback Loop

Since I moved to async processing, the UI needed to reflect that. I shipped a new "Status Stream" component. Instead of a spinning loader, the user now sees: "Searching knowledge base..." -> "Synthesizing answer..." -> "Formatting output." This is a psychology hack as much as a UI update; users are far more patient when they can see the brain working.


The Bottlenecks: Where the Pipeline Leaked

Reviewing the pipeline revealed two major choke points this week.

Bottleneck 1: Context Window Overflow

I got ambitious with the RAG (Retrieval-Augmented Generation) implementation. I was retrieving too many chunks of data and stuffing them into the context window. Not only did this spike costs, but it also confused the model (GPT-4o), leading to hallucinated dependencies.

The Solution: I implemented a Reranking step. Instead of taking the top 10 matches from the vector store, I take the top 50, pass them through a lightweight Cross-Encoder (Cohere Rerank), and only send the top 5 most relevant ones to the LLM. Higher precision, lower cost.

Bottleneck 2: The "Zombie" Agents

I had an issue where if a browser tab was closed mid-generation, the agent on the server would keep running, burning API credits for a ghost user.

The Solution: implemented `AbortController` signals on the backend and stricter timeouts on the job queues. If the socket disconnects, the job is killed immediately.


Metrics: The Truth Data

As part of the "Proof" pillar, we look at the numbers. Feelings lie; metrics don't.

  • Deployments: 12 (Down from 20 last week, but higher quality commits).
  • Average Latency (P95): Reduced from 8.5s to 3.2s via the reranking optimization.
  • Token Spend: Down 15% despite higher usage (thanks to better context management).
  • Critical Bugs Found: 3 (2 resolved, 1 deferred).

Technical Deep Dive: The Async Pattern

For the developers reading this, here is the pseudo-code pattern that saved the week. Moving from a request/response model to a job queue model is the biggest level-up you can make in AI engineering right now.


// The Old Way (The Week 1 Way)
export async function POST(req) {
  const result = await heavyAgentLogic(req.body);
  return Response.json(result); // Timeout risk!
}

// The Week 3 Way (The System Way)
export async function POST(req) {
  const { id } = await jobQueue.add('agent-task', req.body);
  return Response.json({ jobId: id, status: 'processing' });
}

// Worker.ts
worker.process('agent-task', async (job) => {
  await heavyAgentLogic(job.data);
  await updateDbStatus(job.id, 'completed');
});

It adds complexity, sure. You have to manage state. But it decouples your application logic from the HTTP layer. This is how you scale.


Retrospective: What I Learned

1. Systems over Features.
It is tempting to keep adding new buttons or new AI capabilities. But a feature that fails 10% of the time is a bug, not a feature. Week 3 taught me to stop building out and start building deep.

2. The "Works on Local" Trap.
My local environment has zero network latency and infinite patience. Production does not. Testing directly in the staging environment earlier would have saved me about 6 hours of debugging on Wednesday.

Next Week's Focus

Next week, we pivot back to the user. The system is stable (enough). Now I need to refine the output quality. I'll be implementing an evaluation framework (evals) to automatically grade the agent's responses against a golden dataset.

The build continues.

Share

Comments

Loading comments...

Add a comment

By posting a comment, you’ll be subscribed to the newsletter. You can unsubscribe anytime.

0/2000