
Week 3 Retro: Breaking Bottlenecks and Hardening the Pipeline
A technical retrospective on Week 3 of the build. Covering pipeline optimization, resolving async bottlenecks, and the shift from feature shipping to system stability.
The Messy Middle
They say the start of a project is pure adrenaline and the end is pure relief. But Week 3? Week 3 is usually the trench. Itâs the point where the initial "new project energy" fades, the coffee doesn't hit as hard, and the reality of your architectural decisions starts to punch back.
For the past two weeks, Iâve been sprinting. I pushed an MVP, hooked up basic agents, and got the frontend talking to the backend. It workedâtechnically. But "working" isn't the same as "production-ready."
This week wasn't about shiny new features. It was about Proof. Proving that the system can handle friction, proving that the logic holds up when inputs get weird, and proving to myself that this isn't just a toy project. This week, we fixed the pipeline.
The Build: What Actually Shipped
If Week 1 was Hello World and Week 2 was MVP, Week 3 was Refactor & Hardening. Here is the breakdown of the montage:
1. The Event-Driven Shift
My initial implementation for the AI agent workflow was synchronous. User sends request -> Server waits for OpenAI -> Server waits for database -> User gets response.
The problem: It worked fine for me. It worked terrible for three people at once. Vercel serverless functions were timing out on complex reasoning chains.
The fix: I moved the heavy lifting to a background worker setup using Redis and BullMQ. Now, the API acknowledges the request instantly, spins up a job ID, and the frontend polls (or listens via WebSocket) for updates. Itâs unsexy work, but itâs the difference between a timeout error and a working product.
2. Structured Logging (Observability)
You can't fix what you can't see. I spent Tuesday implementing proper structured logging. I stopped `console.log('here')` and started logging context, trace IDs, and token usage per step. This immediately revealed that 30% of my latency wasn't the LLMâit was inefficient vector database queries.
3. The UI Feedback Loop
Since I moved to async processing, the UI needed to reflect that. I shipped a new "Status Stream" component. Instead of a spinning loader, the user now sees: "Searching knowledge base..." -> "Synthesizing answer..." -> "Formatting output." This is a psychology hack as much as a UI update; users are far more patient when they can see the brain working.
The Bottlenecks: Where the Pipeline Leaked
Reviewing the pipeline revealed two major choke points this week.
Bottleneck 1: Context Window Overflow
I got ambitious with the RAG (Retrieval-Augmented Generation) implementation. I was retrieving too many chunks of data and stuffing them into the context window. Not only did this spike costs, but it also confused the model (GPT-4o), leading to hallucinated dependencies.
The Solution: I implemented a Reranking step. Instead of taking the top 10 matches from the vector store, I take the top 50, pass them through a lightweight Cross-Encoder (Cohere Rerank), and only send the top 5 most relevant ones to the LLM. Higher precision, lower cost.
Bottleneck 2: The "Zombie" Agents
I had an issue where if a browser tab was closed mid-generation, the agent on the server would keep running, burning API credits for a ghost user.
The Solution: implemented `AbortController` signals on the backend and stricter timeouts on the job queues. If the socket disconnects, the job is killed immediately.
Metrics: The Truth Data
As part of the "Proof" pillar, we look at the numbers. Feelings lie; metrics don't.
- Deployments: 12 (Down from 20 last week, but higher quality commits).
- Average Latency (P95): Reduced from 8.5s to 3.2s via the reranking optimization.
- Token Spend: Down 15% despite higher usage (thanks to better context management).
- Critical Bugs Found: 3 (2 resolved, 1 deferred).
Technical Deep Dive: The Async Pattern
For the developers reading this, here is the pseudo-code pattern that saved the week. Moving from a request/response model to a job queue model is the biggest level-up you can make in AI engineering right now.
// The Old Way (The Week 1 Way)
export async function POST(req) {
const result = await heavyAgentLogic(req.body);
return Response.json(result); // Timeout risk!
}
// The Week 3 Way (The System Way)
export async function POST(req) {
const { id } = await jobQueue.add('agent-task', req.body);
return Response.json({ jobId: id, status: 'processing' });
}
// Worker.ts
worker.process('agent-task', async (job) => {
await heavyAgentLogic(job.data);
await updateDbStatus(job.id, 'completed');
});
It adds complexity, sure. You have to manage state. But it decouples your application logic from the HTTP layer. This is how you scale.
Retrospective: What I Learned
1. Systems over Features.
It is tempting to keep adding new buttons or new AI capabilities. But a feature that fails 10% of the time is a bug, not a feature. Week 3 taught me to stop building out and start building deep.
2. The "Works on Local" Trap.
My local environment has zero network latency and infinite patience. Production does not. Testing directly in the staging environment earlier would have saved me about 6 hours of debugging on Wednesday.
Next Week's Focus
Next week, we pivot back to the user. The system is stable (enough). Now I need to refine the output quality. I'll be implementing an evaluation framework (evals) to automatically grade the agent's responses against a golden dataset.
The build continues.
Comments
Loading comments...