EchoCart: An AI Agent That Actually Buys Things For You
Tell it what you want — by text or voice — and it finds the best deal and buys it for you. Users set a budget, fund a wallet, and EchoCart handles the rest. From search to checkout, an AI shopping agent that doesn't just recommend — it completes purchases.
Overview
Shopping by Conversation
EchoCart is an AI shopping agent that takes a fundamentally different approach to e-commerce: instead of browsing catalogues and comparing tabs, you simply tell the agent what you want to buy. You can type "I need running shoes under £80" or say it out loud using voice input powered by ElevenLabs. The agent then searches multiple retailers, compares prices, identifies the best deal within your constraints, and completes the purchase on your behalf.
The system is built around the principle of autonomous purchasing. Most AI shopping tools stop at search and recommendation — they show you products and expect you to click "buy." EchoCart goes further: it manages a funded wallet, executes the transaction, handles shipping address selection, and confirms the order. The user's involvement is limited to setting their budget and reviewing the agent's choice before it completes the purchase.
Built as a project by Temisan Gerrard, a London-based Solutions Architect, EchoCart demonstrates what's possible when systems architecture meets retail: a system that doesn't just suggest products but acts on the user's behalf end-to-end.
The Challenge
Most AI Shopping Tools Just Search
The current generation of AI shopping assistants share a common limitation: they search, they recommend, but they don't purchase. You ask an AI to find you a laptop under £1,000 and it returns a curated list with comparison tables and affiliate links. You still have to navigate to the retailer's website, create an account, enter payment details, select shipping options, and complete checkout manually. The AI did 20% of the work and left you with the other 80%.
The challenge with EchoCart was building a system that could handle the full purchase lifecycle autonomously. This meant solving four distinct technical problems: natural language understanding that captures shopping intent from both text and voice input; real-time web scraping across multiple retailer websites that don't want to be scraped; intelligent product comparison that accounts for price, shipping, availability, and user preferences; and autonomous payment execution that completes the transaction without human intervention while staying within the user's budget.
"The gap between 'AI that finds products' and 'AI that buys products' is the entire checkout flow — payment, shipping, order confirmation. Most AI shopping projects stop at search because the last mile is where it gets technically hard."
Voice input added another layer of complexity. When a user says "find me a blue North Face puffer jacket size medium," the system has to transcribe the speech accurately, parse the intent into structured product criteria (brand: North Face, colour: blue, type: puffer jacket, size: medium), and execute the search across retailers. The margin for error in voice-to-purchase is low — a misunderstood size or colour means buying the wrong product with real money.
Cross-platform deployment was also a requirement. The system needed to work on the web and as a native iOS app, requiring a shared codebase that could run in both environments while maintaining access to platform-specific features like biometric authentication and push notifications for order updates.
The Solution
Voice + Scraping + LLM + Payments
The architecture follows the AI Operating Stack, with each layer handling a specific part of the shopping agent's workflow.
Voice Input via ElevenLabs
ElevenLabs handles speech-to-text conversion with high accuracy for product-specific vocabulary. The voice pipeline captures the user's shopping request, converts it to text, and passes it to the LLM for intent parsing. Users speak naturally: "I need a birthday gift for my mum, she likes gardening, budget is £50." The system handles the nuance without requiring structured input.
Web Scraping via Firecrawl
Firecrawl provides real-time product data from retailer websites. Unlike static product APIs, Firecrawl renders JavaScript-heavy e-commerce pages and extracts structured product data including price, availability, shipping costs, and product specifications. The scraping pipeline runs in parallel across multiple retailers to build a real-time comparison matrix.
Qwen LLM for Decision-Making
Qwen, accessed via OpenRouter, serves as the reasoning engine. It parses user intent from text or voice, generates search queries for Firecrawl, evaluates and ranks product results based on the user's stated preferences and budget, and formats the final recommendation with a clear rationale. The LLM is constrained to only recommend products within the user's budget.
Base USDC + Stripe Issuing
Payment execution uses a dual-rail approach. Base USDC handles crypto-native payments for users who fund their wallet with stablecoins. Stripe Issuing generates virtual cards for purchases at retailers that don't accept crypto directly. The system automatically selects the appropriate payment method based on the retailer's accepted payment types.
The purchase flow operates in three phases. In the discovery phase, the user states what they want via text or voice. The LLM parses the request into structured product criteria. In the comparison phase, Firecrawl scrapes multiple retailers in parallel, and the LLM ranks results based on price, shipping cost, delivery time, and user preferences. In the execution phase, the user reviews the top recommendation and approves the purchase. The system then completes the transaction using the appropriate payment rail and confirms the order.
WebAuthn provides passwordless authentication, allowing users to approve purchases with biometric verification (fingerprint or face recognition) on both web and iOS. This replaces the traditional login-and-password flow with a single tap, which is critical for a voice-driven shopping experience where the user expects speed over security theatre.
Tech Stack
What It's Built With
Next.js 16
The latest version of Next.js provides the React framework with server components for handling API routes, voice processing, scraping orchestration, and payment execution on the server side. The app router structure keeps the shopping flow organised as a series of well-defined server actions.
ElevenLabs
Speech-to-text and text-to-speech for the voice shopping experience. ElevenLabs handles the full voice pipeline: capturing natural speech, transcribing it with product-aware accuracy, and reading back purchase confirmations to the user in a natural voice.
Firecrawl
Real-time web scraping that renders JavaScript-heavy e-commerce sites and extracts structured product data. Firecrawl handles the anti-bot measures that retailers deploy, ensuring reliable data extraction across multiple stores simultaneously.
Base USDC + Stripe Issuing
Dual payment rails for maximum retailer coverage. Base USDC for crypto-accepting merchants. Stripe Issuing generates disposable virtual cards for traditional retailers. The system selects the appropriate rail automatically based on the checkout requirements.
WebAuthn
Passwordless biometric authentication for purchase approval. Users verify their identity and approve transactions with a fingerprint or face scan — no passwords to remember, no 2FA codes to type. Critical for a voice-first experience where the interaction should be hands-free until the final approval.
Qwen via OpenRouter
Qwen serves as the core reasoning engine for intent parsing, product comparison, and purchase recommendation. Accessed through OpenRouter for model routing and fallback. The LLM is constrained with structured output schemas to ensure recommendations always include price, retailer, and delivery estimate within the user's budget.
Capacitor for iOS
The web application is wrapped with Capacitor for native iOS deployment. This provides access to platform-specific features like push notifications for order updates, native biometric authentication integration, and the iOS share sheet. The shared codebase means features built for the web automatically ship to iOS without a separate React Native or Swift codebase.
Decisions
Key Decisions
Dual payment rails over crypto-only
Building a shopping agent that only pays with crypto limits it to a tiny fraction of online retailers. By adding Stripe Issuing as a second payment rail, EchoCart can purchase from virtually any online store. The virtual card is generated per-transaction with the exact purchase amount, limiting exposure if a retailer's payment system is compromised. This decision tripled the useful scope of the agent overnight.
Human approval before purchase over full autonomy
The agent finds and compares products autonomously, but the final purchase requires explicit user approval via biometric verification. This wasn't just a safety decision — it's a trust decision. Users won't fund a wallet and give an AI spending authority unless they retain the final say. The approval step is minimal (a single biometric tap) but gives users the confidence to let the agent handle everything else.
Firecrawl over product APIs
Product APIs (Amazon, eBay, etc.) are rate-limited, require developer accounts, and often return stale pricing. Firecrawl provides real-time data from the actual product pages customers see — including current sale prices, stock levels, and shipping estimates. The trade-off is higher latency and more brittle scraping, but the data quality is significantly better for price comparison use cases.
Capacitor over native iOS development
A native Swift app would offer better performance and deeper iOS integration, but at the cost of maintaining two separate codebases. Capacitor wraps the existing Next.js application for iOS, providing 90% of the native experience with 10% of the maintenance burden. For an MVP, this trade-off is clear: ship one product everywhere rather than two products for two platforms.
Results
What Was Shipped
EchoCart is deployed at echocart.xyz and available as an iOS app via Capacitor. The MVP demonstrates the complete purchase lifecycle: voice or text input, multi-retailer product search, intelligent comparison and ranking, user approval via biometric authentication, and autonomous payment execution through the appropriate payment rail.
The 30 commits reflect focused development on the core shopping pipeline rather than broad feature expansion. Each commit added a specific capability — voice input processing, Firecrawl integration, product comparison logic, payment execution — building the system incrementally from a simple text search to a full voice-to-purchase pipeline. The system demonstrates that AI shopping agents can move beyond recommendation into actual transaction execution, which is the fundamental capability gap in current AI e-commerce tools.
Lessons
Lessons Learned
Web scraping is fragile by nature. E-commerce sites change their HTML structure frequently, deploy new anti-bot measures, and A/B test different page layouts. Firecrawl handles most of this, but no scraping solution is 100% reliable. Building fallback paths — if the primary retailer's page can't be scraped, try the next one — is essential for a system that users depend on for real purchases.
Voice input requires loose matching. Users don't describe products the way retailers categorise them. Someone asking for "a warm jacket" might mean a puffer, a parka, or a fleece. The LLM needs to expand the user's intent into multiple search queries and let the product results speak for themselves. Exact keyword matching on voice input fails constantly.
Payment trust is the hardest problem. The technical challenge of executing a payment is straightforward. The trust challenge of convincing a user to fund a wallet and let an AI spend from it is the real obstacle. Biometric approval helps, but the broader lesson is that autonomous financial agents need to earn trust through transparent behaviour — showing the user exactly what the agent found, why it chose this product, and what it's about to do before it does it.
Dual payment rails are worth the complexity. Supporting both Base USDC and Stripe Issuing virtual cards means EchoCart can purchase from any online retailer, not just the small subset that accepts cryptocurrency. The implementation complexity is manageable — it's a routing decision at checkout time — and the coverage benefit is enormous.
Building an AI agent that takes real actions?
EchoCart proves AI agents can move beyond recommendations into autonomous transactions. If you're building an AI agent that interacts with the real world — buying, selling, scheduling — let's talk about your architecture.
Available for Q2 2026 consulting engagements.