The Six-Layer Stack
Most ecommerce search platforms share a common architecture: a text embedding model (often generic and shared across all customers), session-level behavioral data, and a re-ranker trained on aggregated click data. Marqo operates differently across every layer.1. Multimodal Domain-Specific Base Models
Marqo’s embedding models are multimodal from the ground up — they encode images and text into the same semantic space, so a query for “floral summer dress” retrieves products based on their visual appearance, not just their text descriptions. This matters because a significant portion of ecommerce discovery is visual: shoppers often recognise what they want before they can describe it precisely. Beyond multimodality, Marqo develops domain-specific models trained on ecommerce data for specific verticals. Marqo Fashion SigLIP, for example, is trained on fashion imagery and product data — it understands style, silhouette, colour, and pattern in ways a general-purpose model cannot. These domain-specific models form the foundation everything else builds on.2. Per-Customer Embedding Model Finetuning, Aligned to Your Business Objectives
Generic models represent the average of their training data, not your customers or your business. Marqo finetunes its embedding models on a per-customer basis, using each retailer’s own clickstream and purchase data — and critically, with that retailer’s specific business objectives built into the training objective. This means the model doesn’t just learn what shoppers click — it learns what drives revenue, what products convert at the margins you care about, and what catalogue relationships matter for your specific business. If you optimise for gross margin over raw conversion, the model learns that. If certain product categories are strategic priorities, the model encodes that. The platform manages periodic full retraining automatically. As your catalogue evolves, new products launch, and seasonal patterns shift, the models are retrained on updated data without requiring intervention from your team. No other ecommerce search provider finetunes embedding models per-customer, let alone with customer-specific business objectives built into training. This is the deepest source of differentiation in the stack.3. Per-Search Clickstream Tracking
Most platforms track clickstream at the session level — they record that a user clicked a product during a session but lose the context of which specific search produced that click. Marqo tracks at the individual search level: every click, add-to-cart, and purchase is attributed to the exact query that generated it. This produces substantially higher-signal training data. The model learns not just “this product converts” but “this product converts for this specific query” — a far richer signal for both embedding finetuning and ranking optimization. This more granular data is also what enables the continuous optimization described below.4. LLM Query Understanding
Before a query reaches the embedding or ranking layers, Marqo runs it through an LLM-based query understanding pipeline:- Intent detection — distinguishing brand queries, category queries, attribute queries, and use-case queries, and routing each appropriately
- Query expansion — enriching the query with semantically related terms to improve recall across the catalogue
- Automated faceting — dynamically determining which facets are relevant for a given query, without requiring merchandisers to configure rules manually
5. Multimodal Embedding-Based Personalization
Marqo’s personalization layer is built on its multimodal embedding models rather than collaborative filtering or behavioural segments. Each user’s interaction history is encoded as a multimodal embedding capturing their preferences across visual and semantic dimensions simultaneously: style, colour, category, price sensitivity, and brand affinity, all in a single representation. This embedding steers search and collection results toward the individual shopper’s preferences while preserving relevance to the query. A search for “trainers” returns different products for a shopper whose history suggests premium minimalist preferences versus one who consistently engages with brightly coloured performance gear — without sacrificing result quality. Because personalization operates in the same multimodal embedding space as the search layer, it integrates into the retrieval process rather than being applied as an afterthought.6. Continuous Conversion-Optimized Re-Ranking
The final layer uses an LLM trained to maximize conversion, revenue, and margin — not semantic similarity. It operates on the candidate set retrieved by the embedding search and re-ranks it based on predicted business outcomes for the specific user, query, and business context. Critically, this layer updates continuously as new interaction data arrives from the Marqo pixel. There is no manual retraining cycle or scheduled batch job for ranking optimization — the system learns in near-real-time from every search and purchase, so rankings improve automatically as your business runs.How the Layers Compound
Each layer makes every layer above it more effective:| Layer | Business Outcome |
|---|---|
| Domain-specific base models | Better baseline relevance for your product category |
| Per-customer finetuning with business objectives | Embeddings aligned to your revenue and margin goals, not a generic average |
| Per-search clickstream | Richer training signal for every subsequent layer |
| LLM query understanding | Better candidate sets, long-tail coverage without manual rules |
| Multimodal personalization | Results pre-filtered for individual shopper preferences |
| Continuous re-ranking | Rankings that improve automatically with every interaction |
| Agentic interfaces | Conversational discovery and multi-turn shopping experiences beyond the search bar |