Changelog
The full build history of Praxiom AI — every feature, fix, and infrastructure milestone since day one.
Human-in-the-Loop & Deep Integrations
Every high-stakes agent write now goes through a review queue before it lands. GitHub, Linear, and Google Drive gain full read/write agent toolkits — not just OAuth push buttons.
- @gated decorator — any tool that writes to an external system queues a pending action for human approval before executing. Users can approve, edit, or reject each proposed action.
- Batch approval — a mission that proposes 5 Linear issue updates shows one card; approve them all in a single click.
- GitHub Grounding (Pillar A) — 7 read tools: read_repo_file, list_repo_tree, search_repo_code, list_recent_commits, list_open_pull_requests, get_readme, get_package_manifest. The agent can now ground recommendations in actual code.
- Linear Deep Integration — 9 tools including query_linear_issues, get_linear_cycle, update_linear_issue, create_linear_comment, and search. The agent becomes a full Linear workspace partner.
- Google Drive Deep Integration — 8 tools for search, read, write, folder browse, and batch import. Research folders can be imported in one command; documents can be drafted and written back to Drive.
- Drive Import UI — browse Drive folders, search files, and ingest selected items directly into the workspace research library.
- HITL REST API + Executor — 5 endpoints: list, get, approve, reject, and batch-approve. Execution is fire-and-forget with full error recording.
- Real-time HITL inbox — pending action cards stream into the conversation panel; a toast fires when a new action is queued while the user is in a different tab.
- Linear 'Create Issue' button in Recommendations view — one-click issue creation without opening a conversation.
- OAuth callback error handling hardened across GitHub, Linear, and Google Drive — stale state tokens, revoked access, and clock-skew are now surfaced with clear error messages instead of silent failures.
- Alembic migration chain repaired (068 → 062 ordering conflict).
Validation Kit, Overnight Cycles & Durable Task Queue
Idea-stage founders get a structured validation triad generated in one command. Long-running research cycles get a durable Redis/Celery queue with live SSE progress.
- Validation Kit — generates three linked documents from a single prompt: Assumption Map (what you're betting on), Research Plan (how to test it), Interview Guide (the questions). Workspace context (product name, target users, stage) is injected automatically.
- Overnight Research Cycles — Power plan users can enable nightly autonomous experiments. The agent explores different research angles at 2 AM UTC, scores results, and delivers a morning digest.
- Research Cycle History — browse past overnight runs, drill into individual experiments, see per-cycle quality scores and artifact counts.
- Deep Research Consent Dialog — pre-flight cost estimate shown before any deep research run; user must explicitly approve.
- Redis/Celery Durable Task Queue — long-running background tasks are now backed by Redis + Celery workers. Tasks survive process restarts; retries are automatic on worker failure.
- Real-time Task Progress SSE — subscribe to GET /api/tasks/{task_id}/progress for live phase changes, progress %, and messages backed by Redis pub/sub.
- Prompt Registry with Tier-Based A/B Testing — system prompts are stored in a versioned registry. Power vs. Growth vs. Trial plans receive different prompt variants; experiments are tracked with analytics.
- Eval Framework — L0 (static fixtures), L1 (deterministic modules), and L2 (live pipeline) evaluation tiers. 5 synthetic PM workspaces provide realistic eval coverage.
- Docs site Mintlify-level UI polish — sidebar icons, collapsible sections, code block line numbers + copy, TOC IntersectionObserver, search keyboard shortcuts.
- Trial credits reduced to 15 (3-tranche milestone model) — more predictable trial economics with clear upgrade signal.
- Trial milestone unlock and upgrade plan path fixed.
- GitHub OAuth: repo search chip view, stale token handling, callback error surfacing.
- Per-tab source and insight count badges in the research view.
- Onboarding labels, status casing, and AlertDialog UX polish.
- PostgreSQL migration fixes: UUID column types and boolean defaults corrected.
- Integration environment variables fully documented for staging.
Mission System, Agent Harness Completion & Workspace Maturity
Complex queries are now decomposed into parallelizable multi-agent missions with a full command center. The Agent Harness completes all 22 quality features including self-healing retry and the workspace maturity milestone system.
- Full-stack Mission System — a complex query triggers a MissionProposal: the planner decomposes it into subtasks, runs them in dependency order (with parallelism where possible), and aggregates artifacts. DB, API, SSE, and state management all shipped.
- CommandCenter UI — mission fleet view shows all running agents, their status, artifact counts, and duration. Individual agent threads can be inspected.
- Agent Harness Tiers 4 & 5 complete — all 22 planned quality features shipped: post-flight contract, Haiku verifier, stream alerts, retry with self-healing, RQS/RecQS/DocQS scoring, analytics dashboard, quality feedback collection, entropy surface, workspace profile-aware plan selection.
- Workspace Maturity Milestone SSE — the first time a workspace reaches a maturity threshold (e.g. first synthesis, first 10 insights), a milestone_unlocked SSE event fires and a toast appears with the milestone name and credits granted.
- Harness Quality Surface — health cards, quality badges (RQS, RecQS, DocQS), and entropy banner visible in the workspace dashboard.
- Agent Self-Improvement Loop (BP-18–22) — the harness logs quality signals to an improvement queue; failed contracts and low verifier scores are analysed to surface prompt improvements.
- Agentic Billing Gates — synthesis swarm, priority execution, overnight cycles, and deep web research are now gated behind plan features and enforced at the agent level.
- PM Agent Eval Harness — L0/L1/L2 pipeline with 4 annotated fixtures for measuring agent quality regression.
- Optimistic mutations and hover prefetch across all views — instant perceived performance on all CRUD operations with rollback on failure.
- Pipeline-centric sidebar redesign — the 4 core pipeline stages (Research → Insights → Recommendations → Documents) are visually connected with a vertical spine; Missions appears as a tier-2 entry point.
- Research Graph: 40× faster force layout with React rendering optimisation and DB query index coverage.
- Settings and Agent UI: 38-fix design system alignment pass.
- Workspace Settings API — typed foundation for workspace-level preferences.
- Harness engineering audit — 19 fixes including billing correction, stream event ordering, and retry backoff miscalculation.
- PLG, referral, and pricing production fixes.
Cost-Proportional Credits v2, Workspace Invitations & Vision
Credits now map directly to real AI API cost (1 credit = $0.08). Workspaces become collaborative with a full invitation system. The agent gains eyes — Claude Vision lets users drop images and documents into chat.
- Cost-proportional credit engine — credits are calculated from actual Anthropic API spend (input + output tokens at model-specific rates) rather than static per-run weights. 1 credit = $0.08 USD. A ceiling function rounds to the nearest 0.5.
- Credit System v2 — trial tranches (signup: 5 credits, first run: 5, engagement: 5), mission-level credit aggregation across subtasks, and daily soft cap (monthly_quota / members / 30 × 2).
- Credit Depletion Screen — when credits hit zero, users see a summary of what they accomplished: research briefs, recommendations, and documents created. CTAs for upgrade and credit packs.
- Workspace Invitations — invite colleagues by email. Invitees get a branded email, land on a join page, and are added to the workspace with role-based access. Admins can revoke or resend.
- Claude Vision — attach images, screenshots, and documents directly in chat. The agent processes visual content (wireframes, charts, PDFs) using Claude's multimodal capabilities.
- Admin Conversation Inspector — admins can browse any workspace's conversation history, inspect tool calls, and see per-message quality scores.
- Pricing page — 3-plan layout (Pro / Growth / Power) with monthly/yearly toggle, feature comparison table, and credit pack add-ons.
- User-scoped subscription architecture — one Stripe subscription covers all workspaces a user belongs to. Seat limits removed.
- Admin Portal: Conversations + Quality Panels — aggregate quality metrics across all workspaces visible to admins.
- PostHog analytics integration overhauled — funnel events, feature flag exposure, and credit consumption tracked.
- Categorised attach menu in chat — files, images, and Drive imports are organized into a clean picker.
- Social media brand asset pack + SVG-to-PNG exports.
- Billing: seat limits removed, billing period end-date fix.
- Agent timeout and processing race condition fixes.
- Billing and feature gate production fixes.
- 11 stale tests repaired.
Agent Harness, Power Tier & Praxiom Brand Identity
The Agent Harness wraps every agent run with pre/post-flight quality control. The Power tier launches. Praxiom gets a full visual identity — logo, favicon, and the Atlas agent mark.
- AgentHarness — every agent run is wrapped by: (1) ComplexityClassifier routes to the right workflow, (2) WorkflowContract checks minimum output requirements, (3) Haiku VerifierAgent gives an independent quality score with fresh context, (4) HarnessTelemetryRecord captures all signals for analysis. Zero SSE latency impact.
- Self-healing retry — harness triggers automatic retry on contract failure; exponential backoff with per-workflow max attempts.
- Workspace Entropy Health Scanner — detects stale insights (no cites for 30 days), orphaned recommendations (no linked insight), and conflicting severity signals across sources.
- Domain-aware context compaction — before summarising a long conversation, the compactor extracts artifact IDs, open decisions, and working state so nothing important is lost.
- Power Tier — $149/mo plan with 500 credits, synthesis swarm, autonomous web research, overnight cycles, unlimited API access, 1-year chat history, and full agent harness.
- Praxiom Brand Identity — SVG logo with the 'Orbital Signal' mark, Atlas agent identity (Pleiades constellation mark), favicon, and a full brand export package. All transactional emails updated.
- Agent Harness Sprints 0–8 complete — 22 features shipped: post-flight + self-healing + RQS/RecQS/DocQS scoring + analytics dashboard + feedback collection + entropy surface.
- KV-cache optimised system prompt — token-stable prompt structure for better Anthropic cache hit rates.
- Billing sync: plan upgrade now reflects instantly after payment (timestamp guard was always skipping post-payment sync).
- Transactional email brand corrected from Axiom → Praxiom AI.
- Chat UX audit — 45 micro-fixes shipped across 8 phases.
PM IDE, Conversation Intelligence & Trust Indicators
Documents become a structured engineering handoff tool. The agent gains full conversation intelligence — briefing, persona-shifting, working state. Every AI claim gets a trust badge.
- PM IDE — Execution Panel: generate scoped tickets from document blocks. Each ticket gets a title, description, acceptance criteria, effort estimate, and priority. Push to Jira, GitHub, or Linear in bulk.
- Document block actions — expand, challenge, and simulate any block with AI. TipTap ↔ Block bidirectional adapter means edits in the rich editor sync with the structured block model and vice versa.
- Conversation Intelligence (27/27 complete) — session briefing (agent reads prior conversation state on resume), 5-persona auto-shift (PM Agent, Research Analyst, Strategy Advisor, Writer, Explorer), working state tracking, and continuity chains across compaction boundaries.
- Trust Indicators — validation engine runs citation checks, cross-source validation, and severity analysis on every agent response. TrustBadge appears next to each response; VerificationPanel shows full evidence breakdown.
- 6 Agent Infrastructure tracks — credit-weighted metering, Linear/GitHub sync-back (issue status → recommendation status), conversation access tools, RQS quality re-synthesis, and vision input scaffolding.
- Navigation restructure — sidebar reorganised around the product pipeline; chat panel polished with improved message threading and block-level interactions.
- Google Drive Integration — OAuth connect, file search and read, agent tool, and research source ingestion.
- Prometheus /metrics endpoint — operational observability for latency, throughput, error rates, and credit consumption.
- Docs site content expansion — 38 MDX pages covering all API endpoints and feature guides.
- Full developer documentation site (Next.js 14 + MDX) deployed to Vercel with Clerk auth gate.
- Production hardening — 6 waves: P0 IndexError guards on stream/block_agent/execution, P1 Mermaid DOMPurify XSS fix + auth bypass + concurrent Stripe guard, P2 DB index + scoped fallback queries.
- Billing round 4 — 32 critical fixes including feature flag enforcement, quota gates on all agent endpoints, role-gated billing mutations, and race condition resolution.
Skills Platform, SuperMemory & PLG Growth Engine
The agent becomes extensible with installable skills and persistent cross-session memory. The PLG flywheel — waitlist, access codes, referrals, and trial credits — goes live.
- Skills System v1 — 10 official skills in the library. Each skill has trigger keywords, custom instructions, workspace context injection, and output templates. Install from the library or write a custom skill.
- Skills Platform v2 — Haiku-powered intent scoring selects the right skill for each query. Skills see workspace context (product name, stage, recent insights). Quality feedback loop scores skill output.
- SuperMemory — after every conversation turn, the agent writes an episodic memory record (non-blocking). On the next session, the briefing event includes relevant past decisions and context. The agent now remembers across sessions.
- Structured compaction — before compressing a long conversation, the compactor preserves artifact IDs, open decisions, working state, and colleague-recall memory in a structured handoff block.
- PLG System — waitlist with live 'X people ahead' counter, access code redemption, referral credits (both referrer and referee earn), admin controls for code generation and tracking.
- Credits-first billing — Pro (100 credits), Growth (300 credits), Power (500 credits). Credits consumed proportionally to actual AI spend. Compute boost packs (Starter/Plus/Max) top up without changing plans.
- Research Quality Score (RQS) engine — 5-component scoring (coherence, coverage, citation density, cross-source triangulation, actionability). Score pill appears on every synthesis result. Feedback loop re-synthesises low-RQS content.
- Admin Portal Phase 3 — funnel analytics, MRR dashboard, user notes, retry queue, and CSV export.
- Workspace Hub — activity feed, workspace cards grid, lifecycle management (archive, transfer ownership).
- Linear OAuth integration — create Linear issues directly from recommendations.
- Access code delivery tracking — sent_to_email, sent_at, and redeemed_by_email recorded for every code.
- Settings sidebar redesigned — 6 sections (Account, Workspace, Integrations, Notifications, Billing, Advanced) with a collapsible rail.
- Model capability matrix — each model has documented effort floors per workflow type. Context-1M beta unlocked for Sonnet/Opus on synthesis and drafting.
- Agent loop detection — detects infinite tool-call loops and fires a pre-exhaustion warning before credits run out.
- Multi-query context search + thin context mode for lower-latency conversational exchanges.
- Pre-beta security hardening — CSP headers, HSTS, XSS guards, JWT auth hardening.
- PLG hardening — 5 broken onboarding flows repaired (access code redemption, trial provisioning, referral credit grant, plan limit update, waitlist join).
- Billing pre-beta audit — 30+ fixes across 5 rounds.
Stripe Billing, Web Search & Agent Observability
Praxiom becomes a paid product. The agent gains internet access via Tavily. Full agent observability — thinking blocks, cost tracking, tool icons — ships.
- Stripe billing — subscription management (Pro/Growth plans), Stripe Checkout, Customer Portal, webhook idempotency, and automatic plan enforcement.
- Compute boost packs — one-time credit top-ups purchasable without changing plans. Boost credits consumed after monthly quota.
- Feature gating + usage enforcement — require_feature() returns 403; require_usage() returns 402. Every agent endpoint checks plan before executing.
- Web search via Tavily API — the agent can search the internet for real-time information. Search results are cited inline.
- Agent Observability Sprint 1 — expandable thinking blocks (chain-of-thought visible to users), per-run progress bar, credit cost displayed after each run, rich tool call cards with input summary.
- Agent Observability Sprint 2 — tool icons per tool type, collapsible tool sections, contextual status messages during execution.
- Session history — full conversation replay with tool call metadata, cost breakdown, and model used. Sessions are auto-titled, searchable, and renameable inline.
- Context compaction (Module-32 Tier 1) — long conversations are automatically summarised before the context window fills. Compacted history is injected on resume.
- GitHub integration — create GitHub issues directly from recommendations. Returns issue_url, issue_number, and a link_id that tracks the recommendation → issue relationship.
- Multi-file upload in agent chat — attach up to 10 files per message (PDF, CSV, images, audio).
- Claude Code-level markdown — full GFM rendering in the chat panel: tables, code blocks with syntax highlighting, nested lists, task lists.
- Light theme support.
- Railway + Vercel production deployment — app deployed to production infrastructure with GitHub Actions CI/CD.
- API rate limiting — 20 req/min on streaming endpoints with 429 responses.
- SQLite WAL mode to eliminate DB lock errors under concurrent load.
- Rebrand from Axiom → Praxiom AI. PostHog analytics and user feedback button added.
- Session-persist-last — workspace remembers and restores the last active conversation.
- Workspace-aware suggested prompts — prompt suggestions reflect the workspace's research sources and recent activity.
- Recommendation session isolation fix — recommendations from different synthesis runs no longer contaminate each other.
- Usage triple-count bug resolved — some runs were deducting credits 3× due to a race on the credit transaction row.
Intelligence Panel, Research Graph & Document Editor
Every document gets 5 AI tabs for deep analysis. The Research Context Graph visualises relationships between all workspace entities. TipTap powers the document editor.
- Intelligence Panel — 5 AI tabs available on every document: Citations (evidence tracing), Gap Analysis (what's missing), Proofread (grammar + clarity), Summary (executive brief), Verify Claims (fact-checking against sources).
- TipTap rich document editor — full WYSIWYG editing with slash commands, floating toolbar, command palette, and table of contents.
- Document Versioning + Review Workflow — version history, change diffing, and a structured review/approval workflow.
- Research Context Graph — D3-powered knowledge graph visualising relationships between research sources, insights, recommendations, and documents. Cluster-force layout with convex hulls and LOD rendering for large workspaces.
- Chat modes + reasoning strategies — choose between research, analysis, and writing modes. Reasoning depth (fast/thorough/deep) adjusts model effort.
- Global search — search across all workspace entities (sources, insights, recommendations, documents) from a single input.
- Advanced filters — filter recommendations by impact score, effort, status; filter insights by severity, source, date.
- Document export — PDF and DOCX export from any document.
- AI-powered editor collaboration — suggest, expand, and rewrite document sections with inline AI.
- 100MB file upload limit + expanded file types (audio transcripts, CSVs, images, Notion exports).
- Expanded system prompt — agent adopts 5 distinct personas based on task context (PM Agent, Research Analyst, Strategy Advisor, Writer, Explorer).
- Agent activity inline — tool calls appear as expandable cards directly in the conversation thread.
- First-run onboarding — guided wow experience for new workspaces with sample research and a walkthrough.
Full-Stack AI Pipeline & Multi-Agent Architecture
The core research-to-product loop is complete end-to-end. Four specialized MCP servers handle synthesis, recommendations, drafting, and data access. The app goes from prototype to real working product.
- Clerk authentication + multi-workspace — users can create and switch between multiple workspaces, each with isolated data and settings.
- Claude Agent SDK + 4 MCP servers — Synthesis (extract insights from research), Recommendations (prioritise features), Drafting (generate documents), Data Access (read workspace context). All wired to real Claude API.
- Full REST API — all entities (workspaces, research sources, insights, recommendations, documents, conversations) have complete CRUD endpoints.
- SSE streaming pipeline — agent responses stream token by token to the frontend via Server-Sent Events. Thinking blocks, tool calls, and progress events are typed and rendered live.
- CI/CD pipeline — GitHub Actions for test, lint, and deploy. Sentry for error tracking. PostHog for analytics.
- DIMSE (Dynamic Intelligent Model Switching Engine) — selects the optimal Claude model (Haiku/Sonnet/Opus) based on query complexity and workspace context.
- Language + notification settings — response language configurable per workspace; notification preferences per user.
- Keyboard shortcuts dialog — complete hotkey reference accessible from any view.
- Monorepo restructure — backend/ (FastAPI + SQLAlchemy + Alembic) and frontend/ (Vite + React + Tailwind) as distinct packages with shared types.
- Alembic auto-migrations on startup — schema migrations run automatically on deploy; no manual migration step.
- Dev environment hardening — dependency auto-sync, environment variable validation on startup, SQLite WAL mode.
Design System & Visual Identity v1
Praxiom gets its first coherent visual language — typography, color tokens, and component animations that set the foundation for everything built after.
- DM Sans + Space Grotesk typography system — headings in Space Grotesk, body in DM Sans, mono in JetBrains Mono.
- HSL-based color token system — semantic tokens (--primary, --muted-foreground, --card) mapped to a dark-first palette, with light theme support built in from day one.
- ScrollReveal animation system — component-level entrance animations with staggered delays.
- All core views redesigned — Dashboard, Research, Insights, Recommendations, and Documents all get the new design system applied.
- AppSidebar and ChatPanel rebuilt — consistent spacing, visual hierarchy, and interaction patterns.
Day One — Product Scaffolded
Praxiom goes from zero to a working 3-pane PM tool with a streaming AI chat panel in one day.
- 3-pane layout — sidebar navigation, main content area, and floating AI chat panel.
- Dark theme — Space Grotesk headings, Inter body text, dark-first color palette.
- Streaming AI chat panel — mock streaming responses wired to the UI; real Claude integration comes next sprint.
- Navigation scaffolding — routes for Dashboard, Research, Insights, Recommendations, and Documents.
- Vite + React + Tailwind CSS + TypeScript baseline.
- Shadcn/UI component library integrated.
- Project structure established.