Continue the conversation — chat opens pre-seeded with the current signal, caps, and movement.
Enterprise-focused autonomous agent with strong (no longer #1) benchmark standing and a broad governance/interface surface. Differentiator: LLM-agnostic design (GPT-5.x, Claude, Gemini, o-series), agent scaffolding that beats OpenAI's own Codex agent by 2.2pts on the same model on Terminal-Bench 2.0, Custom Droids for specialized subagents, and 40+ MCP integrations.
Benchmark note (May 2026): Droid + GPT-5.3-Codex scores 77.3% on Terminal-Bench 2.0 but has been overtaken — Codex CLI + GPT-5.5 (~82%), Claude Mythos (~82%), and others now lead. The #1 claim from late-2025 (63.1%) is retired.
Agent Readiness framework provides systematic codebase assessment across 8 technical pillars and 5 maturity levels, with automated remediation to improve agent effectiveness.
Enterprises wanting async agent capability with strong compliance posture should evaluate Factory. Teams needing multi-model flexibility (switch between GPT-5, Claude, Gemini mid-task) and interface-agnostic operation find a compelling option. Transparent pricing—Free tier with BYOK makes evaluation accessible. BAA availability enables healthcare sector adoption.
Adoption & Proof Points
- Customers: Nvidia, Morgan Stanley, Adobe, EY, Palo Alto Networks, Adyen, MongoDB, Bayer, Zapier; hundreds of thousands of developers use Droids daily.
- Benchmarks:
- Terminal-Bench 2.0: 77.3% (Droid + GPT-5.3-Codex), ~#6 as of May 2026 — leadership lost to Codex CLI+GPT-5.5 (~82%) and Claude Mythos (~82%)
- Beats OpenAI's own Codex agent by 2.2pts on the same model (scaffolding advantage)
- Growth: revenue doubling MoM for 6 months pre-Series C; $129M revenue cited for 2025
- Funding: $150M Series C at $1.5B (Apr 2026, Khosla-led) — 5x from $50M Series B at $300M (Sep 2025). Board: Keith Rabois (Khosla).
- Pricing: Free (BYOK), Pro ($20/mo), Max ($200/mo), Ultra ($2k/mo), Enterprise (contact sales); overage ~$2.70/1M standard tokens, cached 90% cheaper.
Recommended Use Cases
- Enterprises wanting multi-model flexibility (GPT-5, Claude, Gemini, o3 in one subscription)
- Teams with compliance requirements (ISO 42001, SOC 2, HIPAA with BAA)
- Organizations needing interface-agnostic agents (CLI/IDE/Slack/Browser)
- Large-scale migration, refactoring, and CI/CD automation
- Teams wanting to capture tribal knowledge as Custom Droids
- Organizations seeking to improve codebase agent-readiness systematically
- Healthcare organizations requiring HIPAA-compliant AI development tools
Risks & Limitations
- Benchmark leadership lost: Terminal-Bench 2.0 #1 ceded — Droid+GPT-5.3-Codex (77.3%) now ~#6 vs ~82% leaders (Codex CLI+GPT-5.5, Claude Mythos)
- SOC 2 Type I only: Type II unconfirmed at primary source (operational effectiveness over time)
- No FedRAMP; HIPAA/BAA unconfirmed this cycle
- Token-cost unpredictability: token-billing model produces surprise charges on large context / long-running / multi-model tasks (5+ independent sources)
- Reliability/support complaints: documented stuck-session and slow-support reports (e.g., 2-month resolution)
- Code quality repo-discipline dependent: best results require strong CI/review culture
- Always-on/background agents on roadmap, not shipped
- Hands-on testing needed: assessment remains documentation/review-based
- Customer-reported metrics (31x faster, 96.1% shorter migrations) need independent verification
Capabilities & Integration
Agentic depth: Terminal-Bench 2.0 77.3% with Droid + GPT-5.3-Codex (May 2026, ~#6 — leadership lost to Codex CLI+GPT-5.5 ~82% and Claude Mythos ~82%; Factory still beats OpenAI's own Codex agent by 2.2pts on the same model). Missions orchestrates multi-day work (median ~2hr, 14% >24hr, longest 16 days) via orchestrator/workers/validators with 10+ Droids in parallel. Custom Droids let teams create specialized subagents with custom prompts, tool access, and model selection. Headless mode (droid exec) for CI/CD, migrations, batch scripts. Claims 31x faster feature delivery, 96.1% shorter migrations (vendor-reported, unverified).
Agent Readiness (Jan 2026): Framework for measuring and improving codebase readiness for autonomous development. 8 technical pillars (Style/Validation, Build System, Testing, Documentation, Dev Environment, Debugging/Observability, Security, Task Discovery). 5 maturity levels with gated progression (80% threshold). CLI (/readiness-report), Web Dashboard, and API access. Automated remediation fixes failing criteria automatically. Applied to open-source repos: CockroachDB (L4, 74%), FastAPI (L3, 53%), Express (L2, 28%). Evaluation variance reduced from 7% to 0.6% through grounding methodology.
Context handling: Org-wide codebase "mental model" with real-time indexing. 40+ MCP integrations with OAuth registry for one-click setup. Session persistence across interfaces. Native integrations: GitHub/GitLab, Jira, Slack, PagerDuty, Datadog, Sentry, Google Drive.
Integration surface: CLI, VS Code, JetBrains, Vim, Slack, Linear, Browser. Multi-interface continuity—same context follows across terminal, IDE, browser, and productivity tools.
Extensibility: LLM-agnostic (GPT-5, Claude Sonnet 4, OpenAI o3, Gemini 2.5 Pro, Claude Opus 4.1, GLM-4.6). Custom triggers and scripts via headless mode. Custom Droids as version-controlled team knowledge.