Skip to content

CVE-2026-42601 - ArchiveBox AddView config override RCE

ArchiveBox AddView accepts a per-crawl config value from the /add/ workflow and affected releases merge that user-controlled data into crawl configuration without enough validation. The merged configuration is later exported to archive worker environment variables, where downloader plugins can consume values such as extra argument lists or binary paths. An attacker who can submit to the add endpoint can turn a URL ingestion request into command argument injection and remote code execution on the ArchiveBox host.

The highest-risk deployment is an ArchiveBox web UI with PUBLIC_ADD_VIEW=True. In that mode the add endpoint is intentionally reachable for public or bookmarklet-style submissions, and the GitHub advisory notes the affected path can be exploitable without authentication. This is also an agentic-context risk: ArchiveBox often stores private research, compliance evidence, crawler output, browser captures, and datasets later consumed by retrieval systems or agents.

Affected versions

  • Vulnerable: archivebox <=0.8.6rc0
  • GitHub Advisory Database patched-version metadata: none listed at time of review
  • Fixed/upgrade target from package intelligence: archivebox 0.8.6rc3+
  • Highest-risk condition: the ArchiveBox web app exposes /add/ to untrusted callers, especially with PUBLIC_ADD_VIEW=True, and archive worker plugins that consume *_ARGS_EXTRA, *_BINARY, or equivalent command configuration remain enabled.

Indicator-of-exposure

  • The repository installs, packages, deploys, or configures ArchiveBox.
  • A deployable target resolves archivebox <=0.8.6rc0 through pip, uv, Poetry, Docker, Compose, Helm, Ansible, Nix, systemd, or a vendored image.
  • ArchiveBox is run as a web app, not only as a local one-shot CLI.
  • PUBLIC_ADD_VIEW=True, public URL submission, bookmarklet-style submission, unauthenticated API ingress, or reverse-proxy routing exposes /add/.
  • Archive worker plugins such as yt-dlp, gallery-dl, wget, git, SingleFile, readability, mercury, or custom extractors accept extra arguments, binary paths, environment-derived command options, or user-controlled crawl config.
  • The archive stores sensitive customer pages, research, private URLs, compliance evidence, authenticated browser captures, cookies, API tokens, or material later fed into a RAG/agent pipeline.

Quick checks:

# macOS / Linux
rg -n "archivebox|ArchiveBox|PUBLIC_ADD_VIEW|/add/|YTDLP_ARGS_EXTRA|GALLERYDL_ARGS_EXTRA|_ARGS_EXTRA|_BINARY" .
rg -n "0\\.8\\.6rc0|archivebox==|archivebox/archivebox|pip install .*archivebox|uvx .*archivebox" .
find . -maxdepth 5 -type f \( -iname "*archivebox*" -o -iname "docker-compose*.yml" -o -iname "Dockerfile*" -o -iname "*.service" \) -print

# Windows PowerShell
Get-ChildItem -Recurse -File | Select-String -Pattern "archivebox|ArchiveBox|PUBLIC_ADD_VIEW|/add/|YTDLP_ARGS_EXTRA|GALLERYDL_ARGS_EXTRA|_ARGS_EXTRA|_BINARY"
Get-ChildItem -Recurse -File | Select-String -Pattern "0\.8\.6rc0|archivebox==|archivebox/archivebox|pip install .*archivebox|uvx .*archivebox"
Get-ChildItem -Recurse -File -Include *archivebox*,Dockerfile*,docker-compose*.yml,*.yaml,*.yml,*.service,*.ps1,*.sh

Remediation strategy

  • Upgrade every controlled ArchiveBox package, image, installer, bootstrap script, and deployment manifest to 0.8.6rc3+ or the newest vendor-published release that contains the fix. If a scanner still reports GHAD as having no patched version, document the release evidence used and keep containment in place until the metadata catches up.
  • Regenerate lockfiles, image digests, SBOMs, checksums, rendered manifests, deployment snapshots, and version evidence for every path that can run ArchiveBox.
  • Disable PUBLIC_ADD_VIEW unless the business explicitly requires anonymous submissions. Require authentication and authorization for URL ingestion in production.
  • Add fail-closed validation for any repository-owned wrapper, proxy, form, operator, or extension that passes user crawl config into ArchiveBox. Reject user-controlled command argument keys such as *_ARGS_EXTRA, executable path keys such as *_BINARY, and environment overrides not explicitly allow-listed.
  • Restrict archive workers with least privilege: non-root users, read/write limits to the archive data directory, no Docker socket, no cloud metadata access, and no broad secrets in process environment variables.
  • If exposure was possible, rotate secrets available to the ArchiveBox process, review submitted URLs/config fields and worker execution logs, quarantine suspicious snapshots, and separate incident review from the code-change PR.

The prompt

Model context: this prompt was generated by GPT 5.5 Extra High reasoning.

You are remediating CVE-2026-42601 / GHSA-3h23-7824-pj8r (ArchiveBox AddView
per-crawl config override leading to command argument injection and RCE).
Produce exactly one output:

- A reviewer-ready PR/change request that upgrades or contains the vulnerable
  ArchiveBox deployment, blocks unsafe public add/config override paths, adds
  safe verification, and documents operator cleanup, or
- TRIAGE.md if this repository does not own an affected ArchiveBox deployment,
  install path, image, add-route policy, or safe patch path.

## Rules

- Scope only CVE-2026-42601 and directly related ArchiveBox ingestion and
  archive-worker hardening.
- Treat archived pages, private URLs, cookies, browser profiles, API tokens,
  SSH keys, cloud credentials, webhook secrets, compliance evidence, crawler
  output, and agent/RAG datasets as sensitive.
- Do not submit a real exploit payload to any shared or production ArchiveBox
  server.
- Do not create a proof-of-concept that runs commands, writes marker files,
  opens shells, downloads payloads, exfiltrates environment variables, or reads
  archive contents.
- Do not print or commit real ArchiveBox snapshots, submitted URLs, cookies,
  secrets, logs, or command environments.
- Do not auto-merge.

## Steps

1. Inventory every ArchiveBox reference controlled by this repository:
   Python manifests and lockfiles, uv/Poetry/pip-tools config, Dockerfiles,
   Compose files, Helm charts, Kubernetes manifests, Terraform, Ansible, Nix,
   systemd units, bootstrap scripts, CI jobs, image digests, SBOMs, reverse
   proxy rules, web-app route policy, docs, and runbooks.
2. Determine every resolved ArchiveBox version. Treat `archivebox <=0.8.6rc0`
   as vulnerable.
3. Determine every `/add/` exposure path:
   - `PUBLIC_ADD_VIEW` and related public submission settings;
   - ingress, reverse-proxy, gateway, service-mesh, or firewall rules;
   - bookmarklet, browser-extension, API, webhook, or agent integration paths;
   - whether callers are authenticated and authorized before adding URLs.
4. Identify every user-controlled config path that can reach archive workers:
   per-crawl config fields, form fields, JSON APIs, environment overrides,
   wrapper scripts, plugin options, `*_ARGS_EXTRA`, `*_BINARY`, and custom
   extractor settings.
5. If this repository does not deploy, package, configure, or route traffic to
   ArchiveBox, stop with `TRIAGE.md` listing files checked, the likely runtime
   owner, the vulnerable range `archivebox <=0.8.6rc0`, and the target
   `archivebox 0.8.6rc3+` or latest fixed release.
6. Upgrade every controlled ArchiveBox target to `0.8.6rc3+` or the newest
   vendor-published fixed release available through the repository's normal
   distribution channel. If GHAD metadata still shows no patched version,
   include release/package evidence in the PR body.
7. Regenerate all derived artifacts controlled by the repository: lockfiles,
   image digests, SBOMs, checksum allowlists, rendered manifests, deployment
   snapshots, package metadata, and version evidence.
8. Add fail-closed containment where this repository owns configuration or
   routing:
   - set `PUBLIC_ADD_VIEW=False` by default for production;
   - require authentication and authorization for `/add/` and equivalent
     ingestion routes;
   - block untrusted per-crawl config keys that influence command arguments,
     executable paths, worker environment variables, or plugin enablement;
   - allow-list only documented safe per-crawl fields;
   - prevent public ingress from reaching `/add/` without explicit approval;
   - run archive workers without broad secrets, root privileges, Docker socket
     access, or cloud metadata access.
9. Add safe verification without executing commands through ArchiveBox:
   - dependency/image/SBOM assertions prove every ArchiveBox target is not
     `<=0.8.6rc0`;
   - config tests prove production `PUBLIC_ADD_VIEW` is false or protected by
     an authenticated gateway;
   - route/policy tests prove unauthenticated callers cannot reach `/add/`;
   - unit tests or static checks prove user-provided crawl config cannot set
     `*_ARGS_EXTRA`, `*_BINARY`, or other command-affecting keys;
   - secret scanning proves no snapshots, cookies, worker env, or archive data
     were committed.
10. Add a PR body section named `CVE-2026-42601 operator actions` that states:
    - ArchiveBox versions before and after the change;
    - whether `/add/` was publicly reachable;
    - whether `PUBLIC_ADD_VIEW` was enabled;
    - which worker plugins could consume extra arguments or binary paths;
    - which secrets available to the ArchiveBox process should be rotated;
    - which submitted URLs, per-crawl config records, worker logs, and archive
      snapshots require review or quarantine;
    - whether any downstream agent/RAG datasets built from affected archive
      output should be revalidated.
11. Run relevant validation: dependency resolution, image build, deployment
    render, route/gateway policy tests, unit tests, SBOM refresh, secret scan,
    and security scan available in this repository.
12. Use PR title:
    `fix(sec): remediate CVE-2026-42601 in ArchiveBox`.

## Stop conditions

- No ArchiveBox deployment, install path, image, package recipe, route policy,
  or runtime config is controlled by this repository.
- The resolved ArchiveBox version can only be confirmed from production access
  the agent does not have.
- A fixed ArchiveBox release cannot be consumed through the allowed
  distribution channel without a broader platform migration.
- Proving exposure would require command execution, reading archive contents,
  printing worker environment variables, or submitting payloads to a shared or
  production ArchiveBox server.
- Product requirements intentionally allow unauthenticated public submissions
  with arbitrary per-crawl config; document the risk and require a
  product/security decision.
- Existing snapshots or logs indicate possible command execution; stop after
  preserving evidence and documenting incident-response handoff.
- Validation fails for unrelated pre-existing reasons; document the failure
  instead of broadening scope.

Verification - what the reviewer looks for

  • No controlled ArchiveBox package, image, SBOM, deployment manifest, or bootstrap path resolves to archivebox <=0.8.6rc0.
  • The real delivery path changed: dependency pin, lockfile, image digest, install script, rendered deployment, or runbook.
  • Public /add/ exposure is disabled or protected by explicit authentication, authorization, and route policy.
  • User-controlled crawl config cannot set command-affecting keys such as *_ARGS_EXTRA, *_BINARY, or equivalent plugin execution options.
  • Verification does not include a working exploit payload or command execution proof.
  • Operator actions address secret rotation, snapshot/log review, archive quarantine, and downstream agent/RAG dataset revalidation when exposure was possible.

Watch for

  • Updating pip requirements while Docker, Compose, Helm, or package images still pull an older ArchiveBox release.
  • Treating PUBLIC_ADD_VIEW as harmless because the archive is “internal” but exposing it through a shared workspace, agent gateway, webhook, or bookmarklet path.
  • Blocking YTDLP_ARGS_EXTRA but leaving other *_ARGS_EXTRA, *_BINARY, or custom extractor options reachable.
  • Tests that prove safety by running a command through ArchiveBox. Use static checks, config tests, and route tests instead.
  • Logging worker environments while trying to inspect whether secrets were exposed.

References