Repo Radar: Local AI Inference, Agent Memory, Anti-Slop

Anthropic, OpenAI, and Cursor each shipped an official skill or plugin directory this week, pulling the agent-skills scramble out of scattered GitHub gists and into vendor-curated catalogs. The five repositories below work the layer underneath and the layer on top, from where a model runs to what an agent finally ships.

Honcho

A FastAPI memory service for stateful agents. You store messages and events on per-peer sessions, and it reasons in the background to build queryable representations of users, agents, and groups over time. Unlike chunk-matching vector stores, it extracts conclusions. It runs as a managed API or self-hosted via Docker, with Python and TypeScript SDKs.

⭐ 4,464 Python AGPL-3.0 May 27, 2026

Difficulty 4/5

Best fit: Teams adding cross-session user memory to an LLM product that want reasoning over conversations rather than raw retrieval, starting with the managed API before self-hosting.

Watch out: It is AGPL-3.0, and self-hosting wires in your own Gemini, Anthropic, and OpenAI keys plus a Postgres/pgvector backend, so the reasoning pipeline carries ongoing model costs.

View on GitHub →

Stop Slop

A skill file that teaches Claude or any LLM to catch and remove AI writing patterns, including throat-clearing openers, banned phrases, em dashes, binary contrasts, passive voice, and metronomic rhythm. It ships SKILL.md plus reference lists and a 1-10 scoring rubric, loading them on demand inside Claude Code, Projects, or an API system prompt.

⭐ 6,684 Markdown MIT Mar 17, 2026

Difficulty 1/5

Best fit: Editorial or content teams running an LLM drafting pipeline that want a shared, version-controlled house rule set for stripping AI tells before human review.

Watch out: It is a prompt-rules artifact with no tests, so its rubric is the model's own self-judgment, not a measured pass against detectors like Pangram or ZeroGPT, and results vary by model.

View on GitHub →

Get Implicator.ai in your inbox

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

LiteParse

A Rust document parser from the LlamaIndex team that runs locally with no cloud calls. It extracts spatial text and bounding boxes from PDFs through PDFium, offers selective Tesseract or HTTP OCR, and generates page screenshots. The lit CLI plus Node, Python, and WASM bindings target developers feeding clean text into LLM pipelines.

⭐ 6,682 Rust Apache-2.0 May 28, 2026

Difficulty 2/5

Best fit: A retrieval or document-ingestion team that wants local, dependency-light PDF text extraction with bounding boxes before paying for a cloud parser.

Watch out: Parsing DOCX, XLSX, PPTX, or images relies on external LibreOffice and ImageMagick binaries that are not bundled, and the README steers hard documents toward the paid cloud LlamaParse product.

View on GitHub →

bumblebee

Perplexity's read-only inventory collector reads on-disk lockfiles, package-manager metadata, editor and browser extension manifests, and MCP host configs on macOS and Linux developer machines. It emits structured NDJSON records and, given an exposure catalog, flags exact matches, answering which laptops show a named compromised package right now. It runs no package managers and reads no source.

⭐ 3,851 Go Apache-2.0 May 28, 2026

Difficulty 2/5

Best fit: A security or incident-response team that needs to sweep developer laptops against a fresh supply-chain advisory, plugging scans into an existing cron, launchd, or MDM runner.

Watch out: It is a v0.1.x release with stated gaps, including Codex and Continue MCP configs that are not parsed yet, so tool-config coverage is incomplete and the output schema may still shift.

View on GitHub →

DwarfStar

Written by Redis creator Salvatore Sanfilippo, DwarfStar is a self-contained C engine that runs DeepSeek V4 Flash on Apple Silicon Metal or CUDA, with a 512GB path for the larger PRO model. Unlike generic GGUF runners, it bundles model-specific loading, tool calling, an on-disk KV cache, an HTTP server, and a coding agent into one binary.

⭐ 12,413 C MIT May 29, 2026

Difficulty 5/5

Best fit: An applied-ML or local-inference team with a 96GB-or-more Apple Silicon machine or a CUDA box that wants to run DeepSeek V4 Flash end to end with on-disk KV cache and an HTTP API.

Watch out: The author labels it beta-quality code with experimental PRO support and warns that current macOS versions carry a virtual-memory bug that can crash the kernel if you run the CPU code path.

View on GitHub →

⭐ Repo of the Week

DwarfStar

Honcho, LiteParse, and bumblebee all assume the model itself runs somewhere else, usually behind a vendor API. DwarfStar inverts that assumption. Salvatore Sanfilippo, who created Redis, spent May writing a single-file C engine that runs DeepSeek V4 Flash on a 96GB personal machine, bundling tool calling, an on-disk KV cache, and an HTTP server into the binary. The README credits heavy GPT-5.5 assistance for the code, and the project already carries 122 open issues against 12,400 stars three weeks after its first commit on May 6.

Treat it as a benchmark of your own hardware rather than a production runtime. Clone it onto a high-memory Apple Silicon laptop or a CUDA box, pick the matching make target, pull Sanfilippo's quantized weights, and measure tokens per second and memory headroom on a real prompt against whatever API you pay for today. Success is a clear read on what a frontier-class open model actually costs to run in-house. The CPU path stays off the table until the macOS virtual-memory bug flagged in the README is fixed.

View DwarfStar on GitHub →

Frequently Asked Questions

How were these projects selected?

Current GitHub metadata, recent activity, README clarity, practical setup path, and relevance to builders working with AI systems.

Are stars enough?

No. Stars measure attention. Push dates, license, issues, docs, and whether the project solves a specific workflow decide usefulness.

What does the difficulty score mean?

It estimates how hard the project is to test or adapt, not how impressive the underlying engineering is.

Which repo should readers try first?

Stop Slop is the easiest test, since it is a skill file with nothing to build. DwarfStar is the more strategic experiment for teams weighing local inference against vendor APIs.

What should teams check before production use?

License, data retention, credential access, update speed, maintainer responsiveness, and whether the repo has a realistic rollback path.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

Tools & Workflows

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai

Repo Radar: 5 GitHub Projects Worth Your Week

Honcho

Stop Slop

LiteParse

bumblebee

DwarfStar

DwarfStar

Marcus Schuler

Get the Morning Briefing in your inbox.

Related Stories

Repo Radar: 5 GitHub Projects Worth Your Week

Repo Radar: 5 GitHub Projects Worth Your Week

Repo Radar: 5 GitHub Projects Worth Your Week