Апісанне
One-V LLM Serve makes every public page on your WordPress site available as clean Markdown at the same URL with a .md extension — zero configuration required.
https://example.com/about/ HTML page for humans
https://example.com/about.md clean Markdown for AI
AI systems — ChatGPT, Perplexity, ClaudeBot, Google AI Overviews, and most RAG pipelines — parse Markdown far more efficiently than HTML. When these systems encounter an HTML page, they must strip navigation, headers, footers, sidebars, scripts, and tracking pixels before they can read the actual content. This noise introduces errors, increases token cost, and leads to lower-quality outputs.
The Markdown file contains a configurable YAML frontmatter block followed by the page title, headings in correct hierarchy, and the body text. Nothing else.
Core features
- Zero-config Markdown endpoint for every public post, page, and custom post type
- YAML frontmatter with configurable fields (
title,date,modified,url,description,image,tags,categories,lang,type) /llms.txtdiscovery file at the site root following the llmstxt.org convention- Taxonomy archives as Markdown —
/category/news.md,/tag/foo.md, custom taxonomies ?format=markdownquery parameter as an alternative to the.mdURL on any singular page- Per-post exclude via a sidebar checkbox on the post editor
- Works with Classic Editor and Gutenberg via the
the_contentfilter - ACF integration — opt-in per-post: pick which text, textarea, WYSIWYG, URL, email, or link fields to append below the body
- Filterable AI analytics — per-hit events with full denormalised dimensions (UA bucket, referrer host, language, post type, response code), sticky filter bar that drives every chart and table live, six KPI tiles, a stacked-area time chart, three composition donuts (UA bucket / referrer source / language), four Top tables, a User-Agent classifier transparency table, and a Recent Activity stream. Referrers are tracked by hostname only — paths and query strings are stripped before storage so no PII is retained. Forward-compatible classification: when the bot or referrer catalogue is updated in a future release, historical rows are reclassified automatically — no Reset Analytics required.
- Browser-bucket sub-classification — anything that looks like a browser visit gets split into four kinds based on the
Sec-Fetch-Site,Sec-Fetch-User, andSec-CH-UArequest headers a real browser sends: verified user (top-level navigation triggered by a click or address-bar Enter in a recognised browser), headed agent (real Chromium driven programmatically — Playwright, Puppeteer, Selenium), script agent (bare HTTP client imitating a browser UA —requests,httpx, LangChain, custom agents), spoofer (UA shape that no real browser would emit, like modern Chrome with a non-reduced UA). Visible as a stacked-bar breakdown on the User-Agents subpage so you can see at a glance how much of your “human” traffic is actually automation, and rendered inline as colour-coded slugs on every browser-bucket row in the Recent Activity table on the Analytics page. Detection is server-side fingerprinting of the request itself — no cookies, no JS, no IP.
Discoverability
Link: rel="alternate"; type="text/markdown"HTTP header on every HTML page<link rel="alternate">tag in<head>for HTML-based discoveryAllow: /*.md$directive inrobots.txt- CORS
Access-Control-Allow-Origin: *on.mdand/llms.txtso browser-based AI clients can fetch them
Operations
- Transient caching with automatic invalidation on
save_post, on ACF field value saves, on any ACF field group change, and on plugin settings save - “Clear cache” button in the settings page
- Admin notice on fallback HTTP fetch failures
- “Settings” link next to the plugin row in Plugins screen
- “View .md” row action in the Posts and Pages list tables
Developer hooks
ovls_markdownfilter for the final Markdown outputovls_frontmatterfilter for adding, removing, or modifying frontmatter fieldsovls_content_queriesfilter for the HTML extraction XPath cascade
How it works
Each request to /about.md is captured by a WordPress rewrite rule and routed through the plugin’s content generator. The generator runs the post through apply_filters( 'the_content', ... ) — the same pipeline WordPress uses on the front end — so Classic Editor, Gutenberg, and shortcodes all work without separate code paths. The rendered HTML is converted to Markdown via league/html-to-markdown, then cached in a WordPress transient.
The cache is invalidated automatically on save_post, on ACF field/group changes, and whenever plugin settings are saved. A manual Clear cache button is also available on the settings page.
Access methods
There are three equivalent ways to request the Markdown version of a page:
.mdextension —https://example.com/about.md?format=markdownquery —https://example.com/about/?format=markdownLink: rel="alternate"header — returned by every HTML page
The .md URL is the recommended canonical form.
ACF integration
When Advanced Custom Fields is active, ACF field rendering is opt-in at two levels:
- Site defaults per post type — at Settings One-V LLM Serve ACF Defaults, tick fields that should be appended to every post of a given post type.
- Per-post override — the One-V LLM Serve metabox on each post editor lists every supported ACF field applicable to that post. Tick fields to replace the site defaults for that one post.
Supported ACF types: text, textarea, wysiwyg, url, email, link. Each selected field is rendered under a ## Field Label heading. Empty fields are skipped.
Disclaimer
This plugin is provided “as is”, without warranty of any kind, express or implied, in accordance with the GNU General Public License v2 or later. The authors and contributors are not liable for any direct, indirect, incidental, special, or consequential damages — including but not limited to data loss, lost profits, business interruption, search-ranking changes, or third-party claims — arising from the use of, or inability to use, this software, even if advised of the possibility of such damages.
By installing and activating the plugin you acknowledge that:
- You are responsible for testing the plugin in a staging environment before deploying to production.
- You are responsible for the content this plugin exposes as Markdown —
.mdURLs and/llms.txtserve the same content as their HTML counterparts and are intended to be crawled and consumed by AI systems and third-party LLMs. - The plugin does not transmit data to any external service. All Markdown generation, caching, and file writes happen on your own server.
Nothing in this disclaimer is intended to exclude or limit liability for matters that cannot lawfully be excluded under the consumer-protection laws of your jurisdiction. For the full legal terms see the GPLv2 license at https://www.gnu.org/licenses/gpl-2.0.html.
Screenshots
Ўсталёўка
- Upload the
one-v-llm-servefolder to/wp-content/plugins/, or install via Plugins Add New Upload Plugin. - Activate the plugin through the Plugins screen in WordPress.
- Visit Settings One-V LLM Serve to configure post types, frontmatter fields, and ACF defaults.
Rewrite rules are flushed automatically on activation. If .md URLs return 404 immediately after activation, go to Settings Permalinks and click Save Changes.
Часта задаваныя пытанні
-
Does activating the plugin change my existing pages?
-
No. The plugin only responds to
.mdURLs,/llms.txt, and the?format=markdownquery parameter. All existing HTML URLs are unaffected. -
Will the `.md` URLs hurt my SEO?
-
No.
.mdresponses are served withX-Robots-Tag: noindex, follow, so search engines do not index the Markdown variants and the canonical HTML page remains the sole entry in Google/Bing/etc. TheLink: rel="alternate"; type="text/markdown"header on each HTML page advertises the Markdown alternate to AI consumers without exposing it to SERPs. -
Does it work with password-protected posts?
-
No. Password-protected and private posts return 404 on the
.mdURL. Only published posts are served. -
What Markdown flavour is used?
-
CommonMark-compatible Markdown via
league/html-to-markdown. ATX-style headings (#), inline links ([text](url)), and fenced code blocks. -
Where is the Markdown cached?
-
In WordPress transients (database by default, or your object cache). Entries are invalidated when the post is saved, when ACF fields or settings change, or when you click Clear cache — a long safety-net expiry also lets any orphaned entry clear itself on object-cache setups.
-
The `.md` URL returns 404 after activation.
-
Go to Settings Permalinks and click Save Changes to flush rewrite rules.
-
Can I disable Markdown for specific posts?
-
Yes. Two ways:
- Check Exclude from Markdown in the One-V LLM Serve metabox on the post editor.
- Return
''from anovls_markdownfilter callback.
-
Does it work with page builders like Elementor or Divi?
-
Yes. Any builder that hooks into
the_contentis supported (Elementor, Divi, WPBakery, Beaver Builder). For builders that bypassthe_content, the plugin falls back to fetching the rendered frontend HTML and extracting the main content area. -
Is it compatible with caching plugins?
-
Yes. Markdown is stored in WordPress transients. Object caches (Redis, Memcached) work transparently — and Clear cache correctly invalidates them, not just the database. Full-page caching layers (WP Rocket, W3 Total Cache, LiteSpeed Cache) serve fresh Markdown on the next request after a save.
-
/llms.txt returns 404 on WPEngine / Kinsta / managed nginx hosts
-
Some managed WordPress hosts configure their nginx to serve static file extensions (
.txt,.xml, …) directly from disk without passing the request to WordPress. When the file is generated dynamically by a plugin, that produces a 404 because nothing exists on disk.Fix: enable Settings One-V LLM Serve Write llms.txt to disk. The plugin then maintains a real
/llms.txtfile at the site root, regenerating it on every post save, ACF change, or settings update. The file carries a marker comment on the first line; the plugin refuses to overwrite a/llms.txtit did not create. On plugin deletion the managed file is removed viauninstall.php. -
Does disk-mode work on WordPress installed in a subdirectory or on multisite?
-
Not currently. The disk-mode writer assumes WordPress is installed at the site root (
ABSPATHis the public root). Subdirectory installs (/wp/) and multisite are not supported by disk-mode yet — for those, the dynamic rewrite-rule path is still available on hosts where nginx forwards.txtrequests to PHP (most hosts other than WPEngine/Kinsta). -
My .md endpoint returns the same X-Robots-Tag to every bot. Why?
-
The plugin sends a User-Agent-conditional
X-Robots-Tag: AI crawlers (GPTBot, ClaudeBot, PerplexityBot, …) getindex, followso they will use the Markdown variant, while traditional search engines (Bingbot, Googlebot, …) getnoindex, followso the canonical HTML page remains the sole entry in SERPs. To keep shared caches from collapsing the two variants into one, the response carriesVary: User-AgentandCache-Control: private, max-age=0, must-revalidate.These signals work on standard WordPress hosting. They can be overridden by edge layers in specific hosting / CDN configurations:
- Some managed WordPress hosts (Kinsta, WPEngine, Pressable, SiteGround, and others) ship default edge caching that treats static-looking file extensions like
.mdas long-lived static assets and rewrites the plugin’sCache-Controlheader to a public, long-max-agevalue. - Some CDNs (most notably Cloudflare on its default cache key) ignore
Vary: User-Agententirely — they cache one variant per URL and serve it to every visitor regardless of UA.
When one of these is in front of your site, the first
.mdrequest to reach the edge caches the response for everyone afterwards. The plugin is still doing the right thing at origin, but visitors only ever see the cached copy.Diagnosing it. Open a terminal and run:
curl -skI -A "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" https://example.com/your-page.mdHeaders to look for:
cf-cache-status: HIT(Cloudflare),x-kinsta-cache: HIT,x-cache: HIT(generic) — the response is coming from the edge cache.age: <large number>— the response has been sitting in cache for that many seconds.cache-control: public, max-age=<large>instead of the plugin’sprivate, max-age=0— your host or CDN has rewritten it.
Then add a cache-busting query string and try again:
curl -skI -A "Mozilla/5.0 (compatible; bingbot/2.0; …)" "https://example.com/your-page.md?cb=12345"If the headers are now correct (
x-robots-tag: noindex, followfor Bingbot,index, followfor an AI UA), the plugin is fine — the edge is the source of the problem.Fixing it. The fix has to be applied at the layer that is caching, not in WordPress. Common remedies:
- On the CDN, exclude
*.mdURLs from any “Cache Everything” rule, or add a Bypass Cache rule for them. - On managed hosts, contact support and ask them to exempt
*.mdfrom the host’s edge cache (so the plugin’sCache-Control: privateis honoured). - If neither is available, enable Allow search engines to index .md (advanced) at One-V LLM Serve Settings — the plugin then sends
index, followto every UA. The behavior is consistent at the cost of allowing search engines to index the Markdown variants alongside the HTML pages.
- Some managed WordPress hosts (Kinsta, WPEngine, Pressable, SiteGround, and others) ship default edge caching that treats static-looking file extensions like
-
What does the AI bot analytics feature collect?
-
For each
.mdrequest the plugin stores: the timestamp, the User-Agent string (deduplicated via a small dictionary table — one row per unique UA), the referrer hostname only (path and query string are stripped), the requested post or term and its post type / taxonomy, the post language, and the HTTP response code (200 / 304 / 404).Never stored: IP addresses, cookies, session identifiers, geolocation, full referrer URLs with query strings, or any user-account data. Counts visits exclusively to
.mdURLs — the regular HTML pages are not tracked.Detailed per-hit events are kept for the number of days you choose in Settings (default: 365). Older events can optionally be rolled up into a daily aggregate table for long-term trend charts — the aggregate retains only the bucket / language / referrer-bucket / response-code dimensions, no per-UA or per-post detail. Daily WP-cron
ovls_events_cleanupenforces the retention.Analytics is enabled by default and can be turned off at One-V LLM Serve Settings. All stored data can be wiped at One-V LLM Serve Analytics Reset analytics. Uninstalling the plugin drops the three analytics tables (
ovls_events,ovls_ua_dict,ovls_events_archive) completely. -
Why doesn’t my analytics show every AI bot that visits?
-
The plugin classifies crawlers by their User-Agent header. Almost all major AI companies (OpenAI, Anthropic, Perplexity, Google, Apple, Amazon, Meta, ByteDance, Cohere, Mistral, Common Crawl, …) honestly self-identify because they have publicly committed to respecting
robots.txt. Those are detected accurately.However, some traffic is invisible to User-Agent-based detection:
- Stealth crawlers that spoof a regular browser User-Agent (some training-data brokers, certain less-ethical scrapers).
- Agentic browsers like OpenAI Operator or Claude/Claude-Browse running an actual headless Chrome — technically indistinguishable from a human visit at the header level.
- AI assistants that ingest a page through an integrated third-party fetcher (e.g. a
python-requestsscript) — these show up under “Other bot” rather than the underlying model.
For these the plugin records what it can — they will appear in the “Other bot” bucket or the “Browser” bucket with a sub-classification that flags the request as a Playwright-class headed agent, a script agent, or a UA-shape spoofer rather than a real user. The plugin’s bot signature list is updated each release as new identifiers are publicly documented.
-
How do you tell a real human from a scraper that imitates a browser User-Agent?
-
By looking at request headers that real browsers emit automatically and bare HTTP clients usually don’t. Specifically:
Sec-Fetch-User: ?1— present only when the navigation was triggered by user activation (link click, address-bar Enter, tap-out from an AI app to the system browser). Programmatic navigation in Playwright/Puppeteer doesn’t set it.Sec-CH-UAbrand list — a real downstream browser (Chrome, Edge, Brave, Opera, Vivaldi, Yandex, Samsung Internet, Arc, DuckDuckGo) announces its brand here. The open-source Chromium build that Playwright runs by default does not — it identifies as bare"Chromium". Detectable difference.Sec-Fetch-Site— sent by every modern browser since Safari 16.4 (March 2023). Absence indicates the request didn’t come from a browser engine at all.- User-Agent shape — Chrome ≥ 110 must report
Chrome/X.0.0.0(User-Agent Reduction); a UA claimingChrome/133.0.6943.141with non-zero minor digits is impossible for a real browser and flags the request as a copy-pasted scraper UA.
Combined, these four signals split the “Browser” bucket into verified user, headed agent (real Chromium under automation), script agent (curl/httpx/requests imitating a UA), and spoofer (impossible UA shape). None of this requires JavaScript or cookies — it’s all server-side inspection of the HTTP request the client already sent.
A motivated scraper can manually set these headers to bypass detection. The classifier doesn’t claim to catch every bot ever — it catches default-configuration tools, which is the vast majority.
-
Is the analytics feature GDPR-compliant?
-
The events store no personal data — no IP addresses, no user identifiers, no cookies, no full referrer URLs (the path and query string are stripped before storage, so utm parameters or any PII that might be encoded in a URL never reach the database). User-Agent strings, Fetch Metadata Request Headers (
Sec-Fetch-Site,Sec-Fetch-User), and User-Agent Client Hints (Sec-CH-UA) are technical request-level metadata used to classify automated crawlers and tell browser-class clients apart from script-class clients — they carry strictly less information than the User-Agent and don’t, on their own or in combination, identify a person. The plugin relies on legitimate-interest (Art. 6(1)(f)) as the lawful basis — server-side bot analytics is widely accepted as a legitimate interest, and the suggested privacy text at Tools Privacy discloses precisely what is collected so it can be copied into your site policy.If your jurisdiction requires explicit user disclosure for any server-side analytics, you can turn the feature off at One-V LLM Serve Settings.
Водгукі
Удзельнікі і распрацоўшчыкі
“One-V LLM Serve” is open source software. The following people have contributed to this plugin.
УдзельнікіПеракласці “One-V LLM Serve” на вашу мову.
Зацікаўлены ў распрацоўцы?
Праглядзіце код, праверце SVN рэпазітарый, або падпішыцеся на журнал распрацоўкі па RSS.
Журнал змяненняў
1.1.0
Major feature release: AI-traffic analytics and stronger crawler-facing HTTP semantics. Highlights:
- Added: AI traffic analytics — a new top-level “One-V LLM Serve” admin menu with an Analytics subpage and a WP-Admin dashboard widget. Per-hit events record the UA bucket (AI / search / other-bot / browser / unknown), referrer source, language, target post or term, and response code, with a sticky filter bar that drives every chart and table live, KPI tiles, a time chart, composition donuts, Top tables, a User-Agent classifier transparency table, and a Recent Activity stream. Referrers are stored by hostname only — no IPs, cookies, or full URLs are ever recorded.
- Added: Browser-bucket sub-classification — traffic that looks like a browser is split into verified user / headed agent (Playwright-class automation) / script agent (curl, httpx, requests, LangChain) / spoofer, using server-side fingerprinting of the Sec-Fetch and Sec-CH-UA request headers. No JavaScript, no cookies, no IP.
- Added: Referrer attribution with a six-bucket catalogue (search / chatbot / social / direct / internal / other). The “chatbot” bucket surfaces when an LLM cited your page in an answer and the reader clicked through.
- Added: Forward-compatible classification — when the bot or referrer catalogue is updated in a future release, historical rows are reclassified automatically; no Reset Analytics required.
wp ovls reclassifyruns the same pass on demand. - Changed: User-Agent-conditional
X-Robots-Tag— known AI crawlers receiveindex, followso they ingest the Markdown variant, while search engines receivenoindex, followso the canonical HTML page stays the sole SERP entry. A new “Allow search engines to index .md (advanced)” toggle opts every User-Agent intoindex, follow. - Added: Conditional GET (
ETag/If-None-MatchandLast-Modified/If-Modified-Since) returning 304 without a body, plusX-Content-Type-Options: nosniff,Referrer-Policy: no-referrer,Vary: User-Agent, Accept-Encoding, andCache-Control: private, max-age=0, must-revalidate. - Added:
X-OVLS-VersionHTTP header on every.mdresponse and in the/llms.txtmarker, so operators can confirm the deployed build with a singlecurl -I. - Added: WP-CLI namespace
wp ovls …—flush,regenerate,list,warm, andreclassify. - Added: Multilingual slug-fallback for WPML and Polylang so language-prefixed
.mdURLs resolve to the correct translation instead of the default-language post. - Fixed: Unaddressable permalinks — off-host “Page Links To” links, non-viewable custom post types, and missing permalinks are now skipped everywhere a
.mdlink is built and 404 on direct request, instead of polluting/llms.txt. - Fixed: Clearing the cache now reliably invalidates entries on sites running a persistent object cache (Redis / Memcached), and cached Markdown no longer loads into memory on every request.
- Changed: The admin menu moved to its own top-level “One-V LLM Serve” entry (Settings + Analytics). The settings page slug is unchanged, so existing deep-links keep working.
- Added: Edge-cache guidance for hosts and CDNs (Kinsta, Cloudflare) that cache
.mdacross User-Agents and defeat the conditional headers; Kinsta installs get an in-dashboard notice. - Added:
/llms.txtnow carries the “Generated by One-V LLM Serve” marker in both dynamic and disk-mode delivery, and lists posts of every language on multilingual sites. - Hardening: PHP 8.0 runtime guard, a generator lock plus 30-second wall-clock cap against bot storms, 508 loop detection, dbDelta-failure admin notices, a
/llms.txtsize cap, self-healing rewrite flush on upgrade, and safer HTML/YAML handling. - Added: Suggested Privacy Policy text at Tools Privacy describing exactly what the analytics feature collects and how to opt out.
1.0.2
- Added: rel=”canonical” Link HTTP header on .md responses pointing to the HTML permalink — consolidates SEO signals and avoids duplicate-content indexing. Index.md points at the homepage; taxonomy term .md points at the term archive.
- Added: Disclaimer section in readme covering warranty, liability, and data-transmission stance per GPLv2.
1.0.1
- Fixed: disk-mode for
/llms.txtnow detects multisite and “WordPress in a subdirectory” installs and refuses to write toABSPATHwhen it does not map to the public docroot. Settings page surfaces a clear “unsupported install layout” state instead of silently writing the file to the wrong location. The dynamic rewrite-rule path keeps working on all install layouts. - Fixed: uninstall script applies the same layout check before attempting to delete the managed
/llms.txt.
1.0.0
- Initial release.



