Our Methodology

Last updated: May 2026

Learning Whistle publishes learning paths at two tiers, and we build them differently. This page is a transparent account of how each one is made — and exactly what you can trust it for.

The two tiers, at a glance

Standard paths are quality-controlled AI explainers. Every Standard path is written to a target reading level, checked against a structured quality gate before it is published, and labelled as AI-generated on every station. Standard paths are not individually source-cited — they are an accessible, well-structured starting point for learning, not a citable reference. Community event paths are a kind of Standard path, with added collaboration, mileage, and collectible rewards; they are built the same way.

Premium paths go much further: each one is assembled from verified, open-access sources, with its distinctive factual claims cited to those sources and independently checked. The rest of this page documents that Premium pipeline in full.

See it in practice. Our Premium path on The Immune System is a fully-sourced example — 15 stations grounded in 72 citations to PubMed, the NIH, OpenStax, and peer-reviewed journals, each shown in a Verified Sources panel on the station it supports.

How Standard paths are quality-controlled

Standard paths do not go through source retrieval, license filtering, or per-claim verification, and they carry no citations. What they do go through is a structured quality gate: content is calibrated to the reading level you choose using Flesch-Kincaid grade targets, swept for structural completeness across every station, and scored before publication — paths that fall short are regenerated rather than released. Every Standard station also carries a visible AI-Generated label, so you always know what you are reading. The result is consistent, readable, well-organised explainers; for material you intend to cite or quote, use a Premium path.

Reading levels

You choose how advanced a path is written, in plain language. Standard paths span 3rd grade through 12th grade. Premium Learning Paths reach beyond 12th grade — into college and graduate-level depth. The full ladder:

Standard tier — no cost

3rd GradeSimple words and short sentences for young or new readers
4th GradePlain language with a little more detail
5th GradeClear explanations in everyday vocabulary
6th GradeBuilds basic concepts step by step
7th GradeIntroduces subject terms with explanations
8th GradeConnects ideas with moderate vocabulary
9th Grade (General Public)Accessible to most adults — the recommended starting point
10th GradeMore nuance and subject depth
11th GradeAssumes growing familiarity with the topic
12th GradeFull high-school depth — the top of the Standard tier

Premium Learning Paths — beyond Grade 12

College levelKey theories and frameworks
Upper college levelAssumes subject fluency
Bachelor's degreeConnects ideas across fields, evaluates sources critically
Master's degreeGraduate level — academic writing conventions
Expert / professionalTechnical precision for a professional audience
Research leadResearch-focused, engages with original literature
PhD candidateSpecialist level — assumes deep domain expertise

1. Source Retrieval

Before any content is written, our pipeline queries dozens of authoritative open-access databases. The specific databases used depend on the learning category. In addition, our topic decomposition layer automatically identifies cross-domain connectors relevant to each station — so a path on portrait photography may also draw from optics physics (arXiv), colorimetry (NIST), and visual perception research (PubMed), not just art archives:

Science & Medicine: PubMed Central, Europe PMC, NIH Reporter, openFDA, OpenAlex, StatPearls clinical handbook (via NCBI Bookshelf), ClinicalTrials.gov, OpenStax textbooks
Physics & Mathematics: arXiv, NIST, OpenAlex, OpenStax textbooks
Engineering & Computer Science: arXiv, IETF RFCs, NIST, OpenAlex, NASA Technical Reports, OpenStax Introduction to Computer Science
Chemistry: PubChem, NIST, NIH Reporter, OpenAlex, OpenStax textbooks
Law & Policy: CourtListener (Free Law Project), Congressional Research Service, OpenStax Business Law
History, Arts & Humanities: Europeana, Library of Congress, Smithsonian, Project Gutenberg, OpenStax U.S./World History
Earth Sciences: USGS Publications Warehouse, NASA Technical Reports, OpenAlex, OpenStax
Economics & Politics: World Bank Open Data, Congressional Research Service, OpenAlex, OpenStax Macroeconomics & Microeconomics
Philosophy: Stanford Encyclopedia of Philosophy, Project Gutenberg, OpenAlex, OpenStax Introduction to Philosophy
Literature & Language: Library of Congress, Project Gutenberg, OpenAlex, OpenStax Writing Guide

All sources are retrieved via official public APIs. No scraping of paywalled content. The OpenStax chapter index is built from the publicly-published TOC of each CC BY textbook and refreshed when the catalog updates.

2. License Enforcement

Only sources with the following licenses are admitted into the generation pipeline:

CC0 (Public Domain Dedication)
CC BY (Creative Commons Attribution)
Public Domain
U.S. Government Work

Sources with CC-BY-SA, CC-BY-NC, or unknown license terms are automatically excluded. This is a hard rule — no exceptions, regardless of source quality.

3. Relevance and Coverage Selection

For each station, we cast a wide net and then narrow it. Candidate sources are scored for relevance to that station's specific learning objective, and a cross-encoder reranking model orders them by how well they actually answer it — sources that score as off-topic are dropped before generation begins, so tangential material never reaches a citation.

We don't simply take the top-ranked sources, though. The selection step also balances coverage: a good source set explains the topic from more than one angle — a definition, a worked example, a primary artefact, a real-world case — rather than five sources that all say the same thing. The pipeline favours sources that add a missing perspective and penalises ones that merely repeat what is already selected, so each station rests on a varied, non-redundant evidence base. Authority (peer-reviewed and government sources), full-text availability, recency, and license permissiveness all feed this ranking.

4. Source-Grounded Generation

Gemini Pro writes each station with the retrieved sources held in front of it, and is required to ground its distinctivefactual claims — specific data, named theories, dates, statistics, and direct quotations — in those sources, citing them with inline tokens (e.g. [S1], [S2]) that map back to the retrieved documents. Connecting explanation, analogies, and framing are written at the model's own expertise and held to your chosen reading level; they are not citation-bearing claims. In other words, the parts of a station that assert a checkable fact are tied to a real source, while the prose that explains and connects those facts is the model's own — clearly the kind of writing a knowledgeable author produces, not an invented citation.

5. Independent Verification

After generation, each station is checked by a separate verification pass — a different AI model, with no shared context with the writer — that reads the station against the same retrieved sources and flags:

Invalid citations: a citation token that doesn't point to a real retrieved source
Unsupported claims: a cited source that doesn't actually support the claim it's attached to
Misquotes: a quoted passage that doesn't match the source word-for-word

Invalid citations and misquotes are treated as zero-tolerance defects: the offending citation or quote is corrected or removed before the station is delivered, never shipped as-is. The verification report is stored with the path.

6. Wikidata Cross-Check

When a Premium path is made public, each station also undergoes an independent fact audit against the Wikidata knowledge graph(CC0) — the same structured database that underpins Wikipedia's infoboxes. An AI model extracts up to five atomic, verifiable claims per station (specific dates, numerical values, named relationships, scientific constants) and queries Wikidata's SPARQL endpoint for each. Any discrepancy is logged, classified by severity, and stored in the path audit record. This is an additional layer of fact-checking beyond source verification — checking our content against a second, independently maintained knowledge base before it is published to a public audience.

7. Quality Gate

Every Premium path is scored against a 100-point quality gate before delivery, covering source grounding, structure, reading-level accuracy, and content completeness. A path that fails the gate for a genuinely-blocking fault — one we cannot repair in-process — is not released, and the Gold Ticket cost is refunded in full.

8. Verified Source Databases

The following databases are queried during Premium path generation. Every database listed is free, open-access, and operated by a government agency, academic institution, or established non-profit. No paywalled or commercially licensed content is ever used.

Science & Medicine

PubMed Central (U.S. National Library of Medicine) — peer-reviewed biomedical and life sciences literature; open-access full text
Europe PMC (European Bioinformatics Institute) — biomedical literature including preprints; CC-BY and CC0 content
NIH Reporter (National Institutes of Health) — federally funded research project abstracts and plain-language summaries
openFDA (U.S. Food & Drug Administration) — FDA-approved drug labels including clinical pharmacology and mechanism-of-action text
PubChem (National Center for Biotechnology Information) — curated chemical compound database with biological activity data
StatPearls (via NCBI Bookshelf) — 100,000+ peer-reviewed clinical handbook chapters covering diagnosis, treatment, and management; CC BY 4.0. Also includes NICE Guidelines (UK NHS), GeneReviews, and AHRQ evidence syntheses.
ClinicalTrials.gov (U.S. National Library of Medicine) — comprehensive registry of clinical studies worldwide, with study design, interventions, eligibility, and primary outcomes; public domain

Physical Sciences, Engineering & Mathematics

arXiv (Cornell University) — open-access preprints in physics, mathematics, computer science, and engineering
NIST (National Institute of Standards and Technology) — government measurement science publications and the NIST Digital Library of Mathematical Functions
IETF RFCs (Internet Engineering Task Force) — internet and networking standards documents

Space & Earth Sciences

NASA Technical Reports (National Aeronautics and Space Administration) — mission publications, research reports, and technical documents
USGS Publications Warehouse (U.S. Geological Survey) — earth science, geology, hydrology, and geography publications

Law, Policy & Economics

CourtListener (Free Law Project) — U.S. federal and state court opinions, statutes, and legal documents; CC0
Congressional Research Service Reports (U.S. Congress) — nonpartisan policy research and analysis on every area of public policy; government work
World Bank Open Data (World Bank Group) — global economic, financial, and development data and research publications

History, Arts & Humanities

Library of Congress (U.S. Congress) — digitised historical collections, manuscripts, maps, photographs, and published works; government work and CC0
Europeana (European Union) — cultural heritage collections from 3,000+ European museums, libraries, and archives; CC0 and CC-BY
Smithsonian Open Access (Smithsonian Institution) — digitised collections from 19 Smithsonian museums and research centres; CC0

Philosophy

Stanford Encyclopedia of Philosophy (Stanford University) — peer-reviewed reference work with expert-authored entries on every area of philosophy; CC-BY

Free Open Textbooks

OpenStax (Rice University) — 127 free, peer-reviewed college and high-school textbooks across biology, chemistry, physics, anatomy & physiology, microbiology, nursing, anthropology, psychology, sociology, history, economics, statistics, calculus, business, computer science, philosophy, writing, and more. Approximately 12,000 chapter-level sources indexed. CC BY 4.0.

Universal Scholarly Discovery

OpenAlex (OurResearch, non-profit) — open index of 250 million+ scholarly works, authors, institutions, and citations across all disciplines; CC0

Public Domain Primary Texts

Project Gutenberg — 70,000+ public domain books in full text, including primary philosophical works, literary classics, historical documents, and early scientific writing; all content is public domain (copyright: false)

Limitations

Open-access coverage varies by field. Some specialized topics may have fewer available sources, particularly in areas where most authoritative research is behind paywalls. When fewer than 2 verified sources are found for too many stations in a path, generation is halted before any content is written and your Gold Tickets are refunded in full. We will never generate content without sufficient source grounding.

Arts, humanities, and cultural heritage categories (Music, Architecture, Visual Arts, History, Literature) draw from Europeana, the Smithsonian, and the Library of Congress. These databases return collection metadata — catalogue records, accession notes, digitised manuscripts — rather than explanatory scholarly prose. This content functions as primary source context: it grounds claims in real artefacts and documented history, but it is not the same as a peer-reviewed analysis of technique or style. Paths in these categories are informed by authentic primary sources; they are not synthesised from academic literature in the same way as science or engineering paths.

Our verification system reduces but does not eliminate errors. If you find a factual inaccuracy, please report it via our errata page.

Browse learning paths by category

Learning Whistle publishes paths across 20 subject areas. Each category page lists all published paths, with links to read, quiz, and earn miles.

Learning Whistle — Your Ticket to Knowledge · learn@learningwhistle.com