Benefits: Our data is based on qualitative signals in the text, not any metadata. Format & attributes: CSV with UID + score + confidence. Coverage: Currently ~4,000 news outlets (can be customised / expanded upon request). English + Spanish available. Scale: from 1 – 100,000 rows at a time. We indicate how likely a piece of content is computer generated or human written. Content: any text in English or Spanish, from a single sentence to articles of 1,000s words length. Data uniqueness: we use custom built and trained NLP algorithms to assess human effort metrics that are inherent in text content. We focus on what's in the text, not metadata such as publication or engagement. Our AI algorithms are co-created by NLP & journalism experts. Our datasets have all been human-reviewed and labeled. Dataset: CSV containing URL and/or body text, with attributed scoring as an integer and model confidence as a percentage. We ignore metadata such as author, publication, date, word count, shares and so on, to provide a clean and maximally unbiased assessment of how much human effort has been invested in content. Our data is provided in CSV/RSS/JSON format. One row = one scored article. CSV contains URL and/or body text, with attributed scoring as an integer and model confidence as a percentage. Integrity indicators provided as integers on a 1–5 scale. We also have custom models with 35 categories that can be added on request. Data sourcing: public websites, crawlers, scrapers, other partnerships where available. We generally can assess content behind paywalls as well as without paywalls. We source from ~4,000 news outlets, examples include: Bloomberg, CNN, BCC are one each. Countries: all English-speaking markets world-wide. Includes English-language content from non English majority regions, such as Germany, Scandinavia, Japan. Also available in Spanish on request. Use-cases: assessing the implicit integrity and reliability of an article. There is correlation between integrity and human value: we have shown that articles scoring highly according to our scales show increased, sustained, ongoing end-user engagement. Clients also use this to assess journalistic output, publication relevance and to create datasets of 'quality' journalism. Overtone provides a range of qualitative metrics for journalistic, newsworthy and long-form content. We find, highlight and synthesise content that shows added human effort and, by extension, added human value.