Brain Language Metrics on Company Filings for 6000+ US Stocks

The Brain Language Metrics (BLMCF) on Company Filings dataset monitors several language metrics on 10-Ks and 10-Qs company reports 6000 + US stocks. The Brain Language Metrics on Company Filings (BLMCF) dataset has the objective of monitoring several language metrics on 10-Ks and 10-Qs company reports for approximately 6000+ US stocks. Recent literature works claim inefficiencies in the market response to company filings information due to the increased complexity and length of such reports (see for example “Lazy Prices” Cohen et al. 2018 or “ The Positive Similarity of Company Filings and the Cross-Section of Stock Returns”, M. Padysak 2020). Our dataset is made of two parts; the first one includes the language metrics of the most recent 10-K or 10-Q report for each firm, namely: 1. Financial sentiment 2. Percentage of words belonging to financial domain classified by language types. e.g. “Constraining” language or “Litigious” language Factsheet https://braincompany.co/assets/files/BLM_CF_V2_summary.pdf

N/A

countries

15%

popularity

Brain Language Metrics on Company Filings (History Trial)

The Brain Language Metrics on Company Filings (BLMCF) dataset has the objective of monitoring several language metrics on 10-Ks and 10-Qs company reports for approximately 6000+ US stocks. Example of metrics are financial sentiment, percentage of specific language type in the document (e.g. litigious language) and similarity among documents. This extended version provides additional language metrics and an analysis of the whole report together with specific report sections (e.g. Risk Factors).

N/A

countries

76%

popularity

Brain Language Metrics on Earnings Calls Transcripts - Live Feed

The Brain Language Metrics on Earnings Calls Transcripts (BLMECT) dataset has the objective of monitoring several language metrics for the quarterly earnings call transcripts of 4500+ US stocks. This data set contains historical data from January 2012 and live data updated daily within 12pm UTC. DATASET STRUCTURE AND KEY FIELDS The dataset is constituted of a single schema "LANGUAGE_METRICS_EARNINGS_CALLS" and it can be logically divided in two parts. For both parts the metrics calculation is reported separately for the the following sections of the earnings call: a. Management Discussion (MD) b. Analysts’ Questions (AQ) c. Management Answers to Analysts’ Questions (MA) Part one includes several language metrics for each section of the most recent earnings call transcript for each stock and it is saved in the table "METRICS_EARNINGS_CALL". The key metrics of part one are: 1. Financial sentiment; e.g the field "MD_SENTIMENT" refers to the financial sentiment of section MD. 2. Percentage of words belonging to financial domain classified by language types: - “Constraining” language; e.g the field "MD_SCORE_CONSTRAINING" refers to the percentage of financial domain constraining language of section MD of the last available transcript); - “Litigious” language; e.g the field "MD_SCORE_LITIGIOUS" refers to the percentage of financial domain litigious language of section MD of the last available transcript); - “Uncertainty” language; e.g the field "MD_SCORE_UNCERTAINTY" refers to the percentage of financial domain uncertainty language of section MD of the last available transcript). 3. Readability score, e.g. the field MD_READABILITY refers to the reading grade level for the MD section of the last available transcript). 4. Lexical metrics such as lexical density and richness of text, e.g. the field MD_LEXICAL_RICHNESS refers to the lexical richness of the MD section of of the last available transcript) 5. Text statistics such as the transcript length (e.g. the field MD_N_CHARACTERS refers to the length of the section “Management Discussion” measured in number of characters). Part two includes the differences between the most recent earnings call transcript and the previous one: it is saved in the table "DIFFERENCES_EARNINGS_CALLS". The key metrics of part two are: 1. Difference of the various language metrics; e.g. the field MD_DELTA_SENTIMENT refers to the difference of financial sentiment between the MD section of the last available transcript nd the same section of the previous transcript. 2. Similarity metrics between documents, also with respect to a specific language type, for example similarity with respect to “litigious” language or “uncertainty” language. For example the field MD_SIMILARITY_UNCERTAINTY refers to the similarity in terms of financial domain “uncertainty” language between the MD section of the last available transcript and the same section of the previous transcript. FACTSHEET Link to factsheet: https://braincompany.co/assets/files/BLM_ECT_summary.pdf DISCLAIMER The content of this dataset is not to be intended as investment advice. The material is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory or other services by Brain. Brain makes no guarantees regarding the accuracy and completeness of the information expressed in the dataset.

N/A

countries

69%

popularity

Brain Language Metrics on Earnings Calls - 4500+ US Stocks

The exploitation of textual unstructured content (news, company filings, earnings calls etc) in financial analysis is quickly expanding across both quantitative and discretionary strategies as demonstrated by the growing number of academic papers and products in this domain. The Brain Language Metrics on Earnings Calls Transcripts (BLMECT) dataset has the objective of monitoring several language metrics the quarterly earnings call transcripts for 4500+ US stocks. The dataset is made of two parts; one includes the language metrics for the most recent earnings call transcript for each stock, namely: 1) Financial sentiment 2) Percentage of words belonging to financial domain classified by language types: - “Constraining” language - “Litigious” language - “Uncertainty” language 3) Readability score 4) Lexical metrics such as lexical density and richness 5) Text statistics such as the report length and the average sentence length The second part includes the differences between the most recent earnings call transcript and the previous one: 1) Difference of the various language metrics (e.g. delta sentiment, delta readability score delta, delta percentage of a specific language type etc.) 2) Similarity metrics between documents, also with respect to a specific language type (for example similarity with respect to “litigious” language or “uncertainty” language) The metrics calculation is reported separately for the following sections of the transcript: a) Management Discussion b) Analysts Questions c) Management Answers to Analysts Questions The dataset is updated with a daily frequency since new earnings calls transcripts are published every day for some of the universe stocks. Clearly the data for each stock will change on a quarterly basis when new earnings calls are published. The historical dataset is available from year 2012. Factsheet https://braincompany.co/assets/files/BLM_ECT_summary.pdf Data dictionary https://braincompany.co/assets/files/BLM_ECT_data_dictionary.pdf

N/A

countries

15%

popularity

Brain Language Metrics on Earnings Calls Transcripts (History Trial)

The Brain Language Metrics on Earnings Calls Transcripts (BLMECT) dataset has the objective of monitoring several language metrics for the quarterly earnings call transcripts of 4500+ US stocks. With this dataset we aim at providing additional building blocks to asset managers to build investment strategies based on alternative data.

N/A

countries

77%

popularity

Save to list

Save to list

We care about your privacy

Your account

Your Datahub

Brain Language Metrics on Company Filings - Live Feed

Market Insights

Financial Data

Security Data

Description

Your datahub

How can we help?

Something went wrong

Ticket submitted, we will be in touch!

How can we help you?Type your business problem

Brain Language Metrics on Company Filings - Live Feed

Market Insights

Financial Data

Security Data

Description

Geographics

Similar products

Brain Language Metrics on Company Filings for 6000+ US Stocks

Brain Language Metrics on Company Filings (History Trial)

Brain Language Metrics on Earnings Calls Transcripts - Live Feed

Brain Language Metrics on Earnings Calls - 4500+ US Stocks

Brain Language Metrics on Earnings Calls Transcripts (History Trial)

Geographics

Similar products

Brain Language Metrics on Company Filings for 6000+ US Stocks

Brain Language Metrics on Company Filings (History Trial)

Brain Language Metrics on Earnings Calls Transcripts - Live Feed

Brain Language Metrics on Earnings Calls - 4500+ US Stocks

Brain Language Metrics on Earnings Calls Transcripts (History Trial)