The Brain Language Metrics on Earnings Calls Transcripts (BLMECT) dataset has the objective of monitoring several language metrics for the quarterly earnings call transcripts of 4500+ US stocks.
This data set contains historical data from January 2012 and live data updated daily within 12pm UTC.
DATASET STRUCTURE AND KEY FIELDS
The dataset is constituted of a single schema "LANGUAGE_METRICS_EARNINGS_CALLS" and it can be logically divided in two parts.
For both parts the metrics calculation is reported separately for the the following sections of the earnings call:
a. Management Discussion (MD)
b. Analysts’ Questions (AQ)
c. Management Answers to Analysts’ Questions (MA)
Part one includes several language metrics for each section of the most recent earnings call transcript for each stock and it is saved in the table "METRICS_EARNINGS_CALL".
The key metrics of part one are:
1. Financial sentiment; e.g the field "MD_SENTIMENT" refers to the financial sentiment of section MD.
2. Percentage of words belonging to financial domain classified by language types:
- “Constraining” language; e.g the field "MD_SCORE_CONSTRAINING" refers to the percentage of financial domain constraining language of section MD of the last available transcript);
- “Litigious” language; e.g the field "MD_SCORE_LITIGIOUS" refers to the percentage of financial domain litigious language of section MD of the last available transcript);
- “Uncertainty” language; e.g the field "MD_SCORE_UNCERTAINTY" refers to the percentage of financial domain uncertainty language of section MD of the last available transcript).
3. Readability score, e.g. the field MD_READABILITY refers to the reading grade level for the MD section of the last available transcript).
4. Lexical metrics such as lexical density and richness of text, e.g. the field MD_LEXICAL_RICHNESS refers to the lexical richness of the MD section of of the last available transcript)
5. Text statistics such as the transcript length (e.g. the field MD_N_CHARACTERS refers to the length of the section “Management Discussion” measured in number of characters).
Part two includes the differences between the most recent earnings call transcript and the previous one: it is saved in the table "DIFFERENCES_EARNINGS_CALLS".
The key metrics of part two are:
1. Difference of the various language metrics; e.g. the field MD_DELTA_SENTIMENT refers to the difference of financial sentiment between the MD section of the last available transcript nd the same section of the previous transcript.
2. Similarity metrics between documents, also with respect to a specific language type, for example similarity with respect to “litigious” language or “uncertainty” language. For example the field MD_SIMILARITY_UNCERTAINTY refers to the similarity in terms of financial domain “uncertainty” language between the MD section of the last available transcript and the same section of the previous transcript.
FACTSHEET
Link to factsheet: https://braincompany.co/assets/files/BLM_ECT_summary.pdf
DISCLAIMER
The content of this dataset is not to be intended as investment advice. The material is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory or other services by Brain. Brain makes no guarantees regarding the accuracy and completeness of the information expressed in the dataset.