ZENPULSAR’s data centric AI platform “PUMP” monitors in real time multiple social media networks to track activities related to financial and crypto assets and then analyse them. It detects emerging viral narratives likely to form trends and impact financial assets. PUMP clears out the noise of social media with unmatched speed and accuracy.
It identifies viral narratives related to the assets you track, early signals we can spot and act on before the crowds and everyone else. ZENPULSAR’s technology is also leveraged by a variety of clients to manage critical events such as product launches, policy platform developments, reputation crisis management, and disinformation campaigns.
We are providing time series social media data relevant to selected assets. The data is extracted from Twitter, Reddit, Seeking Alpha and Telegram.
The data provided can be split into 4 categories:
1. Data describing sentiment of social media posts
1a. Number of social media posts with bullish/bearish sentiment towards a target asset per period
1b. Number of upvotes/downvotes, likes, replies, comments, cross-posts of the posts with bullish/bearish sentiment towards target asset per period
2. Data describing activity of social media accounts
2a. Number of social media posts per period
3. Data describing engagement of social media accounts
3a. Number of likes and upvotes/downvotes per period
3b. Number of replies and comments to the posts per period
3c. Number of retweets and cross-posts per period
4. Data describing credibility of social media accounts
4a. Number of Social media posts done by accounts identified as bots/not bots per period
4b. Number of Upvotes/downvotes, likes, replies, comments, cross-posts of the posts done by accounts identified as bots/non-bots per period
4c. Number of social media posts done by accounts identified as influencers/market analysts per period
4d. Number of upvotes/downvotes, likes, replies, comments, cross-posts of the posts done by accounts influencers/market analysts per period
Data analytics methodology
Selection of asset-relevant social media posts:
This task is done via iterative usage of information retrieval methods such as keyword extraction and topic modelling (LDA, BERTopic, etc.). We extract the keywords for each asset that are commonly used by people. Because a person who wants to influence public opinion on an asset must provide a specific name for the target asset, such as relevant codes or common names, the keywords they choose will help us to identify them. Also, there are fine-tuned models to help us to determine the truth about the financial topics. By combining these methods and models, we can focus on the data to seek the alpha or identify critical events from different influencers.
Financial-related classification:
To filter the key samples from large amounts of posts and news, we employ one of the state-of-art NLP models (Roberta-XLM) to achieve the best performance. There were already some pre-trained models focused on the news containing traditional assets such as bonds, FX, and stocks. By using weak-supervision learning and the additional internal data related to less traditional assets like crypto (added via such techniques as pseudo-labelling), our fine-tuned classifier can achieve great accuracy and precision. This is a binary classification to predict whether the post is related to finance or not.
Account classification:
To classify an account as a bot or as an authentic user, we apply a combination of the following techniques:
- NLP-based content analysis - we employ transformer models like google MT5 or XLM-Roberta trained on bot post datasets.
- Heuristics-based features (speed of posting, statistical characteristics based on NER analysis results, etc). Those features are fed to the Support Vector machine classifier.
- The format of recent posts from the same user. Many bots have templates for different posts by putting the text together and transforming it. The model can extract features from the format to improve the model.
- Analysis of network topology (bots have a different one from human accounts), specifically betweenness centrality characteristics of an account within an account network (Katz centrality, Pagerank).
To classify an account as an influencer or a market analyst, or an abnormal user we apply a combination of the following techniques:
- NLP-based content analysis - transformer models like google MT5 or XLM-Roberta trained on influencer post datasets.
- Analysis of the account following network characteristics of an account, specifically betweenness centrality, within the account network (Katz centrality, Pagerank, Eigenvector centrality).
- Number of followers/reddit karma thresholds.
Sentiment detection:
We utilise transformer-based models (FinBert, CryptoBert and CryptoRoberta) finetuned on our internal datasets. The model was trained on cryptocurrency and stock data collected from social media, and three classes will be output by the classifier, bearish, neutral, and bullish.