ZENPULSAR’s PUMP tracks mentions of assets in social media and evaluates popularity. This data set provides information about trending assets across multiple social media platforms, as well as how popularity changes among different groups of users: influencers, bots, retail investors. ZENPULSAR’s data centric AI platform “PUMP” monitors in real time multiple social media networks to track activities related to financial and crypto assets and then analyse them. It detects emerging viral narratives likely to form trends and impact financial assets. PUMP clears out the noise of social media with unmatched speed and accuracy. It identifies viral narratives related to the assets you track, early signals you can spot and act on before the crowds and everyone else. ZENPULSAR’s technology is also leveraged by a variety of clients to manage critical events such as product launches, policy platform developments, reputation crisis management, and disinformation campaigns. ZENPULSAR’s PUMP Social Media Momentum provides data about the popularity of monitored assets from different classes like Equities, Crypto, Commodities, FX and Fixed Income in seven major social media platforms: Twitter, Reddit, Seeking Alpha, Facebook, LinkedIn, Telegram, Weibo. Popularity is measured based on various metrics: audience reach, number of posts, comments, likes, and reposts. Our dataset gives the ability to rank assets not only by number of mentions, but also by the level of engagement of the audience to narratives related to the asset and sentiment. Data set can be filtered according to the following parameters: ● Asset name/Ticker; ● Social media networks (Twitter, Reddit, Seeking Alpha, and Telegram); ● Type of accounts (bots or influencers); ● Sentiment (bullish or bearish); ● Time frame. Data analytics methodology Selection of asset-relevant social media posts: This task is done via iterative usage of information retrieval methods such as keyword extraction and topic modelling (LDA, BERTopic, etc.). We extract the keywords for each asset that are commonly used by people. Because a person who wants to influence public opinion on an asset must provide a specific name for the target asset, such as relevant codes or common names, the keywords they choose will help us to identify them. Also, there are fine-tuned models to help us to determine the truth about the financial topics. By combining these methods and models, we can focus on the data to seek the alpha or identify critical events from different influencers. Financial-related classification: To filter the key samples from large amounts of posts and news, we employ one of the state-of-art NLP models (Roberta-XLM) to achieve the best performance. There were already some pre-trained models focused on the news containing traditional assets such as bonds, FX, and stocks. By using weak-supervision learning and the additional internal data related to less traditional assets like crypto (added via such techniques as pseudo-labelling), our fine-tuned classifier can achieve great accuracy and precision. This is a binary classification to predict whether the post is related to finance or not. Account classification: To classify an account as a bot or as an authentic user, we apply a combination of the following techniques: ● NLP-based content analysis - we employ transformer models like google MT5 or XLM-Roberta trained on bot post datasets. ● Heuristics-based features (speed of posting, statistical characteristics based on NER analysis results, etc). Those features are fed to the Support Vector machine classifier. ● The format of recent posts from the same user. Many bots have templates for different posts by putting the text together and transforming it. The model can extract features on it to improve the model. ● Analysis of network topology (bots have a different one from human accounts), specifically betweenness centrality characteristics of an account within an account network (Katz centrality, Pagerank). To classify an account as an influencer or a market analyst, or an abnormal user we apply a combination of the following techniques: ● NLP-based content analysis - transformer models like google MT5 or XLM-Roberta trained on influencer post datasets. ● Analysis of the account following network characteristics of an account, specifically betweenness centrality, within the account network (Katz centrality, Pagerank, Eigenvector centrality). ● Number of followers/reddit karma thresholds. Sentiment detection: We utilise transformer-based models (FinBert, CryptoBert and CryptoRoberta) finetuned on our internal datasets. The model was trained on cryptocurrency and stock data collected from social media, and three classes will be output by the classifier, bearish, neutral, and bullish.