Data Processing and Transformation
Once data is ingested, we perform a series of transformations to normalize, clean, and enrich it.
Normalization & Enrichment
Symbol Mapping: Converting “AAPL” from one provider to “Apple Inc.” in another source. Crypto tickers can also differ across exchanges, so we unify them under standard references.
Timestamp Alignment: Market data can arrive with different time zones or slight delays; aligning them ensures correct correlation.
Missing Data Handling: Forward-fill or interpolation for minor gaps. If entire data blocks are missing, we might mark them with special flags to avoid misleading results.
Feature Engineering
Feature engineering is critical for anomaly detection and correlation analysis. Typical transformations might include:
Time-Series Indicators: Moving averages, RSI, Bollinger Bands, implied volatility surfaces for options, etc.
Fundamental Ratios: P/E, P/B, free cash flow, debt ratios.
Sentiment Scores: Calculated by applying NLP to news articles, social media posts. We may use LLM-based sentiment classifiers to gauge positivity/negativity on specific tickers.
Volatility Metrics: Historical volatility, intraday volatility spikes, put/call ratios.
Clustering Features: Grouping correlated assets together to find sector-level or factor-level anomalies.
Real-Time vs. Batch Processing
Real-Time Stream Processing: For immediate signal generation (e.g., “unusual options activity,” “social media mentions spike”)
Batch Processing: For deeper analytics—like training large ML models on historical data or generating monthly factor models
Output from these transformations populates our Analytical Store ( NoSQL DB), which the AI Agents can query.
Last updated