From data-integration
Designs and manages market data infrastructure for financial trading: real-time/delayed feeds, Level 1/2/3 depth, SIP vs direct feeds, vendor selection (Bloomberg, Refinitiv), licensing, entitlements, ticker plants, and data quality.
npx claudepluginhub joellewis/finance_skills --plugin data-integrationThis skill uses the workspace's default tool permissions.
Guide the design and management of market data infrastructure for financial services
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Guide the design and management of market data infrastructure for financial services firms. Covers real-time and delayed market data, depth of book levels, consolidated tape and direct feeds, data vendor selection and management, market data licensing and entitlements, data distribution architecture, and market data quality management. Enables building or evaluating market data infrastructure that delivers accurate, timely data to trading, portfolio management, and client-facing systems.
13 — Data Integration (Reference Data & Integration)
both
Level 1 — Top of Book: NBBO, last sale, volume, daily OHLC. Sufficient for portfolio management, client reporting, and order entry. Lowest cost and bandwidth.
Level 2 — Market Depth: Multiple price levels with aggregate size (top 5-20 levels per side). Reveals liquidity beyond the NBBO. Essential for active trading, market impact assessment, and algorithmic execution (TWAP, VWAP). Higher cost and bandwidth.
Level 3 — Full Order Book: Individual order detail (price, size, order ID) enabling complete book reconstruction and order lifecycle tracking. Provided by direct feeds (Nasdaq ITCH, NYSE Arca). Required for market making, HFT, and queue position modeling. Highest cost — hundreds of thousands of messages per second per exchange.
| Use Case | Level | Rationale |
|---|---|---|
| Portfolio management / reporting | Level 1 | NBBO and last sale sufficient for valuation |
| Active equity trading desk | Level 2 | Traders assess depth before large orders |
| Algorithmic execution | Level 2 | Algorithms adapt pace based on available liquidity |
| Market making / HFT | Level 3 | Requires queue position and order flow modeling |
| Client-facing app (delayed) | Level 1 (delayed) | Display only, 15-minute delay acceptable |
Securities Information Processors (SIPs): CTA/CQS for NYSE-listed (Tape A/B), UTP for Nasdaq-listed (Tape C), OPRA for options. SIPs collect data from all exchanges, compute the NBBO, and disseminate a consolidated stream. Under Reg NMS, the SIP NBBO is the regulatory benchmark for best execution.
Direct exchange feeds: Proprietary feeds from individual exchanges (NYSE Arca, Nasdaq TotalView/ITCH, Cboe PITCH, IEX DEEP) delivering order-by-order data with lower latency than the SIP. A firm must subscribe to multiple feeds and compute NBBO internally. Each exchange uses different protocols requiring per-exchange parsers.
| Dimension | SIP (Consolidated) | Direct Feeds |
|---|---|---|
| Latency | Higher (~10-50 microseconds SIP processing) | Lower (bypasses SIP) |
| NBBO | Provided directly | Must compute from multiple feeds |
| Data depth | Level 1 (NBBO + last sale) | Level 2/3 (full depth, order-by-order) |
| Cost | Lower, predictable | Higher, scales with exchange count |
| Normalization | Pre-normalized | Requires per-exchange parsers |
| Typical consumer | Buy-side, advisory, retail | Prop trading, market making, HFT |
Bloomberg: Terminal ($20K-$25K/user/year), B-PIPE (enterprise real-time feed), Data License (bulk EOD/reference data), BEAP (cloud API).
Refinitiv (LSEG): Eikon (desktop, lower cost than Bloomberg, strong FX/FI), Elektron/ LSEG Real-Time (enterprise feed), DataScope (bulk EOD), Tick History (historical ticks).
ICE Data Services: Consolidated feeds, evaluated fixed income pricing (widely used for NAV and regulatory reporting), ICE Benchmark Administration.
FactSet: Research-oriented, flexible API delivery, competitive pricing for smaller buy-side, strong Excel/portfolio management integration.
S&P Capital IQ / Market Intelligence: Comprehensive fundamentals, credit ratings, company filings. Morningstar: Fund/ETF data, ratings, Morningstar Direct for research.
Free/open sources: Exchange websites and financial portals provide delayed (15-min) quotes. Useful for non-time-sensitive display but limited reliability and coverage.
Vendor selection criteria: Asset class coverage, latency, reliability/uptime SLA, API quality, total licensing cost (including exchange fees), historical data depth, support, data quality handling.
License categories: Non-professional (retail, personal use, lower fees) vs professional (business use, significantly higher). Display (human views on screen) vs non-display (automated systems: algorithms, risk engines, pricing — fees based on application type, not per-user). Derived data (substantially transformed; redistribution may be permitted if original data cannot be reverse-engineered; policies vary by exchange).
Licensing models: Per-user/per-device (exact monthly count required), enterprise (flat fee covering a defined entity), usage-based non-display (fees by application category: trading, risk, valuation).
Reporting obligations: Monthly/quarterly subscriber counts submitted to each exchange or via data vendor. Under-reporting triggers back-billing, penalties, and contract termination.
Redistribution: Raw exchange data requires explicit redistribution agreements and additional fees for client-facing display. Vendors typically handle redistribution for data consumed through their platforms.
Cost management: Audit usage periodically to eliminate unused subscriptions. Use delayed data where real-time is unnecessary. Track non-display use — many firms discover unreported non-display obligations only during exchange audits.
Ticker plant: Central ingestion and normalization layer. Parses exchange protocols (ITCH, PITCH, FIX), normalizes to unified schema, maps symbology, caches latest values, applies conflation, and monitors feed health.
Fan-out patterns: Topic-based pub-sub (dominant pattern; middleware: Solace, TIBCO, 29West, Kafka for lower-latency needs), request-reply (REST for on-demand lookups), multicast (network-level fan-out for ultra-low-latency co-located environments).
Conflation: Throttles update rates for slower consumers. Time-based (deliver latest value every N ms), change-based (suppress duplicates), priority-based (never conflate trades; conflate quotes for slower consumers).
APIs: REST for historical/reference data, WebSocket for real-time streaming to web/mobile applications, proprietary binary APIs for ultra-low-latency consumers.
Cloud services: AWS Data Exchange, Google Cloud Marketplace, Azure Data Share. Adds network latency (unsuitable for latency-sensitive trading) but appropriate for analytics, portfolio management, and client-facing applications.
EOD databases: Daily OHLCV. Sufficient for portfolio analytics and long-horizon backtesting. Tick-level data: Every trade/quote with microsecond timestamps. Required for intraday backtesting and microstructure research. A single day of U.S. equity ticks may exceed 10-20 TB. Providers: Refinitiv Tick History, NYSE TAQ, LOBSTER.
Adjusted vs unadjusted prices: Unadjusted for trade-level analysis and regulatory records. Split-adjusted and fully adjusted (splits + dividends) for return calculations.
Survivorship bias: Databases including only current listings inflate backtested returns. Point-in-time databases (showing the universe as it existed historically) are required for unbiased research. Point-in-time data also applies to fundamentals: initial earnings reports may be restated; using restated data introduces look-ahead bias.
Stale data detection: Flag quotes not updated within expected timeframes during market hours. Suppress stale data from trading and valuation decisions.
Gap detection: Feed-level (sequence number gaps in ITCH/PITCH) and application-level (expected vs actual data frequency).
Erroneous tick filtering: Process exchange trade-bust messages. Filter outlier prints (prices far from NBBO, adjusted for spread and volatility). Distinguish legitimate unusual trades (blocks, auctions, after-hours) from errors.
Monitoring and alerting: Feed health dashboards, latency tracking (exchange-to-receipt), volume monitoring against baselines, automated alerts for disconnections, latency spikes, staleness, and gaps.
Failover: Primary/secondary feed architecture with automatic failover on disconnection, excessive latency, or quality breach. Downstream systems must handle graceful degradation (e.g., losing Level 3 depth when failing from direct feed to SIP).
| Metric | Target |
|---|---|
| Feed uptime (trading hours) | > 99.95% |
| Median latency | < 1ms (direct), < 50ms (SIP) |
| 99th percentile latency | < 10ms (direct), < 100ms (SIP) |
| Staleness rate | < 0.1% of instruments |
| Gap rate | < 0.01% of expected messages |
Scenario: A $2B RIA with 3,000 client accounts needs: real-time quotes for 15 portfolio managers/traders, delayed data for 40 client service associates, EOD data for portfolio accounting and performance, historical data for research, and a client portal with current market values.
Data level assessment: Level 1 is sufficient. The firm places client orders, not market making or HFT. This significantly reduces cost and infrastructure complexity.
Vendor evaluation:
| Option | Est. Annual Cost | Key Trade-off |
|---|---|---|
| Bloomberg (15 Terminals + Data License) | $375K-$425K | Deep analytics but expensive per-terminal model |
| Refinitiv Eikon (15 seats) + DataScope | $200K-$275K | Lower cost but smaller user community |
| FactSet (15 seats) + EOD package | $150K-$225K | Flexible pricing, strong API, less real-time trading depth |
FactSet offers the best balance for this firm: real-time quotes and screening for portfolio managers, historical data and factor tools for research, and API access for internal systems.
Client portal data strategy: Real-time redistribution would add $100K-$200K/year in exchange fees for 3,000 non-professional users. The firm selects 15-minute delayed data, eliminating redistribution fees and clearly labeling prices as delayed.
Exchange licensing: 15 professional users for real-time Level 1. 40 associates on delayed data (no exchange license). Client portal on delayed data (no redistribution fees). One administrator handles monthly subscriber reporting through the vendor.
Analysis: Total cost of approximately $175K-$250K vs $400K+ for Bloomberg-centric. The architecture separates real-time (licensed professionals) from delayed (everyone else), minimizing licensing complexity. Annual vendor reviews and usage audits ensure compliance.
Scenario: A broker-dealer building an institutional equity platform with real-time market data display, smart order routing, execution algorithms (TWAP, VWAP), and post-trade TCA. Must balance latency, completeness, cost, and Reg NMS compliance.
The firm needs both SIP and direct feeds: SIP provides the authoritative NBBO for best execution compliance. Direct feeds from major exchanges provide the per-exchange depth that smart order routing and algorithms require.
Feed selection: Direct feeds from NYSE Arca, Nasdaq TotalView (ITCH), NYSE (Pillar), Cboe BZX/EDGX (PITCH), and IEX DEEP — covering the majority of volume. Lower-volume exchanges added later if routing analysis indicates missed liquidity.
Ticker plant design: (1) Feed handlers per exchange with kernel bypass networking, (2) NBBO calculator comparing internal NBBO against SIP for validation, (3) Book builder maintaining per-exchange and consolidated order books, (4) Pub-sub publishing layer with full-rate feeds for algorithms and conflated feeds for client displays, (5) Historical capture for TCA, regulatory records, and strategy research.
Redistribution licensing: Displaying real-time data to institutional clients requires redistribution agreements with each exchange, monthly professional user reporting, and per-user fees — or enterprise redistribution pricing if economical.
Analysis: Total market data cost is substantial: direct feeds ($300K-$500K/year), SIP ($50K-$100K/year), ticker plant build ($200K-$400K initial), redistribution fees ($100K-$500K/year). Market data is one of the largest operating costs for an electronic platform. Budget for annual exchange fee increases.
Scenario: A 200-employee multi-strategy hedge fund (New York, London, Hong Kong) receives an NYSE audit notification. Subscriber counts have been estimated rather than tracked, and the fund is uncertain whether its risk system's use of NYSE pricing constitutes non-display use.
Data consumption inventory: The fund catalogs all NYSE data consumers: (1) Display users — every Bloomberg Terminal, Eikon desktop, and internal dashboard showing NYSE real-time data. Result: 120 professional display users found vs 95 previously reported. (2) Non-display applications — algorithmic trading, risk (VaR/Greeks), portfolio valuation, OMS, pricing engines. Result: 8 unreported non-display applications identified. (3) Derived data — a daily position file with NYSE closing prices sent to the prime broker requires review against NYSE's derived data policy.
Remediation: File amended subscriber reports (expect back-billing). Register non-display applications by category (A: trading, B: internal non-trading, C: derived/redistribution). Deploy an entitlement management platform (Bloomberg SSEOMS, Refinitiv DACS, or dedicated tools like TRG Screen). Establish provisioning/deprovisioning policies. Automate monthly subscriber count generation and reconciliation.
Financial exposure:
| Gap | Estimated Back-Billing |
|---|---|
| Display under-reporting (25 users x 12 months) | $75K-$150K |
| Non-display applications (8, some Category A) | $200K-$500K |
| Potential redistribution (1 flow under review) | $0-$100K |
| Total exposure | $275K-$750K |
Analysis: Remediation cost ($100K-$200K for entitlement system + ongoing administration) is modest vs audit exposure. Market data entitlement management must be a formal compliance function. Conduct internal audits annually before exchanges audit externally.
Conflating SIP NBBO with direct feed best prices. The SIP NBBO is the Reg NMS regulatory benchmark. A firm's internally computed NBBO from direct feeds may differ due to latency. For best execution compliance, the SIP NBBO is authoritative.
Under-reporting exchange subscribers. Estimating rather than counting professional users and non-display applications risks material back-billing during exchange audits.
Ignoring non-display use fees. Any system consuming exchange data for automated purposes (algorithms, risk, pricing) typically requires a separate non-display license.
Treating delayed data as free. Vendor delivery costs and professional-user fees for delayed data through certain platforms still apply. Verify terms per use case.
Over-subscribing to market data. Firms accumulate unused subscriptions over time. Periodic usage audits identify significant cost savings.
Neglecting data quality monitoring. Consuming data without staleness, gap, and erroneous tick monitoring exposes the firm to silent failures. VaR computed on stale prices is dangerously misleading.
Failing to plan for peak data rates. Volumes spike during market events. Size infrastructure for 2-3x typical peak volumes to avoid failures when data matters most.
Ignoring survivorship bias in historical data. Use point-in-time, survivorship-free databases for strategy research to avoid inflated backtest returns.
Distributing raw exchange data without redistribution licenses. Client-facing real-time quotes require explicit redistribution agreements. Violations risk license termination and legal liability.