Guides feature extraction from marketplace listing photos, metadata, and sitter wizards for i2i, u2i, u2u recommenders. Covers vision/text encoding, compositions, governance, and validation.
npx claudepluginhub joshuarweaver/cascade-code-general-misc-1 --plugin pproenca-dot-skills-1This skill uses the workspace's default tool permissions.
Comprehensive first-principles guide for deriving usable recommender features from the raw assets of a two-sided trust marketplace — listing photos, owner-supplied listing metadata, and sitter wizard responses — for item-to-item, user-to-item, and user-to-user solutions. Contains 44 rules across 8 categories ordered by cascade impact on the feature-engineering lifecycle, plus one playbook that ...
Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.
Builds DCF models with sensitivity analysis, Monte Carlo simulations, and scenario planning for investment valuation and risk assessment.
Calculates profitability (ROE, margins), liquidity (current ratio), leverage, efficiency, and valuation (P/E, EV/EBITDA) ratios from financial statements in CSV, JSON, text, or Excel for investment analysis.
Comprehensive first-principles guide for deriving usable recommender features from the raw assets of a two-sided trust marketplace — listing photos, owner-supplied listing metadata, and sitter wizard responses — for item-to-item, user-to-item, and user-to-user solutions. Contains 44 rules across 8 categories ordered by cascade impact on the feature-engineering lifecycle, plus one playbook that composes the rules into an end-to-end feature discovery workflow.
This skill is the upstream precursor to marketplace-personalisation (AWS Personalize) and marketplace-search-recsys-planning (OpenSearch retrieval). Those skills treat features as inputs they already have; this skill is about deciding what features to build from the raw assets, which decisions they serve, and how to prove each one is worth its maintenance cost.
Reference this skill when:
This skill has no user-specific configuration — it is self-contained. References are live URLs to engineering blogs from Airbnb, Pinterest, DoorDash, Uber, Netflix, and Google, to open-source libraries (Feast, Sentence-Transformers, Hugging Face CLIP, H3), to foundational academic papers (Airbnb KDD 2018, Pinterest ItemSage, YouTube Semantic IDs, PinSage), and to Google's Rules of Machine Learning.
Categories are ordered by cascade impact on the feature-engineering lifecycle: auditing mistakes build features on data that does not exist, first-principles mistakes produce features that do not map to real decisions, extraction mistakes poison everything downstream, and so on. Fix earlier-stage problems before later-stage problems.
| # | Category | Prefix | Impact |
|---|---|---|---|
| 1 | Asset Audit and Inventory | audit- | CRITICAL |
| 2 | First-Principles Feature Decomposition | firstp- | CRITICAL |
| 3 | Image Feature Extraction | vision- | HIGH |
| 4 | Listing Text and Metadata Extraction | listing- | HIGH |
| 5 | Sitter Wizard and Profile Extraction | wizard- | HIGH |
| 6 | Derived Similarity and Affinity | derive- | MEDIUM-HIGH |
| 7 | Feature Quality and Governance | quality- | MEDIUM-HIGH |
| 8 | Incremental Rollout and Value Proof | prove- | MEDIUM |
audit-measure-coverage-before-modelling — reject fields below 80% coverage from the feature planaudit-sample-every-asset-type-end-to-end — pull 100 real instances through the real fetch path before planningaudit-verify-rights-and-privacy-before-extraction — ToS, GDPR, consent, face blur before encodingaudit-quantify-freshness-per-asset — age distribution + expiry + refresh bucketaudit-separate-raw-assets-from-derived-features — raw immutable in object store, derived versioned in feature storefirstp-start-from-the-decision-not-the-algorithm — decision first, sub-judgments second, tools lastfirstp-ask-what-signal-a-human-uses — interview 8-12 owners and sitters; features trace back to quotesfirstp-tie-every-feature-to-a-specific-solution — no feature without a named i2i/u2i/u2u consumerfirstp-prefer-directly-observed-over-learned — observed columns first, learned embeddings secondfirstp-reject-features-you-cannot-serve-at-inference — training-serving parity starts at design timefirstp-kill-features-a-popularity-baseline-already-captures — correlation screen before registrationvision-use-clip-for-zero-shot-listing-embeddings — zero-shot CLIP ships in a weekvision-detect-room-types-before-detecting-amenities — room prior conditions the amenity thresholdvision-quantify-image-quality-separately-from-content — blur, lighting, aesthetic as their own featuresvision-extract-per-object-counts-not-just-presence — n_bed = 4 beats has_bed = truevision-pool-embeddings-across-a-listings-photo-set — pooled listing vector; per-photo stored alongsidevision-fine-tune-on-your-domain-when-clip-underperforms — contrastive fine-tune only after zero-shot plateauslisting-declare-categorical-fields-for-bounded-vocabularies — bounded vocab → categorical, validated on writelisting-multi-hot-encode-amenity-lists — fixed amenity vocabulary → multi-hot vectorlisting-hash-geo-to-hierarchies-not-raw-lat-lon — H3 at multiple resolutionslisting-embed-description-with-pretrained-sentence-encoder — all-MiniLM-L6-v2 for cheap semantic text featureslisting-extract-stay-duration-shape-not-just-length — bin + holiday overlap + flexibility, not raw day countlisting-encode-pet-requirements-as-structured-triples — (species, count, special_needs) triples plus free text alongsidewizard-order-questions-by-information-gain — discriminative questions first, narrative lastwizard-prefer-multiple-choice-over-free-text — categorical features by constructionwizard-make-skips-genuine-and-log-them — skip is signal; defaults destroy itwizard-capture-experience-as-counts-and-dates — numbers, not adjectives; platform history overrides self-declarationwizard-separate-hard-constraints-from-soft-preferences — filters vs ranking featuresderive-precompute-i2i-nearest-neighbours-offline — ANN shelf built nightly, served from KV in <5msderive-fuse-modalities-before-item-similarity — vision + text + structured, weighted and normalisedderive-use-two-tower-for-user-item-affinity — dual encoder trained on interactions; ANN-retrieval-readyderive-score-u2u-as-symmetric-mutual-fit — min(P(owner), P(sitter)); one-sided scoring produces wasted requestsderive-decompose-affinity-into-interpretable-subscores — fit/safety/logistics/price subscores + blendderive-cache-user-embedding-with-short-ttl — session-level cache, 60-300s TTLquality-version-feature-definitions-in-one-registry — one name, one implementation, one ownerquality-serve-training-and-inference-from-one-store — feature store as the single source of truthquality-gate-features-on-coverage-and-drift — coverage floor + PSI alarmquality-scrub-pii-before-features-leave-secure-zone — face blur and regex scrubbing before encodingquality-freeze-feature-schemas-per-model-version — schema hash pinned to model artifactprove-ship-one-feature-at-a-time — one feature, one experiment, one decisionprove-measure-lift-against-feature-ablated-variant — ablation isolates the feature from incidental changesprove-kill-features-that-dont-earn-maintenance — quarterly kill review on attributed liftprove-dedicate-random-exploration-slice-to-new-features — 3-5% slice catches offline-close-to-tied winnersprove-retain-feature-free-baseline-permanently — popularity baseline as drift anchorOne playbook composes the rules into an end-to-end workflow:
references/playbooks/discovering.md — Discover new features from raw marketplace assets: a seven-step workflow that starts with an asset audit and a decision decomposition and ends with a shipped ablation A/B against a feature-ablated baseline. Use when the task is "what should we build next?" rather than "fix this specific feature."Read the playbook first when the task is an open-ended "how do we extract more signal from X?" Read individual rules when a specific implementation question arises.
references/_sections.md for category structure and cascade rationalegotchas.md for accumulated diagnostic lessons before suggesting interventionsreferences/playbooks/discovering.md to plan a new feature discovery cyclereferences/ when a specific task matches the rule titleassets/templates/_template.md to author new rules as the skill growsmarketplace-personalisation — Post-extraction personalisation on AWS Personalize: event tracking, schema design, two-sided matching, cold start, feedback loops. Hand off once your features are in the store and you are ready to train a ranker.marketplace-search-recsys-planning — OpenSearch retrieval planning: query understanding, index design, ranking, search-plus-recs blending. Hand off when the bottleneck is retrieval rather than feature availability.marketplace-pre-member-personalisation — Pre-member journey from anonymous visit to paid membership: anonymous signal inference, onboarding intent capture, pre-member measurement. Hand off at the paid-member boundary.| File | Description |
|---|---|
| references/_sections.md | Category definitions, impact ordering, cascade rationale |
| references/playbooks/discovering.md | End-to-end feature discovery playbook |
| gotchas.md | Accumulated feature-engineering diagnostic lessons (living) |
| assets/templates/_template.md | Template for authoring new rules |
| metadata.json | Version, discipline, authoritative references |