From kyvos
Identify statistical anomalies, outliers, and unusual patterns in datasets. Use when users ask to find anomalies, detect outliers, identify unusual patterns, spot irregularities, or analyze data for unexpected behavior. Supports time-series analysis, distribution-based detection, and pattern recognition for numerical and categorical data.
npx claudepluginhub ki-kyvos/kyvos-plugins --plugin kyvosThis skill uses the workspace's default tool permissions.
This skill identifies anomalies in data using multiple statistical methods. It can detect unusual values in numerical data, unexpected shifts in time-series data, and rare occurrences in categorical data.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Builds scalable data pipelines, modern data warehouses, and real-time streaming architectures using Spark, dbt, Airflow, Kafka, and cloud platforms like Snowflake, BigQuery.
Builds production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. For data pipelines, workflow orchestration, and batch job scheduling.
This skill identifies anomalies in data using multiple statistical methods. It can detect unusual values in numerical data, unexpected shifts in time-series data, and rare occurrences in categorical data.
For numeric columns, anomalies are typically values that fall far from the central tendency of the data.
This method is best for data that is approximately normally distributed. It measures how many standard deviations a data point is from the mean.
# Assumes data is in a pandas DataFrame 'df' and we're checking 'value' column
z_scores = (df['value'] - df['value'].mean()) / df['value'].std()
anomalies = df[abs(z_scores) > 3]
This method is robust to outliers and does not assume a normal distribution, making it suitable for skewed data. An anomaly is a value that falls outside the range defined by Q1 - 1.5IQR and Q3 + 1.5IQR.
# Assumes data is in a pandas DataFrame 'df' and we're checking 'value' column
Q1 = df['value'].quantile(0.25)
Q3 = df['value'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
anomalies = df[(df['value'] < lower_bound) | (df['value'] > upper_bound)]
A simple method to identify extreme values by defining anomalies as values that fall in the top or bottom X% of the data.
# Identify values in the bottom 1% or top 1%
anomalies = df[(df['value'] < df['value'].quantile(0.01)) |
(df['value'] > df['value'].quantile(0.99))]
For time-series data, anomalies can be sudden spikes/dips or deviations from a recurring pattern (seasonality).
This method identifies values that deviate significantly from a rolling average, which helps smooth out short-term noise.
# Assumes 'df' has a datetime index and a 'value' column
# Calculate 7-period moving average
df['moving_average'] = df['value'].rolling(window=7).mean()
# Calculate deviation from moving average
df['deviation'] = df['value'] - df['moving_average']
# Identify points with a large deviation (e.g., > 3 standard deviations of the deviation)
anomaly_threshold = df['deviation'].std() * 3
anomalies = df[abs(df['deviation']) > anomaly_threshold]
For categorical data, anomalies are often categories that appear with unusually low frequency.
Identify categories that are rare compared to others.
# Assumes 'df' has a 'category' column
frequency = df['category'].value_counts(normalize=True)
# Identify categories that make up less than 1% of the data
rare_categories = frequency[frequency < 0.01].index.tolist()
anomalies = df[df['category'].isin(rare_categories)]
Do NOT automatically remove anomalies. Instead:
Always report how anomalies were identified and handled.