Skip to content

Latest commit

 

History

History
728 lines (476 loc) · 24.4 KB

File metadata and controls

728 lines (476 loc) · 24.4 KB

Glossary

This glossary explains technical terms used in mtdata with simple language and real-world trading examples.


Basic Trading Concepts (Start Here)

If you're new to trading, skim this section first. It's enough to follow the examples in this repo.

Long vs Short

  • Long: you profit if price goes up (buy now, sell later).
  • Short: you profit if price goes down (sell now, buy back later).

TP/SL (Take Profit / Stop Loss)

  • Take Profit (TP): close a position when price reaches your profit target.
  • Stop Loss (SL): close a position to cap the loss if price moves against you.

Bid/Ask and Spread

  • Ask: the price you pay to buy.
  • Bid: the price you receive when you sell.
  • Spread: ask - bid (a cost you pay to enter/exit). See also: Spread.

Pip (and Pipette)

  • A pip is a standard unit of FX price movement (often 0.0001, or 0.01 for JPY pairs). See also: Pip.
  • Some brokers quote an extra digit (“pipettes”): 1 pip = 10 pipettes.

Timeframe and Candles (Bars)

  • A timeframe (e.g., M5, H1, D1) controls how data is grouped.
  • A candle/bar summarizes that period (open/high/low/close and volume).

Lot Size

Forex position sizes are often expressed in lots (see: Lot Size).


Core Concepts

Time Series

A sequence of data points indexed by time. In trading, this is typically OHLCV data (Open, High, Low, Close, Volume) at regular intervals.

Example: 500 hourly candles of EURUSD form a time series.

Horizon

How many bars into the future a forecast predicts.

Example: --horizon 12 with H1 timeframe means "predict the next 12 hours."

Lookback

How many historical bars a model uses to make its prediction.

Example: --lookback 500 tells the model to learn from the last 500 candles.


Forecasting Methods

Theta Method

A simple, robust forecasting technique that decomposes a time series into trend and seasonality components.

When to use: As a baseline for short-to-medium horizons. Often surprisingly accurate.

Example:

mtdata-cli forecast_generate EURUSD --timeframe H1 --horizon 12 --method theta

ARIMA (AutoRegressive Integrated Moving Average)

A classical statistical model that predicts future values based on:

  • AR (AutoRegressive): Past values of the series
  • I (Integrated): Differencing to make data stationary
  • MA (Moving Average): Past forecast errors

When to use: Short-term forecasting when data shows clear autocorrelation patterns.

Interpretation: ARIMA works best on stationary data (constant mean/variance). Raw prices aren't stationary, but returns usually are.

ETS (Error, Trend, Seasonality)

Exponential smoothing that weighs recent observations more heavily than older ones.

When to use: Data with clear seasonal patterns (e.g., higher volatility during market opens).

Monte Carlo Simulation

Generates thousands of possible future price paths by randomly sampling from historical return distributions.

When to use: When you need a range of outcomes rather than a single prediction. Essential for risk sizing and barrier analysis.

Example:

mtdata-cli forecast_generate EURUSD --timeframe H1 --horizon 12 \
  --method mc_gbm --params "n_sims=2000"

Interpretation: Instead of one forecast line, you get percentile bands showing where price might land.

Chronos

A foundation model for time series (like GPT for text). Pre-trained on millions of time series, then applied to your data.

When to use: When you want state-of-the-art accuracy without tuning. Requires chronos and torch packages.

Example:

mtdata-cli forecast_generate EURUSD --timeframe H1 --horizon 24 \
  --library pretrained --method chronos2

Uncertainty & Intervals

Confidence Interval

A range around the forecast indicating uncertainty. A 95% CI means "if model assumptions hold, the true value will fall within this range 95% of the time."

Caution: Financial data often violates model assumptions (fat tails, regime changes), so intervals may be too narrow.

Conformal Intervals

A distribution-free method to create prediction intervals. Instead of assuming a bell curve, it uses historical forecast errors to calibrate the band width.

How it works:

  1. Run a small rolling backtest
  2. Collect actual errors at each horizon step
  3. Use error quantiles to set band width

When to use: When you don't trust model-based intervals. Conformal intervals are empirically calibrated.

Example:

mtdata-cli forecast_conformal_intervals EURUSD --timeframe H1 \
  --method theta --horizon 12 --steps 25 --ci-alpha 0.1

Regime Detection

Regime

The current "behavior mode" of the market. Different regimes require different strategies.

Common regimes:

Regime Characteristics Strategy Implication
Low Volatility / Ranging Price oscillates in a narrow band Mean reversion works well
Trending Price moves directionally Trend-following works well
High Volatility / Crisis Large, unpredictable swings Reduce position sizes

Hidden Markov Model (HMM)

An algorithm that assumes the market switches between hidden "states" (regimes). It estimates which state generated the current data based on observed returns and volatility.

Output: State ID (0, 1, 2...) and confidence probability.

Important: The model doesn't know "Bull" or "Bear." You must interpret what each state means by examining its mean return and volatility.

Example:

mtdata-cli regime_detect EURUSD --timeframe H1 --method hmm --params "n_states=2"

Interpretation:

  • State 0: Low volatility (σ = 0.0003) → Ranging market
  • State 1: High volatility (σ = 0.0008) → Trending or volatile market

Change-Point Detection (BOCPD)

Bayesian Online Change Point Detection. Answers: "Did the market's underlying behavior just change?"

Output: Probability (0-1) that each bar represents a regime shift.

When to use:

  • If probability > 0.5, the previous pattern may be breaking down
  • Useful for detecting breakouts or structural changes

Example:

mtdata-cli regime_detect EURUSD --timeframe H1 --method bocpd --threshold 0.5

Volatility

Volatility

A measure of how much price typically moves. Higher volatility = larger price swings.

Types:

Type Definition
Realized How much price did move (historical)
Forecasted How much we expect it will move
Implied Market's expectation (from options prices)

EWMA (Exponentially Weighted Moving Average)

A volatility estimator that gives more weight to recent observations.

Parameter: lambda (0.94 is standard). Higher = slower adaptation; lower = faster reaction to new data.

Example:

mtdata-cli forecast_volatility_estimate EURUSD --timeframe H1 \
  --horizon 12 --method ewma --params "lambda=0.94"

Interpretation: If output is sigma_bar_return: 0.0006, expect hourly returns to have ~0.06% standard deviation.

GARCH (Generalized Autoregressive Conditional Heteroskedasticity)

A model that captures volatility clustering—the tendency for high-volatility periods to follow high-volatility periods.

When to use: When you observe that big moves cluster together. Common in equity and FX markets.

HAR-RV (Heterogeneous AutoRegressive Realized Volatility)

Uses realized volatility at multiple time scales (daily, weekly, monthly) to forecast future volatility.

When to use: When you have access to high-frequency intraday data and want more accurate volatility forecasts.


Barrier Analysis

Barrier

A price level that triggers an event when touched:

  • Take Profit (TP): Close position at profit target
  • Stop Loss (SL): Close position to limit loss

Triple-Barrier Labeling

A method to label historical data based on which barrier was hit first:

  • +1 (Win): TP hit first
  • -1 (Loss): SL hit first
  • 0 (Neutral): Neither hit within horizon

For triple-barrier labeling in high_low mode, if one bar touches both TP and SL, the tie is resolved conservatively as a loss (-1) because intrabar order is unknown.

Use case: Creating labels for machine learning models.


Barrier Optimization Objectives

When optimizing TP/SL levels, you must choose what to maximize. Each objective answers a different trading question.

Edge

What it measures: The raw probability advantage of winning vs losing.

Formula: Edge = P(TP first) - P(SL first)

Example: Edge = 0.20 means you win 20% more often than you lose.

When to use:

  • General-purpose objective for consistent advantage
  • When you want high win rates regardless of payoff size
  • Good starting point for most strategies

Limitation: Ignores how much you win/lose. A setup with 60% wins but tiny TP and large SL may have positive edge but negative EV.


Kelly Criterion

What it measures: The optimal fraction of your capital to bet for maximum long-term growth.

Formula: Kelly = P(win) - (P(loss) / (TP/SL))

Example: Kelly = 0.25 means bet 25% of capital per trade for maximum growth.

When to use:

  • Position sizing decisions
  • When you want to grow capital as fast as possible
  • Long-term systematic trading

Limitation: Full Kelly is aggressive and leads to large drawdowns. Most traders use "fractional Kelly" (e.g., 0.25 × Kelly).

Interpretation of output:

  • kelly: Raw Kelly fraction based on all paths
  • kelly_cond: Kelly fraction based only on paths that resolved (TP or SL hit)

EV (Expected Value)

What it measures: The average profit per trade, accounting for both probability and payoff size.

Formula: EV = P(win) × TP - P(loss) × SL

Example: EV = 0.15 means you expect to gain 0.15% per trade on average.

When to use:

  • When payoff asymmetry matters (small wins, big losses or vice versa)
  • Comparing setups with different TP/SL ratios
  • Maximizing total profit over many trades

Variants:

  • ev: Based on all paths (includes no-hit scenarios as 0)
  • ev_cond: Based only on resolved paths (ignores trades that never hit TP or SL)

EV Per Bar

What it measures: Expected value normalized by how long the trade takes.

Formula: EV_per_bar = EV / mean_time_to_resolution

Example: Setup A has EV=0.30 but takes 20 bars; Setup B has EV=0.15 but takes 5 bars. Setup B has higher EV per bar (0.03 vs 0.015).

When to use:

  • When capital turnover matters (you want fast trades to reinvest)
  • Comparing setups across different timeframes
  • Scalping and high-frequency strategies

Limitation: Favors fast trades, which may have higher transaction costs.


Prob Resolve

What it measures: The probability that the trade closes (hits either TP or SL) within the horizon.

Formula: Prob_resolve = P(TP first) + P(SL first) = 1 - P(no hit)

Example: Prob_resolve = 0.85 means 85% of trades close before the time limit.

When to use:

  • When you want trades to actually complete
  • Avoiding setups where most trades expire without hitting targets
  • Day trading where you must close by end of session

Limitation: High resolve probability doesn't mean profitable—could resolve at a loss.


Profit Factor

What it measures: Ratio of expected gains to expected losses.

Formula: Profit_factor = (P(win) × TP) / (P(loss) × SL)

Example: Profit factor = 2.0 means you expect $2 in wins for every $1 in losses.

When to use:

  • When you want a risk/reward focused metric
  • Comparing profitability of different strategies
  • Common in backtesting reports

Interpretation:

  • Profit factor > 1: Profitable
  • Profit factor > 2: Very good
  • Profit factor < 1: Losing money

Min Loss Prob

What it measures: Minimizes the probability of losing (SL hit first).

Formula: Minimize P(SL first)

Example: Setup with min_loss_prob = 0.15 means only 15% chance of hitting stop loss.

When to use:

  • Capital preservation is priority
  • Conservative trading styles
  • When you can't afford drawdowns (e.g., trading client money)

Limitation: May result in very small TP targets or trades that rarely resolve.


Utility (Log Utility)

What it measures: Risk-adjusted expected value using logarithmic utility (diminishing returns on gains).

Formula: Utility = P(win) × log(1 + TP) + P(loss) × log(1 - SL)

Example: Log utility penalizes large losses more than it rewards equivalent gains.

When to use:

  • Risk-averse traders
  • When you want to avoid ruin (log utility naturally avoids bets that could wipe you out)
  • Academic/theoretical contexts

Interpretation: Higher utility = better risk-adjusted outcome. Negative utility means the trade is worse than doing nothing.


Objective Selection Guide

Your Priority Use Objective Why
Win rate edge Maximizes probability advantage
Total profit ev Accounts for both probability and payoff
Fast trades ev_per_bar Optimizes profit per unit time
Position sizing kelly Tells you how much to bet
Avoid losses min_loss_prob Minimizes chance of stop-loss hit
Trade completion prob_resolve Ensures trades actually close
Risk-adjusted utility Penalizes large losses
Gross profit ratio profit_factor Gains vs losses ratio

Backtesting Metrics

MAE (Mean Absolute Error)

What it measures: Average size of forecast errors, ignoring direction.

Formula: MAE = average(|actual - predicted|)

Example: MAE = 0.0015 means forecasts are off by 0.15% on average.

Interpretation:

  • Lower is better
  • Easy to interpret (same units as forecast)
  • Doesn't penalize large errors more than small ones

RMSE (Root Mean Squared Error)

What it measures: Average size of forecast errors, penalizing large errors more.

Formula: RMSE = sqrt(average((actual - predicted)²))

Example: RMSE = 0.0020 with MAE = 0.0015 suggests some large outlier errors.

Interpretation:

  • Lower is better
  • Always ≥ MAE (equality means all errors are same size)
  • RMSE much larger than MAE → outliers are a problem

Directional Accuracy

What it measures: How often the forecast predicts the correct direction (up/down).

Example: Directional accuracy = 0.58 means 58% of forecasts got the direction right.

Interpretation:

  • 0.50 = random guessing
  • 0.55+ is generally useful for trading
  • High directional accuracy with high MAE: right direction, wrong magnitude

Monte Carlo Simulation Terms

n_sims (Number of Simulations)

How many random price paths to generate. More simulations = more stable results but slower.

Guidelines:

  • Quick checks: 1,000
  • Normal use: 2,000-5,000
  • Publication/final decisions: 10,000+

seed

Random number generator seed. Set this for reproducible results.

Example: --params "seed=42" ensures the same random paths each time.


Drift (μ)

The expected average return per period. In GBM simulations, drift represents the trend component.


Diffusion (σ)

The volatility component of price movement. Represents random fluctuation around the drift.


Probability Terms

prob_tp_first

Probability that take-profit is hit before stop-loss.

Example: prob_tp_first = 0.62 means 62% of simulated paths hit TP first.


prob_sl_first

Probability that stop-loss is hit before take-profit.

Example: prob_sl_first = 0.28 means 28% of simulated paths hit SL first.


prob_no_hit

Probability that neither TP nor SL is hit within the horizon.

Example: prob_no_hit = 0.10 means 10% of paths expire without hitting either barrier.

Important: High prob_no_hit means your barriers may be too far or horizon too short.


prob_resolve

Probability that the trade completes (hits TP or SL).

Formula: prob_resolve = prob_tp_first + prob_sl_first = 1 - prob_no_hit


Grid Optimization Terms

grid_style

How to generate the TP/SL combinations to test:

Style Description Use When
fixed Regular grid from min to max You know the range to search
volatility Scales barriers to current volatility Adaptive to market conditions
ratio Varies risk/reward ratios You want specific RR profiles
preset Pre-configured for trading styles Quick setup for scalp/swing/position

refine

Two-stage optimization: coarse search, then fine search around the best result.

Example: --refine true --refine-radius 0.3 searches ±30% around the best initial candidate.


top_k

Number of best candidates to return from optimization.

Example: --top-k 5 returns the top 5 TP/SL combinations instead of just the best.


Forecast Parameters

Quantity

The target variable for a forecast: price (raw closing prices), return (log returns), or volatility (predicted variance). Most users want price; use return for stationarity or volatility for risk sizing.

CI Alpha

Miscoverage rate for confidence intervals. --ci-alpha 0.1 produces a 90% interval; --ci-alpha 0.05 produces 95%.

Library

Selects which forecasting backend to use: native (built-in), statsforecast, sktime, mlforecast, or pretrained (foundation models like Chronos).

As-Of Date

The --as-of parameter lets you generate a retrospective forecast as if running at a past point in time. Useful for comparison and auditing.


Signal Processing

Denoising

Removing random fluctuations ("noise") to reveal the underlying trend ("signal").

Trade-off: More smoothing = clearer trend but more lag (delay).

Common methods:

Method Best For
ema General smoothing
median Spike removal
kalman Adaptive filtering
wavelet Frequency-based separation

Example:

mtdata-cli data_fetch_candles EURUSD --timeframe H1 --limit 500 \
  --denoise ema --denoise-params "alpha=0.2"

Stationarity

A statistical property where mean and variance don't change over time. Most forecasting models assume stationarity.

Problem: Raw prices are not stationary (they trend up or down). Solution: Use returns (percent change) instead, which are typically stationary.


Technical Indicators

Moving Average

The average price over the last N bars. Smooths out short-term fluctuations.

  • SMA: Simple Moving Average (equal weights)
  • EMA: Exponential Moving Average (recent bars weighted more)

RSI (Relative Strength Index)

A momentum oscillator measuring speed and change of price movements. Range: 0-100.

Interpretation:

  • RSI > 70: Potentially overbought (consider selling)
  • RSI < 30: Potentially oversold (consider buying)
  • RSI = 50: Neutral

MACD (Moving Average Convergence Divergence)

Shows relationship between two moving averages.

Components:

  • MACD line: Fast EMA - Slow EMA
  • Signal line: EMA of MACD line
  • Histogram: MACD - Signal

Interpretation: When MACD crosses above Signal, bullish momentum. Below, bearish momentum.

ATR (Average True Range)

Measures volatility by averaging the true range (high-low including gaps) over N bars.

Use case: Setting stop-loss distance. A common rule is SL = 2 × ATR.


Pattern Recognition

Candlestick Patterns

Visual patterns formed by one or more candles that historically precede certain price moves.

Examples:

  • Engulfing: Large candle completely covers previous candle (reversal signal)
  • Doji: Open ≈ Close (indecision)
  • Hammer: Small body with long lower wick (potential bottom)

Chart Patterns

Larger-scale geometric shapes formed over multiple candles.

Examples:

  • Head and Shoulders: Three peaks, middle highest (reversal)
  • Double Top/Bottom: Two peaks/troughs at similar level
  • Triangle: Converging trendlines (breakout imminent)

Data & Execution

OHLCV

Standard candle data format:

  • Open: First price of the period
  • High: Highest price
  • Low: Lowest price
  • Close: Last price
  • Volume: Trading activity

Market Depth (DOM)

Order book showing pending buy/sell orders at various price levels.

Slippage

Difference between expected execution price and actual fill price. Occurs due to market movement or insufficient liquidity.

Example: You place a market buy at 1.1750, but get filled at 1.1752. Slippage = 2 pips.

Why it matters: Slippage reduces profits and increases losses. Account for it when backtesting.

Pip

Smallest price increment for a currency pair. For most pairs: 0.0001. For JPY pairs: 0.01.

Example: EURUSD moves from 1.1750 to 1.1775 = 25 pips.

Note: Some brokers show "pipettes" (5 decimal places), where 1 pip = 10 pipettes.

Spread

Difference between bid (sell) and ask (buy) price. This is a transaction cost.

Example: Bid 1.1748, Ask 1.1750 → Spread = 2 pips.

Impact: You start every trade 1 spread behind. Tight spreads are crucial for scalping.

Lot Size

Standard position size in forex:

  • Standard lot: 100,000 units
  • Mini lot: 10,000 units
  • Micro lot: 1,000 units

Example: 0.1 lots of EURUSD = 10,000 EUR notional.

Risk/Reward Ratio (RR)

Ratio of potential profit to potential loss.

Formula: RR = TP distance / SL distance

Example: TP = 50 pips, SL = 25 pips → RR = 2.0 (you risk 25 to make 50).

Rule of thumb: Higher RR means you can be wrong more often and still profit.

Drawdown

Peak-to-trough decline in account equity.

Example: Account peaks at $10,000, drops to $8,500 → Drawdown = 15%.

Maximum drawdown: Largest historical drawdown. Key risk metric.


External Tools and Techniques

Finviz

A financial visualization platform providing fundamental data, stock screening, insider trading activity, analyst ratings, and market news for US equities. Data is delayed 15–20 minutes.

In mtdata: The finviz_* commands fetch data from Finviz. See FINVIZ.md.

QuantLib

An open-source C++ library (with Python bindings) for quantitative finance, providing pricing engines for exotic options, yield curves, and calibration routines.

In mtdata: Used for barrier option pricing (options_barrier_price) and Heston model calibration (options_heston_calibrate). See OPTIONS_QUANTLIB.md.

Heston Model

A stochastic volatility model where the asset price and its variance follow correlated stochastic processes. Characterized by five parameters: v0 (initial variance), kappa (mean reversion speed), theta (long-run variance), sigma (vol of vol), and rho (correlation).

When to use: Pricing barrier options and exotic derivatives where constant-volatility (Black-Scholes) assumptions are inadequate.

Optuna

A Bayesian hyperparameter optimization framework supporting TPE, CMA-ES, and random sampling with pruning (early stopping of unpromising trials).

In mtdata: Used by forecast_tune_optuna for automated parameter tuning. See FORECAST.md.

Barrier Option

A financial derivative whose payoff depends on whether the underlying asset's price reaches a specified barrier level. Types include knock-in (activated when barrier is hit) and knock-out (extinguished when barrier is hit).

In mtdata: Barrier analysis is central to TP/SL optimization. See BARRIER_FUNCTIONS.md.

Support and Resistance

Price levels where buying pressure (support) or selling pressure (resistance) tends to concentrate, causing price to pause or reverse.

In mtdata: Detected via support_resistance_levels and the Web API /api/support-resistance endpoint. See WEB_API.md.

Fundamental Analysis

Evaluating a security by examining its intrinsic value through financial statements, earnings, revenue, P/E ratios, and other economic data — as opposed to technical analysis which focuses on price/volume patterns.

In mtdata: The Finviz commands provide fundamental data for US equities. See FINVIZ.md.

Temporal Analysis

Analyzing how a symbol's behavior varies across time dimensions — day of week, hour of day, month of year — to identify recurring seasonal patterns.

In mtdata: The temporal_analyze command groups returns by time dimension. See TEMPORAL.md.


See Also