Stochastic Multi-Echelon Planning Engine

By Nabil El Bachiri

Most planning failures are not caused by one bad forecast; they come from delayed response across a constrained multi-echelon system. This engine models that system end-to-end and lets policy decisions be tested before execution, including reorder behaviour, backlog recovery, and capacity pressure. The output is decision-ready case-study data generated under fixed assumptions, so outcomes are measurable, reproducible, and directly linked to planning choices.

How the scenarios are run

Scenarios run weekly across Retailer, Distributor, Manufacturer, and Supplier nodes, using one shared event-driven timeline where demand, orders, shipments, and planning decisions are processed in sequence. Policies are evaluated under the same demand paths and system conditions, so outcome differences come from decisions rather than changing assumptions.

What the model actually captures

Demand generation at SKU level across smooth, erratic, intermittent, and seasonal profiles, with multi-week shocks at family and SKU level.
Forecast updates under partial visibility, where upstream nodes react to order signals instead of direct customer demand.
Replenishment under pipeline constraints, where in-transit inventory and lead-time uncertainty limit short-term response.
Backlog tracked as aging demand buckets with service windows and cancellation behaviour, not as a single aggregate number.
Capacity limits at node-family level, forcing weekly allocation trade-offs across competing SKUs.
Replenishment logic represented with dynamic base-stock controls and classical policies \(sS, RQ, RS\).

What behaviour emerges

Decisions often arrive too late to fully prevent service loss once demand shifts and pipeline stock is already committed.
Service, backlog, and inventory trade-offs under constrained supply response.
Propagation and amplification effects as decisions travel across echelons.
In saturation periods, the system cannot recover backlog immediately even when replenishment is increased, which results eventually in lost sales.

How this feeds the case studies

All case studies are generated from this same environment.
Demand paths and baseline conditions are fixed across experiments.
Policy changes are evaluated under identical conditions.
Results are reported using service, backlog, inventory exposure, cost-to-serve, and bullwhip.

Technical Deep Dive

0. Modelling Scope and Unit of Analysis

Before describing the runtime loop, it is important to define the model's scope. A node corresponds to one planning stage in the network: Retailer, Distributor, Manufacturer, or Supplier. The engine tracks state at node-SKU-week level, rather than only at aggregate network level. In practice, this means the simulation evolves many node-SKU states in parallel within a single chronological event timeline.

Figure: network topology and simulation granularity (state unit: node-SKU; execution layer: global event timeline).

The event loop therefore does not run independently per node. There is one global queue, and each event references a specific node and SKU; the corresponding state is then updated.

1. Code Architecture and Execution Flow

At runtime, the simulation is easier to read as a two-stage process: an initialization stage executed once, followed by a weekly event-processing loop repeated until the time horizon ends. This structure is what governs the whole engine.

Figure: Runtime flow split into one-time initialization and a repeated weekly event loop.

Diagram Walkthrough

Each rectangle corresponds to a concrete operation in the engine. Initialization prepares the world once, then the loop evolves that world event by event.

Initialization Stage (Executed Once)

Load Configuration: The model loads demand profiles, network topology, lead times, policy parameters, backlog rules, and economic coefficients from the configuration layer.
Seed Initial State: For each node-SKU pair, the engine initializes stock, historical signals, and inbound pipeline so week 1 does not start from an artificial empty system.
Generate Demand Paths: Demand is sampled for the full horizon (including seasonality and shocks) once at the start. Demand is generated once for the full horizon, then injected week by week through demand events during the loop.
Schedule Events: The first queue is built with shipment arrivals, weekly demand release, and planning events across echelons.

Weekly Loop Stage (Repeated Until End)

Select Next Event: The engine retrieves the earliest event by \((t,p,s)\): week, priority, then insertion order.
Apply Transition: The event type decides the state transition (for example receiving shipments, posting demand/orders, running planning logic).
Update State and Metrics: Stock, backlog, pipeline, forecasts/signals, and weekly KPI counters are refreshed.
Push New Events: New downstream arrivals and upstream order events are added based on updated policy outputs.

The loop stops when the queue is empty or the configured horizon is reached.

\[ \text{event key} = (t, p, s),\quad \text{processed in lexicographic order} \]

The queue uses Python's heapq (min-heap), which gives deterministic ordering and efficient retrieval.

Additional libraries that are used: numpy for stochastic processes and vector operations, and statistics.NormalDist for inverse-normal service factors in classical policy parameterization.

2. Demand Model

In this model, customer demand is generated at the retailer node. Upstream nodes do not observe customer sales directly; they observe demand through orders and forecast signals passed upstream. Material then moves back downstream as shipments once orders are released and lead times elapse. This is the core information/material split in the simulator.

Figure: information travels upstream; material travels downstream.

A week works as follows. Customer demand is posted at retailer level first. The retailer updates inventory and backlog, then places replenishment orders to the distributor based on its policy state. The distributor does the same one level up, followed by manufacturer and supplier. Released orders are scheduled as future arrivals, so physical stock appears downstream only after transport and processing delays. This is how the model captures both propagation and lag.

Demand is decomposed because one mechanism cannot represent the mix of behaviours seen in planning portfolios. The base layer captures ordinary weekly variation around a profile-specific mean and coefficient of variation. Weekly draws are rounded and truncated at zero so demand cannot become negative (for example, a random draw of \(-3\) units is converted to 0):

\[ D_t = \max\left(0,\; \left\lfloor \mathcal{N}(\mu_t,\sigma^2) \right\rceil\right), \qquad \sigma = \mu \cdot cv \]

\(D_t\): weekly demand in week \(t\).
\(\mu_t\): expected demand level in week \(t\).
\(\sigma\): demand standard deviation.
\(cv\): coefficient of variation used to scale volatility.
\(\lfloor \cdot \rceil\): rounding to whole units.

Seasonal and intermittent behaviour are then applied as separate layers, because they represent different realities. Seasonality introduces recurrent amplitude shifts, while intermittency governs whether demand occurs at all in a given week. Finally, family-level and SKU-level shock processes add temporary regime shifts. This structure preserves local noise while still producing non-stationary episodes that drive service risk, replenishment pressure, and inventory exposure.

Figure: the model is built as a pipeline - realistic demand, then operational forecast, then policy decision.

In plain terms, the simulator first creates demand behaviour, then applies forecast rules, and then executes replenishment logic. Keeping those blocks separate makes it possible to explain outcomes with a clean causal chain instead of mixing all effects in one opaque signal.

3. Forecast Model

Forecasting is modelled as a planning process, not as a perfect prediction layer. For each node-SKU pair, the engine updates forecast state weekly and then publishes the version used by replenishment logic. At retailer level, updates are anchored to realised customer demand; upstream, updates are driven by the order signal received from downstream nodes. This keeps each echelon consistent with the information it would realistically observe.

For non-intermittent SKUs, the baseline signal is generated through exponential smoothing:

\[ L_t = \alpha D_t + (1-\alpha)L_{t-1},\qquad \alpha=0.3 \]

\(L_t\): smoothed level used as forecast signal in week \(t\).
\(D_t\): observed demand in week \(t\).
\(L_{t-1}\): previous week's smoothed level.
\(\alpha\): smoothing weight (\(0.3\) here, balancing latest demand and history).

Intermittent SKUs are handled differently: occurrence and size are separated so sparse demand is not misread as small continuous demand. After the raw update, the engine applies publication rules by horizon. Near-term buckets are more constrained, while later buckets are more flexible. This reflects how weekly replanning usually works in practice: teams can adjust plans every cycle, but near-term changes are still limited by execution commitments and operational frictions.

4. Dynamic Signal Construction and Base-Stock Logic

This is the decision layer used each week at each node-SKU pair. The node takes its current forecast signal, inventory position, and backlog state, then computes a replenishment order to its upstream node. In practice, the logic must react to pressure without creating unstable order swings. For that reason, short-term movement is blended with baseline level, and backlog enters the signal through a sublinear term:

\[ C(B)=\rho\sqrt{B} \]

\(B\): backlog units.
\(\rho\): sensitivity factor for backlog pressure.
\(\sqrt{B}\): sublinear response, so pressure increases with backlog but not one-to-one.

After the signal step, the policy computes a target stock level, compares it to inventory position, and determines the replenishment quantity released upstream:

\[ T = S\cdot W + SS,\quad q=\max(0, T-IP) \]

\(T\): target stock.
\(S\): demand signal used for replenishment.
\(W\): planning coverage in weeks.
\(SS\): safety stock.
\(q\): final order quantity.
\(IP\): inventory position, defined as \(IP = OH + IT - \omega B\).
\(OH\): on-hand inventory.
\(IT\): in-transit inventory.
\(B\): backlog units.
\(\omega\): backlog weight in inventory position.

Once \(q\) is released, it becomes incoming demand for the upstream node in the same planning cycle and a future inbound shipment for the downstream node after lead-time delay. This is where information propagation and material propagation connect. The core trade-off remains the same: recover service where pressure is real, but avoid unstable week-to-week order swings.

5. Classical Policies and Parameterization

At policy level, the simulator currently supports four replenishment approaches: a dynamic base-stock policy, plus three classical inventory policies \((sS, RQ, RS)\). They are all implemented inside the same simulation environment, so each policy is exposed to the same demand paths, backlog behaviour, lead-time uncertainty, and capacity constraints. This is important because it makes the comparison meaningful: differences in performance come from the replenishment logic itself, not from changes in the operating conditions.

The dynamic base-stock policy is intended as a structured approximation of how a planner would react in a weekly operating cycle. It does not assume perfect foresight. Instead, it responds to the current signal environment: recent demand or order movement, inventory already in the pipeline, backlog pressure, and forecast updates. In other words, it is meant to behave like a planner-facing control rule rather than like a pure textbook formula.

The other three policies provide more classical points of comparison:

Dynamic base-stock: adapts target stock from a blended operating signal, then orders the gap between target stock and effective inventory position with backlog-aware pressure terms.
\(sS\): reorder when inventory position falls below a reorder point, then replenish up to an order-up-to level.
\(RQ\): reorder when inventory position falls below a reorder point, then place a fixed lot quantity.
\(RS\): review inventory every \(R\) periods and replenish up to a target level.

The dynamic policy is calibrated directly from the signal-building logic described in the previous section, so its behaviour is driven by observed operating conditions inside the simulation. The classical policies, by contrast, are parameterized from estimated demand statistics rather than from hidden true parameters. This is deliberate: they are meant to reflect the kind of imperfect planning view a real business would use when sizing reorder points and safety stock.

\[ \mu^{est}=\mu^{true}\cdot b_\mu,\quad b_\mu\sim\mathcal{U}(0.97,1.03), \qquad cv^{est}=cv^{true}\cdot b_{cv},\quad b_{cv}\sim\mathcal{U}(0.90,1.10) \]

\(\mu^{true}\): baseline true mean demand for the SKU.
\(\mu^{est}\): planner-side estimated mean demand used for policy sizing.
\(b_\mu\): multiplicative mean-bias factor sampled from a uniform range.
\(cv^{true}\): baseline true coefficient of variation.
\(cv^{est}\): estimated coefficient of variation used in policy parameters.
\(b_{cv}\): multiplicative variability-bias factor sampled from a uniform range.
\(\mathcal{U}(a,b)\): continuous uniform distribution over \([a,b]\).

From there, the classical policies are sized using standard inventory-theory components. Service level is mapped into a normal safety factor, and safety stock scales with lead-time exposure, variability, and SKU pattern sensitivity:

\[ z = \Phi^{-1}(\tau), \qquad SS(L)= z\,\kappa\,\sigma\sqrt{L} \]

\(\tau\): target service level.
\(z\): normal safety factor associated with \(\tau\).
\(\Phi^{-1}\): inverse standard-normal CDF.
\(SS(L)\): safety stock for effective lead-time exposure \(L\).
\(\kappa\): pattern/variability adjustment coefficient.
\(\sigma\): demand standard deviation.
\(L\): lead-time (or review-plus-lead-time) coverage in periods.

Fixed-lot logic also uses an EOQ-style quantity so order size responds to ordering and holding cost rather than being inserted as an arbitrary constant:

\[ EOQ = \sqrt{\frac{2DK}{H}} \]

\(EOQ\): economic order quantity.
\(D\): demand volume over the sizing horizon.
\(K\): fixed ordering cost per replenishment.
\(H\): holding cost coefficient over the same horizon.

The policy definitions are therefore explicit:

\[ \text{Dynamic base-stock: } T = S\cdot W + SS,\qquad q=\max(0,T-IP) \]

\(T\): target stock level.
\(S\): demand signal used by the dynamic policy.
\(W\): coverage window in periods.
\(SS\): safety stock term.
\(IP\): inventory position.
\(q\): replenishment order quantity.

\[ \text{sS: } ROP = \mu L + SS(L),\qquad S = \mu(L+R)+SS(L+R) \]

\(ROP\): reorder point for the \(sS\) policy.
\(S\): order-up-to level in the \(sS\) policy.
\(\mu\): estimated mean demand per period.
\(L\): replenishment lead time.
\(R\): review interval.
\(SS(\cdot)\): safety-stock function at the indicated coverage.

\[ \text{RQ: } ROP = \mu L + SS(L),\qquad Q = \max(Q_{min}, EOQ) \]

\(ROP\): reorder point for the \(RQ\) policy.
\(Q\): fixed lot size released when \(IP \lt ROP\).
\(Q_{min}\): minimum allowed lot size.
\(EOQ\): economic lot-size benchmark.

\[ \text{RS: order every }R\text{ weeks to }S=\mu(L+R)+SS(L+R) \]

\(R\): fixed review cadence.
\(S\): review-period order-up-to target.
\(\mu(L+R)\): expected demand over review-plus-lead-time exposure.
\(SS(L+R)\): safety stock for review-plus-lead-time exposure.

Taken together, these four policies let the simulator ask a practical question: when does a planner-like adaptive rule outperform simpler reorder logic, and when are the simpler rules already good enough? Because all four live inside the same environment, that trade-off can be evaluated under one consistent set of operational frictions rather than as an abstract policy exercise.

6. Backlog, Cancellation, and Obsolescence

Backlog handling is applied in each weekly planning cycle, after the model first tries to serve demand with available stock and capacity. Any remaining unserved demand is then aged, and cancellation logic is applied bucket by bucket.

The model uses two different regimes on purpose: Retailer uses a deterministic hard-window rule, while Distributor, Manufacturer, and Supplier use probabilistic ageing. This split matches business meaning in the simulator: downstream unmet demand is treated as customer loss if it exceeds service window, whereas upstream unmet orders behave like cancellable internal demand.

\[ \text{Hard-window: }\quad \text{cancel if } a > w \]

\(a\): bucket age in weeks.
\(w\): allowed service window for that bucket.
Used at Retailer with a fixed service-window rule.

Upstream nodes use a hazard-style cancellation process after a grace period:

\[ h(a)=\lambda_{0}\,\kappa_{pattern},\qquad P_{cancel}(a)=\min\left(0.95,\;1-e^{-h(a)\cdot(a-g)}\right),\quad a>g \]

\(a\): backlog age (time since demand entered backlog).
\(g\): grace period before probabilistic cancellation starts.
\(\lambda_0\): baseline cancellation intensity.
\(\kappa_{pattern}\): multiplier by demand-pattern type.
\(h(a)\): effective hazard rate used at age \(a\).
\(P_{cancel}(a)\): cancellation probability at age \(a\), capped at \(0.95\).
Used at Distributor, Manufacturer, and Supplier with probabilistic ageing.

Lost sales and cancellations are recorded differently by node. At Retailer, cancelled backlog is counted as lost sales. Upstream, cancelled backlog is treated as cancelled internal demand and tracked as potential excess-risk volume. If that risk remains and inventory stays above target, it is progressively recognised as excess stock and can later become obsolete stock after the configured ageing threshold.

7. Capacity Calibration and Allocation

Capacity is explicitly constrained at upstream stages (Manufacturer and Supplier), not across every node. Weekly capacity is calibrated by product family from expected portfolio demand, adjusted family load, target utilization, and an additional variability buffer.

\[ Cap_{f} = \left\lfloor \frac{\mu^{exp}_{f}\cdot m_f}{u^*}\,(1+\beta) \right\rceil \]

\(Cap_f\): weekly capacity pool for family \(f\).
\(\mu^{exp}_{f}\): expected baseline demand for family \(f\).
\(m_f\): family load multiplier (higher for heavier/less stable families).
\(u^*\): target utilization level.
\(\beta\): variability buffer.
\(\lfloor\cdot\rceil\): rounding to whole units of weekly capacity.

Within each week and family, capacity is then allocated across SKUs by service pressure, not by simple FIFO. The pressure score combines immediately serviceable backlog, backlog age, and due-now urgency:

\[ \pi_i = q^{srv}_i\left(1+0.25\bar{a}_i\right)+0.5q^{due}_i, \qquad share_i = Cap_f\frac{\pi_i}{\sum_j \pi_j} \]

\(\pi_i\): priority score for SKU \(i\) in that family.
\(q^{srv}_i\): backlog units that can actually be served now.
\(\bar{a}_i\): average age of those serviceable backlog units.
\(q^{due}_i\): serviceable units already due in the current week.
\(share_i\): proportional capacity share before integer rounding.

Implementation-wise, allocation is recalculated once per family at the start of each week, rounded down to integers, and any remaining units are reassigned by largest remainder to avoid systematic bias. If a SKU does not consume its reserved share, unused units are released back into the common family pool so later-planned SKUs in the same week can still use them.

8. Transport and Lead-Time Realism

Shipment timing is modeled as nominal lead time plus a stochastic transport delay. For each shipment, the engine samples delay weeks from a lane-specific discrete distribution, then schedules the arrival event at the resulting ETA. This is applied to both internal node-to-node shipments and supplier-source replenishment shipments.

\[ \Delta \sim \text{Categorical}(p_0,p_1,\dots,p_k), \qquad ETA = t_{ship} + L_{nominal} + \Delta \]

Here, \(t_{ship}\) is ship week, \(L_{nominal}\) is configured lane lead time, and \(\Delta\) is sampled transport delay in weeks. Because the delay distribution is lane-specific, the model can represent asymmetric delay risk (for example many on-time arrivals with a long late tail) without changing nominal lead-time settings.

\(\Delta\): sampled delay offset from the lane distribution.
\(p_d\): probability of delay value \(d\) for that lane.
\(ETA\): arrival week used to schedule the inbound shipment event.

Operationally, this is important because replenishment decisions can be correct yet still arrive late due to transport variability.

9. Cost Model and KPI Layer

The cost layer is computed from weekly planning snapshots at node-SKU level, using node-specific economic parameters. Costs are split so each movement can be traced to an operational cause rather than hidden inside one aggregate value.

\[ C_t = C_t^{hold}+C_t^{backlog}+C_t^{lost}+C_t^{cancel}+C_t^{obs}+C_t^{order}+C_t^{trans} \]

\(C_t\): total weekly cost for one node-SKU state.
\(C_t^{hold}\): holding-cost component.
\(C_t^{backlog}\): backlog carrying-cost component.
\(C_t^{lost}\): lost-sales penalty component.
\(C_t^{cancel}\): cancellation penalty component.
\(C_t^{obs}\): obsolescence write-off component.
\(C_t^{order}\): fixed ordering-cost component.
\(C_t^{trans}\): transport-cost component.

\[ C_t^{hold}=OH_t\,c_h, \quad C_t^{backlog}=B_t\,c_b, \quad C_t^{lost}=LS_t\,c_{ls}, \quad C_t^{cancel}=CN_t\,c_{cn} \]

\(OH_t\): on-hand stock in week \(t\).
\(B_t\): backlog units in week \(t\).
\(LS_t\): lost-sales units recorded in week \(t\).
\(CN_t\): cancelled units recorded in week \(t\).
\(c_h,c_b,c_{ls},c_{cn}\): node-specific unit-cost parameters.

\[ C_t^{obs}=OB_t\,c_{ob}, \quad C_t^{order}=\mathbf{1}_{q_t>0}K, \quad C_t^{trans}=SH_t\,c_{tr} \]

\(OB_t\): obsolete units recognised in week \(t\).
\(c_{ob}\): obsolescence cost per unit.
\(\mathbf{1}_{q_t>0}\): indicator equal to 1 only if an upstream order is placed that week.
\(K\): fixed order cost per order event.
\(SH_t\): shipped units in week \(t\).
\(c_{tr}\): transport cost per shipped unit.

Bullwhip is reported at node-family level as variance amplification between weekly ordered volume and weekly incoming demand (population variance over the reporting window):

\[ BW = \frac{\mathrm{Var}(Orders_{upstream})}{\mathrm{Var}(Inflow)} \]

\(Orders_{upstream}\): weekly requested upstream orders aggregated by node-family.
\(Inflow\): weekly incoming demand aggregated by node-family.
\(BW\): bullwhip ratio; values above 1 indicate amplification.

To keep the metric stable, bullwhip is set to 0 when inflow variance is effectively zero in the reporting horizon.

10. Validation Strategy

Validation is split into two test families because they answer two different questions. The first asks whether the simulation mechanics are internally consistent under controlled worlds. The second asks whether classical policy formulas move in the expected economic direction when inputs change.

Model sanity tests check execution integrity and dynamic behaviour:

Demand flow conservation: checks per node-SKU flow balance \(\text{inflow}=\text{shipped}+\text{terminal losses}+\text{ending backlog}\).
Ratio bounds: checks that service and shipment ratios stay within valid limits.
Easy-world stability: checks that a high-capacity, no-delay, constant-demand setting yields high service and minimal cancellations.
Small-noise robustness: checks that small demand perturbations do not destabilize an easy system.
Shock recovery: checks that a focused SKU shock can generate backlog and then recover under favourable conditions.
Zero-demand behaviour: checks that after initial transient there is no artificial backlog or ordering activity.
Symmetry under identical SKUs: checks that identical items stay operationally similar in a symmetric setup.
Capacity monotonicity: checks that increasing capacity does not worsen cancellations or supplier cost.
Lead-time monotonicity: checks that shortening supplier lead time does not worsen backlog or cancellations.
Classical-policy feasibility: checks basic \(sS\), \(RQ\), and \(RS\) parameter feasibility.

Policy-theory tests check directional correctness of classical policy sizing:

\(RQ\) lot size vs order cost: checks that higher ordering cost increases lot size.
\(RQ\) lot size vs holding cost: checks that higher holding cost reduces lot size.
\(RQ\) reorder point vs lead time: checks that longer lead time increases reorder point.
\(sS\) protection vs service target: checks that higher service targets increase protection levels.
\(RS\) order-up-to vs review interval: checks that longer review cycles increase order-up-to levels.
Lead-time response across classical policies: checks that all classical policies scale protection upward with longer lead time.

Sanity tests catch bugs and impossible model behaviour; theory tests check that policy formulas move in the expected direction.

11. Current technical limits and next upgrades

Current limitations, beyond modelling assumptions (which depend on the specific supply chain context), are mainly technical within the scope of this project:

Policy set is still heuristic-heavy (dynamic base-stock + \(sS\), \(RQ\), \(RS\)) rather than optimization-driven.
Comparisons are mostly scenario-based and directional, without full confidence-interval reporting across large Monte Carlo grids.
There is validation on sanity and monotonicity, but not yet at full statistical backtesting depth.

Next upgrades are targeted at those gaps:

Multi-seed Monte Carlo policy evaluation with confidence intervals on headline KPIs.
Common random numbers and variance-reduction methods for cleaner A/B policy comparisons.
Extension toward optimization-based policy layers (stochastic, robust, and approximate dynamic approaches).
Formal distribution-level validation and backtesting diagnostics.

Open case study 1 See all case studies Back to projects hub