Stochastic Multi-Echelon Planning Engine

By Nabil El Bachiri

Most planning failures are not caused by one bad forecast; they come from delayed response across a constrained multi-echelon system. This engine models that system end-to-end and lets policy decisions be tested before execution, including reorder behaviour, backlog recovery, and capacity pressure. The output is decision-ready case-study data generated under fixed assumptions, so outcomes are measurable, reproducible, and directly linked to planning choices.

How the scenarios are run

Scenarios run weekly across Retailer, Distributor, Manufacturer, and Supplier nodes, using one shared event-driven timeline where demand, orders, shipments, and planning decisions are processed in sequence. Policies are evaluated under the same demand paths and system conditions, so outcome differences come from decisions rather than changing assumptions.

What the model actually captures

What behaviour emerges

How this feeds the case studies

Technical Deep Dive

0. Modelling Scope and Unit of Analysis

Before describing the runtime loop, it is important to define the model's scope. A node corresponds to one planning stage in the network: Retailer, Distributor, Manufacturer, or Supplier. The engine tracks state at node-SKU-week level, rather than only at aggregate network level. In practice, this means the simulation evolves many node-SKU states in parallel within a single chronological event timeline.

Supply Chain Structure (Node = Planning Stage) Retailer Distributor Manufacturer Supplier Orders / signals (upstream) Shipments (downstream) Simulation Granularity Global Timeline event queue ordered by (week, priority, sequence) State Unit one Node-SKU state object Weekly Planning Coverage all nodes and all SKUs are scheduled each week Interpretation: the loop is global, but transitions are applied to specific node-SKU states. So the model runs one queue, not one independent loop per node.
Figure: network topology and simulation granularity (state unit: node-SKU; execution layer: global event timeline).

The event loop therefore does not run independently per node. There is one global queue, and each event references a specific node and SKU; the corresponding state is then updated.

1. Code Architecture and Execution Flow

At runtime, the simulation is easier to read as a two-stage process: an initialization stage executed once, followed by a weekly event-processing loop repeated until the time horizon ends. This structure is what governs the whole engine.

Initialization Stage (executed once) Load Configuration network, policies, costs Seed Initial State stock, pipeline, histories Generate Demand Paths all SKUs, full horizon Schedule Events initial queue build Weekly Loop Stage (repeat for t = 0,1,...,T-1) Select Next Event min-heap by (t,p,s) Apply Transition arrivals, demand, planning Update State & Metrics stock/backlog/cost signals Push New Events future arrivals/orders continue while queue not empty and t < T
Figure: Runtime flow split into one-time initialization and a repeated weekly event loop.

Diagram Walkthrough

Each rectangle corresponds to a concrete operation in the engine. Initialization prepares the world once, then the loop evolves that world event by event.

Initialization Stage (Executed Once)

Weekly Loop Stage (Repeated Until End)

The loop stops when the queue is empty or the configured horizon is reached.

\[ \text{event key} = (t, p, s),\quad \text{processed in lexicographic order} \]

The queue uses Python's heapq (min-heap), which gives deterministic ordering and efficient retrieval.

Additional libraries that are used: numpy for stochastic processes and vector operations, and statistics.NormalDist for inverse-normal service factors in classical policy parameterization.

2. Demand Model

In this model, customer demand is generated at the retailer node. Upstream nodes do not observe customer sales directly; they observe demand through orders and forecast signals passed upstream. Material then moves back downstream as shipments once orders are released and lead times elapse. This is the core information/material split in the simulator.

Demand / Forecast Propagation vs Material Propagation Retailer Distributor Manufacturer Supplier Orders + forecast signal move upstream Shipments move downstream after lead time Only retailer sees customer demand directly; upstream nodes infer demand from incoming orders/signals.
Figure: information travels upstream; material travels downstream.

A week works as follows. Customer demand is posted at retailer level first. The retailer updates inventory and backlog, then places replenishment orders to the distributor based on its policy state. The distributor does the same one level up, followed by manufacturer and supplier. Released orders are scheduled as future arrivals, so physical stock appears downstream only after transport and processing delays. This is how the model captures both propagation and lag.

Demand is decomposed because one mechanism cannot represent the mix of behaviours seen in planning portfolios. The base layer captures ordinary weekly variation around a profile-specific mean and coefficient of variation. Weekly draws are rounded and truncated at zero so demand cannot become negative (for example, a random draw of \(-3\) units is converted to 0):

\[ D_t = \max\left(0,\; \left\lfloor \mathcal{N}(\mu_t,\sigma^2) \right\rceil\right), \qquad \sigma = \mu \cdot cv \]

Seasonal and intermittent behaviour are then applied as separate layers, because they represent different realities. Seasonality introduces recurrent amplitude shifts, while intermittency governs whether demand occurs at all in a given week. Finally, family-level and SKU-level shock processes add temporary regime shifts. This structure preserves local noise while still producing non-stationary episodes that drive service risk, replenishment pressure, and inventory exposure.

Demand-to-Decision Decomposition (Sections 2-4) 2. Demand Generation Baseline variation by SKU profile + Seasonality layer (if applicable) + Intermittency gate (if applicable) + Family-level shocks + SKU-level shocks Output: weekly demand path per SKU for the full simulation horizon 3. Forecast Publication Smoothed baseline for regular SKUs Occurrence-size split for intermittent SKUs Frozen near-term buckets More flexibility further in horizon Output: planning forecast used as policy input, not as truth 4. Replenishment Signal Blend short-term and baseline signal Stabilize response around baseline signal Add backlog pressure term Compute target stock and order cap Output: planned order quantity that feeds future queue events
Figure: the model is built as a pipeline - realistic demand, then operational forecast, then policy decision.

In plain terms, the simulator first creates demand behaviour, then applies forecast rules, and then executes replenishment logic. Keeping those blocks separate makes it possible to explain outcomes with a clean causal chain instead of mixing all effects in one opaque signal.

3. Forecast Model

Forecasting is modelled as a planning process, not as a perfect prediction layer. For each node-SKU pair, the engine updates forecast state weekly and then publishes the version used by replenishment logic. At retailer level, updates are anchored to realised customer demand; upstream, updates are driven by the order signal received from downstream nodes. This keeps each echelon consistent with the information it would realistically observe.

For non-intermittent SKUs, the baseline signal is generated through exponential smoothing:

\[ L_t = \alpha D_t + (1-\alpha)L_{t-1},\qquad \alpha=0.3 \]

Intermittent SKUs are handled differently: occurrence and size are separated so sparse demand is not misread as small continuous demand. After the raw update, the engine applies publication rules by horizon. Near-term buckets are more constrained, while later buckets are more flexible. This reflects how weekly replanning usually works in practice: teams can adjust plans every cycle, but near-term changes are still limited by execution commitments and operational frictions.

4. Dynamic Signal Construction and Base-Stock Logic

This is the decision layer used each week at each node-SKU pair. The node takes its current forecast signal, inventory position, and backlog state, then computes a replenishment order to its upstream node. In practice, the logic must react to pressure without creating unstable order swings. For that reason, short-term movement is blended with baseline level, and backlog enters the signal through a sublinear term:

\[ C(B)=\rho\sqrt{B} \]

After the signal step, the policy computes a target stock level, compares it to inventory position, and determines the replenishment quantity released upstream:

\[ T = S\cdot W + SS,\quad q=\max(0, T-IP) \]

Once \(q\) is released, it becomes incoming demand for the upstream node in the same planning cycle and a future inbound shipment for the downstream node after lead-time delay. This is where information propagation and material propagation connect. The core trade-off remains the same: recover service where pressure is real, but avoid unstable week-to-week order swings.

5. Classical Policies and Parameterization

At policy level, the simulator currently supports four replenishment approaches: a dynamic base-stock policy, plus three classical inventory policies \((sS, RQ, RS)\). They are all implemented inside the same simulation environment, so each policy is exposed to the same demand paths, backlog behaviour, lead-time uncertainty, and capacity constraints. This is important because it makes the comparison meaningful: differences in performance come from the replenishment logic itself, not from changes in the operating conditions.

The dynamic base-stock policy is intended as a structured approximation of how a planner would react in a weekly operating cycle. It does not assume perfect foresight. Instead, it responds to the current signal environment: recent demand or order movement, inventory already in the pipeline, backlog pressure, and forecast updates. In other words, it is meant to behave like a planner-facing control rule rather than like a pure textbook formula.

The other three policies provide more classical points of comparison:

The dynamic policy is calibrated directly from the signal-building logic described in the previous section, so its behaviour is driven by observed operating conditions inside the simulation. The classical policies, by contrast, are parameterized from estimated demand statistics rather than from hidden true parameters. This is deliberate: they are meant to reflect the kind of imperfect planning view a real business would use when sizing reorder points and safety stock.

\[ \mu^{est}=\mu^{true}\cdot b_\mu,\quad b_\mu\sim\mathcal{U}(0.97,1.03), \qquad cv^{est}=cv^{true}\cdot b_{cv},\quad b_{cv}\sim\mathcal{U}(0.90,1.10) \]

From there, the classical policies are sized using standard inventory-theory components. Service level is mapped into a normal safety factor, and safety stock scales with lead-time exposure, variability, and SKU pattern sensitivity:

\[ z = \Phi^{-1}(\tau), \qquad SS(L)= z\,\kappa\,\sigma\sqrt{L} \]

Fixed-lot logic also uses an EOQ-style quantity so order size responds to ordering and holding cost rather than being inserted as an arbitrary constant:

\[ EOQ = \sqrt{\frac{2DK}{H}} \]

The policy definitions are therefore explicit:

\[ \text{Dynamic base-stock: } T = S\cdot W + SS,\qquad q=\max(0,T-IP) \]
\[ \text{sS: } ROP = \mu L + SS(L),\qquad S = \mu(L+R)+SS(L+R) \]
\[ \text{RQ: } ROP = \mu L + SS(L),\qquad Q = \max(Q_{min}, EOQ) \]
\[ \text{RS: order every }R\text{ weeks to }S=\mu(L+R)+SS(L+R) \]

Taken together, these four policies let the simulator ask a practical question: when does a planner-like adaptive rule outperform simpler reorder logic, and when are the simpler rules already good enough? Because all four live inside the same environment, that trade-off can be evaluated under one consistent set of operational frictions rather than as an abstract policy exercise.

6. Backlog, Cancellation, and Obsolescence

Backlog handling is applied in each weekly planning cycle, after the model first tries to serve demand with available stock and capacity. Any remaining unserved demand is then aged, and cancellation logic is applied bucket by bucket.

The model uses two different regimes on purpose: Retailer uses a deterministic hard-window rule, while Distributor, Manufacturer, and Supplier use probabilistic ageing. This split matches business meaning in the simulator: downstream unmet demand is treated as customer loss if it exceeds service window, whereas upstream unmet orders behave like cancellable internal demand.

\[ \text{Hard-window: }\quad \text{cancel if } a > w \]

Upstream nodes use a hazard-style cancellation process after a grace period:

\[ h(a)=\lambda_{0}\,\kappa_{pattern},\qquad P_{cancel}(a)=\min\left(0.95,\;1-e^{-h(a)\cdot(a-g)}\right),\quad a>g \]

Lost sales and cancellations are recorded differently by node. At Retailer, cancelled backlog is counted as lost sales. Upstream, cancelled backlog is treated as cancelled internal demand and tracked as potential excess-risk volume. If that risk remains and inventory stays above target, it is progressively recognised as excess stock and can later become obsolete stock after the configured ageing threshold.

7. Capacity Calibration and Allocation

Capacity is explicitly constrained at upstream stages (Manufacturer and Supplier), not across every node. Weekly capacity is calibrated by product family from expected portfolio demand, adjusted family load, target utilization, and an additional variability buffer.

\[ Cap_{f} = \left\lfloor \frac{\mu^{exp}_{f}\cdot m_f}{u^*}\,(1+\beta) \right\rceil \]

Within each week and family, capacity is then allocated across SKUs by service pressure, not by simple FIFO. The pressure score combines immediately serviceable backlog, backlog age, and due-now urgency:

\[ \pi_i = q^{srv}_i\left(1+0.25\bar{a}_i\right)+0.5q^{due}_i, \qquad share_i = Cap_f\frac{\pi_i}{\sum_j \pi_j} \]

Implementation-wise, allocation is recalculated once per family at the start of each week, rounded down to integers, and any remaining units are reassigned by largest remainder to avoid systematic bias. If a SKU does not consume its reserved share, unused units are released back into the common family pool so later-planned SKUs in the same week can still use them.

8. Transport and Lead-Time Realism

Shipment timing is modeled as nominal lead time plus a stochastic transport delay. For each shipment, the engine samples delay weeks from a lane-specific discrete distribution, then schedules the arrival event at the resulting ETA. This is applied to both internal node-to-node shipments and supplier-source replenishment shipments.

\[ \Delta \sim \text{Categorical}(p_0,p_1,\dots,p_k), \qquad ETA = t_{ship} + L_{nominal} + \Delta \]

Here, \(t_{ship}\) is ship week, \(L_{nominal}\) is configured lane lead time, and \(\Delta\) is sampled transport delay in weeks. Because the delay distribution is lane-specific, the model can represent asymmetric delay risk (for example many on-time arrivals with a long late tail) without changing nominal lead-time settings.

Operationally, this is important because replenishment decisions can be correct yet still arrive late due to transport variability.

9. Cost Model and KPI Layer

The cost layer is computed from weekly planning snapshots at node-SKU level, using node-specific economic parameters. Costs are split so each movement can be traced to an operational cause rather than hidden inside one aggregate value.

\[ C_t = C_t^{hold}+C_t^{backlog}+C_t^{lost}+C_t^{cancel}+C_t^{obs}+C_t^{order}+C_t^{trans} \]
\[ C_t^{hold}=OH_t\,c_h, \quad C_t^{backlog}=B_t\,c_b, \quad C_t^{lost}=LS_t\,c_{ls}, \quad C_t^{cancel}=CN_t\,c_{cn} \]
\[ C_t^{obs}=OB_t\,c_{ob}, \quad C_t^{order}=\mathbf{1}_{q_t>0}K, \quad C_t^{trans}=SH_t\,c_{tr} \]

Bullwhip is reported at node-family level as variance amplification between weekly ordered volume and weekly incoming demand (population variance over the reporting window):

\[ BW = \frac{\mathrm{Var}(Orders_{upstream})}{\mathrm{Var}(Inflow)} \]

To keep the metric stable, bullwhip is set to 0 when inflow variance is effectively zero in the reporting horizon.

10. Validation Strategy

Validation is split into two test families because they answer two different questions. The first asks whether the simulation mechanics are internally consistent under controlled worlds. The second asks whether classical policy formulas move in the expected economic direction when inputs change.

Model sanity tests check execution integrity and dynamic behaviour:

Policy-theory tests check directional correctness of classical policy sizing:

Sanity tests catch bugs and impossible model behaviour; theory tests check that policy formulas move in the expected direction.

11. Current technical limits and next upgrades

Current limitations, beyond modelling assumptions (which depend on the specific supply chain context), are mainly technical within the scope of this project:

Next upgrades are targeted at those gaps:

Open case study 1 See all case studies Back to projects hub