Last active 1 month ago

tire.kicker's Avatar tire.kicker revised this gist 1 month ago. Go to revision

No changes

tire.kicker's Avatar tire.kicker revised this gist 1 month ago. Go to revision

1 file changed, 0 insertions, 0 deletions

How I AI.md renamed to 000 How I AI.md

File renamed without changes

tire.kicker's Avatar tire.kicker revised this gist 1 month ago. Go to revision

No changes

tire.kicker's Avatar tire.kicker revised this gist 1 month ago. Go to revision

8 files changed, 2516 insertions

AGENTS.md(file created)

@@ -0,0 +1,57 @@
1 + # Python Environment And Package Policy
2 +
3 + Use `uv` for all Python workflows in this repository.
4 +
5 + ## Rules
6 + - Always run Python commands with `uv run`.
7 + - Always add dependencies with `uv add`.
8 + - Use `uv venv` for virtual environment setup/management.
9 + - Do not use `pip`, `pip3`, `python -m pip`, `virtualenv`, or `python -m venv`.
10 + - Do not install dependencies outside `uv`.
11 +
12 + ## Examples
13 + - Run app: `uv run python main.py`
14 + - Add runtime dependency: `uv add requests`
15 + - Add dev dependency: `uv add --dev pytest`
16 +
17 + # Test Coverage Policy
18 +
19 + Use `pytest-cov` via `uv` for coverage checks.
20 +
21 + ## Rules
22 + - If coverage flags are needed and `pytest-cov` is missing, install it with `uv add --dev pytest-cov`.
23 + - For the workspace package, install coverage tooling with `uv add --package geomrf-engine --dev pytest-cov`.
24 + - Run coverage with `uv run pytest ... --cov=...`.
25 + - Because both root and `geomrf-engine` use a `tests` package name, run coverage in two passes and combine reports instead of a single mixed pytest invocation.
26 +
27 + ## Coverage command pattern
28 + - Orchestrator pass: `COVERAGE_FILE=.coverage.orch uv run pytest tests --cov=satsim_orch --cov-report=`
29 + - Geo engine pass: `COVERAGE_FILE=.coverage.geomrf uv run --package geomrf-engine pytest geomrf-engine/tests --cov=geomrf_engine --cov-report=`
30 + - Combine/report: `uv run coverage combine .coverage.orch .coverage.geomrf && uv run coverage report -m`
31 +
32 + # OMNeT++/INET Environment Policy
33 +
34 + Use `opp_env` for OMNeT++/INET install and environment management in this repository.
35 +
36 + ## Rules
37 + - Use `opp_env` to install/manage OMNeT++ and INET versions.
38 + - Do not rely on ad-hoc/manual OMNeT++ or INET installs for project workflows.
39 + - Keep OMNeT++/INET version selection pinned and reproducible across dev and CI.
40 +
41 + ## Scope boundary
42 + - Python dependency and execution workflows remain `uv`-managed.
43 + - OMNeT++/INET toolchain workflows are `opp_env`-managed.
44 +
45 + # Legacy Code Reuse Policy
46 +
47 + If functionality from `old_code/` is needed:
48 + - Do not import or execute from `old_code/` directly.
49 + - Copy the required snippet(s) into a new file under the active codebase.
50 + - Adapt and maintain the copied code in the active module only.
51 +
52 + # Sandbox / Network Restriction Policy
53 +
54 + If a task is blocked by sandbox or network restrictions:
55 + - Stop immediately and do not spend tokens repeatedly trying to bypass restrictions.
56 + - Clearly tell the user that we are in a sandbox-restricted environment.
57 + - Ask the user for permission before attempting any sandbox breakout or elevated access.

ARCHITECTURE.md(file created)

@@ -0,0 +1,342 @@
1 +
2 + ```markdown
3 + # ARCHITECTURE.md — SatSim System Overview
4 +
5 + This document gives the **system-level architecture** for SatSim. It is intended to provide a complete “sight picture” for anyone implementing a subproject (e.g., the Geometry/RF Engine) so they understand how their component fits into the larger simulator.
6 +
7 + ---
8 +
9 + ## 1) Purpose and guiding idea
10 +
11 + SatSim is a **hybrid satellite networking simulator** that combines:
12 +
13 + 1) A swappable **Geometry/RF/Link-Budget Engine** (physics + propagation + link feasibility)
14 + 2) A **packet-level discrete-event simulation lane 'OMNeT++/INET'** (scale + protocol behavior)
15 + 3) A **real SDN emulation lane 'Mininet/OVS'** (controller-in-the-loop + real Linux networking)
16 + 4) An **Orchestrator** that provides a single scenario/timebase and keeps all parts consistent
17 +
18 + The fundamental design choice is that SatSim is **layered and composable**: we reuse mature simulators/emulators and treat satellite physics as an external service with a stable interface.
19 +
20 + ---
21 +
22 + ## 2) Key design decisions (why this looks the way it does)
23 +
24 + ### 2.1 Why two lanes (simulation vs emulation)
25 + We intentionally run two different lanes because they answer different questions:
26 +
27 + - **OMNeT++/INET lane (Discrete-Event Simulation)**
28 + - Best for: scaling up to many nodes, protocol studies, routing and congestion behavior, reproducibility.
29 + - Not best for: running real SDN controllers and real Linux TCP stacks.
30 +
31 + - **Mininet/OVS lane (Network Emulation)**
32 + - Best for: real SDN controllers (ONOS/Ryu), real forwarding behavior (OpenFlow/OVS), real apps/traffic tools.
33 + - Not best for: scaling to thousands of nodes with full protocol stacks.
34 +
35 + Trying to “pipe packets” between them is possible but usually not worth it early, because it introduces hard time synchronization problems (DES time vs wall-clock time) and packet bridging complexity. Instead we connect both lanes to the same **state oracle** (the Geo/RF engine) through the Orchestrator.
36 +
37 + ### 2.2 Where the lanes *do* meet today
38 + They meet at:
39 + - **Scenario definition** (same nodes, same constraints, same time window)
40 + - **LinkState/Event timeline** (same “truth” about which links exist and their properties)
41 + - **Metrics and artifacts** (comparable outputs; shared logging/PCAP strategy)
42 +
43 + Optionally, they also meet via:
44 + - **Shared SDN decision logic** (same ONOS/Ryu app used to compute routes, then applied in both lanes through adapters)
45 +
46 + ### 2.3 Future “stacking” (OMNeT++ feeding Mininet)
47 + In the future, OMNeT++ may “feed” Mininet in two practical ways:
48 +
49 + 1) **Trace-driven replay (recommended future path)**
50 + - OMNeT++ generates a curated set of traces (topology/failure schedules, traffic demands, baseline routing decisions).
51 + - Mininet replays those traces in real-time to validate controller behavior under identical conditions.
52 +
53 + 2) **Hard co-simulation / packet bridging (advanced, optional)**
54 + - Some nodes simulated in OMNeT++, others emulated in Mininet at the same time.
55 + - Requires strict time coupling and a gateway that transforms/timeshifts packets.
56 + - Not a v1 target.
57 +
58 + ### 2.4 Locked decisions (2026-02-18)
59 + - **Tick authority:** `StreamLinkDeltas` is the control-plane source of truth for lane updates.
60 + - **Events contract:** event streaming is retained, but aligned to the same requested `dt` and `selector`, and each event carries `tick_index`.
61 + - **Orchestrator behavior:** streaming-driven execution is canonical; any scheduler is pacing-only.
62 + - **Scenario translation:** orchestrator must fail-fast when it cannot produce a valid Geo/RF `ScenarioSpec`.
63 + - **Python tooling:** Python workflows use `uv`; OMNeT++/INET workflows use `opp_env`.
64 +
65 + ---
66 +
67 + ## 3) Top-level components
68 +
69 + ### 3.1 Geometry/RF/Link-Budget Engine (black box, replaceable)
70 + **Role:** The authoritative “physics layer” that translates orbital/propagation reality into network-usable link state.
71 +
72 + **Key properties**
73 + - Replaceable implementation (Skyfield + ITU-R today, could be STK import or other later)
74 + - Stable interface (the rest of SatSim depends only on its API)
75 + - Produces time-indexed:
76 + - Link feasibility (up/down)
77 + - Link properties (delay/capacity/loss proxies)
78 + - Discrete events (link up/down, handover, failures if modeled)
79 +
80 + **Location:** `geomrf-engine/`
81 +
82 + ---
83 +
84 + ### 3.2 Orchestrator (system conductor)
85 + **Role:** Owns the simulation lifecycle and timebase. It is the “brain” that coordinates all lanes.
86 +
87 + **Responsibilities**
88 + - Load scenario config → create/initialize the Geo/RF engine scenario
89 + - Choose execution mode:
90 + - OMNeT-only, Mininet-only, or both in parallel
91 + - Drive execution pacing:
92 + - offline apply-fast or real-time apply-paced, while consuming authoritative engine stream ticks
93 + - Consume Geo/RF LinkState stream and distribute it to:
94 + - OMNeT adapter
95 + - Mininet adapter
96 + - logging/metrics
97 + - Collect artifacts (PCAPs, timeseries metrics, configs, run manifests)
98 + - Provide reproducible run IDs and version stamping
99 +
100 + **Location:** `orchestrator/`
101 +
102 + ---
103 +
104 + ### 3.3 OMNeT++/INET Lane (packet-level discrete-event)
105 + **Role:** Packet-level simulation of protocols, queuing, routing, traffic at scale.
106 +
107 + **Responsibilities**
108 + - Build the network node models (routers, hosts, queues) using INET components
109 + - Apply dynamic link updates (delay/capacity/loss/up-down) based on Geo/RF output
110 + - Run deterministic experiments rapidly (sweeps)
111 + - Export artifacts:
112 + - logs + metrics
113 + - optional PCAP outputs (where supported)
114 +
115 + **Custom SatSim additions**
116 + - A lightweight **LinkState Adapter Module** that subscribes to orchestrator/Geo output
117 + - A mechanism to apply link changes at simulation timestamps
118 +
119 + **Environment and install management**
120 + - Use `opp_env` as the standard way to install/manage OMNeT++ and INET.
121 + - Avoid ad-hoc/manual OMNeT++/INET installs in project workflows.
122 +
123 + **Location:** `lanes/omnet/`
124 +
125 + ---
126 +
127 + ### 3.4 Mininet/OVS Lane (SDN emulation)
128 + **Role:** Real SDN controller + real forwarding plane under dynamic link conditions.
129 +
130 + **Responsibilities**
131 + - Build an emulated topology with Mininet (or Containernet)
132 + - Use OVS as the dataplane switch/router substrate
133 + - Run a real SDN controller (ONOS or Ryu)
134 + - Apply dynamic link shaping based on Geo/RF output:
135 + - `tc/netem` for delay/loss/jitter
136 + - `tbf/htb` for rate control
137 + - interface up/down to emulate link drops
138 + - Generate traffic using real tools:
139 + - iperf3, D-ITG, SIPp, tcpreplay, custom apps
140 +
141 + **Location:** `lanes/mininet/`
142 +
143 + ---
144 +
145 + ### 3.5 Observability, artifacts, and visualization
146 + **Role:** Make runs inspectable, comparable, and reproducible.
147 +
148 + **Artifacts**
149 + - Scenario config snapshot + run manifest (versions, seeds, git SHAs)
150 + - LinkState/Event traces (optional export)
151 + - Metrics time-series (throughput/delay/loss/path changes)
152 + - PCAP captures (Mininet tcpdump; OMNeT if enabled)
153 +
154 + **Tools**
155 + - Prometheus + Grafana for dashboards
156 + - Wireshark for PCAP analysis
157 +
158 + **Location:** `observability/` and `artifacts/`
159 +
160 + ---
161 +
162 + ## 4) System boundaries and data ownership
163 +
164 + ### 4.1 The Geo/RF engine owns *physics truth*
165 + - It is the source of truth for which links can exist and their physical/network properties.
166 + - Other components must not invent geometry/rf; they only consume the engine’s output.
167 +
168 + ### 4.2 The Orchestrator owns *time and execution*
169 + - It defines run window requests, pacing mode, and synchronization rules.
170 + - For v1/v1.1, tick production comes from Geo/RF stream output rather than orchestrator-generated ticks.
171 + - It routes updates to the lanes and standardizes artifacts.
172 +
173 + ### 4.3 Each lane owns *packet/control behavior*
174 + - OMNeT owns packet-level behavior inside DES.
175 + - Mininet owns real SDN and Linux networking behavior.
176 +
177 + ---
178 +
179 + ## 5) Core data flows (end-to-end)
180 +
181 + ### 5.1 Initialization flow
182 + 1. User provides `ScenarioConfig` (YAML/JSON).
183 + 2. Orchestrator validates config and creates a new run ID.
184 + 3. Orchestrator calls Geo/RF engine:
185 + - `CreateScenario` (returns scenario ref)
186 + 4. Orchestrator initializes selected lane(s):
187 + - OMNeT: compile/load model, start run
188 + - Mininet: build topology, start controller
189 + 5. Orchestrator subscribes to Geo/RF streaming output for LinkState and Events.
190 +
191 + ### 5.2 Runtime (parallel lane mode)
192 + At each time tick:
193 + 1. Geo/RF produces `LinkDeltaBatch` + optional events.
194 + 2. Orchestrator receives it and distributes:
195 + - OMNeT adapter: update channel/link state in simulator time
196 + - Mininet adapter: apply tc/netem shaping and link toggles
197 + - (optional) Event recorder: store aligned `EngineEvent` stream for analysis/observability
198 + - Observability: record metrics and store link traces
199 + 3. Lanes generate traffic and produce metrics/PCAPs.
200 +
201 + ### 5.3 Completion flow
202 + 1. Orchestrator stops lane processes.
203 + 2. Orchestrator closes the Geo/RF scenario.
204 + 3. All artifacts are written under the run ID.
205 +
206 + ---
207 +
208 + ## 6) Timebase and execution modes
209 +
210 + SatSim supports multiple execution modes, controlled by the Orchestrator:
211 +
212 + ### Mode A — OMNeT-only (offline DES)
213 + - Orchestrator consumes Geo/RF ticks and applies them to OMNeT without wall-clock pacing.
214 + - Highest scalability and repeatability.
215 +
216 + ### Mode B — Mininet-only (real-time emulation)
217 + - Orchestrator consumes Geo/RF ticks and applies wall-clock pacing while updating Mininet shaping.
218 + - Best for SDN/controller realism and app-level testing.
219 +
220 + ### Mode C — Parallel (OMNeT + Mininet simultaneously)
221 + - Both lanes consume the same LinkState stream.
222 + - Used to compare “simulated protocol outcomes” vs “real controller outcomes” under the same link dynamics.
223 +
224 + ### Mode D — Trace-driven replay (future/optional)
225 + - Geo/RF and/or OMNeT exports a trace.
226 + - Mininet replays trace deterministically.
227 +
228 + ---
229 +
230 + ## 7) Interfaces between components (high-level)
231 +
232 + ### 7.1 Geo/RF Engine interface (v1)
233 + - gRPC service, Protobuf messages
234 + - Scenario lifecycle + streaming link deltas/events
235 + - Output is **NetworkView** link properties (up/down, delay, capacity, loss proxy)
236 + - Optional debug scalars for validation (SNR margin, elevation, range)
237 + - Event-stream alignment target: events use same requested window/selector/dt semantics as deltas and expose `tick_index`.
238 +
239 + ### 7.2 Orchestrator ↔ OMNeT interface
240 + - OMNeT subscribes to orchestrator updates via:
241 + - gRPC client inside a C++ adapter module, OR
242 + - file/trace ingestion for offline runs
243 + - Applies updates to INET channel/link parameters and toggles connectivity
244 +
245 + ### 7.3 Orchestrator ↔ Mininet interface
246 + - Orchestrator controls Mininet via:
247 + - Python Mininet API calls
248 + - Linux `tc` and interface management commands
249 + - SDN controller is external (ONOS/Ryu), connected in the standard Mininet way
250 +
251 + ---
252 +
253 + ## 8) Reproducibility rules
254 +
255 + Each run must record:
256 + - ScenarioConfig snapshot
257 + - Seeds
258 + - Engine versions:
259 + - Geo/RF engine version + schema version
260 + - orchestrator version
261 + - OMNeT/INET versions and model git SHA
262 + - `opp_env` environment definition/metadata used for OMNeT/INET
263 + - controller version and app git SHA
264 + - LinkState trace hash (if stored)
265 + - Toolchain/container image tags (if containerized)
266 +
267 + ---
268 +
269 + ## 9) Suggested monorepo layout
270 +
271 + ```
272 +
273 + satsim/
274 + ARCHITECTURE.md
275 +
276 + orchestrator/
277 + ...
278 +
279 + subprojects/
280 + geomrf-engine/
281 + ARCHITECTURE.md # subproject-specific (the streaming API spec lives here)
282 + proto/
283 + src/
284 + tests/
285 +
286 + lanes/
287 + omnet/
288 + models/
289 + adapter/
290 + scripts/
291 +
292 + ```
293 + mininet/
294 + topo/
295 + driver/
296 + controllers/
297 + scripts/
298 + ```
299 +
300 + observability/
301 + grafana/
302 + prometheus/
303 + dashboards/
304 +
305 + artifacts/
306 + runs/
307 + <run_id>/
308 + scenario.yaml
309 + manifest.json
310 + linkstate.parquet (optional)
311 + metrics/
312 + pcaps/
313 + logs/
314 +
315 + ```
316 +
317 + ---
318 +
319 + ## 10) What an implementer of the Geo/RF engine must know
320 +
321 + - The Geo/RF engine must be treated as **the physics oracle**.
322 + - Its output must be:
323 + - time-indexed
324 + - sparse (selector-driven)
325 + - stable and deterministic
326 + - expressed in consistent units
327 + - The Orchestrator will use it in both:
328 + - offline sampling (for OMNeT)
329 + - real-time streaming (for Mininet)
330 + - The lanes do not need to know how link budgets are computed—only how to consume the streaming LinkDelta/Event outputs.
331 +
332 + ---
333 +
334 + ## 11) Roadmap hooks (explicit future extensions)
335 +
336 + - Add a richer PHY view (optional fields) without breaking NetworkView consumers.
337 + - Add trace import/replay for deterministic mininet runs.
338 + - Add “shared SDN decision interface” so ONOS/Ryu path computation can be applied inside OMNeT.
339 + - Add advanced co-simulation only if required (packet bridging).
340 +
341 + ---
342 + ```

How I AI.md (file created)

@@ -0,0 +1,63 @@
1 +
2 +
3 + # Smol Example:
4 +
5 + ### ChatGPT Web:
6 + _<After workshopping the specs/requirements for a while...>_
7 +
8 + > Write an extremely detailed implementation doc for the streaming version and th apis . We are only looking at geometry rf engine right now . We need to be able to hand the doc you make off to an LLM to code the geometry enginer API up from the existing python code. Make it so . Extremely detailed. Markdown checkboxes for each discrete task `- [ ]` so the AI can mark them off with an X as they complete
9 +
10 +
11 + The codex-cli can start working on the output `TASKS_GEOENGINE.md`.
12 +
13 +
14 + # Implementing component 3 (of 4):
15 +
16 +
17 + ### ChatGPT Web:
18 +
19 + I pasted in all the `TASKS_*.md` the AI coder has produced so far, so the web planner has an idea of what was done. (If we didn't have access restrictions, the web version could just access the git repo).
20 +
21 + > Okay, here are the markdown taskfiles of what we've got done so far. We build geo engine and orchestrator . <extraneous details omitted> I need you to write a very detailed TASKS_IMNET.md file that deals with the imnet++ inet part and how it interacts with the other two systems as they exist . Write it as a markdown checklist - [ ] so the ai can cross items off as it writes them . Go .
22 +
23 +
24 + This produces TASKS_ORCHESTRATOR.md
25 +
26 +
27 + ### Codex CLI:
28 +
29 + > Check over TASKS_ORCHESTRATOR.md It was written without knowledg of the actual current implementation. Let me know if it looks good for ARCHITECTURE.md and if it is compatible with what has been done so far in TASKS_GEO_ENGINE.md We have only tried to make a gEO
30 + engine so far. no edits ,just check the TASKS_ORCHESTRATOR.md makes sense.
31 +
32 +
33 + ```
34 + Overall verdict: it is mostly aligned with ARCHITECTURE.md and broadly compatible with the Geo engine work, but the 3 medium/high items above should be clarified before implementation so the orchs estrator doesn’t encode wrong assumptions.
35 + ```
36 +
37 + This step is super important! I need to make sure the plan we thought of actually makes sense with what is happening on the ground.
38 +
39 + > Can the misaligned items be easily solved? Is there a moslty obvisous right answer to them ? If so tell me the gravious solutions, or show me my optiosn if there s a fork or hard choice
40 +
41 + _<More workshopping of differenes between web and implementation. The web has the pdf source documents and better web search, and has unlimited usage, so I did planning there>_
42 +
43 +
44 + ### Codex CLI:
45 +
46 + > @AGENTS.md You task is to implement the geometry engine defined in @TASKS_GEO_ENGINE.md . After completing each task, mark it off with an x (`- [x]`) so its markdown checkbox so there is an external record of what has been done. If plans change, then modify the task list appropriately. The overall high level architecture of the program is in @ARCHITECTURE.md . Go .
47 +
48 + This is where the magic happens! We have thought through our API and have developed a test plan, and a development plan. Now the AI can develop the code and test the API via the test suite to ensure its correct.
49 +
50 + ---
51 +
52 + Overall Advice:
53 +
54 + - Think about your inputs/outputs/dependencies beforehand.
55 + - When the AI screw ups hallucinates, or does something silly, you can make a note in AGENTS.md to do the right thing instead.
56 + - Force the AI to use as many deterministic static tools as possible:
57 + - Strict Type Checking (Use Rust instead of C, Typescript instead of Javascript, Python with Type Annotations instead of without)
58 + - Linters
59 + - Code Format Tools
60 + - Unit/Integration tests
61 + - Have the AI write as many tests as possible of what you want the program to do
62 +
63 + - If you want the AI to "one-shot" (i.e., autonomously code something complex for a while without supervision and get a good result), then you need to give it as much test input/output behaviour as possible, so it can keep checking against the "proper" results without your guidance.

README.md(file created)

@@ -0,0 +1,17 @@
1 + # SatSim docs index
2 +
3 + Primary design and task documents:
4 +
5 + - `ARCHITECTURE.md`
6 + - `TASKS_GEO_ENGINE.md`
7 + - `TASKS_IMNET.md`
8 + - `TASKS_ORCHESTRATOR.md`
9 + - `TASKS_TESTSUITE_GEOENGINE.md`
10 +
11 + Locked design decisions (2026-02-18):
12 +
13 + - Control-plane tick authority is `StreamLinkDeltas` (streaming-driven orchestration).
14 + - Event stream alignment is being standardized with request `dt`/`selector` and event `tick_index`.
15 + - Orchestrator error handling includes `NOT_FOUND`, `INVALID_ARGUMENT`, `FAILED_PRECONDITION`, and `RESOURCE_EXHAUSTED`.
16 + - Scenario translation to Geo/RF `ScenarioSpec` is fail-fast.
17 + - Python workflows are `uv`-managed; OMNeT++/INET workflows are `opp_env`-managed.

TASKS_GEO_ENGINE.md(file created)

@@ -0,0 +1,973 @@
1 + # Geometry/RF Engine v1/v1.1 Streaming API — Implementation Specification
2 +
3 + This document specifies a **complete, implementable** Geometry/RF Engine API and server for **streaming link-state deltas + events**. It is written so another LLM can generate the code from existing Python geometry/RF/link-budget code (Skyfield + ITU-R + your models) with minimal guesswork.
4 +
5 + Status note:
6 + - v1 baseline is implemented.
7 + - v1.1 alignment updates (for orchestrator compatibility) are now specified below, especially for `StreamEvents` tick alignment.
8 +
9 + ---
10 +
11 + ## 0) Deliverables
12 +
13 + ### What must exist at the end
14 +
15 + * A runnable **Python gRPC server** that implements:
16 +
17 + * Scenario creation/closure
18 + * Capabilities/version endpoints
19 + * **Streaming link deltas**
20 + * **Streaming events**
21 + * A Protobuf schema with:
22 +
23 + * Stable IDs, time semantics, units
24 + * Selector logic (which links/nodes to compute)
25 + * Delta semantics (what counts as a “change”)
26 + * A reference Python client demonstrating:
27 +
28 + * Create scenario → stream deltas/events → close scenario
29 +
30 + ---
31 +
32 + ## 1) Core design constraints
33 +
34 + ### 1.1 Contract invariants
35 +
36 + * **Scenario-scoped**: All computation happens inside a `ScenarioRef`.
37 + * **Time-indexed**: All output is keyed by a timestamp and tick index.
38 + * **Selector-driven**: Never compute “all links” unless explicitly requested.
39 + * **Streaming-first**: Primary runtime interface is server→client stream.
40 + * **Deterministic**: Given identical inputs (scenario + seed + engine version), output is replayable.
41 +
42 + ### 1.2 What the engine outputs (NetworkView)
43 +
44 + For each directed link `(src → dst)` at each tick, the engine provides:
45 +
46 + * `up` (boolean)
47 + * `one_way_delay_s` (float; seconds)
48 + * `capacity_bps` (float; bits per second)
49 + * `loss_rate` (float; [0,1] packet loss proxy OR PER proxy)
50 + * optional debug scalar(s): `snr_margin_db`, `elevation_deg`, `range_m`
51 +
52 + **Everything else** can be exposed later via an optional “debug view”; v1 focuses on network-usable state.
53 +
54 + ---
55 +
56 + ## 2) Repository layout (recommended)
57 +
58 + ```
59 + geomrf-engine/
60 + proto/
61 + geomrf/v1/geomrf.proto
62 + src/geomrf_engine/
63 + __init__.py
64 + server.py
65 + config_schema.py
66 + scenario_store.py
67 + timebase.py
68 + selectors.py
69 + compute/
70 + __init__.py
71 + ephemeris.py
72 + geometry.py
73 + rf_models.py
74 + link_budget.py
75 + adaptation.py
76 + streaming/
77 + __init__.py
78 + delta.py
79 + events.py
80 + backpressure.py
81 + util/
82 + ids.py
83 + units.py
84 + logging.py
85 + metrics.py
86 + examples/
87 + client_stream.py
88 + tests/
89 + test_proto_roundtrip.py
90 + test_delta_thresholds.py
91 + test_selectors.py
92 + test_determinism.py
93 + ```
94 +
95 + ---
96 +
97 + ## 3) Implementation tasks checklist
98 +
99 + ### 3.1 Project & build system
100 +
101 + - [x] Create repo structure as above
102 + - [x] Add `pyproject.toml` with dependencies:
103 +
104 + - [x] `grpcio`, `grpcio-tools`, `protobuf`
105 + - [x] `pydantic` (scenario validation)
106 + - [x] `pyyaml` (YAML scenario input)
107 + - [x] `numpy`, `scipy` (if used)
108 + - [x] `skyfield`, `sgp4`
109 + - [x] your ITU-R package(s)
110 + - [x] `prometheus-client` (optional but recommended)
111 + - [x] Add a `Makefile` or task runner:
112 +
113 + - [x] `uv run python -m grpc_tools.protoc ...` compiles `.proto` to Python
114 + - [x] `uv run python -m geomrf_engine.server ...` starts server
115 + - [x] `uv run pytest` runs tests
116 +
117 + ### 3.2 Protobuf + gRPC schema
118 +
119 + - [x] Write `proto/geomrf/v1/geomrf.proto` (spec below)
120 + - [x] Generate Python stubs
121 + - [x] Add schema version constants and embed in responses
122 +
123 + ### 3.3 Server skeleton
124 +
125 + - [x] Implement async gRPC server (`grpc.aio`)
126 + - [x] Wire servicer methods:
127 +
128 + - [x] `GetVersion`
129 + - [x] `GetCapabilities`
130 + - [x] `CreateScenario`
131 + - [x] `CloseScenario`
132 + - [x] `StreamLinkDeltas`
133 + - [x] `StreamEvents`
134 + - [x] Add structured logging and request correlation IDs
135 +
136 + ### 3.4 Scenario lifecycle
137 +
138 + - [x] Implement scenario validation (Pydantic)
139 + - [x] Implement scenario store (in-memory for v1)
140 + - [x] Implement scenario ID generation (UUIDv4)
141 + - [x] Snapshot `ScenarioSpec` + resolved assets into a `ScenarioRuntime`
142 +
143 + ### 3.5 Compute pipeline
144 +
145 + - [x] Implement ephemeris loader (TLE list initially)
146 + - [x] Implement geometry evaluation (positions + visibility + elevation + range)
147 + - [x] Implement RF/link budget mapping to `NetworkLinkState`
148 + - [x] Implement adaptation mapping (SNR → capacity/loss) with a default policy
149 + - [x] Implement per-tick evaluation returning sparse link set
150 +
151 + ### 3.6 Streaming + deltas/events
152 +
153 + - [x] Implement tick loop (timebase)
154 + - [x] Implement delta computation with thresholds
155 + - [x] Implement event emission (link up/down, handover optional)
156 + - [x] Implement backpressure-safe streaming
157 + - [x] Add stream cancellation handling and cleanup
158 +
159 + ### 3.7 Tests + examples
160 +
161 + - [x] Determinism test (same scenario+seed → identical deltas)
162 + - [x] Selector test (only requested links computed)
163 + - [x] Threshold test (small changes suppressed)
164 + - [x] Example client script (prints updates, counts links)
165 +
166 + ---
167 +
168 + ## 4) gRPC/Protobuf specification (v1)
169 +
170 + ### 4.1 `.proto` (authoritative spec)
171 +
172 + Create `proto/geomrf/v1/geomrf.proto`:
173 +
174 + ```proto
175 + syntax = "proto3";
176 +
177 + package geomrf.v1;
178 +
179 + import "google/protobuf/timestamp.proto";
180 + import "google/protobuf/duration.proto";
181 +
182 + option go_package = "geomrf/v1;geomrfv1"; // harmless for other langs
183 +
184 + // ---------------------------
185 + // Service
186 + // ---------------------------
187 + service GeometryRfEngine {
188 + rpc GetVersion(GetVersionRequest) returns (GetVersionResponse);
189 + rpc GetCapabilities(GetCapabilitiesRequest) returns (GetCapabilitiesResponse);
190 +
191 + rpc CreateScenario(CreateScenarioRequest) returns (CreateScenarioResponse);
192 + rpc CloseScenario(CloseScenarioRequest) returns (CloseScenarioResponse);
193 +
194 + // Primary: stream sparse deltas per tick.
195 + rpc StreamLinkDeltas(StreamLinkDeltasRequest) returns (stream LinkDeltaBatch);
196 +
197 + // Primary: stream discrete events (optional separate channel for clean consumers).
198 + rpc StreamEvents(StreamEventsRequest) returns (stream EngineEvent);
199 + }
200 +
201 + // ---------------------------
202 + // Version / capabilities
203 + // ---------------------------
204 + message GetVersionRequest {}
205 +
206 + message GetVersionResponse {
207 + string engine_name = 1; // e.g., "geomrf-engine"
208 + string engine_version = 2; // semver, e.g., "1.0.0"
209 + string schema_version = 3; // e.g., "geomrf.v1"
210 + string build_git_sha = 4; // optional
211 + }
212 +
213 + message GetCapabilitiesRequest {}
214 +
215 + message GetCapabilitiesResponse {
216 + string schema_version = 1;
217 +
218 + // Limits
219 + uint32 max_links_per_tick = 2;
220 + uint32 max_nodes = 3;
221 + uint32 max_streams_per_scenario = 4;
222 + google.protobuf.duration min_dt = 5;
223 + google.protobuf.duration max_dt = 6;
224 +
225 + // Supported outputs
226 + bool supports_loss_rate = 10;
227 + bool supports_capacity_bps = 11;
228 + bool supports_delay_s = 12;
229 + bool supports_snr_margin_db = 13;
230 +
231 + // Supported selectors/features (advertise so clients can adapt)
232 + bool supports_only_visible = 20;
233 + bool supports_min_elevation_deg = 21;
234 + bool supports_max_degree = 22;
235 + bool supports_link_types = 23; // GS-SAT, SAT-SAT, etc.
236 + }
237 +
238 + // ---------------------------
239 + // Scenario lifecycle
240 + // ---------------------------
241 + message CreateScenarioRequest {
242 + ScenarioSpec spec = 1;
243 + }
244 +
245 + message CreateScenarioResponse {
246 + string scenario_ref = 1; // UUID string
247 + string schema_version = 2;
248 + }
249 +
250 + message CloseScenarioRequest {
251 + string scenario_ref = 1;
252 + }
253 +
254 + message CloseScenarioResponse {
255 + bool ok = 1;
256 + }
257 +
258 + // ---------------------------
259 + // Scenario specification (v1)
260 + // ---------------------------
261 +
262 + message ScenarioSpec {
263 + // Reproducibility
264 + uint64 seed = 1;
265 +
266 + // Time model
267 + google.protobuf.timestamp t0 = 2; // UTC
268 + google.protobuf.timestamp t1 = 3; // UTC
269 + google.protobuf.duration default_dt = 4;
270 +
271 + // Nodes
272 + repeated NodeSpec nodes = 10;
273 +
274 + // Eligibility rules (which links can exist)
275 + LinkPolicy link_policy = 20;
276 +
277 + // Mapping PHY -> network outputs (can be simplistic in v1)
278 + AdaptationPolicy adaptation = 30;
279 +
280 + // Optional: engine-side caching hints
281 + CacheHints cache_hints = 40;
282 + }
283 +
284 + enum NodeRole {
285 + NODE_ROLE_UNSPECIFIED = 0;
286 + SATELLITE = 1;
287 + GROUND_STATION = 2;
288 + USER_TERMINAL = 3;
289 + }
290 +
291 + message NodeSpec {
292 + string node_id = 1; // stable ID used everywhere
293 + NodeRole role = 2;
294 +
295 + // One of the following depending on role
296 + SatelliteOrbit orbit = 10;
297 + GroundFixedSite fixed_site = 11;
298 +
299 + // Radio/terminal model parameters (minimal v1)
300 + TerminalModel terminal = 20;
301 +
302 + // Arbitrary tags for selectors/grouping
303 + map<string,string> tags = 30;
304 + }
305 +
306 + message SatelliteOrbit {
307 + // v1: only TLE supported. Later: OEM/SP3/etc.
308 + string tle_line1 = 1;
309 + string tle_line2 = 2;
310 + }
311 +
312 + message GroundFixedSite {
313 + double lat_deg = 1;
314 + double lon_deg = 2;
315 + double alt_m = 3;
316 + }
317 +
318 + message TerminalModel {
319 + // Minimal knobs to compute link budgets consistently.
320 + // Units: dBW, dBi, Hz, K, etc.
321 + double tx_power_dbw = 1;
322 + double tx_gain_dbi = 2; // can be treated as peak gain in v1
323 + double rx_gain_dbi = 3; // can be treated as peak gain in v1
324 + double rx_noise_temp_k = 4;
325 + double bandwidth_hz = 5;
326 + double frequency_hz = 6;
327 +
328 + // Optional: simple pointing/antenna pattern loss approximation
329 + double pointing_loss_db = 10; // default constant loss if you don’t model patterns yet
330 + }
331 +
332 + enum LinkType {
333 + LINK_TYPE_UNSPECIFIED = 0;
334 + GS_TO_SAT = 1;
335 + SAT_TO_GS = 2;
336 + SAT_TO_SAT = 3;
337 + UT_TO_SAT = 4;
338 + SAT_TO_UT = 5;
339 + }
340 +
341 + message LinkPolicy {
342 + // Which link types are allowed at all
343 + repeated LinkType allowed_types = 1;
344 +
345 + // Dynamic feasibility thresholds
346 + double min_elevation_deg = 2; // default 0 if unused
347 + bool only_visible = 3; // if true, return only visible/feasible links
348 +
349 + // Degree constraints (optional)
350 + uint32 max_out_degree = 10; // 0 means unlimited
351 + uint32 max_in_degree = 11; // 0 means unlimited
352 +
353 + // Optional: limit candidates by distance for scalability
354 + double max_range_m = 20; // 0 means unlimited
355 + }
356 +
357 + message AdaptationPolicy {
358 + // v1: a simple mapping mode.
359 + // Future: full MCS tables, ACM, coding gains, etc.
360 + enum Mode {
361 + MODE_UNSPECIFIED = 0;
362 + FIXED_RATE = 1; // constant capacity if link is up, else 0
363 + SNR_TO_RATE = 2; // rate from snr_margin (simple piecewise)
364 + SNR_TO_LOSS = 3; // loss from snr_margin (simple logistic)
365 + SNR_TO_BOTH = 4;
366 + }
367 + Mode mode = 1;
368 +
369 + // v1 defaults
370 + double fixed_capacity_bps = 2;
371 + double fixed_loss_rate = 3;
372 +
373 + // Parameters for simple SNR->rate/loss mappings (implementation defined but deterministic)
374 + double snr_margin_min_db = 10;
375 + double snr_margin_max_db = 11;
376 + }
377 +
378 + message CacheHints {
379 + bool precompute_positions = 1;
380 + bool precompute_visibility = 2;
381 + uint32 max_cache_ticks = 3; // 0 = engine default
382 + }
383 +
384 + // ---------------------------
385 + // Streaming requests
386 + // ---------------------------
387 +
388 + message StreamLinkDeltasRequest {
389 + string scenario_ref = 1;
390 +
391 + // Time range for this stream. If empty, use scenario t0..t1.
392 + google.protobuf.timestamp t_start = 2;
393 + google.protobuf.timestamp t_end = 3;
394 +
395 + // If unset, use scenario default_dt.
396 + google.protobuf.duration dt = 4;
397 +
398 + // Which links to consider/return.
399 + LinkSelector selector = 10;
400 +
401 + // Delta emission thresholds
402 + DeltaThresholds thresholds = 20;
403 +
404 + // Behavior knobs
405 + bool emit_full_snapshot_first = 30; // recommended true for simpler clients
406 + bool include_debug_fields = 31; // if true, fill debug fields in updates
407 + }
408 +
409 + message StreamEventsRequest {
410 + string scenario_ref = 1;
411 + google.protobuf.timestamp t_start = 2;
412 + google.protobuf.timestamp t_end = 3;
413 + // If unset, use scenario default_dt. Must satisfy capabilities bounds.
414 + google.protobuf.duration dt = 4;
415 +
416 + EventFilter filter = 10;
417 + // Apply the same selection surface as StreamLinkDeltas for deterministic alignment.
418 + LinkSelector selector = 11;
419 + }
420 +
421 + message LinkSelector {
422 + // v1 supports:
423 + // - explicit pairs
424 + // - by link type
425 + // - by node role sets
426 + repeated LinkPair explicit_pairs = 1;
427 + repeated LinkType link_types = 2;
428 +
429 + // If non-empty, only consider links where src in set AND dst in set
430 + repeated string src_node_ids = 10;
431 + repeated string dst_node_ids = 11;
432 +
433 + // Optional tag filters (exact match)
434 + map<string,string> src_tags = 12;
435 + map<string,string> dst_tags = 13;
436 +
437 + // If true, apply scenario LinkPolicy.only_visible behavior
438 + bool only_visible = 20;
439 +
440 + // Optional override thresholds (0 uses scenario policy)
441 + double min_elevation_deg = 21;
442 + double max_range_m = 22;
443 + }
444 +
445 + message DeltaThresholds {
446 + // Only emit update if absolute change exceeds threshold.
447 + // 0 means "emit on any change" for that field.
448 + double delay_s = 1;
449 + double capacity_bps = 2;
450 + double loss_rate = 3;
451 + double snr_margin_db = 4;
452 +
453 + // Emit if link up/down changes always (implicit).
454 + }
455 +
456 + // ---------------------------
457 + // Streaming output
458 + // ---------------------------
459 +
460 + message LinkDeltaBatch {
461 + string scenario_ref = 1;
462 + string schema_version = 2;
463 +
464 + google.protobuf.timestamp time = 3; // tick time
465 + uint64 tick_index = 4;
466 +
467 + // If emit_full_snapshot_first=true, first batch may be a full snapshot.
468 + bool is_full_snapshot = 5;
469 +
470 + // Sparse updates (add/update)
471 + repeated LinkUpdate updates = 10;
472 +
473 + // Links to remove from active set (no longer selected/visible/allowed)
474 + repeated LinkKey removals = 11;
475 +
476 + // Optional: server stats
477 + TickStats stats = 20;
478 + }
479 +
480 + message LinkUpdate {
481 + LinkKey key = 1;
482 +
483 + // Core NetworkView outputs
484 + bool up = 2;
485 + double one_way_delay_s = 3;
486 + double capacity_bps = 4;
487 + double loss_rate = 5;
488 +
489 + // Optional debug fields (filled if include_debug_fields=true)
490 + double snr_margin_db = 10;
491 + double elevation_deg = 11;
492 + double range_m = 12;
493 +
494 + // Extension space for later (avoid breaking schema)
495 + map<string,string> extra = 30;
496 + }
497 +
498 + message LinkKey {
499 + string src = 1;
500 + string dst = 2;
501 + LinkType type = 3;
502 + }
503 +
504 + message LinkPair {
505 + string src = 1;
506 + string dst = 2;
507 + LinkType type = 3;
508 + }
509 +
510 + message TickStats {
511 + uint32 links_computed = 1;
512 + uint32 links_emitted = 2;
513 + double compute_ms = 3;
514 + }
515 +
516 + // ---------------------------
517 + // Events
518 + // ---------------------------
519 +
520 + enum EventType {
521 + EVENT_TYPE_UNSPECIFIED = 0;
522 + LINK_UP = 1;
523 + LINK_DOWN = 2;
524 + HANDOVER_START = 3;
525 + HANDOVER_COMPLETE = 4;
526 + NODE_FAILURE = 5;
527 + NODE_RECOVERY = 6;
528 + }
529 +
530 + message EngineEvent {
531 + string scenario_ref = 1;
532 + string schema_version = 2;
533 +
534 + EventType type = 3;
535 + google.protobuf.timestamp time = 4;
536 + uint64 tick_index = 5;
537 +
538 + // Which entities are involved (optional depending on event)
539 + string node_id = 10;
540 + LinkKey link = 11;
541 +
542 + map<string,string> meta = 20;
543 + }
544 +
545 + message EventFilter {
546 + repeated EventType types = 1;
547 + repeated string node_ids = 2;
548 + }
549 + ```
550 +
551 + ---
552 +
553 + ## 5) Server behavior specification (streaming semantics)
554 +
555 + ### 5.1 Timebase rules
556 +
557 + * `ScenarioSpec.t0/t1` define the canonical simulation window.
558 + * Stream requests may override with `t_start/t_end`:
559 +
560 + * If unset → default to scenario window.
561 + * Engine must clamp requests to `[t0, t1]` unless explicitly configured otherwise.
562 + * `dt`:
563 +
564 + * If unset → use `ScenarioSpec.default_dt`.
565 + * Must be within `[Capabilities.min_dt, Capabilities.max_dt]`; otherwise return `INVALID_ARGUMENT`.
566 +
567 + ### 5.2 Tick indexing
568 +
569 + * Tick `0` corresponds to `t_start`.
570 + * Tick `k` corresponds to `t_start + k*dt`.
571 + * Engine must emit `tick_index` and `time` on every batch.
572 +
573 + ### 5.3 Active link set and removals
574 +
575 + The stream maintains a client-side “active link table”.
576 +
577 + * `updates[]` means: **create or replace** link entry keyed by `(src,dst,type)`.
578 + * `removals[]` means: delete that link entry (no longer in selection or no longer feasible under policy).
579 +
580 + This is required for sparse streams when visibility causes links to appear/disappear.
581 +
582 + ### 5.4 First message behavior
583 +
584 + If `emit_full_snapshot_first=true`:
585 +
586 + * The first emitted `LinkDeltaBatch` at tick 0 must have:
587 +
588 + * `is_full_snapshot=true`
589 + * `updates[]` containing **all currently selected/feasible links**
590 + * `removals[]` empty
591 +
592 + This drastically simplifies consumers (no special “initialization” logic).
593 +
594 + ### 5.5 Delta emission thresholds
595 +
596 + For ticks after the initial snapshot:
597 +
598 + * A link is emitted in `updates[]` if:
599 +
600 + * it is newly added, OR
601 + * its `up` changed, OR
602 + * `abs(new.delay - old.delay) > thresholds.delay_s` (if threshold > 0), OR
603 + * `abs(new.capacity - old.capacity) > thresholds.capacity_bps` (if threshold > 0), OR
604 + * `abs(new.loss - old.loss) > thresholds.loss_rate` (if threshold > 0), OR
605 + * (optional debug) changes exceed debug thresholds if included.
606 +
607 + If a threshold is **0**, treat it as “emit on any change”.
608 +
609 + ### 5.6 Event stream behavior (v1.1 alignment)
610 +
611 + `StreamEvents` emits events in chronological order within `[t_start, t_end]`:
612 +
613 + * Minimum set in v1:
614 +
615 + * `LINK_UP`, `LINK_DOWN`
616 + * Optional:
617 +
618 + * `HANDOVER_START`, `HANDOVER_COMPLETE` if you can detect “best-sat changed” for a GS/UT.
619 + * Alignment requirements:
620 +
621 + * `StreamEventsRequest.dt` must use the same semantics/rules as delta streams (`default_dt` when unset, cap-validated).
622 + * `StreamEventsRequest.selector` must use the same link candidate filtering semantics as delta streams.
623 + * Every emitted event includes `tick_index`, where tick `k = t_start + k*dt`.
624 + * Events should be consistent with LinkDelta stream:
625 +
626 + * If link transitions from `up=false` to `up=true` at tick k, emit a `LINK_UP` event at that tick’s `time`.
627 +
628 + ### 5.7 Backpressure and cancellation
629 +
630 + * Use `grpc.aio` streaming and `yield` messages.
631 + * If the client is slow, await on send; do not build unbounded queues.
632 + * On cancellation (`context.cancelled()`):
633 +
634 + * stop computation promptly
635 + * release scenario references held by the stream
636 + * record a log entry with reason
637 +
638 + ---
639 +
640 + ## 6) Internal engine architecture (recommended)
641 +
642 + ### 6.1 Modules and responsibilities
643 +
644 + **`scenario_store.py`**
645 +
646 + * Holds `ScenarioRuntime` objects keyed by `scenario_ref`
647 + * Contains:
648 +
649 + * validated `ScenarioSpec`
650 + * pre-parsed skyfield satellite objects
651 + * node dictionaries and role sets
652 + * cached computed data (positions/visibility per tick if enabled)
653 + * RNG seeded from `ScenarioSpec.seed`
654 +
655 + **`timebase.py`**
656 +
657 + * Converts timestamps to ticks and vice versa
658 + * Handles rounding rules (recommend: tick times exactly `t_start + k*dt`)
659 +
660 + **`selectors.py`**
661 +
662 + * Applies `LinkSelector` + `LinkPolicy` to yield candidate link pairs
663 + * Must support:
664 +
665 + * explicit pairs (exact)
666 + * link types
667 + * src/dst id filters
668 + * tag filters
669 + * only_visible/min_elevation/max_range constraints
670 +
671 + **`compute/ephemeris.py`**
672 +
673 + * Builds skyfield `EarthSatellite` objects from TLE
674 + * Provides `get_sat_ecef(t)` or `get_sat_eci(t)` depending on your implementation
675 +
676 + **`compute/geometry.py`**
677 +
678 + * Computes:
679 +
680 + * range (m)
681 + * elevation (deg) from ground site to satellite (and vice versa if needed)
682 + * visibility boolean: elevation >= min_elev, range <= max_range
683 +
684 + **`compute/link_budget.py`**
685 +
686 + * Computes:
687 +
688 + * FSPL from range + frequency
689 + * atmospheric attenuation (via ITU-R), optional
690 + * noise power from bandwidth + noise temp
691 + * received power, C/N0, SNR margin, etc.
692 + * Returns a `PhySummary` (internal dataclass)
693 +
694 + **`compute/adaptation.py`**
695 +
696 + * Maps `PhySummary` → `NetworkLinkState`:
697 +
698 + * `capacity_bps` and/or `loss_rate`
699 + * If v1, implement a deterministic piecewise mapping:
700 +
701 + * clamp snr_margin_db into [min,max]
702 + * map linearly to capacity between [0, terminal.bandwidth * eff_max] (or use fixed)
703 + * map snr_margin_db to loss via logistic or fixed thresholds
704 +
705 + **`streaming/delta.py`**
706 +
707 + * Maintains per-stream “previous link table”
708 + * Computes `updates` and `removals` each tick
709 +
710 + **`streaming/events.py`**
711 +
712 + * Detects link up/down transitions and yields `EngineEvent`
713 +
714 + ---
715 +
716 + ## 7) Scenario validation rules (must be enforced)
717 +
718 + * `t0 < t1`
719 + * `default_dt > 0`
720 + * Node IDs unique
721 + * Satellites must include valid TLE lines
722 + * Fixed sites must have valid lat/lon ranges
723 + * Terminal model must include:
724 +
725 + * `frequency_hz > 0`
726 + * `bandwidth_hz > 0`
727 + * `rx_noise_temp_k > 0`
728 + * `LinkPolicy.allowed_types` must be non-empty OR default to all valid types for provided roles
729 +
730 + Return gRPC status `INVALID_ARGUMENT` with a descriptive error message if validation fails.
731 +
732 + ---
733 +
734 + ## 8) Performance requirements (practical targets)
735 +
736 + These are engineering targets; adjust later.
737 +
738 + * Tick compute should scale with **number of candidate links**, not N² nodes.
739 + * Implement at least one of:
740 +
741 + * pre-filter by link type and role sets
742 + * max_range cutoff
743 + * max_degree pruning (keep best K neighbors by range or SNR)
744 +
745 + ### Recommended optimizations (v1)
746 +
747 + * Cache satellite positions per tick if `precompute_positions=true`.
748 + * Cache ground station ECEF once.
749 + * Vectorize range computations where possible (NumPy arrays).
750 +
751 + ---
752 +
753 + ## 9) Determinism requirements
754 +
755 + Determinism must include:
756 +
757 + * Ordering: Always sort link keys before emitting for stable output
758 +
759 + * sort by `(src, dst, type)`
760 + * RNG: Use `numpy.random.Generator(PCG64(seed))` attached to scenario
761 + * Floating rounding: Do not over-round; but be consistent in computations (same order of ops)
762 +
763 + Test: Run the same stream twice and ensure byte-equivalent serialized output (or field-wise equal within tolerance where appropriate).
764 +
765 + ---
766 +
767 + ## 10) Error handling (gRPC status codes)
768 +
769 + Implement these consistent statuses:
770 +
771 + * `NOT_FOUND`: unknown `scenario_ref`
772 + * `INVALID_ARGUMENT`: bad time range, dt, selector, scenario validation failure
773 + * `RESOURCE_EXHAUSTED`: too many active streams for a scenario; or too many links per tick requested
774 + * `FAILED_PRECONDITION`: scenario closed
775 + * `INTERNAL`: unexpected exceptions (log stack trace server-side)
776 +
777 + Add a stable error message prefix, e.g. `GEOMRF_ERR:<CODE>:<details>` for easier parsing.
778 +
779 + ---
780 +
781 + ## 11) Reference streaming algorithm (server-side)
782 +
783 + ### Pseudocode for `StreamLinkDeltas`
784 +
785 + 1. Resolve scenario and compute effective `t_start/t_end/dt`.
786 + 2. Build selector state (resolved node sets, tag filters, link types).
787 + 3. Initialize:
788 +
789 + * `prev_links = {}` (LinkKey → LinkUpdate-like internal struct)
790 + * `active_keys = set()`
791 + 4. For tick k from 0..:
792 +
793 + * Compute `t = t_start + k*dt`; stop when `t > t_end`.
794 + * Determine candidate link pairs from selector+policy.
795 + * For each candidate link:
796 +
797 + * compute geometry (range/elev/visibility)
798 + * if not feasible and only_visible: skip (will cause removal if previously active)
799 + * compute PHY summary
800 + * compute NetworkLinkState (up/delay/capacity/loss)
801 + * assemble internal current map `curr_links[key] = state`
802 + * Compute removals = keys in prev_links but not in curr_links
803 + * Compute updates:
804 +
805 + * if first tick and emit_full_snapshot_first: all curr_links become updates
806 + * else: apply delta thresholds comparing curr vs prev
807 + * Emit `LinkDeltaBatch` (even if empty updates/removals? optional; recommended emit every tick for simplicity)
808 + * Update prev_links = curr_links
809 +
810 + ### Pseudocode for `StreamEvents`
811 +
812 + * Either:
813 +
814 + * derive from `StreamLinkDeltas` logic (shared per-stream evaluator), OR
815 + * implement as separate evaluation loop that only checks transitions
816 + * Resolve and validate `t_start/t_end/dt` exactly as in delta streams.
817 + * Build selector from request and apply identical candidate filtering.
818 + * Emit event when `(prev.up != curr.up)` for any link in the selected set.
819 + * Populate both `time` and `tick_index`.
820 +
821 + ---
822 +
823 + ## 12) Client expectations (contract for consumers)
824 +
825 + A correct consumer must:
826 +
827 + * Start with the first `LinkDeltaBatch` (full snapshot)
828 + * Maintain `active_table[LinkKey] = LinkUpdate`
829 + * Apply each tick:
830 +
831 + * delete removals
832 + * upsert updates
833 + * Use `time` and `tick_index` as authoritative time
834 + * Optionally also subscribe to events; events are primarily observability data, not control-plane truth
835 +
836 + ---
837 +
838 + ## 13) Example client (must be included)
839 +
840 + Create `examples/client_stream.py`:
841 +
842 + * Connect to server
843 + * `CreateScenario` from an inline scenario object (or YAML file)
844 + * Start `StreamLinkDeltas` and print:
845 +
846 + * tick index, number of updates/removals, sample link
847 + * Optionally start `StreamEvents` concurrently
848 + * Close scenario at end
849 +
850 + Checklist:
851 +
852 + - [x] Implement `examples/client_stream.py`
853 + - [x] Add README usage snippet:
854 +
855 + - [x] start server
856 + - [x] run client
857 + - [x] expected output format
858 +
859 + ---
860 +
861 + ## 14) Minimal “default” adaptation mapping (v1, deterministic)
862 +
863 + If your existing code already outputs a usable throughput and PER proxy, use it. If not, implement a deterministic fallback:
864 +
865 + ### v1 fallback policy
866 +
867 + * `up = visibility && snr_margin_db > 0` (or >= threshold)
868 + * `delay = range_m / c` (c = 299792458 m/s)
869 + * `capacity_bps`:
870 +
871 + * FIXED_RATE: `fixed_capacity_bps` when up else 0
872 + * SNR_TO_RATE:
873 +
874 + * normalize `x = clamp((snr_margin_db - min)/(max-min), 0..1)`
875 + * `capacity = x * capacity_max`, where `capacity_max = bandwidth_hz * eff_max`
876 + * choose `eff_max` constant (e.g., 4 bits/s/Hz) in v1; document it
877 + * `loss_rate`:
878 +
879 + * FIXED: `fixed_loss_rate` when up else 1
880 + * SNR_TO_LOSS:
881 +
882 + * logistic: `loss = 1 / (1 + exp(a*(snr_margin_db - b)))` with fixed a,b
883 + * clamp to [0,1]
884 +
885 + Checklist:
886 +
887 + - [x] Decide v1 constants (`eff_max`, logistic params, up-threshold)
888 + - [x] Put them in `adaptation.py` and record them in logs/version
889 +
890 + ---
891 +
892 + ## 15) Observability (recommended even in v1)
893 +
894 + * gRPC access logs including:
895 +
896 + * scenario_ref, stream type, time range, dt, selector summary
897 + * Prometheus counters (optional but easy):
898 +
899 + * streams active, ticks computed, links computed, mean compute time
900 + * TickStats in stream payload (already specified)
901 +
902 + Checklist:
903 +
904 + - [x] Add `TickStats` computation
905 + - [x] Add server-side metrics (optional)
906 + - [x] Add structured logging with correlation IDs
907 +
908 + ---
909 +
910 + ## 16) Security and robustness (v1 minimum)
911 +
912 + * Bind address configurable (`0.0.0.0:50051` default)
913 + * Optional TLS later; v1 can be plaintext for local lab use
914 + * Enforce limits:
915 +
916 + * max nodes
917 + * max links per tick
918 + * max active streams per scenario
919 +
920 + Checklist:
921 +
922 + - [x] Enforce link/node limits with `RESOURCE_EXHAUSTED`
923 + - [x] Enforce max concurrent streams per scenario
924 +
925 + ---
926 +
927 + ## 17) Acceptance criteria (definition of done)
928 +
929 + ### Functional
930 +
931 + - [x] Server starts and responds to `GetVersion` and `GetCapabilities`
932 + - [x] `CreateScenario` returns a scenario_ref and validates inputs
933 + - [x] `StreamLinkDeltas` emits:
934 +
935 + - [x] a full snapshot first (when enabled)
936 + - [x] then sparse deltas/removals per tick
937 + - [x] `StreamEvents` emits link up/down events consistent with deltas
938 + - [x] `CloseScenario` frees scenario resources and blocks further streams
939 +
940 + ### Correctness
941 +
942 + - [x] Determinism test passes (same inputs → same outputs)
943 + - [x] Selector tests pass (only requested links emitted)
944 + - [x] Delta threshold tests pass (small changes suppressed)
945 +
946 + ### Usability
947 +
948 + - [x] Example client runs end-to-end against server and prints reasonable output
949 + - [x] README explains how to run locally and how to pass a scenario YAML
950 +
951 + ---
952 +
953 + ## 18) Optional but high-value extension hooks (safe to leave stubbed)
954 +
955 + These can exist as placeholders in code (no API changes needed later):
956 +
957 + - [x] “Debug fields” population (`include_debug_fields=true`)
958 + - [ ] Additional events (handover start/complete)
959 + - [ ] More orbit formats (OEM/SP3) behind `SatelliteOrbit` oneof later
960 + - [ ] Better antenna pattern modeling behind `TerminalModel`
961 +
962 + ---
963 +
964 + ## 19) v1.1 orchestrator-alignment tasks (new)
965 +
966 + - [ ] Bump schema to `geomrf.v1.1` (or equivalent versioning plan) for event-alignment fields.
967 + - [ ] Update `StreamEventsRequest` implementation to honor request `dt` and `selector`.
968 + - [ ] Populate `EngineEvent.tick_index` from the same tick loop semantics as deltas.
969 + - [ ] Add tests:
970 + - [ ] events and deltas requested with same window/selectors produce aligned tick grids
971 + - [ ] invalid event `dt` returns `INVALID_ARGUMENT`
972 + - [ ] event selector filtering mirrors delta selector behavior
973 + - [ ] Keep backward compatibility plan explicit (version gate or dual-field behavior) for existing v1 clients.

TASKS_IMNET.md(file created)

@@ -0,0 +1,406 @@
1 + # TASKS_IMNET.md — SatSim IMNET Lane (OMNeT++/INET) Implementation Plan
2 +
3 + This is a task-driven implementation plan for the **IMNET lane** (OMNeT++/INET). It covers:
4 + - Building and running an OMNeT++/INET simulation project under **opp_env**
5 + - Integrating IMNET into the existing **uv-managed** SatSim workspace (without mixing concerns)
6 + - Consuming **orchestrator-produced LinkState traces** (v1: trace-first) to apply dynamic link changes
7 + - Producing artifacts compatible with SatSim run directories / manifests
8 +
9 + > Policy reminder (already in repo): Python workflows are `uv`-managed; OMNeT++/INET toolchain is `opp_env`-managed.
10 +
11 + ---
12 +
13 + ## -1) Locked v1 decisions for this plan update
14 +
15 + - Execution model is **post-stream replay**:
16 + - Orchestrator records the OMNeT trace during streaming, then runs OMNeT after stream completion.
17 + - Topology strategy is **Strategy 3**:
18 + - stable NED template + orchestrator-generated `node_map.json` and `link_map.json`.
19 + - Orchestrator ↔ OMNeT runtime interface is **typed and orchestrator-owned**:
20 + - no implicit "just pass arbitrary `run_args`" contract for required parameters.
21 + - Canonical trace key fields use `src`, `dst`, `link_type` (not mixed `type` naming).
22 + - Repo split is intentional:
23 + - OMNeT assets under `lanes/omnet/`, Python lane adapter/runner under `satsim_orch/lanes/omnet_lane/`.
24 + - `opp_env` default path uses Nix, but `--nixless-workspace` is a supported fallback mode.
25 +
26 + ---
27 +
28 + ## 0) Deliverables (what “done” means)
29 +
30 + - [x] `lanes/omnet/` exists and contains:
31 + - [x] a reproducible `opp_env` workspace definition (pinned OMNeT++ + INET versions)
32 + - [x] an OMNeT++ project (“satsim-imnet”) that compiles and runs headless
33 + - [x] a **LinkState trace ingestion + applier** module that updates delay/rate/(optional loss) per tick
34 + - [x] a minimal demo scenario (2–10 nodes) that:
35 + - [x] runs via orchestrator in `--mode omnet`
36 + - [x] reads the trace written by orchestrator
37 + - [x] produces artifacts under the run directory (logs + .vec/.sca; optional pcap)
38 + - [x] One-command dev flow:
39 + - [x] `uv run satsim run <scenario.yaml> --mode omnet ...` executes IMNET via opp_env and stores artifacts in the standard run folder.
40 + - [x] Existing OMNeT lane blockers in current orchestrator code are closed before IMNET C++ work:
41 + - [x] trace writer removal serialization does not use `__dict__` on slots dataclasses
42 + - [x] trace line includes `is_full_snapshot`
43 + - [x] OMNeT launch is not silently skipped when `--mode omnet` is selected
44 +
45 + ---
46 +
47 + ## 1) Repo layout for IMNET lane
48 +
49 + - [x] Create directory structure:
50 + - [x] `lanes/omnet/`
51 + - [x] `lanes/omnet/WORKSPACE.md` (how to install + run with opp_env; pinned versions)
52 + - [x] `lanes/omnet/opp_env/` (workspace init and pinned selection)
53 + - [x] `lanes/omnet/satsim-imnet/` (the OMNeT++ project)
54 + - [x] `src/` (C++ modules)
55 + - [x] `ned/` (NED definitions)
56 + - [x] `omnetpp.ini` (baseline config; orchestrator may override with -f/-c)
57 + - [x] `Makefile` (generated by opp_makemake; committed only if desired, otherwise generated)
58 + - [x] `README.md` (how to build/run inside opp_env)
59 + - [x] `lanes/omnet/scripts/` (helper wrappers used by orchestrator)
60 + - [x] `install.py` (optional convenience; calls `opp_env install ...`)
61 + - [x] `build.py` (build IMNET project inside opp_env)
62 + - [x] `run.py` (run IMNET headless inside opp_env, accepts args from orchestrator)
63 + - [x] Keep orchestrator Python integration in package code:
64 + - [x] `satsim_orch/lanes/omnet_lane/adapter.py` remains the lane entrypoint
65 + - [x] `satsim_orch/lanes/omnet_lane/runner.py` owns command construction and process launch
66 + - [x] `lanes/omnet/WORKSPACE.md` documents how these package paths map to `lanes/omnet/` assets
67 +
68 + ---
69 +
70 + ## 2) Version pinning and opp_env workspace (reproducible toolchain)
71 +
72 + ### 2.1 Pick and pin OMNeT++ + INET versions
73 + - [x] Pin versions (v1 recommendation; can be adjusted later):
74 + - [x] OMNeT++: `omnetpp-6.3.0`
75 + - [x] INET: `inet-4.5.4`
76 + - [x] Document the pin in:
77 + - [x] `lanes/omnet/WORKSPACE.md`
78 + - [x] orchestrator run manifest fields (already exists; ensure it records these exact strings)
79 +
80 + ### 2.2 Initialize an opp_env workspace outside git working trees
81 + Goal: keep installs reproducible while avoiding committing huge toolchains.
82 +
83 + - [x] Decide where the opp_env workspace lives:
84 + - [x] default: `~/.cache/satsim/opp_env/workspace` (outside git tree)
85 + - [x] store only small config/metadata in git, not the compiled artifacts
86 + - [x] Add `.gitignore` entries:
87 + - [x] ignore optional repo-local workspace path `lanes/omnet/opp_env/workspace/` if used for local experimentation
88 + - [x] ignore `lanes/omnet/**/out/` (OMNeT outputs)
89 + - [x] ignore `lanes/omnet/**/results/` (if used)
90 +
91 + ### 2.3 Provide canonical commands (must work from uv venv)
92 + - [x] Ensure `opp_env` is invoked via uv:
93 + - [x] `uv run opp_env --version`
94 + - [x] `uv run opp_env list`
95 + - [x] Implement `lanes/omnet/scripts/install.py` that performs:
96 + - [x] workspace init (idempotent)
97 + - [x] install pinned INET (which pulls matching OMNeT++)
98 + - [x] verify the installed packages exist
99 + - [x] Add workspace mode guidance:
100 + - [x] default mode: Nix-backed `opp_env` workspace
101 + - [x] fallback mode: `opp_env --nixless-workspace` (document prerequisites and reproducibility caveats)
102 + - [x] orchestrator preflight must emit a clear error only when the selected workspace mode requirements are unmet
103 +
104 + ---
105 +
106 + ## 3) uv ↔ opp_env integration (clean boundary)
107 +
108 + ### 3.1 Keep responsibility boundaries strict
109 + - [x] Confirm and document:
110 + - [x] `uv` manages Python deps + orchestrator execution
111 + - [x] `opp_env` manages OMNeT++/INET toolchain and the shell/run environment
112 + - [x] Orchestrator calls `opp_env run ...` (or `opp_env shell -c ...`) rather than assuming OMNeT binaries are on PATH
113 +
114 + ### 3.2 Add orchestrator-side “IMNET preflight”
115 + (Only if not already present; keep it minimal.)
116 +
117 + - [x] In orchestrator’s OMNeT lane runner:
118 + - [x] verify `opp_env` is available (`uv run opp_env --version`)
119 + - [x] verify required runtime mode dependencies:
120 + - [x] Nix-backed mode: verify `nix` is available
121 + - [x] nixless mode: verify required toolchain binaries are present
122 + - [x] verify the IMNET scripts exist (install/build/run)
123 + - [x] if missing dependencies:
124 + - [x] fail with a single actionable message (no partial runs)
125 + - [x] Close current OMNeT-lane correctness blockers first:
126 + - [x] fix trace writer removal serialization for `LinkKey` (slots dataclass)
127 + - [x] include `is_full_snapshot` in the OMNeT trace JSONL tick payload
128 + - [x] remove `ini_path`-gated silent skip; in omnet mode runner must launch or raise an explicit error
129 +
130 + ### 3.3 Standardize how orchestrator launches IMNET
131 + - [x] Replace ad-hoc argument passing with a typed runner contract (orchestrator-owned):
132 + - [x] required fields:
133 + - [x] `workspace_path`
134 + - [x] `inet_version` (pinned string)
135 + - [x] `project_path`
136 + - [x] `ini_path`
137 + - [x] `trace_path`
138 + - [x] `dt_seconds`
139 + - [x] `outdir`
140 + - [x] `seed`
141 + - [x] optional fields:
142 + - [x] `config_name`
143 + - [x] `sim_time_limit_s`
144 + - [x] `extra_args` (non-critical escape hatch only)
145 + - [x] Define one canonical `opp_env` invocation generated by runner code:
146 + - [x] `opp_env run inet-<PINNED> --init -w <WORKSPACE> --chdir -c "<COMMAND>"`
147 + - [x] `runner.py` converts typed fields into OMNeT CLI args; callers do not handcraft command strings
148 +
149 + ---
150 +
151 + ## 4) IMNET contract with the other SatSim systems (as they exist)
152 +
153 + ### 4.1 Relationship to Geo/RF engine
154 + - [x] IMNET does **not** talk to Geo/RF directly in v1
155 + - [x] IMNET consumes orchestrator-produced trace derived from:
156 + - [x] Geo/RF `StreamLinkDeltas` batches (tick_index + time + updates/removals)
157 +
158 + ### 4.2 Relationship to Orchestrator (v1 trace-first)
159 + - [x] Orchestrator responsibilities (assumed existing):
160 + - [x] create scenario in Geo/RF engine
161 + - [x] consume LinkDeltaBatch stream
162 + - [x] write a deterministic LinkState trace file for OMNeT lane
163 + - [x] launch OMNeT lane runner with correct typed parameters **after stream completion** (post-stream replay)
164 + - [x] IMNET responsibilities:
165 + - [x] parse the trace deterministically
166 + - [x] schedule link updates in simulation time aligned to tick_index
167 + - [x] apply updates to the simulated links (delay/rate/(optional loss), and “up/down” semantics)
168 +
169 + ### 4.3 Trace file format (IMNET must support this)
170 + Pick one and lock it. If orchestrator already emits a format, IMNET must match it.
171 +
172 + - [x] Decide and document the v1 trace format (JSONL, locked)
173 + - [x] One JSON object per tick line
174 + - [x] Fields required:
175 + - [x] `tick_index` (uint64)
176 + - [x] `time` (ISO8601 or unix seconds; used for logging only; simtime comes from tick_index * dt)
177 + - [x] `is_full_snapshot` (bool)
178 + - [x] `updates` (list)
179 + - [x] `removals` (list)
180 + - [x] Each `update` contains:
181 + - [x] `src` (string)
182 + - [x] `dst` (string)
183 + - [x] `link_type` (string enum name)
184 + - [x] `up` (bool)
185 + - [x] `one_way_delay_s` (float)
186 + - [x] `capacity_bps` (float)
187 + - [x] `loss_rate` (float; optional if IMNET v1 ignores loss)
188 + - [x] Each `removal` contains:
189 + - [x] `src`, `dst`, `link_type` (same key fields)
190 + - [x] Determinism requirements:
191 + - [x] stable ordering of `updates` and `removals` within each tick
192 + - [x] no mixed key names (`type` vs `link_type`) in v1 output
193 + - [x] Define dt semantics for IMNET:
194 + - [x] Lock v1 approach: IMNET gets `dt` via typed runner argument (`dt_seconds`)
195 + - [x] trace `time` is informational/logging only; tick scheduling uses `tick_index * dt_seconds`
196 +
197 + ---
198 +
199 + ## 5) OMNeT++/INET project scaffolding (“satsim-imnet”)
200 +
201 + ### 5.1 Create a minimal INET-based network model
202 + Goal: minimal, not fancy, but real packets flow and link parameters can change.
203 +
204 + - [x] Create `lanes/omnet/satsim-imnet/ned/SatSimNetwork.ned`
205 + - [x] Use INET `StandardHost` (or `Router`) modules for nodes
206 + - [x] Include:
207 + - [x] one traffic source app and one sink app (UDP is fine for v1)
208 + - [x] optional intermediate router (for a 2-hop demo)
209 + - [x] Strategy 3 mapping is required in v1:
210 + - [x] orchestrator generates `node_map.json` (`node_id` ↔ module path)
211 + - [x] orchestrator generates `link_map.json` (LinkKey ↔ mutable channel/shim path)
212 + - [x] LinkStateApplier consumes map files in strict mode and fails on unknown keys
213 + - [x] Create `lanes/omnet/satsim-imnet/omnetpp.ini`
214 + - [x] include INET paths and defaults
215 + - [x] configure:
216 + - [x] IP address assignment (INET configurator)
217 + - [x] app endpoints and start times
218 + - [x] disable GUI by default (`Cmdenv`) for orchestrator runs
219 +
220 + ### 5.2 Add a LinkState ingestion + application component
221 + - [x] Add `src/LinkTraceReader.{h,cc}`
222 + - [x] reads the trace file
223 + - [x] validates ordering (tick_index monotonic)
224 + - [x] provides an in-memory list of per-tick updates (or streaming reader)
225 + - [x] Add `src/LinkStateApplier.{h,cc}` as a `cSimpleModule`
226 + - [x] parameters:
227 + - [x] `string tracePath`
228 + - [x] `double dtSeconds`
229 + - [x] `bool strict` (fail-fast on unknown link keys)
230 + - [x] `bool applyLoss` (optional)
231 + - [x] behavior:
232 + - [x] on init:
233 + - [x] load/validate trace header/dt
234 + - [x] build a mapping from LinkKey → simulation link object(s)
235 + - [x] schedule first self-message at tick 0
236 + - [x] on each tick:
237 + - [x] apply updates/removals for that tick
238 + - [x] schedule next tick if present
239 + - [x] on finish:
240 + - [x] write summary scalars (num updates applied, unknown keys, etc.)
241 +
242 + ---
243 +
244 + ## 6) How IMNET represents links (v1 minimal approach)
245 +
246 + You need a concrete, implementable mapping from LinkKey → something mutable in OMNeT/INET.
247 +
248 + ### 6.1 Choose v1 link representation
249 + Pick one approach and implement it end-to-end:
250 +
251 + **Option A (preferred v1): mutate OMNeT channels**
252 + - [x] Use a channel type that supports:
253 + - [x] delay updates (`delay`)
254 + - [x] datarate updates (`datarate`)
255 + - [x] “up/down” via disabling or forcing drop
256 + - [x] Build a stable topology at startup containing all candidate links
257 + - [x] Map each LinkKey to a channel pointer
258 + - [x] On update:
259 + - [x] set channel delay
260 + - [x] set channel datarate
261 + - [x] if `up=false` or “removal”: disable channel / set drop mode
262 +
263 + **Option B: insert a small “LinkShim” module per edge**
264 + - [x] Create a `LinkShim` `cSimpleModule` that:
265 + - [x] applies propagation delay (schedule) *(not selected for v1 Option A path)*
266 + - [x] applies serialization delay from capacity (packet_bits / capacity_bps) *(not selected for v1 Option A path)*
267 + - [x] drops packets by loss_rate *(not selected for v1 Option A path)*
268 + - [x] drops everything when `up=false` *(not selected for v1 Option A path)*
269 + - [x] Connect hosts via `LinkShim` modules instead of relying on channel mutability *(not selected for v1 Option A path)*
270 +
271 + > v1 recommendation: start with Option A if channel mutability is adequate; fall back to Option B if not.
272 +
273 + ### 6.2 Define “up/down/removal” semantics for v1
274 + - [x] Lock v1 semantics (must match orchestrator trace writer):
275 + - [x] `up=false` means the link exists but currently unavailable (drop all / disable)
276 + - [x] `removal` means the link is not in the current active set
277 + - [x] v1 handling recommendation: treat removal as `up=false` (do not delete topology)
278 + - [x] allow a later `update` to re-enable it
279 +
280 + ### 6.3 Unit conversion rules (lock them)
281 + - [x] `one_way_delay_s` → OMNeT `simtime_t` delay
282 + - [x] `capacity_bps` → channel datarate (bps)
283 + - [x] `loss_rate`:
284 + - [x] if supported natively by chosen link representation, apply directly
285 + - [x] otherwise implement drop probability in `LinkShim` or a per-interface dropper
286 +
287 + ---
288 +
289 + ## 7) Topology generation strategy (v1: small but consistent)
290 +
291 + ### 7.1 Keep topology simple in v1
292 + - [x] v1 target: 2–10 nodes, with:
293 + - [x] at least one satellite-like router node
294 + - [x] at least one ground station-like host node
295 + - [x] at least one dynamic link changing delay/rate/up/down over time
296 +
297 + ### 7.2 How the topology is created
298 + Use Strategy 3 for v1:
299 +
300 + - [x] Commit a stable NED topology template under `lanes/omnet/satsim-imnet/ned/`
301 + - [x] Orchestrator writes `node_map.json` and `link_map.json` per run into the run artifacts
302 + - [x] IMNET loads maps at startup and applies all updates by LinkKey lookup through the map
303 + - [x] In strict mode, any unmapped LinkKey is a hard failure
304 + - [x] Generated NED per run (Strategy 2) is explicitly deferred beyond v1
305 +
306 + ---
307 +
308 + ## 8) Orchestrator ↔ IMNET runtime interface (exact parameters)
309 +
310 + ### 8.1 Standardize the runner arguments
311 + - [x] Define a single typed runner entrypoint (called by adapter/runner code) that accepts:
312 + - [x] `--trace <path>`
313 + - [x] `--dt <seconds>`
314 + - [x] `--outdir <path>`
315 + - [x] `--seed <int>`
316 + - [x] `--tend <seconds>` (optional; can be inferred from last tick)
317 + - [x] `--config <omnet_config_name>` (optional)
318 + - [x] Runner converts these to OMNeT args:
319 + - [x] `-u Cmdenv`
320 + - [x] `-n <NEDPATHS including INET and satsim-imnet/ned>`
321 + - [x] `-l` (load required libraries if needed)
322 + - [x] `--output-dir=<outdir>`
323 + - [x] `--seed-set=<seed>` (or equivalent OMNeT seed setting)
324 + - [x] `--sim-time-limit=<tend>s` (if used)
325 + - [x] pass `tracePath` and `dtSeconds` as module parameters
326 + - [x] Runner invocation semantics:
327 + - [x] if lane mode is `omnet` or `parallel`, omnet runner invocation is mandatory
328 + - [x] missing required runner inputs is a hard error, never a silent no-op
329 +
330 + ### 8.2 Ensure artifacts land in the SatSim run directory
331 + - [x] Orchestrator passes `outdir = artifacts/<run_id>/omnet/`
332 + - [x] IMNET writes:
333 + - [x] OMNeT results (.vec/.sca) to `outdir/results/` (or directly under outdir)
334 + - [x] stdout/stderr to `outdir/logs/omnet.log` (or orchestrator captures it)
335 + - [x] a small `imnet_runinfo.json` containing:
336 + - [x] versions (inet, omnetpp)
337 + - [x] trace hash
338 + - [x] config name
339 + - [x] seed
340 + - [x] start/end ticks processed
341 +
342 + ---
343 +
344 + ## 9) Observability for IMNET (minimal but useful)
345 +
346 + - [x] Configure OMNeT to export:
347 + - [x] scalar summary stats (packets sent/received, drops)
348 + - [x] vector time series for throughput/delay (where feasible)
349 + - [x] Add LinkStateApplier scalars:
350 + - [x] `ticksProcessed`
351 + - [x] `updatesApplied`
352 + - [x] `removalsApplied`
353 + - [x] `unknownLinkKeys` (must be 0 in strict mode)
354 + - [x] Optional (v1.1): PCAP output
355 + - [x] If using INET features that can emit pcap:
356 + - [x] enable and write into `outdir/pcap/` *(deferred; not enabled in v1)*
357 + - [x] Otherwise: skip; rely on Mininet lane for PCAPs
358 +
359 + ---
360 +
361 + ## 10) Testing and validation (must exist for v1)
362 +
363 + ### 10.1 Offline smoke test (no orchestrator)
364 + - [x] Add `lanes/omnet/satsim-imnet/tests/` (or scripts) that:
365 + - [x] runs a short simulation with a tiny trace
366 + - [x] verifies results files created
367 + - [x] verifies LinkStateApplier processed N ticks
368 + - [x] Provide `lanes/omnet/scripts/smoke.py`:
369 + - [x] installs toolchain (if needed)
370 + - [x] builds project
371 + - [x] runs headless sim for ~5–20 seconds simtime
372 +
373 + ### 10.2 End-to-end smoke test (with orchestrator)
374 + - [x] Add a minimal `scenarios/omnet_smoke.yaml`:
375 + - [x] small node set
376 + - [x] dt ~ 1s
377 + - [x] short window (e.g., 60s)
378 + - [x] explicit link selector to keep link key set stable
379 + - [x] Add a single command documented in root README or WORKSPACE.md:
380 + - [x] `uv run satsim run scenarios/omnet_smoke.yaml --mode omnet`
381 + - [x] Verify artifacts:
382 + - [x] orchestrator run folder exists
383 + - [x] omnet subfolder contains .vec/.sca
384 + - [x] logs show LinkStateApplier applying ticks in order
385 +
386 + ---
387 +
388 + ## 11) Known v1 constraints (explicitly accepted)
389 +
390 + - [x] v1 uses **trace-first** ingestion (no live gRPC inside OMNeT)
391 + - [x] v1 may ignore directional asymmetry:
392 + - [x] if LinkKey is directional but the chosen link model is bidirectional,
393 + document how direction is collapsed (or restrict scenarios accordingly)
394 + - [x] v1 focuses on:
395 + - [x] correctness of tick alignment
396 + - [x] correctness of delay/rate updates
397 + - [x] reproducible, scripted build/run under opp_env
398 +
399 + ---
400 +
401 + ## 12) v1.1+ hooks (implemented as opt-in features)
402 +
403 + - [x] Live streaming adapter hook inside OMNeT++ (`ingest_mode=live_stream`, runtime trace refresh + `grpcTarget` hook parameter)
404 + - [x] Dynamic topology construction from ScenarioSpec maps (generated NED at run-time in `run.py`)
405 + - [x] Proper per-direction link modeling hook (`directional_links=true` creates separate directional channels)
406 + - [x] Integration hook for SDN decision traces (`sdn_trace_path` + per-tick channel enable/disable decisions in LinkStateApplier)

TASKS_ORCHESTRATOR.md(file created)

@@ -0,0 +1,596 @@
1 + # ORCHESTRATOR_IMPLEMENTATION.md — SatSim Orchestrator (Python) Detailed Plan
2 +
3 + This document is a task-driven implementation plan for the **SatSim Orchestrator**. It is intended to be handed to an LLM to implement the orchestrator in Python. It focuses on **architecture, module boundaries, data flow, and concrete tasks** (not deep RPC wire details).
4 +
5 + The orchestrator is responsible for:
6 + - Loading a scenario
7 + - Starting and coordinating subcomponents (Geo/RF engine, OMNeT++ lane, Mininet lane)
8 + - Driving a unified timebase
9 + - Fanning out LinkState/Event updates to lane adapters
10 + - Recording reproducible artifacts (manifest, logs, optional traces, metrics)
11 +
12 + ---
13 +
14 + ## Decision lock (2026-02-18)
15 +
16 + These ambiguities are now resolved and should be treated as fixed v1 design:
17 +
18 + - **Authoritative tick source**: `StreamLinkDeltas` is the only control-plane tick source. Lanes apply link state from deltas, not from events.
19 + - **Event stream alignment (Option B accepted)**: evolve Geo/RF event API so `StreamEventsRequest` carries `dt` + `selector`, and `EngineEvent` carries `tick_index`, aligned to the same tick grid as deltas.
20 + - **Error contract handling**: orchestrator must handle `NOT_FOUND`, `INVALID_ARGUMENT`, `FAILED_PRECONDITION`, and `RESOURCE_EXHAUSTED` as first-class engine responses.
21 + - **Scenario translation strictness**: translation to Geo/RF `ScenarioSpec` is fail-fast and must satisfy all required engine fields/constraints.
22 + - **Tooling policy**: Python workflows use `uv` (`uv run`, `uv add`); do not rely on `pip`.
23 +
24 + ---
25 +
26 + ## 1) Repository layout (recommended)
27 +
28 + ```
29 +
30 + satsim/
31 + orchestrator/
32 + pyproject.toml
33 + README.md
34 + ORCHESTRATOR_IMPLEMENTATION.md
35 +
36 + ```
37 + satsim_orch/
38 + __init__.py
39 + cli.py
40 + main.py
41 +
42 + config/
43 + __init__.py
44 + schema.py
45 + loader.py
46 + defaults.py
47 + normalize.py
48 +
49 + runtime/
50 + __init__.py
51 + run_manager.py
52 + manifest.py
53 + artifact_store.py
54 + logging.py
55 + versioning.py
56 + process.py
57 +
58 + timebase/
59 + __init__.py
60 + clock.py
61 + modes.py
62 + scheduler.py
63 +
64 + bus/
65 + __init__.py
66 + messages.py
67 + queues.py
68 + fanout.py
69 +
70 + geomrf/
71 + __init__.py
72 + client.py
73 + translate.py
74 + health.py
75 +
76 + lanes/
77 + __init__.py
78 + base.py
79 + registry.py
80 +
81 + mininet_lane/
82 + __init__.py
83 + adapter.py
84 + topo.py
85 + shaping.py
86 + controller.py
87 + capture.py
88 +
89 + omnet_lane/
90 + __init__.py
91 + adapter.py
92 + trace_ingest.py
93 + runner.py
94 +
95 + metrics/
96 + __init__.py
97 + prom.py
98 + records.py
99 + exporters.py
100 +
101 + util/
102 + __init__.py
103 + ids.py
104 + units.py
105 + asyncx.py
106 + errors.py
107 +
108 + tests/
109 + test_config_validation.py
110 + test_timebase_scheduler.py
111 + test_bus_fanout.py
112 + test_run_manifest.py
113 + test_lane_adapter_contract.py
114 + test_geomrf_client_smoke.py
115 + ```
116 +
117 + subprojects/
118 + geomrf-engine/ # separate project; orchestrator consumes it via gRPC
119 + lanes/
120 + omnet/
121 + mininet/
122 + observability/
123 + artifacts/
124 +
125 + ```
126 +
127 + ---
128 +
129 + ## 2) Orchestrator design summary (targets)
130 +
131 + ### 2.1 Orchestrator responsibilities
132 + - Scenario loading & validation
133 + - Run directory + manifest creation
134 + - Geo/RF engine lifecycle (create scenario; start streams; close)
135 + - Lane lifecycle:
136 + - `prepare()` (build topology / start processes)
137 + - `apply_tick()` (apply link deltas / events)
138 + - `finalize()` (stop processes, collect outputs)
139 + - Unified runtime pacing:
140 + - offline apply-fast
141 + - real-time apply-paced (wall-clock aligned)
142 + - parallel lane fanout (same incoming ticks feed multiple lanes)
143 + - Artifact collection:
144 + - config snapshot, manifest
145 + - optional LinkState trace logging
146 + - metrics export
147 + - PCAP capture (Mininet lane)
148 +
149 + ### 2.2 Key architectural choices
150 + - Python 3.11+ with **asyncio**
151 + - gRPC async client (`grpc.aio`) for Geo/RF streaming
152 + - In-process **async fanout bus** using bounded queues (v1)
153 + - Pluggable lane adapters via a registry
154 + - Everything stamped with versions/seeds for reproducibility
155 +
156 + ---
157 +
158 + ## 3) Implementation checklist (extremely detailed)
159 +
160 + ## 3.1 Project bootstrap and build
161 + - [x] Create `orchestrator/pyproject.toml`
162 + - [x] Define package name (e.g., `satsim-orchestrator`)
163 + - [x] Set Python version (>=3.11)
164 + - [x] Add dependencies:
165 + - [x] `pydantic`
166 + - [x] `pyyaml`
167 + - [x] `grpcio`, `grpcio-tools`, `protobuf`
168 + - [x] `rich` (optional, for CLI UX)
169 + - [x] `prometheus-client` (optional)
170 + - [x] `aiofiles` (optional, async file writes)
171 + - [x] Add dev dependencies:
172 + - [x] `pytest`, `pytest-asyncio`
173 + - [x] `ruff` / `black`
174 + - [x] `mypy` (optional)
175 + - [x] Add task runner and `uv` commands:
176 + - [x] `uv run pytest`
177 + - [x] `uv run ruff check .`
178 + - [x] `uv run python -m satsim_orch.cli run <scenario.yaml> ...`
179 +
180 + ## 3.2 CLI and entrypoints
181 + - [x] Implement `satsim_orch/cli.py` with commands:
182 + - [x] `run <scenario.yaml> --mode {omnet|mininet|parallel} --dt 1s --t0 ... --t1 ...`
183 + - [x] `validate <scenario.yaml>`
184 + - [x] `list-runs`
185 + - [x] `show-run <run_id>`
186 + - [x] Implement `satsim_orch/main.py`
187 + - [x] Parse CLI args
188 + - [x] Load scenario
189 + - [x] Create `RunContext`
190 + - [x] Run orchestrator loop
191 + - [x] Define exit codes and error messages:
192 + - [x] Invalid config → exit 2
193 + - [x] Missing dependency/lane binary → exit 3
194 + - [x] Runtime error → exit 1
195 +
196 + ---
197 +
198 + ## 4) Configuration system
199 +
200 + ### 4.1 Scenario schema (Pydantic)
201 + - [x] Implement `config/schema.py` with a canonical `ScenarioConfig`
202 + - [x] Global:
203 + - [x] `name`
204 + - [x] `seed`
205 + - [x] `time: {t0, t1, dt, mode}`
206 + - [x] `execution: {lane_mode, strict_reproducible, record_trace, record_pcap}`
207 + - [x] `paths: {artifacts_root}`
208 + - [x] Geo/RF engine connection:
209 + - [x] `geomrf: {grpc_target, request_dt, selector_defaults, thresholds_defaults}`
210 + - [x] Geo/RF scenario payload:
211 + - [x] `geomrf.scenario_spec` maps 1:1 to Geo/RF `ScenarioSpec` required fields (`nodes`, `terminal`, orbit/site, link/adaptation policy)
212 + - [x] optional high-level shorthand may exist, but must compile deterministically to valid `ScenarioSpec`
213 + - [x] Lane configs:
214 + - [x] `mininet: {controller: {type, addr}, topo: {...}, shaping: {...}}`
215 + - [x] `omnet: {project_path, ini_path, run_args, trace_mode}`
216 + - [x] Add validators:
217 + - [x] `t0 < t1`
218 + - [x] `dt > 0`
219 + - [x] `seed >= 0`
220 + - [x] lane configs exist for chosen mode
221 + - [x] fail-fast if engine-required scenario fields are missing/invalid
222 + - [x] fail-fast if `request_dt` is outside engine capabilities (`min_dt`, `max_dt`)
223 + - [x] if `mode=mininet` require Linux + OVS checks (soft validate with warnings)
224 + - [x] Add defaulting rules in `config/defaults.py`
225 + - [x] dt default (e.g., 1s)
226 + - [x] thresholds default (delay/capacity/loss)
227 + - [x] artifacts root default `./artifacts/runs`
228 +
229 + ### 4.2 Loader and normalization
230 + - [x] Implement `config/loader.py`
231 + - [x] load YAML/JSON
232 + - [x] environment variable expansion (optional)
233 + - [x] include/merge support (optional)
234 + - [x] Implement `config/normalize.py`
235 + - [x] produce a normalized config (canonical types, timezone normalization)
236 + - [x] compute derived fields (run duration, tick count)
237 + - [x] Implement `config/normalize.py` to build:
238 + - [x] `GeomrfScenarioSpec` (engine-facing) from `ScenarioConfig`
239 + - [x] `LaneScenarioSpec` (lane-facing) from `ScenarioConfig`
240 +
241 + ---
242 +
243 + ## 5) Run manager and artifacts
244 +
245 + ### 5.1 Run context and directory structure
246 + - [x] Implement `runtime/run_manager.py`
247 + - [x] Generate `run_id` (timestamp + short random, or UUID)
248 + - [x] Create run directory:
249 + - [x] `artifacts/runs/<run_id>/`
250 + - [x] `logs/`, `metrics/`, `pcaps/`, `traces/`, `manifests/`
251 + - [x] Save copies of:
252 + - [x] raw scenario file
253 + - [x] normalized scenario JSON
254 + - [x] Implement `runtime/manifest.py`
255 + - [x] manifest fields:
256 + - [x] run_id, scenario name, timestamps
257 + - [x] seeds
258 + - [x] component versions (orchestrator, geomrf engine, lanes)
259 + - [x] execution mode, dt, tick count
260 + - [x] git SHAs if available
261 + - [x] host info (OS, python version) (optional)
262 + - [x] Implement `runtime/versioning.py`
263 + - [x] orchestrator version string
264 + - [x] best-effort git SHA discovery
265 +
266 + ### 5.2 Logging
267 + - [x] Implement `runtime/logging.py`
268 + - [x] structured JSON logs to file
269 + - [x] human-readable console logs
270 + - [x] include `run_id` and correlation IDs
271 + - [x] Implement log rotation policy (optional)
272 + - [x] Implement `util/errors.py` with typed exceptions:
273 + - [x] `ScenarioError`, `GeomrfError`, `LaneError`, `TimebaseError`
274 +
275 + ### 5.3 Artifact store helpers
276 + - [x] Implement `runtime/artifact_store.py`
277 + - [x] `write_text(path, text)`
278 + - [x] `write_json(path, obj)`
279 + - [x] `append_jsonl(path, obj)`
280 + - [x] atomic writes (write temp then rename)
281 + - [x] Implement trace recording option:
282 + - [x] If `record_trace=true`, append received LinkDeltaBatch to JSONL/Parquet later
283 +
284 + ---
285 +
286 + ## 6) Timebase and pacing
287 +
288 + ### 6.1 Time modes
289 + - [x] Implement `timebase/modes.py` enum:
290 + - [x] `OFFLINE` (apply incoming ticks as fast as possible; no sleeping)
291 + - [x] `REALTIME` (apply incoming ticks at wall-clock pace)
292 + - [x] `PARALLEL` (lane selection mode; both lanes consume the same incoming ticks)
293 + - [x] Implement `timebase/clock.py`
294 + - [x] `SimulationTime` type for formatting/validation of incoming stream ticks
295 + - [x] conversions and formatting
296 + - [x] Implement `timebase/scheduler.py`
297 + - [x] implement **pacing**, not tick generation
298 + - [x] for REALTIME: sleep until expected wall-clock for next received tick
299 + - [x] for OFFLINE: apply each received tick immediately
300 + - [x] Add drift handling for REALTIME:
301 + - [x] if late by > 1 tick, either skip ticks or catch up (configurable)
302 + - [x] default: never skip control-plane ticks; warn if drift accumulates
303 +
304 + ---
305 +
306 + ## 7) Internal bus and message contracts
307 +
308 + ### 7.1 Canonical internal messages
309 + - [x] Implement `bus/messages.py` dataclasses:
310 + - [x] `TickUpdate`:
311 + - [x] run_id, scenario_ref
312 + - [x] tick_index, time
313 + - [x] link_updates: list
314 + - [x] link_removals: list
315 + - [x] events: list
316 + - [x] stats: compute timing, counts
317 + - [x] `RunControl` messages:
318 + - [x] start/pause/resume/stop
319 + - [x] `LaneStatus` messages:
320 + - [x] ready/running/error/stopped
321 + - [x] Implement `bus/queues.py`
322 + - [x] bounded asyncio queues
323 + - [x] per-lane queue limits (configurable)
324 + - [x] Implement `bus/fanout.py`
325 + - [x] one producer (Geo/RF stream consumer)
326 + - [x] N consumers (lane adapters + recorder)
327 + - [x] backpressure policy:
328 + - [x] default: block producer when any lane queue is full (strict sync)
329 + - [x] option: drop trace recorder only (never drop lane updates)
330 + - [x] Add message ordering rules:
331 + - [x] tick updates delivered in increasing tick_index
332 + - [x] within a tick: removals applied before updates by consumers (documented)
333 +
334 + ---
335 +
336 + ## 8) Geo/RF engine client integration
337 +
338 + ### 8.1 gRPC client
339 + - [x] Implement `geomrf/client.py`
340 + - [x] gRPC channel creation (`grpc.aio.insecure_channel(target)`)
341 + - [x] stub creation from generated proto
342 + - [x] `get_version()`, `get_capabilities()`
343 + - [x] `create_scenario(scenario_spec) -> scenario_ref`
344 + - [x] `close_scenario(scenario_ref)`
345 + - [x] `stream_link_deltas(request) -> async iterator`
346 + - [x] `stream_events(request) -> async iterator`
347 + - [x] Implement `geomrf/health.py`
348 + - [x] connect + health check on startup
349 + - [x] gate event-consumer features on engine schema/version support
350 + - [x] Implement `geomrf/translate.py`
351 + - [x] translate orchestrator ScenarioConfig to Geo/RF ScenarioSpec (engine-facing)
352 + - [x] enforce deterministic key ordering where needed for reproducible payloads
353 + - [x] validate all required proto fields before RPC call; reject locally on mismatch
354 + - [x] translate Geo/RF `LinkDeltaBatch` into internal `TickUpdate`
355 + - [x] Implement robust error mapping:
356 + - [x] map `NOT_FOUND`, `INVALID_ARGUMENT`, `FAILED_PRECONDITION`, `RESOURCE_EXHAUSTED` to typed `GeomrfError`
357 + - [x] define retry policy for `UNAVAILABLE`/`DEADLINE_EXCEEDED` (bounded retries + backoff)
358 + - [x] include scenario_ref and tick_index in error logs
359 +
360 + ### 8.2 Stream consumption tasks
361 + - [x] Implement `geomrf` stream consumer coroutine:
362 + - [x] starts `StreamLinkDeltas` with `emit_full_snapshot_first=true`
363 + - [x] reads batches and pushes `TickUpdate` to bus producer
364 + - [x] Implement event stream consumer coroutine (optional in v1):
365 + - [x] call `StreamEvents` with same `t_start/t_end/dt/selector` used for deltas
366 + - [x] consume `EngineEvent.tick_index` directly (no nearest-tick heuristics)
367 + - [x] record events to trace/metrics channel for observability
368 + - [x] if connected engine does not support aligned event schema, disable event consumer and warn once
369 + - [x] Merge/control strategy:
370 + - [x] lane control path uses `TickUpdate` from `StreamLinkDeltas` only
371 + - [x] event stream is informational and must not mutate lane state
372 +
373 + ---
374 +
375 + ## 9) Lane adapter architecture
376 +
377 + ### 9.1 Adapter base contract
378 + - [x] Implement `lanes/base.py`:
379 + - [x] `class LaneAdapter(Protocol)` or ABC with:
380 + - [x] `name: str`
381 + - [x] `async prepare(run_context, scenario_config) -> None`
382 + - [x] `async apply_tick(tick: TickUpdate) -> None`
383 + - [x] `async finalize(run_context) -> None`
384 + - [x] `async health() -> dict` (optional)
385 + - [x] Implement `lanes/registry.py`
386 + - [x] register adapters by name
387 + - [x] instantiate chosen adapters based on `lane_mode`
388 + - [x] Implement `tests/test_lane_adapter_contract.py` for interface compliance
389 +
390 + ### 9.2 Mininet lane adapter (detailed tasks)
391 + - [x] Implement `lanes/mininet_lane/adapter.py`
392 + - [x] `prepare()`:
393 + - [x] validate Linux prereqs (`ovs-vsctl`, `tc`, `ip`)
394 + - [x] start controller (ONOS/Ryu) if configured
395 + - [x] create Mininet topology (delegate to `topo.py`)
396 + - [x] start Mininet network
397 + - [x] start PCAP capture if enabled (delegate to `capture.py`)
398 + - [x] `apply_tick()`:
399 + - [x] apply removals (links down) first
400 + - [x] apply updates:
401 + - [x] for each link: set up/down state
402 + - [x] apply delay/loss/rate using shaping module
403 + - [x] `finalize()`:
404 + - [x] stop captures
405 + - [x] stop Mininet
406 + - [x] stop controller if orchestrator started it
407 +
408 + - [x] Implement `lanes/mininet_lane/topo.py`
409 + - [x] create a Mininet graph from ScenarioConfig node roles
410 + - [x] map SatSim node IDs to Mininet host/switch names
411 + - [x] define OVS switches and host attachments
412 + - [x] decide representation:
413 + - [x] v1 recommended: represent satellites as OVS switches; GS/UT as hosts
414 + - [x] allow optional SAT as hosts if needed
415 + - [x] create links but keep them initially “neutral” (shaping applied per tick)
416 +
417 + - [x] Implement `lanes/mininet_lane/shaping.py`
418 + - [x] provide functions:
419 + - [x] `set_link_up(link_id)` / `set_link_down(link_id)`
420 + - [x] `apply_netem(link_id, delay_ms, loss_pct)`
421 + - [x] `apply_rate(link_id, rate_mbps)`
422 + - [x] `clear_shaping(link_id)`
423 + - [x] implement using:
424 + - [x] `tc qdisc replace dev <if> root netem delay ... loss ...`
425 + - [x] `tc qdisc ... tbf/htb` for rate
426 + - [x] ensure idempotency (repeated calls safe)
427 + - [x] log every applied shaping change with tick_index
428 +
429 + - [x] Implement `lanes/mininet_lane/controller.py`
430 + - [x] support controller options:
431 + - [x] external controller address (already running)
432 + - [x] orchestrator-launched controller container/process (optional v1)
433 + - [x] store controller version info in manifest
434 +
435 + - [x] Implement `lanes/mininet_lane/capture.py`
436 + - [x] start tcpdump for relevant interfaces
437 + - [x] rotate PCAP per time or per run (v1: one PCAP per run)
438 + - [x] store PCAP path in manifest
439 +
440 + ### 9.3 OMNeT lane adapter (trace-first v1)
441 + - [x] Implement `lanes/omnet_lane/adapter.py`
442 + - [x] v1 assumption: OMNeT consumes a **trace file** (offline) rather than live streaming
443 + - [x] Implement `lanes/omnet_lane/trace_ingest.py`
444 + - [x] orchestrator writes a LinkState trace file suitable for OMNeT adapter
445 + - [x] define a simple trace format:
446 + - [x] JSONL per tick containing updates/removals
447 + - [x] or CSV-like with (tick, src, dst, up, delay, rate, loss)
448 + - [x] ensure deterministic ordering of entries
449 + - [x] Implement `lanes/omnet_lane/runner.py`
450 + - [x] launch OMNeT simulation via subprocess:
451 + - [x] capture stdout/stderr to run logs
452 + - [x] exit code handling
453 + - [x] place outputs into artifacts directory
454 +
455 + ---
456 +
457 + ## 10) Orchestrator main runtime loop
458 +
459 + ### 10.1 Lifecycle coordination
460 + - [x] Implement `satsim_orch/main.py` orchestration steps:
461 + - [x] Create run context + artifact directories
462 + - [x] Log environment + versions
463 + - [x] Initialize Geo/RF client and fetch version/capabilities
464 + - [x] Create Geo/RF scenario
465 + - [x] Instantiate chosen lane adapters (mininet/omnet/parallel)
466 + - [x] Call `prepare()` for each lane
467 + - [x] Start stream consumer tasks
468 + - [x] Start realtime pacing task only when `time.mode=REALTIME`
469 + - [x] Await completion conditions:
470 + - [x] reached t_end
471 + - [x] user stop signal (CTRL+C)
472 + - [x] error in any task
473 + - [x] Finalize lanes
474 + - [x] Close Geo/RF scenario
475 + - [x] Write final manifest + summary
476 +
477 + ### 10.2 Streaming-driven execution (locked)
478 + - Geo/RF stream is the authoritative tick source.
479 + - Orchestrator does not generate ticks; it consumes them and fans out.
480 +
481 + Tasks:
482 + - [x] In streaming consumer, for each LinkDeltaBatch:
483 + - [x] translate to `TickUpdate`
484 + - [x] push to fanout bus
485 +
486 + ### 10.3 Fanout to lanes
487 + - [x] For each lane, run a consumer task:
488 + - [x] `while True: tick = await queue.get(); await lane.apply_tick(tick)`
489 + - [x] handle cancellation and lane errors
490 + - [x] Implement strict ordering:
491 + - [x] do not allow lane to process tick k+1 before tick k
492 + - [x] Implement shutdown handshake:
493 + - [x] send `RunControl(STOP)` to lanes on exit
494 + - [x] drain queues if configured
495 +
496 + ---
497 +
498 + ## 11) Error handling and shutdown
499 +
500 + ### 11.1 Exception strategy
501 + - [x] Any uncaught exception in:
502 + - [x] Geo/RF stream consumer
503 + - [x] any lane consumer
504 + - [x] any lane adapter method
505 + triggers a coordinated shutdown.
506 +
507 + - [x] Implement `runtime/process.py`:
508 + - [x] subprocess management with kill/terminate escalation
509 + - [x] collect exit codes and stderr tails
510 + - [x] Add SIGINT/SIGTERM handling:
511 + - [x] first CTRL+C: graceful stop
512 + - [x] second CTRL+C: immediate stop
513 +
514 + ### 11.2 Cleanup correctness
515 + - [x] Always attempt:
516 + - [x] `finalize()` lanes
517 + - [x] `close_scenario()` Geo/RF
518 + even when errors occur.
519 + - [x] Write final manifest including failure reason.
520 +
521 + ---
522 +
523 + ## 12) Metrics and run summaries
524 +
525 + ### 12.1 Metric recording
526 + - [x] Implement `metrics/records.py`
527 + - [x] standard metric record format for:
528 + - [x] tick compute time
529 + - [x] links emitted
530 + - [x] lane apply times (optional)
531 + - [x] Implement `metrics/exporters.py`
532 + - [x] JSONL writer to `metrics/`
533 + - [x] optional Prometheus exporter
534 + - [x] Implement per-tick timing:
535 + - [x] time spent translating batches
536 + - [x] time spent applying to each lane
537 +
538 + ### 12.2 Summary report (v1)
539 + - [x] Write a `summary.json` at end of run:
540 + - [x] total ticks, total links emitted, mean compute time, runtime duration
541 + - [x] lane success/failure states
542 + - [x] artifact paths (pcaps, traces, logs)
543 +
544 + ---
545 +
546 + ## 13) Integration tests (practical, not huge)
547 +
548 + ### 13.1 Smoke tests
549 + - [x] `test_geomrf_client_smoke.py`
550 + - [x] connect to Geo/RF engine on localhost
551 + - [x] create a tiny scenario (1 GS + 1 SAT)
552 + - [x] stream first 3 ticks and assert non-empty output
553 + - [x] `test_event_alignment_smoke.py`
554 + - [x] request deltas/events with identical `t_start/t_end/dt/selector`
555 + - [x] assert each event has `tick_index` and maps to existing/expected delta tick
556 +
557 + ### 13.2 Bus correctness
558 + - [x] `test_bus_fanout.py`
559 + - [x] ensure ticks delivered to all lanes in order
560 + - [x] ensure backpressure blocks producer when lane queue is full
561 +
562 + ### 13.3 Run manifest correctness
563 + - [x] `test_run_manifest.py`
564 + - [x] run manager writes expected keys
565 + - [x] manifest includes versions and config snapshot
566 +
567 + ---
568 +
569 + ## 14) Minimum viable orchestrator (v1) — acceptance criteria
570 +
571 + - [x] Can run `satsim run scenario.yaml --mode mininet`
572 + - [x] Geo/RF scenario created
573 + - [x] Link deltas streamed and applied via `tc/netem`
574 + - [x] PCAP recorded (optional)
575 + - [x] run artifacts written (logs, manifest)
576 +
577 + - [x] Can run `satsim run scenario.yaml --mode omnet`
578 + - [x] Geo/RF stream recorded to trace
579 + - [x] OMNeT launched consuming trace (or stubbed with clear TODO if not ready)
580 + - [x] run artifacts written
581 +
582 + - [x] Can run `satsim run scenario.yaml --mode parallel`
583 + - [x] both lane adapters receive identical tick updates
584 + - [x] lane adapters derive control only from link deltas
585 + - [x] optional events are captured in artifacts without driving lane state
586 + - [x] orchestrator shuts down cleanly on completion or CTRL+C
587 +
588 + ---
589 +
590 + ## 15) Optional but valuable v1.1 tasks (safe additions)
591 + - [ ] Orchestrator exposes its own gRPC stream `StreamTickUpdates` so lanes can subscribe remotely
592 + - [ ] Add NATS internal bus option for multi-process fanout
593 + - [ ] Add replay command: `satsim replay <run_id>` (use stored trace)
594 + - [ ] Add sweep runner: parameter grid search with repeated runs and consolidated summary
595 +
596 + ---

TASKS_TESTSUITE_GEOENGINE.md(file created)

@@ -0,0 +1,62 @@
1 + # Geometry/RF Engine Test Suite Plan
2 +
3 + This checklist tracks the work to build and verify a comprehensive RPC-focused test suite for `geomrf-engine`.
4 +
5 + ## 0) Deliverables
6 +
7 + - [x] Add a dedicated gRPC service test module that exercises all six RPCs.
8 + - [x] Validate success + error-path behavior for lifecycle and streaming RPCs.
9 + - [x] Produce an updated coverage report and capture gaps.
10 + - [x] Keep this checklist updated as tasks are completed.
11 +
12 + ## 1) Baseline and scope
13 +
14 + - [x] Confirm current tests/coverage baseline before adding new RPC tests.
15 + - [x] Confirm test scenario strategy (deterministic helper scenario; compatible with 027 overhead-pass style TLE + GS setup).
16 +
17 + ## 2) Test infrastructure
18 +
19 + - [x] Add an in-process gRPC test harness (ephemeral port, async channel/stub, clean teardown).
20 + - [x] Add shared helpers for creating/closing scenarios from tests.
21 +
22 + ## 3) RPC lifecycle tests
23 +
24 + - [x] `GetVersion` returns expected identity/schema metadata.
25 + - [x] `GetCapabilities` returns expected limits and feature flags.
26 + - [x] `CreateScenario` success path returns `scenario_ref`.
27 + - [x] `CreateScenario` invalid spec path returns `INVALID_ARGUMENT`.
28 + - [x] `CloseScenario` success path returns `ok=true`.
29 + - [x] `CloseScenario` unknown scenario path returns `NOT_FOUND`.
30 +
31 + ## 4) Streaming RPC tests
32 +
33 + - [x] `StreamLinkDeltas` success path returns ordered batches with snapshot metadata.
34 + - [x] `StreamLinkDeltas` unknown scenario path returns `NOT_FOUND`.
35 + - [x] `StreamLinkDeltas` closed scenario path returns `FAILED_PRECONDITION`.
36 + - [x] `StreamLinkDeltas` invalid time parameters return `INVALID_ARGUMENT`.
37 + - [x] `StreamEvents` success path returns well-formed events for the test scenario.
38 + - [x] `StreamEvents` filtered path validates event filtering behavior.
39 + - [x] `StreamEvents` unknown scenario path returns `NOT_FOUND`.
40 + - [x] `StreamEvents` closed scenario path returns `FAILED_PRECONDITION`.
41 +
42 + ## 5) Execution and coverage
43 +
44 + - [x] Run full test suite and ensure all tests pass.
45 + - [x] Run coverage scoped to `geomrf_engine`.
46 + - [x] Verify `server.py` and stream/event modules are covered by tests.
47 + - [x] Document final coverage numbers and remaining gaps.
48 +
49 + ## 6) Results summary
50 +
51 + - [x] Test count: `20 passed`.
52 + - [x] Coverage total (`geomrf_engine`): `85%` (`820` statements, `120` missed).
53 + - [x] Core RPC implementation coverage: `server.py` at `79%`, `streaming/events.py` at `96%`, `streaming/backpressure.py` at `80%`, `util/logging.py` at `92%`.
54 + - [x] Remaining notable gaps captured for follow-up: evaluator branch coverage (`56%`) and delta-threshold branch coverage (`71%`).
55 +
56 + ## 7) v1.1 follow-up (event alignment)
57 +
58 + - [ ] Add `StreamEvents` alignment tests for request `dt` semantics (`default_dt` fallback + invalid-range rejection).
59 + - [ ] Add selector-parity tests ensuring event selection mirrors `StreamLinkDeltas` selection.
60 + - [ ] Add assertions that every emitted `EngineEvent` carries `tick_index`.
61 + - [ ] Add cross-stream alignment test: same window/dt/selector for events+deltas yields consistent tick mapping.
62 + - [ ] Extend error-path coverage for new event request fields.
Newer Older