Last active 1 month ago

000 How I AI.md Raw

Smol Example:

ChatGPT Web:

<After workshopping the specs/requirements for a while...>

Write an extremely detailed implementation doc for the streaming version and th apis . We are only looking at geometry rf engine right now . We need to be able to hand the doc you make off to an LLM to code the geometry enginer API up from the existing python code. Make it so . Extremely detailed. Markdown checkboxes for each discrete task - [ ] so the AI can mark them off with an X as they complete

The codex-cli can start working on the output TASKS_GEOENGINE.md.

Implementing component 3 (of 4):

ChatGPT Web:

I pasted in all the TASKS_*.md the AI coder has produced so far, so the web planner has an idea of what was done. (If we didn't have access restrictions, the web version could just access the git repo).

Okay, here are the markdown taskfiles of what we've got done so far. We build geo engine and orchestrator . I need you to write a very detailed TASKS_IMNET.md file that deals with the imnet++ inet part and how it interacts with the other two systems as they exist . Write it as a markdown checklist - [ ] so the ai can cross items off as it writes them . Go .

This produces TASKS_ORCHESTRATOR.md

Codex CLI:

Check over TASKS_ORCHESTRATOR.md It was written without knowledg of the actual current implementation. Let me know if it looks good for ARCHITECTURE.md and if it is compatible with what has been done so far in TASKS_GEO_ENGINE.md We have only tried to make a gEO engine so far. no edits ,just check the TASKS_ORCHESTRATOR.md makes sense.

  Overall verdict: it is mostly aligned with ARCHITECTURE.md and broadly compatible with the Geo engine work, but the 3 medium/high items above should be clarified before implementation so the orchs estrator doesn’t encode wrong assumptions.

This step is super important! I need to make sure the plan we thought of actually makes sense with what is happening on the ground.

Can the misaligned items be easily solved? Is there a moslty obvisous right answer to them ? If so tell me the gravious solutions, or show me my optiosn if there s a fork or hard choice

<More workshopping of differenes between web and implementation. The web has the pdf source documents and better web search, and has unlimited usage, so I did planning there>

Codex CLI:

@AGENTS.md You task is to implement the geometry engine defined in @TASKS_GEO_ENGINE.md . After completing each task, mark it off with an x (- [x]) so its markdown checkbox so there is an external record of what has been done. If plans change, then modify the task list appropriately. The overall high level architecture of the program is in @ARCHITECTURE.md . Go .

This is where the magic happens! We have thought through our API and have developed a test plan, and a development plan. Now the AI can develop the code and test the API via the test suite to ensure its correct.


Overall Advice:

  • Think about your inputs/outputs/dependencies beforehand.

  • When the AI screw ups hallucinates, or does something silly, you can make a note in AGENTS.md to do the right thing instead.

  • Force the AI to use as many deterministic static tools as possible:

    • Strict Type Checking (Use Rust instead of C, Typescript instead of Javascript, Python with Type Annotations instead of without)
    • Linters
    • Code Format Tools
    • Unit/Integration tests
  • Have the AI write as many tests as possible of what you want the program to do

  • If you want the AI to "one-shot" (i.e., autonomously code something complex for a while without supervision and get a good result), then you need to give it as much test input/output behaviour as possible, so it can keep checking against the "proper" results without your guidance.

AGENTS.md Raw

Python Environment And Package Policy

Use uv for all Python workflows in this repository.

Rules

  • Always run Python commands with uv run.
  • Always add dependencies with uv add.
  • Use uv venv for virtual environment setup/management.
  • Do not use pip, pip3, python -m pip, virtualenv, or python -m venv.
  • Do not install dependencies outside uv.

Examples

  • Run app: uv run python main.py
  • Add runtime dependency: uv add requests
  • Add dev dependency: uv add --dev pytest

Test Coverage Policy

Use pytest-cov via uv for coverage checks.

Rules

  • If coverage flags are needed and pytest-cov is missing, install it with uv add --dev pytest-cov.
  • For the workspace package, install coverage tooling with uv add --package geomrf-engine --dev pytest-cov.
  • Run coverage with uv run pytest ... --cov=....
  • Because both root and geomrf-engine use a tests package name, run coverage in two passes and combine reports instead of a single mixed pytest invocation.

Coverage command pattern

  • Orchestrator pass: COVERAGE_FILE=.coverage.orch uv run pytest tests --cov=satsim_orch --cov-report=
  • Geo engine pass: COVERAGE_FILE=.coverage.geomrf uv run --package geomrf-engine pytest geomrf-engine/tests --cov=geomrf_engine --cov-report=
  • Combine/report: uv run coverage combine .coverage.orch .coverage.geomrf && uv run coverage report -m

OMNeT++/INET Environment Policy

Use opp_env for OMNeT++/INET install and environment management in this repository.

Rules

  • Use opp_env to install/manage OMNeT++ and INET versions.
  • Do not rely on ad-hoc/manual OMNeT++ or INET installs for project workflows.
  • Keep OMNeT++/INET version selection pinned and reproducible across dev and CI.

Scope boundary

  • Python dependency and execution workflows remain uv-managed.
  • OMNeT++/INET toolchain workflows are opp_env-managed.

Legacy Code Reuse Policy

If functionality from old_code/ is needed:

  • Do not import or execute from old_code/ directly.
  • Copy the required snippet(s) into a new file under the active codebase.
  • Adapt and maintain the copied code in the active module only.

Sandbox / Network Restriction Policy

If a task is blocked by sandbox or network restrictions:

  • Stop immediately and do not spend tokens repeatedly trying to bypass restrictions.
  • Clearly tell the user that we are in a sandbox-restricted environment.
  • Ask the user for permission before attempting any sandbox breakout or elevated access.
ARCHITECTURE.md Raw
# ARCHITECTURE.md — SatSim System Overview

This document gives the **system-level architecture** for SatSim. It is intended to provide a complete “sight picture” for anyone implementing a subproject (e.g., the Geometry/RF Engine) so they understand how their component fits into the larger simulator.

---

## 1) Purpose and guiding idea

SatSim is a **hybrid satellite networking simulator** that combines:

1) A swappable **Geometry/RF/Link-Budget Engine** (physics + propagation + link feasibility)
2) A **packet-level discrete-event simulation lane 'OMNeT++/INET'** (scale + protocol behavior)
3) A **real SDN emulation lane 'Mininet/OVS'** (controller-in-the-loop + real Linux networking)
4) An **Orchestrator** that provides a single scenario/timebase and keeps all parts consistent

The fundamental design choice is that SatSim is **layered and composable**: we reuse mature simulators/emulators and treat satellite physics as an external service with a stable interface.

---

## 2) Key design decisions (why this looks the way it does)

### 2.1 Why two lanes (simulation vs emulation)
We intentionally run two different lanes because they answer different questions:

- **OMNeT++/INET lane (Discrete-Event Simulation)**
  - Best for: scaling up to many nodes, protocol studies, routing and congestion behavior, reproducibility.
  - Not best for: running real SDN controllers and real Linux TCP stacks.

- **Mininet/OVS lane (Network Emulation)**
  - Best for: real SDN controllers (ONOS/Ryu), real forwarding behavior (OpenFlow/OVS), real apps/traffic tools.
  - Not best for: scaling to thousands of nodes with full protocol stacks.

Trying to “pipe packets” between them is possible but usually not worth it early, because it introduces hard time synchronization problems (DES time vs wall-clock time) and packet bridging complexity. Instead we connect both lanes to the same **state oracle** (the Geo/RF engine) through the Orchestrator.

### 2.2 Where the lanes *do* meet today
They meet at:
- **Scenario definition** (same nodes, same constraints, same time window)
- **LinkState/Event timeline** (same “truth” about which links exist and their properties)
- **Metrics and artifacts** (comparable outputs; shared logging/PCAP strategy)

Optionally, they also meet via:
- **Shared SDN decision logic** (same ONOS/Ryu app used to compute routes, then applied in both lanes through adapters)

### 2.3 Future “stacking” (OMNeT++ feeding Mininet)
In the future, OMNeT++ may “feed” Mininet in two practical ways:

1) **Trace-driven replay (recommended future path)**
   - OMNeT++ generates a curated set of traces (topology/failure schedules, traffic demands, baseline routing decisions).
   - Mininet replays those traces in real-time to validate controller behavior under identical conditions.

2) **Hard co-simulation / packet bridging (advanced, optional)**
   - Some nodes simulated in OMNeT++, others emulated in Mininet at the same time.
   - Requires strict time coupling and a gateway that transforms/timeshifts packets.
   - Not a v1 target.

### 2.4 Locked decisions (2026-02-18)
- **Tick authority:** `StreamLinkDeltas` is the control-plane source of truth for lane updates.
- **Events contract:** event streaming is retained, but aligned to the same requested `dt` and `selector`, and each event carries `tick_index`.
- **Orchestrator behavior:** streaming-driven execution is canonical; any scheduler is pacing-only.
- **Scenario translation:** orchestrator must fail-fast when it cannot produce a valid Geo/RF `ScenarioSpec`.
- **Python tooling:** Python workflows use `uv`; OMNeT++/INET workflows use `opp_env`.

---

## 3) Top-level components

### 3.1 Geometry/RF/Link-Budget Engine (black box, replaceable)
**Role:** The authoritative “physics layer” that translates orbital/propagation reality into network-usable link state.

**Key properties**
- Replaceable implementation (Skyfield + ITU-R today, could be STK import or other later)
- Stable interface (the rest of SatSim depends only on its API)
- Produces time-indexed:
  - Link feasibility (up/down)
  - Link properties (delay/capacity/loss proxies)
  - Discrete events (link up/down, handover, failures if modeled)

**Location:** `geomrf-engine/`

---

### 3.2 Orchestrator (system conductor)
**Role:** Owns the simulation lifecycle and timebase. It is the “brain” that coordinates all lanes.

**Responsibilities**
- Load scenario config → create/initialize the Geo/RF engine scenario
- Choose execution mode:
  - OMNeT-only, Mininet-only, or both in parallel
- Drive execution pacing:
  - offline apply-fast or real-time apply-paced, while consuming authoritative engine stream ticks
- Consume Geo/RF LinkState stream and distribute it to:
  - OMNeT adapter
  - Mininet adapter
  - logging/metrics
- Collect artifacts (PCAPs, timeseries metrics, configs, run manifests)
- Provide reproducible run IDs and version stamping

**Location:** `orchestrator/`

---

### 3.3 OMNeT++/INET Lane (packet-level discrete-event)
**Role:** Packet-level simulation of protocols, queuing, routing, traffic at scale.

**Responsibilities**
- Build the network node models (routers, hosts, queues) using INET components
- Apply dynamic link updates (delay/capacity/loss/up-down) based on Geo/RF output
- Run deterministic experiments rapidly (sweeps)
- Export artifacts:
  - logs + metrics
  - optional PCAP outputs (where supported)

**Custom SatSim additions**
- A lightweight **LinkState Adapter Module** that subscribes to orchestrator/Geo output
- A mechanism to apply link changes at simulation timestamps

**Environment and install management**
- Use `opp_env` as the standard way to install/manage OMNeT++ and INET.
- Avoid ad-hoc/manual OMNeT++/INET installs in project workflows.

**Location:** `lanes/omnet/`

---

### 3.4 Mininet/OVS Lane (SDN emulation)
**Role:** Real SDN controller + real forwarding plane under dynamic link conditions.

**Responsibilities**
- Build an emulated topology with Mininet (or Containernet)
- Use OVS as the dataplane switch/router substrate
- Run a real SDN controller (ONOS or Ryu)
- Apply dynamic link shaping based on Geo/RF output:
  - `tc/netem` for delay/loss/jitter
  - `tbf/htb` for rate control
  - interface up/down to emulate link drops
- Generate traffic using real tools:
  - iperf3, D-ITG, SIPp, tcpreplay, custom apps

**Location:** `lanes/mininet/`

---

### 3.5 Observability, artifacts, and visualization
**Role:** Make runs inspectable, comparable, and reproducible.

**Artifacts**
- Scenario config snapshot + run manifest (versions, seeds, git SHAs)
- LinkState/Event traces (optional export)
- Metrics time-series (throughput/delay/loss/path changes)
- PCAP captures (Mininet tcpdump; OMNeT if enabled)

**Tools**
- Prometheus + Grafana for dashboards
- Wireshark for PCAP analysis

**Location:** `observability/` and `artifacts/`

---

## 4) System boundaries and data ownership

### 4.1 The Geo/RF engine owns *physics truth*
- It is the source of truth for which links can exist and their physical/network properties.
- Other components must not invent geometry/rf; they only consume the engine’s output.

### 4.2 The Orchestrator owns *time and execution*
- It defines run window requests, pacing mode, and synchronization rules.
- For v1/v1.1, tick production comes from Geo/RF stream output rather than orchestrator-generated ticks.
- It routes updates to the lanes and standardizes artifacts.

### 4.3 Each lane owns *packet/control behavior*
- OMNeT owns packet-level behavior inside DES.
- Mininet owns real SDN and Linux networking behavior.

---

## 5) Core data flows (end-to-end)

### 5.1 Initialization flow
1. User provides `ScenarioConfig` (YAML/JSON).
2. Orchestrator validates config and creates a new run ID.
3. Orchestrator calls Geo/RF engine:
   - `CreateScenario` (returns scenario ref)
4. Orchestrator initializes selected lane(s):
   - OMNeT: compile/load model, start run
   - Mininet: build topology, start controller
5. Orchestrator subscribes to Geo/RF streaming output for LinkState and Events.

### 5.2 Runtime (parallel lane mode)
At each time tick:
1. Geo/RF produces `LinkDeltaBatch` + optional events.
2. Orchestrator receives it and distributes:
   - OMNeT adapter: update channel/link state in simulator time
   - Mininet adapter: apply tc/netem shaping and link toggles
   - (optional) Event recorder: store aligned `EngineEvent` stream for analysis/observability
   - Observability: record metrics and store link traces
3. Lanes generate traffic and produce metrics/PCAPs.

### 5.3 Completion flow
1. Orchestrator stops lane processes.
2. Orchestrator closes the Geo/RF scenario.
3. All artifacts are written under the run ID.

---

## 6) Timebase and execution modes

SatSim supports multiple execution modes, controlled by the Orchestrator:

### Mode A — OMNeT-only (offline DES)
- Orchestrator consumes Geo/RF ticks and applies them to OMNeT without wall-clock pacing.
- Highest scalability and repeatability.

### Mode B — Mininet-only (real-time emulation)
- Orchestrator consumes Geo/RF ticks and applies wall-clock pacing while updating Mininet shaping.
- Best for SDN/controller realism and app-level testing.

### Mode C — Parallel (OMNeT + Mininet simultaneously)
- Both lanes consume the same LinkState stream.
- Used to compare “simulated protocol outcomes” vs “real controller outcomes” under the same link dynamics.

### Mode D — Trace-driven replay (future/optional)
- Geo/RF and/or OMNeT exports a trace.
- Mininet replays trace deterministically.

---

## 7) Interfaces between components (high-level)

### 7.1 Geo/RF Engine interface (v1)
- gRPC service, Protobuf messages
- Scenario lifecycle + streaming link deltas/events
- Output is **NetworkView** link properties (up/down, delay, capacity, loss proxy)
- Optional debug scalars for validation (SNR margin, elevation, range)
- Event-stream alignment target: events use same requested window/selector/dt semantics as deltas and expose `tick_index`.

### 7.2 Orchestrator ↔ OMNeT interface
- OMNeT subscribes to orchestrator updates via:
  - gRPC client inside a C++ adapter module, OR
  - file/trace ingestion for offline runs
- Applies updates to INET channel/link parameters and toggles connectivity

### 7.3 Orchestrator ↔ Mininet interface
- Orchestrator controls Mininet via:
  - Python Mininet API calls
  - Linux `tc` and interface management commands
- SDN controller is external (ONOS/Ryu), connected in the standard Mininet way

---

## 8) Reproducibility rules

Each run must record:
- ScenarioConfig snapshot
- Seeds
- Engine versions:
  - Geo/RF engine version + schema version
  - orchestrator version
  - OMNeT/INET versions and model git SHA
  - `opp_env` environment definition/metadata used for OMNeT/INET
  - controller version and app git SHA
- LinkState trace hash (if stored)
- Toolchain/container image tags (if containerized)

---

## 9) Suggested monorepo layout

satsim/ ARCHITECTURE.md

orchestrator/ ...

subprojects/ geomrf-engine/ ARCHITECTURE.md # subproject-specific (the streaming API spec lives here) proto/ src/ tests/

lanes/ omnet/ models/ adapter/ scripts/

mininet/
  topo/
  driver/
  controllers/
  scripts/

observability/ grafana/ prometheus/ dashboards/

artifacts/ runs/ <run_id>/ scenario.yaml manifest.json linkstate.parquet (optional) metrics/ pcaps/ logs/


---

## 10) What an implementer of the Geo/RF engine must know

- The Geo/RF engine must be treated as **the physics oracle**.
- Its output must be:
  - time-indexed
  - sparse (selector-driven)
  - stable and deterministic
  - expressed in consistent units
- The Orchestrator will use it in both:
  - offline sampling (for OMNeT)
  - real-time streaming (for Mininet)
- The lanes do not need to know how link budgets are computed—only how to consume the streaming LinkDelta/Event outputs.

---

## 11) Roadmap hooks (explicit future extensions)

- Add a richer PHY view (optional fields) without breaking NetworkView consumers.
- Add trace import/replay for deterministic mininet runs.
- Add “shared SDN decision interface” so ONOS/Ryu path computation can be applied inside OMNeT.
- Add advanced co-simulation only if required (packet bridging).

---
README.md Raw

SatSim docs index

Primary design and task documents:

  • ARCHITECTURE.md
  • TASKS_GEO_ENGINE.md
  • TASKS_IMNET.md
  • TASKS_ORCHESTRATOR.md
  • TASKS_TESTSUITE_GEOENGINE.md

Locked design decisions (2026-02-18):

  • Control-plane tick authority is StreamLinkDeltas (streaming-driven orchestration).
  • Event stream alignment is being standardized with request dt/selector and event tick_index.
  • Orchestrator error handling includes NOT_FOUND, INVALID_ARGUMENT, FAILED_PRECONDITION, and RESOURCE_EXHAUSTED.
  • Scenario translation to Geo/RF ScenarioSpec is fail-fast.
  • Python workflows are uv-managed; OMNeT++/INET workflows are opp_env-managed.
TASKS_GEO_ENGINE.md Raw

Geometry/RF Engine v1/v1.1 Streaming API — Implementation Specification

This document specifies a complete, implementable Geometry/RF Engine API and server for streaming link-state deltas + events. It is written so another LLM can generate the code from existing Python geometry/RF/link-budget code (Skyfield + ITU-R + your models) with minimal guesswork.

Status note:

  • v1 baseline is implemented.
  • v1.1 alignment updates (for orchestrator compatibility) are now specified below, especially for StreamEvents tick alignment.

0) Deliverables

What must exist at the end

  • A runnable Python gRPC server that implements:

    • Scenario creation/closure
    • Capabilities/version endpoints
    • Streaming link deltas
    • Streaming events
  • A Protobuf schema with:

    • Stable IDs, time semantics, units
    • Selector logic (which links/nodes to compute)
    • Delta semantics (what counts as a “change”)
  • A reference Python client demonstrating:

    • Create scenario → stream deltas/events → close scenario

1) Core design constraints

1.1 Contract invariants

  • Scenario-scoped: All computation happens inside a ScenarioRef.
  • Time-indexed: All output is keyed by a timestamp and tick index.
  • Selector-driven: Never compute “all links” unless explicitly requested.
  • Streaming-first: Primary runtime interface is server→client stream.
  • Deterministic: Given identical inputs (scenario + seed + engine version), output is replayable.

1.2 What the engine outputs (NetworkView)

For each directed link (src → dst) at each tick, the engine provides:

  • up (boolean)
  • one_way_delay_s (float; seconds)
  • capacity_bps (float; bits per second)
  • loss_rate (float; [0,1] packet loss proxy OR PER proxy)
  • optional debug scalar(s): snr_margin_db, elevation_deg, range_m

Everything else can be exposed later via an optional “debug view”; v1 focuses on network-usable state.


2) Repository layout (recommended)

geomrf-engine/
  proto/
    geomrf/v1/geomrf.proto
  src/geomrf_engine/
    __init__.py
    server.py
    config_schema.py
    scenario_store.py
    timebase.py
    selectors.py
    compute/
      __init__.py
      ephemeris.py
      geometry.py
      rf_models.py
      link_budget.py
      adaptation.py
    streaming/
      __init__.py
      delta.py
      events.py
      backpressure.py
    util/
      ids.py
      units.py
      logging.py
      metrics.py
  examples/
    client_stream.py
  tests/
    test_proto_roundtrip.py
    test_delta_thresholds.py
    test_selectors.py
    test_determinism.py

3) Implementation tasks checklist

3.1 Project & build system

  • Create repo structure as above

  • Add pyproject.toml with dependencies:

    • grpcio, grpcio-tools, protobuf
    • pydantic (scenario validation)
    • pyyaml (YAML scenario input)
    • numpy, scipy (if used)
    • skyfield, sgp4
    • your ITU-R package(s)
    • prometheus-client (optional but recommended)
  • Add a Makefile or task runner:

    • uv run python -m grpc_tools.protoc ... compiles .proto to Python
    • uv run python -m geomrf_engine.server ... starts server
    • uv run pytest runs tests

3.2 Protobuf + gRPC schema

  • Write proto/geomrf/v1/geomrf.proto (spec below)
  • Generate Python stubs
  • Add schema version constants and embed in responses

3.3 Server skeleton

  • Implement async gRPC server (grpc.aio)

  • Wire servicer methods:

    • GetVersion
    • GetCapabilities
    • CreateScenario
    • CloseScenario
    • StreamLinkDeltas
    • StreamEvents
  • Add structured logging and request correlation IDs

3.4 Scenario lifecycle

  • Implement scenario validation (Pydantic)
  • Implement scenario store (in-memory for v1)
  • Implement scenario ID generation (UUIDv4)
  • Snapshot ScenarioSpec + resolved assets into a ScenarioRuntime

3.5 Compute pipeline

  • Implement ephemeris loader (TLE list initially)
  • Implement geometry evaluation (positions + visibility + elevation + range)
  • Implement RF/link budget mapping to NetworkLinkState
  • Implement adaptation mapping (SNR → capacity/loss) with a default policy
  • Implement per-tick evaluation returning sparse link set

3.6 Streaming + deltas/events

  • Implement tick loop (timebase)
  • Implement delta computation with thresholds
  • Implement event emission (link up/down, handover optional)
  • Implement backpressure-safe streaming
  • Add stream cancellation handling and cleanup

3.7 Tests + examples

  • Determinism test (same scenario+seed → identical deltas)
  • Selector test (only requested links computed)
  • Threshold test (small changes suppressed)
  • Example client script (prints updates, counts links)

4) gRPC/Protobuf specification (v1)

4.1 .proto (authoritative spec)

Create proto/geomrf/v1/geomrf.proto:

syntax = "proto3";

package geomrf.v1;

import "google/protobuf/timestamp.proto";
import "google/protobuf/duration.proto";

option go_package = "geomrf/v1;geomrfv1"; // harmless for other langs

// ---------------------------
// Service
// ---------------------------
service GeometryRfEngine {
  rpc GetVersion(GetVersionRequest) returns (GetVersionResponse);
  rpc GetCapabilities(GetCapabilitiesRequest) returns (GetCapabilitiesResponse);

  rpc CreateScenario(CreateScenarioRequest) returns (CreateScenarioResponse);
  rpc CloseScenario(CloseScenarioRequest) returns (CloseScenarioResponse);

  // Primary: stream sparse deltas per tick.
  rpc StreamLinkDeltas(StreamLinkDeltasRequest) returns (stream LinkDeltaBatch);

  // Primary: stream discrete events (optional separate channel for clean consumers).
  rpc StreamEvents(StreamEventsRequest) returns (stream EngineEvent);
}

// ---------------------------
// Version / capabilities
// ---------------------------
message GetVersionRequest {}

message GetVersionResponse {
  string engine_name = 1;            // e.g., "geomrf-engine"
  string engine_version = 2;         // semver, e.g., "1.0.0"
  string schema_version = 3;         // e.g., "geomrf.v1"
  string build_git_sha = 4;          // optional
}

message GetCapabilitiesRequest {}

message GetCapabilitiesResponse {
  string schema_version = 1;

  // Limits
  uint32 max_links_per_tick = 2;
  uint32 max_nodes = 3;
  uint32 max_streams_per_scenario = 4;
  google.protobuf.duration min_dt = 5;
  google.protobuf.duration max_dt = 6;

  // Supported outputs
  bool supports_loss_rate = 10;
  bool supports_capacity_bps = 11;
  bool supports_delay_s = 12;
  bool supports_snr_margin_db = 13;

  // Supported selectors/features (advertise so clients can adapt)
  bool supports_only_visible = 20;
  bool supports_min_elevation_deg = 21;
  bool supports_max_degree = 22;
  bool supports_link_types = 23; // GS-SAT, SAT-SAT, etc.
}

// ---------------------------
// Scenario lifecycle
// ---------------------------
message CreateScenarioRequest {
  ScenarioSpec spec = 1;
}

message CreateScenarioResponse {
  string scenario_ref = 1; // UUID string
  string schema_version = 2;
}

message CloseScenarioRequest {
  string scenario_ref = 1;
}

message CloseScenarioResponse {
  bool ok = 1;
}

// ---------------------------
// Scenario specification (v1)
// ---------------------------

message ScenarioSpec {
  // Reproducibility
  uint64 seed = 1;

  // Time model
  google.protobuf.timestamp t0 = 2;      // UTC
  google.protobuf.timestamp t1 = 3;      // UTC
  google.protobuf.duration default_dt = 4;

  // Nodes
  repeated NodeSpec nodes = 10;

  // Eligibility rules (which links can exist)
  LinkPolicy link_policy = 20;

  // Mapping PHY -> network outputs (can be simplistic in v1)
  AdaptationPolicy adaptation = 30;

  // Optional: engine-side caching hints
  CacheHints cache_hints = 40;
}

enum NodeRole {
  NODE_ROLE_UNSPECIFIED = 0;
  SATELLITE = 1;
  GROUND_STATION = 2;
  USER_TERMINAL = 3;
}

message NodeSpec {
  string node_id = 1;     // stable ID used everywhere
  NodeRole role = 2;

  // One of the following depending on role
  SatelliteOrbit orbit = 10;
  GroundFixedSite fixed_site = 11;

  // Radio/terminal model parameters (minimal v1)
  TerminalModel terminal = 20;

  // Arbitrary tags for selectors/grouping
  map<string,string> tags = 30;
}

message SatelliteOrbit {
  // v1: only TLE supported. Later: OEM/SP3/etc.
  string tle_line1 = 1;
  string tle_line2 = 2;
}

message GroundFixedSite {
  double lat_deg = 1;
  double lon_deg = 2;
  double alt_m = 3;
}

message TerminalModel {
  // Minimal knobs to compute link budgets consistently.
  // Units: dBW, dBi, Hz, K, etc.
  double tx_power_dbw = 1;
  double tx_gain_dbi = 2;          // can be treated as peak gain in v1
  double rx_gain_dbi = 3;          // can be treated as peak gain in v1
  double rx_noise_temp_k = 4;
  double bandwidth_hz = 5;
  double frequency_hz = 6;

  // Optional: simple pointing/antenna pattern loss approximation
  double pointing_loss_db = 10;    // default constant loss if you don’t model patterns yet
}

enum LinkType {
  LINK_TYPE_UNSPECIFIED = 0;
  GS_TO_SAT = 1;
  SAT_TO_GS = 2;
  SAT_TO_SAT = 3;
  UT_TO_SAT = 4;
  SAT_TO_UT = 5;
}

message LinkPolicy {
  // Which link types are allowed at all
  repeated LinkType allowed_types = 1;

  // Dynamic feasibility thresholds
  double min_elevation_deg = 2;     // default 0 if unused
  bool only_visible = 3;            // if true, return only visible/feasible links

  // Degree constraints (optional)
  uint32 max_out_degree = 10;       // 0 means unlimited
  uint32 max_in_degree = 11;        // 0 means unlimited

  // Optional: limit candidates by distance for scalability
  double max_range_m = 20;          // 0 means unlimited
}

message AdaptationPolicy {
  // v1: a simple mapping mode.
  // Future: full MCS tables, ACM, coding gains, etc.
  enum Mode {
    MODE_UNSPECIFIED = 0;
    FIXED_RATE = 1;        // constant capacity if link is up, else 0
    SNR_TO_RATE = 2;       // rate from snr_margin (simple piecewise)
    SNR_TO_LOSS = 3;       // loss from snr_margin (simple logistic)
    SNR_TO_BOTH = 4;
  }
  Mode mode = 1;

  // v1 defaults
  double fixed_capacity_bps = 2;
  double fixed_loss_rate = 3;

  // Parameters for simple SNR->rate/loss mappings (implementation defined but deterministic)
  double snr_margin_min_db = 10;
  double snr_margin_max_db = 11;
}

message CacheHints {
  bool precompute_positions = 1;
  bool precompute_visibility = 2;
  uint32 max_cache_ticks = 3; // 0 = engine default
}

// ---------------------------
// Streaming requests
// ---------------------------

message StreamLinkDeltasRequest {
  string scenario_ref = 1;

  // Time range for this stream. If empty, use scenario t0..t1.
  google.protobuf.timestamp t_start = 2;
  google.protobuf.timestamp t_end = 3;

  // If unset, use scenario default_dt.
  google.protobuf.duration dt = 4;

  // Which links to consider/return.
  LinkSelector selector = 10;

  // Delta emission thresholds
  DeltaThresholds thresholds = 20;

  // Behavior knobs
  bool emit_full_snapshot_first = 30;  // recommended true for simpler clients
  bool include_debug_fields = 31;      // if true, fill debug fields in updates
}

message StreamEventsRequest {
  string scenario_ref = 1;
  google.protobuf.timestamp t_start = 2;
  google.protobuf.timestamp t_end = 3;
  // If unset, use scenario default_dt. Must satisfy capabilities bounds.
  google.protobuf.duration dt = 4;

  EventFilter filter = 10;
  // Apply the same selection surface as StreamLinkDeltas for deterministic alignment.
  LinkSelector selector = 11;
}

message LinkSelector {
  // v1 supports:
  // - explicit pairs
  // - by link type
  // - by node role sets
  repeated LinkPair explicit_pairs = 1;
  repeated LinkType link_types = 2;

  // If non-empty, only consider links where src in set AND dst in set
  repeated string src_node_ids = 10;
  repeated string dst_node_ids = 11;

  // Optional tag filters (exact match)
  map<string,string> src_tags = 12;
  map<string,string> dst_tags = 13;

  // If true, apply scenario LinkPolicy.only_visible behavior
  bool only_visible = 20;

  // Optional override thresholds (0 uses scenario policy)
  double min_elevation_deg = 21;
  double max_range_m = 22;
}

message DeltaThresholds {
  // Only emit update if absolute change exceeds threshold.
  // 0 means "emit on any change" for that field.
  double delay_s = 1;
  double capacity_bps = 2;
  double loss_rate = 3;
  double snr_margin_db = 4;

  // Emit if link up/down changes always (implicit).
}

// ---------------------------
// Streaming output
// ---------------------------

message LinkDeltaBatch {
  string scenario_ref = 1;
  string schema_version = 2;

  google.protobuf.timestamp time = 3; // tick time
  uint64 tick_index = 4;

  // If emit_full_snapshot_first=true, first batch may be a full snapshot.
  bool is_full_snapshot = 5;

  // Sparse updates (add/update)
  repeated LinkUpdate updates = 10;

  // Links to remove from active set (no longer selected/visible/allowed)
  repeated LinkKey removals = 11;

  // Optional: server stats
  TickStats stats = 20;
}

message LinkUpdate {
  LinkKey key = 1;

  // Core NetworkView outputs
  bool up = 2;
  double one_way_delay_s = 3;
  double capacity_bps = 4;
  double loss_rate = 5;

  // Optional debug fields (filled if include_debug_fields=true)
  double snr_margin_db = 10;
  double elevation_deg = 11;
  double range_m = 12;

  // Extension space for later (avoid breaking schema)
  map<string,string> extra = 30;
}

message LinkKey {
  string src = 1;
  string dst = 2;
  LinkType type = 3;
}

message LinkPair {
  string src = 1;
  string dst = 2;
  LinkType type = 3;
}

message TickStats {
  uint32 links_computed = 1;
  uint32 links_emitted = 2;
  double compute_ms = 3;
}

// ---------------------------
// Events
// ---------------------------

enum EventType {
  EVENT_TYPE_UNSPECIFIED = 0;
  LINK_UP = 1;
  LINK_DOWN = 2;
  HANDOVER_START = 3;
  HANDOVER_COMPLETE = 4;
  NODE_FAILURE = 5;
  NODE_RECOVERY = 6;
}

message EngineEvent {
  string scenario_ref = 1;
  string schema_version = 2;

  EventType type = 3;
  google.protobuf.timestamp time = 4;
  uint64 tick_index = 5;

  // Which entities are involved (optional depending on event)
  string node_id = 10;
  LinkKey link = 11;

  map<string,string> meta = 20;
}

message EventFilter {
  repeated EventType types = 1;
  repeated string node_ids = 2;
}

5) Server behavior specification (streaming semantics)

5.1 Timebase rules

  • ScenarioSpec.t0/t1 define the canonical simulation window.

  • Stream requests may override with t_start/t_end:

    • If unset → default to scenario window.
    • Engine must clamp requests to [t0, t1] unless explicitly configured otherwise.
  • dt:

    • If unset → use ScenarioSpec.default_dt.
    • Must be within [Capabilities.min_dt, Capabilities.max_dt]; otherwise return INVALID_ARGUMENT.

5.2 Tick indexing

  • Tick 0 corresponds to t_start.
  • Tick k corresponds to t_start + k*dt.
  • Engine must emit tick_index and time on every batch.

5.3 Active link set and removals

The stream maintains a client-side “active link table”.

  • updates[] means: create or replace link entry keyed by (src,dst,type).
  • removals[] means: delete that link entry (no longer in selection or no longer feasible under policy).

This is required for sparse streams when visibility causes links to appear/disappear.

5.4 First message behavior

If emit_full_snapshot_first=true:

  • The first emitted LinkDeltaBatch at tick 0 must have:

    • is_full_snapshot=true
    • updates[] containing all currently selected/feasible links
    • removals[] empty

This drastically simplifies consumers (no special “initialization” logic).

5.5 Delta emission thresholds

For ticks after the initial snapshot:

  • A link is emitted in updates[] if:

    • it is newly added, OR
    • its up changed, OR
    • abs(new.delay - old.delay) > thresholds.delay_s (if threshold > 0), OR
    • abs(new.capacity - old.capacity) > thresholds.capacity_bps (if threshold > 0), OR
    • abs(new.loss - old.loss) > thresholds.loss_rate (if threshold > 0), OR
    • (optional debug) changes exceed debug thresholds if included.

If a threshold is 0, treat it as “emit on any change”.

5.6 Event stream behavior (v1.1 alignment)

StreamEvents emits events in chronological order within [t_start, t_end]:

  • Minimum set in v1:

    • LINK_UP, LINK_DOWN
  • Optional:

    • HANDOVER_START, HANDOVER_COMPLETE if you can detect “best-sat changed” for a GS/UT.
  • Alignment requirements:

    • StreamEventsRequest.dt must use the same semantics/rules as delta streams (default_dt when unset, cap-validated).
    • StreamEventsRequest.selector must use the same link candidate filtering semantics as delta streams.
    • Every emitted event includes tick_index, where tick k = t_start + k*dt.
  • Events should be consistent with LinkDelta stream:

    • If link transitions from up=false to up=true at tick k, emit a LINK_UP event at that tick’s time.

5.7 Backpressure and cancellation

  • Use grpc.aio streaming and yield messages.

  • If the client is slow, await on send; do not build unbounded queues.

  • On cancellation (context.cancelled()):

    • stop computation promptly
    • release scenario references held by the stream
    • record a log entry with reason

6) Internal engine architecture (recommended)

6.1 Modules and responsibilities

scenario_store.py

  • Holds ScenarioRuntime objects keyed by scenario_ref

  • Contains:

    • validated ScenarioSpec
    • pre-parsed skyfield satellite objects
    • node dictionaries and role sets
    • cached computed data (positions/visibility per tick if enabled)
    • RNG seeded from ScenarioSpec.seed

timebase.py

  • Converts timestamps to ticks and vice versa
  • Handles rounding rules (recommend: tick times exactly t_start + k*dt)

selectors.py

  • Applies LinkSelector + LinkPolicy to yield candidate link pairs

  • Must support:

    • explicit pairs (exact)
    • link types
    • src/dst id filters
    • tag filters
    • only_visible/min_elevation/max_range constraints

compute/ephemeris.py

  • Builds skyfield EarthSatellite objects from TLE
  • Provides get_sat_ecef(t) or get_sat_eci(t) depending on your implementation

compute/geometry.py

  • Computes:

    • range (m)
    • elevation (deg) from ground site to satellite (and vice versa if needed)
    • visibility boolean: elevation >= min_elev, range <= max_range

compute/link_budget.py

  • Computes:

    • FSPL from range + frequency
    • atmospheric attenuation (via ITU-R), optional
    • noise power from bandwidth + noise temp
    • received power, C/N0, SNR margin, etc.
  • Returns a PhySummary (internal dataclass)

compute/adaptation.py

  • Maps PhySummaryNetworkLinkState:

    • capacity_bps and/or loss_rate

    • If v1, implement a deterministic piecewise mapping:

      • clamp snr_margin_db into [min,max]
      • map linearly to capacity between [0, terminal.bandwidth * eff_max] (or use fixed)
      • map snr_margin_db to loss via logistic or fixed thresholds

streaming/delta.py

  • Maintains per-stream “previous link table”
  • Computes updates and removals each tick

streaming/events.py

  • Detects link up/down transitions and yields EngineEvent

7) Scenario validation rules (must be enforced)

  • t0 < t1

  • default_dt > 0

  • Node IDs unique

  • Satellites must include valid TLE lines

  • Fixed sites must have valid lat/lon ranges

  • Terminal model must include:

    • frequency_hz > 0
    • bandwidth_hz > 0
    • rx_noise_temp_k > 0
  • LinkPolicy.allowed_types must be non-empty OR default to all valid types for provided roles

Return gRPC status INVALID_ARGUMENT with a descriptive error message if validation fails.


8) Performance requirements (practical targets)

These are engineering targets; adjust later.

  • Tick compute should scale with number of candidate links, not N² nodes.

  • Implement at least one of:

    • pre-filter by link type and role sets
    • max_range cutoff
    • max_degree pruning (keep best K neighbors by range or SNR)

Recommended optimizations (v1)

  • Cache satellite positions per tick if precompute_positions=true.
  • Cache ground station ECEF once.
  • Vectorize range computations where possible (NumPy arrays).

9) Determinism requirements

Determinism must include:

  • Ordering: Always sort link keys before emitting for stable output

    • sort by (src, dst, type)
  • RNG: Use numpy.random.Generator(PCG64(seed)) attached to scenario

  • Floating rounding: Do not over-round; but be consistent in computations (same order of ops)

Test: Run the same stream twice and ensure byte-equivalent serialized output (or field-wise equal within tolerance where appropriate).


10) Error handling (gRPC status codes)

Implement these consistent statuses:

  • NOT_FOUND: unknown scenario_ref
  • INVALID_ARGUMENT: bad time range, dt, selector, scenario validation failure
  • RESOURCE_EXHAUSTED: too many active streams for a scenario; or too many links per tick requested
  • FAILED_PRECONDITION: scenario closed
  • INTERNAL: unexpected exceptions (log stack trace server-side)

Add a stable error message prefix, e.g. GEOMRF_ERR:<CODE>:<details> for easier parsing.


11) Reference streaming algorithm (server-side)

Pseudocode for StreamLinkDeltas

  1. Resolve scenario and compute effective t_start/t_end/dt.

  2. Build selector state (resolved node sets, tag filters, link types).

  3. Initialize:

    • prev_links = {} (LinkKey → LinkUpdate-like internal struct)
    • active_keys = set()
  4. For tick k from 0..:

    • Compute t = t_start + k*dt; stop when t > t_end.

    • Determine candidate link pairs from selector+policy.

    • For each candidate link:

      • compute geometry (range/elev/visibility)
      • if not feasible and only_visible: skip (will cause removal if previously active)
      • compute PHY summary
      • compute NetworkLinkState (up/delay/capacity/loss)
      • assemble internal current map curr_links[key] = state
    • Compute removals = keys in prev_links but not in curr_links

    • Compute updates:

      • if first tick and emit_full_snapshot_first: all curr_links become updates
      • else: apply delta thresholds comparing curr vs prev
    • Emit LinkDeltaBatch (even if empty updates/removals? optional; recommended emit every tick for simplicity)

    • Update prev_links = curr_links

Pseudocode for StreamEvents

  • Either:

    • derive from StreamLinkDeltas logic (shared per-stream evaluator), OR
    • implement as separate evaluation loop that only checks transitions
  • Resolve and validate t_start/t_end/dt exactly as in delta streams.

  • Build selector from request and apply identical candidate filtering.

  • Emit event when (prev.up != curr.up) for any link in the selected set.

  • Populate both time and tick_index.


12) Client expectations (contract for consumers)

A correct consumer must:

  • Start with the first LinkDeltaBatch (full snapshot)

  • Maintain active_table[LinkKey] = LinkUpdate

  • Apply each tick:

    • delete removals
    • upsert updates
  • Use time and tick_index as authoritative time

  • Optionally also subscribe to events; events are primarily observability data, not control-plane truth


13) Example client (must be included)

Create examples/client_stream.py:

  • Connect to server

  • CreateScenario from an inline scenario object (or YAML file)

  • Start StreamLinkDeltas and print:

    • tick index, number of updates/removals, sample link
  • Optionally start StreamEvents concurrently

  • Close scenario at end

Checklist:

  • Implement examples/client_stream.py

  • Add README usage snippet:

    • start server
    • run client
    • expected output format

14) Minimal “default” adaptation mapping (v1, deterministic)

If your existing code already outputs a usable throughput and PER proxy, use it. If not, implement a deterministic fallback:

v1 fallback policy

  • up = visibility && snr_margin_db > 0 (or >= threshold)

  • delay = range_m / c (c = 299792458 m/s)

  • capacity_bps:

    • FIXED_RATE: fixed_capacity_bps when up else 0

    • SNR_TO_RATE:

      • normalize x = clamp((snr_margin_db - min)/(max-min), 0..1)
      • capacity = x * capacity_max, where capacity_max = bandwidth_hz * eff_max
      • choose eff_max constant (e.g., 4 bits/s/Hz) in v1; document it
  • loss_rate:

    • FIXED: fixed_loss_rate when up else 1

    • SNR_TO_LOSS:

      • logistic: loss = 1 / (1 + exp(a*(snr_margin_db - b))) with fixed a,b
      • clamp to [0,1]

Checklist:

  • Decide v1 constants (eff_max, logistic params, up-threshold)
  • Put them in adaptation.py and record them in logs/version

15) Observability (recommended even in v1)

  • gRPC access logs including:

    • scenario_ref, stream type, time range, dt, selector summary
  • Prometheus counters (optional but easy):

    • streams active, ticks computed, links computed, mean compute time
  • TickStats in stream payload (already specified)

Checklist:

  • Add TickStats computation
  • Add server-side metrics (optional)
  • Add structured logging with correlation IDs

16) Security and robustness (v1 minimum)

  • Bind address configurable (0.0.0.0:50051 default)

  • Optional TLS later; v1 can be plaintext for local lab use

  • Enforce limits:

    • max nodes
    • max links per tick
    • max active streams per scenario

Checklist:

  • Enforce link/node limits with RESOURCE_EXHAUSTED
  • Enforce max concurrent streams per scenario

17) Acceptance criteria (definition of done)

Functional

  • Server starts and responds to GetVersion and GetCapabilities

  • CreateScenario returns a scenario_ref and validates inputs

  • StreamLinkDeltas emits:

    • a full snapshot first (when enabled)
    • then sparse deltas/removals per tick
  • StreamEvents emits link up/down events consistent with deltas

  • CloseScenario frees scenario resources and blocks further streams

Correctness

  • Determinism test passes (same inputs → same outputs)
  • Selector tests pass (only requested links emitted)
  • Delta threshold tests pass (small changes suppressed)

Usability

  • Example client runs end-to-end against server and prints reasonable output
  • README explains how to run locally and how to pass a scenario YAML

18) Optional but high-value extension hooks (safe to leave stubbed)

These can exist as placeholders in code (no API changes needed later):

  • “Debug fields” population (include_debug_fields=true)
  • Additional events (handover start/complete)
  • More orbit formats (OEM/SP3) behind SatelliteOrbit oneof later
  • Better antenna pattern modeling behind TerminalModel

19) v1.1 orchestrator-alignment tasks (new)

  • Bump schema to geomrf.v1.1 (or equivalent versioning plan) for event-alignment fields.
  • Update StreamEventsRequest implementation to honor request dt and selector.
  • Populate EngineEvent.tick_index from the same tick loop semantics as deltas.
  • Add tests:
    • events and deltas requested with same window/selectors produce aligned tick grids
    • invalid event dt returns INVALID_ARGUMENT
    • event selector filtering mirrors delta selector behavior
  • Keep backward compatibility plan explicit (version gate or dual-field behavior) for existing v1 clients.
TASKS_IMNET.md Raw

TASKS_IMNET.md — SatSim IMNET Lane (OMNeT++/INET) Implementation Plan

This is a task-driven implementation plan for the IMNET lane (OMNeT++/INET). It covers:

  • Building and running an OMNeT++/INET simulation project under opp_env
  • Integrating IMNET into the existing uv-managed SatSim workspace (without mixing concerns)
  • Consuming orchestrator-produced LinkState traces (v1: trace-first) to apply dynamic link changes
  • Producing artifacts compatible with SatSim run directories / manifests

Policy reminder (already in repo): Python workflows are uv-managed; OMNeT++/INET toolchain is opp_env-managed.


-1) Locked v1 decisions for this plan update

  • Execution model is post-stream replay:
    • Orchestrator records the OMNeT trace during streaming, then runs OMNeT after stream completion.
  • Topology strategy is Strategy 3:
    • stable NED template + orchestrator-generated node_map.json and link_map.json.
  • Orchestrator ↔ OMNeT runtime interface is typed and orchestrator-owned:
    • no implicit "just pass arbitrary run_args" contract for required parameters.
  • Canonical trace key fields use src, dst, link_type (not mixed type naming).
  • Repo split is intentional:
    • OMNeT assets under lanes/omnet/, Python lane adapter/runner under satsim_orch/lanes/omnet_lane/.
  • opp_env default path uses Nix, but --nixless-workspace is a supported fallback mode.

0) Deliverables (what “done” means)

  • lanes/omnet/ exists and contains:
    • a reproducible opp_env workspace definition (pinned OMNeT++ + INET versions)
    • an OMNeT++ project (“satsim-imnet”) that compiles and runs headless
    • a LinkState trace ingestion + applier module that updates delay/rate/(optional loss) per tick
    • a minimal demo scenario (2–10 nodes) that:
      • runs via orchestrator in --mode omnet
      • reads the trace written by orchestrator
      • produces artifacts under the run directory (logs + .vec/.sca; optional pcap)
  • One-command dev flow:
    • uv run satsim run <scenario.yaml> --mode omnet ... executes IMNET via opp_env and stores artifacts in the standard run folder.
  • Existing OMNeT lane blockers in current orchestrator code are closed before IMNET C++ work:
    • trace writer removal serialization does not use __dict__ on slots dataclasses
    • trace line includes is_full_snapshot
    • OMNeT launch is not silently skipped when --mode omnet is selected

1) Repo layout for IMNET lane

  • Create directory structure:
    • lanes/omnet/
      • lanes/omnet/WORKSPACE.md (how to install + run with opp_env; pinned versions)
      • lanes/omnet/opp_env/ (workspace init and pinned selection)
      • lanes/omnet/satsim-imnet/ (the OMNeT++ project)
        • src/ (C++ modules)
        • ned/ (NED definitions)
        • omnetpp.ini (baseline config; orchestrator may override with -f/-c)
        • Makefile (generated by opp_makemake; committed only if desired, otherwise generated)
        • README.md (how to build/run inside opp_env)
      • lanes/omnet/scripts/ (helper wrappers used by orchestrator)
        • install.py (optional convenience; calls opp_env install ...)
        • build.py (build IMNET project inside opp_env)
        • run.py (run IMNET headless inside opp_env, accepts args from orchestrator)
  • Keep orchestrator Python integration in package code:
    • satsim_orch/lanes/omnet_lane/adapter.py remains the lane entrypoint
    • satsim_orch/lanes/omnet_lane/runner.py owns command construction and process launch
    • lanes/omnet/WORKSPACE.md documents how these package paths map to lanes/omnet/ assets

2) Version pinning and opp_env workspace (reproducible toolchain)

2.1 Pick and pin OMNeT++ + INET versions

  • Pin versions (v1 recommendation; can be adjusted later):
    • OMNeT++: omnetpp-6.3.0
    • INET: inet-4.5.4
  • Document the pin in:
    • lanes/omnet/WORKSPACE.md
    • orchestrator run manifest fields (already exists; ensure it records these exact strings)

2.2 Initialize an opp_env workspace outside git working trees

Goal: keep installs reproducible while avoiding committing huge toolchains.

  • Decide where the opp_env workspace lives:
    • default: ~/.cache/satsim/opp_env/workspace (outside git tree)
    • store only small config/metadata in git, not the compiled artifacts
  • Add .gitignore entries:
    • ignore optional repo-local workspace path lanes/omnet/opp_env/workspace/ if used for local experimentation
    • ignore lanes/omnet/**/out/ (OMNeT outputs)
    • ignore lanes/omnet/**/results/ (if used)

2.3 Provide canonical commands (must work from uv venv)

  • Ensure opp_env is invoked via uv:
    • uv run opp_env --version
    • uv run opp_env list
  • Implement lanes/omnet/scripts/install.py that performs:
    • workspace init (idempotent)
    • install pinned INET (which pulls matching OMNeT++)
    • verify the installed packages exist
  • Add workspace mode guidance:
    • default mode: Nix-backed opp_env workspace
    • fallback mode: opp_env --nixless-workspace (document prerequisites and reproducibility caveats)
    • orchestrator preflight must emit a clear error only when the selected workspace mode requirements are unmet

3) uv ↔ opp_env integration (clean boundary)

3.1 Keep responsibility boundaries strict

  • Confirm and document:
    • uv manages Python deps + orchestrator execution
    • opp_env manages OMNeT++/INET toolchain and the shell/run environment
    • Orchestrator calls opp_env run ... (or opp_env shell -c ...) rather than assuming OMNeT binaries are on PATH

3.2 Add orchestrator-side “IMNET preflight”

(Only if not already present; keep it minimal.)

  • In orchestrator’s OMNeT lane runner:
    • verify opp_env is available (uv run opp_env --version)
    • verify required runtime mode dependencies:
      • Nix-backed mode: verify nix is available
      • nixless mode: verify required toolchain binaries are present
    • verify the IMNET scripts exist (install/build/run)
    • if missing dependencies:
      • fail with a single actionable message (no partial runs)
  • Close current OMNeT-lane correctness blockers first:
    • fix trace writer removal serialization for LinkKey (slots dataclass)
    • include is_full_snapshot in the OMNeT trace JSONL tick payload
    • remove ini_path-gated silent skip; in omnet mode runner must launch or raise an explicit error

3.3 Standardize how orchestrator launches IMNET

  • Replace ad-hoc argument passing with a typed runner contract (orchestrator-owned):
    • required fields:
      • workspace_path
      • inet_version (pinned string)
      • project_path
      • ini_path
      • trace_path
      • dt_seconds
      • outdir
      • seed
    • optional fields:
      • config_name
      • sim_time_limit_s
      • extra_args (non-critical escape hatch only)
  • Define one canonical opp_env invocation generated by runner code:
    • opp_env run inet-<PINNED> --init -w <WORKSPACE> --chdir -c "<COMMAND>"
  • runner.py converts typed fields into OMNeT CLI args; callers do not handcraft command strings

4) IMNET contract with the other SatSim systems (as they exist)

4.1 Relationship to Geo/RF engine

  • IMNET does not talk to Geo/RF directly in v1
  • IMNET consumes orchestrator-produced trace derived from:
    • Geo/RF StreamLinkDeltas batches (tick_index + time + updates/removals)

4.2 Relationship to Orchestrator (v1 trace-first)

  • Orchestrator responsibilities (assumed existing):
    • create scenario in Geo/RF engine
    • consume LinkDeltaBatch stream
    • write a deterministic LinkState trace file for OMNeT lane
    • launch OMNeT lane runner with correct typed parameters after stream completion (post-stream replay)
  • IMNET responsibilities:
    • parse the trace deterministically
    • schedule link updates in simulation time aligned to tick_index
    • apply updates to the simulated links (delay/rate/(optional loss), and “up/down” semantics)

4.3 Trace file format (IMNET must support this)

Pick one and lock it. If orchestrator already emits a format, IMNET must match it.

  • Decide and document the v1 trace format (JSONL, locked)
    • One JSON object per tick line
    • Fields required:
      • tick_index (uint64)
      • time (ISO8601 or unix seconds; used for logging only; simtime comes from tick_index * dt)
      • is_full_snapshot (bool)
      • updates (list)
      • removals (list)
    • Each update contains:
      • src (string)
      • dst (string)
      • link_type (string enum name)
      • up (bool)
      • one_way_delay_s (float)
      • capacity_bps (float)
      • loss_rate (float; optional if IMNET v1 ignores loss)
    • Each removal contains:
      • src, dst, link_type (same key fields)
    • Determinism requirements:
      • stable ordering of updates and removals within each tick
      • no mixed key names (type vs link_type) in v1 output
  • Define dt semantics for IMNET:
    • Lock v1 approach: IMNET gets dt via typed runner argument (dt_seconds)
    • trace time is informational/logging only; tick scheduling uses tick_index * dt_seconds

5) OMNeT++/INET project scaffolding (“satsim-imnet”)

5.1 Create a minimal INET-based network model

Goal: minimal, not fancy, but real packets flow and link parameters can change.

  • Create lanes/omnet/satsim-imnet/ned/SatSimNetwork.ned
    • Use INET StandardHost (or Router) modules for nodes
    • Include:
      • one traffic source app and one sink app (UDP is fine for v1)
      • optional intermediate router (for a 2-hop demo)
    • Strategy 3 mapping is required in v1:
      • orchestrator generates node_map.json (node_id ↔ module path)
      • orchestrator generates link_map.json (LinkKey ↔ mutable channel/shim path)
      • LinkStateApplier consumes map files in strict mode and fails on unknown keys
  • Create lanes/omnet/satsim-imnet/omnetpp.ini
    • include INET paths and defaults
    • configure:
      • IP address assignment (INET configurator)
      • app endpoints and start times
      • disable GUI by default (Cmdenv) for orchestrator runs

5.2 Add a LinkState ingestion + application component

  • Add src/LinkTraceReader.{h,cc}
    • reads the trace file
    • validates ordering (tick_index monotonic)
    • provides an in-memory list of per-tick updates (or streaming reader)
  • Add src/LinkStateApplier.{h,cc} as a cSimpleModule
    • parameters:
      • string tracePath
      • double dtSeconds
      • bool strict (fail-fast on unknown link keys)
      • bool applyLoss (optional)
    • behavior:
      • on init:
        • load/validate trace header/dt
        • build a mapping from LinkKey → simulation link object(s)
        • schedule first self-message at tick 0
      • on each tick:
        • apply updates/removals for that tick
        • schedule next tick if present
      • on finish:
        • write summary scalars (num updates applied, unknown keys, etc.)

6) How IMNET represents links (v1 minimal approach)

You need a concrete, implementable mapping from LinkKey → something mutable in OMNeT/INET.

6.1 Choose v1 link representation

Pick one approach and implement it end-to-end:

Option A (preferred v1): mutate OMNeT channels

  • Use a channel type that supports:
    • delay updates (delay)
    • datarate updates (datarate)
    • “up/down” via disabling or forcing drop
  • Build a stable topology at startup containing all candidate links
  • Map each LinkKey to a channel pointer
  • On update:
    • set channel delay
    • set channel datarate
    • if up=false or “removal”: disable channel / set drop mode

Option B: insert a small “LinkShim” module per edge

  • Create a LinkShim cSimpleModule that:
    • applies propagation delay (schedule) (not selected for v1 Option A path)
    • applies serialization delay from capacity (packet_bits / capacity_bps) (not selected for v1 Option A path)
    • drops packets by loss_rate (not selected for v1 Option A path)
    • drops everything when up=false (not selected for v1 Option A path)
  • Connect hosts via LinkShim modules instead of relying on channel mutability (not selected for v1 Option A path)

v1 recommendation: start with Option A if channel mutability is adequate; fall back to Option B if not.

6.2 Define “up/down/removal” semantics for v1

  • Lock v1 semantics (must match orchestrator trace writer):
    • up=false means the link exists but currently unavailable (drop all / disable)
    • removal means the link is not in the current active set
      • v1 handling recommendation: treat removal as up=false (do not delete topology)
      • allow a later update to re-enable it

6.3 Unit conversion rules (lock them)

  • one_way_delay_s → OMNeT simtime_t delay
  • capacity_bps → channel datarate (bps)
  • loss_rate:
    • if supported natively by chosen link representation, apply directly
    • otherwise implement drop probability in LinkShim or a per-interface dropper

7) Topology generation strategy (v1: small but consistent)

7.1 Keep topology simple in v1

  • v1 target: 2–10 nodes, with:
    • at least one satellite-like router node
    • at least one ground station-like host node
    • at least one dynamic link changing delay/rate/up/down over time

7.2 How the topology is created

Use Strategy 3 for v1:

  • Commit a stable NED topology template under lanes/omnet/satsim-imnet/ned/
  • Orchestrator writes node_map.json and link_map.json per run into the run artifacts
  • IMNET loads maps at startup and applies all updates by LinkKey lookup through the map
  • In strict mode, any unmapped LinkKey is a hard failure
  • Generated NED per run (Strategy 2) is explicitly deferred beyond v1

8) Orchestrator ↔ IMNET runtime interface (exact parameters)

8.1 Standardize the runner arguments

  • Define a single typed runner entrypoint (called by adapter/runner code) that accepts:
    • --trace <path>
    • --dt <seconds>
    • --outdir <path>
    • --seed <int>
    • --tend <seconds> (optional; can be inferred from last tick)
    • --config <omnet_config_name> (optional)
  • Runner converts these to OMNeT args:
    • -u Cmdenv
    • -n <NEDPATHS including INET and satsim-imnet/ned>
    • -l (load required libraries if needed)
    • --output-dir=<outdir>
    • --seed-set=<seed> (or equivalent OMNeT seed setting)
    • --sim-time-limit=<tend>s (if used)
    • pass tracePath and dtSeconds as module parameters
  • Runner invocation semantics:
    • if lane mode is omnet or parallel, omnet runner invocation is mandatory
    • missing required runner inputs is a hard error, never a silent no-op

8.2 Ensure artifacts land in the SatSim run directory

  • Orchestrator passes outdir = artifacts/<run_id>/omnet/
  • IMNET writes:
    • OMNeT results (.vec/.sca) to outdir/results/ (or directly under outdir)
    • stdout/stderr to outdir/logs/omnet.log (or orchestrator captures it)
    • a small imnet_runinfo.json containing:
      • versions (inet, omnetpp)
      • trace hash
      • config name
      • seed
      • start/end ticks processed

9) Observability for IMNET (minimal but useful)

  • Configure OMNeT to export:
    • scalar summary stats (packets sent/received, drops)
    • vector time series for throughput/delay (where feasible)
  • Add LinkStateApplier scalars:
    • ticksProcessed
    • updatesApplied
    • removalsApplied
    • unknownLinkKeys (must be 0 in strict mode)
  • Optional (v1.1): PCAP output
    • If using INET features that can emit pcap:
      • enable and write into outdir/pcap/ (deferred; not enabled in v1)
    • Otherwise: skip; rely on Mininet lane for PCAPs

10) Testing and validation (must exist for v1)

10.1 Offline smoke test (no orchestrator)

  • Add lanes/omnet/satsim-imnet/tests/ (or scripts) that:
    • runs a short simulation with a tiny trace
    • verifies results files created
    • verifies LinkStateApplier processed N ticks
  • Provide lanes/omnet/scripts/smoke.py:
    • installs toolchain (if needed)
    • builds project
    • runs headless sim for ~5–20 seconds simtime

10.2 End-to-end smoke test (with orchestrator)

  • Add a minimal scenarios/omnet_smoke.yaml:
    • small node set
    • dt ~ 1s
    • short window (e.g., 60s)
    • explicit link selector to keep link key set stable
  • Add a single command documented in root README or WORKSPACE.md:
    • uv run satsim run scenarios/omnet_smoke.yaml --mode omnet
  • Verify artifacts:
    • orchestrator run folder exists
    • omnet subfolder contains .vec/.sca
    • logs show LinkStateApplier applying ticks in order

11) Known v1 constraints (explicitly accepted)

  • v1 uses trace-first ingestion (no live gRPC inside OMNeT)
  • v1 may ignore directional asymmetry:
    • if LinkKey is directional but the chosen link model is bidirectional, document how direction is collapsed (or restrict scenarios accordingly)
  • v1 focuses on:
    • correctness of tick alignment
    • correctness of delay/rate updates
    • reproducible, scripted build/run under opp_env

12) v1.1+ hooks (implemented as opt-in features)

  • Live streaming adapter hook inside OMNeT++ (ingest_mode=live_stream, runtime trace refresh + grpcTarget hook parameter)
  • Dynamic topology construction from ScenarioSpec maps (generated NED at run-time in run.py)
  • Proper per-direction link modeling hook (directional_links=true creates separate directional channels)
  • Integration hook for SDN decision traces (sdn_trace_path + per-tick channel enable/disable decisions in LinkStateApplier)
TASKS_ORCHESTRATOR.md Raw

ORCHESTRATOR_IMPLEMENTATION.md — SatSim Orchestrator (Python) Detailed Plan

This document is a task-driven implementation plan for the SatSim Orchestrator. It is intended to be handed to an LLM to implement the orchestrator in Python. It focuses on architecture, module boundaries, data flow, and concrete tasks (not deep RPC wire details).

The orchestrator is responsible for:

  • Loading a scenario
  • Starting and coordinating subcomponents (Geo/RF engine, OMNeT++ lane, Mininet lane)
  • Driving a unified timebase
  • Fanning out LinkState/Event updates to lane adapters
  • Recording reproducible artifacts (manifest, logs, optional traces, metrics)

Decision lock (2026-02-18)

These ambiguities are now resolved and should be treated as fixed v1 design:

  • Authoritative tick source: StreamLinkDeltas is the only control-plane tick source. Lanes apply link state from deltas, not from events.
  • Event stream alignment (Option B accepted): evolve Geo/RF event API so StreamEventsRequest carries dt + selector, and EngineEvent carries tick_index, aligned to the same tick grid as deltas.
  • Error contract handling: orchestrator must handle NOT_FOUND, INVALID_ARGUMENT, FAILED_PRECONDITION, and RESOURCE_EXHAUSTED as first-class engine responses.
  • Scenario translation strictness: translation to Geo/RF ScenarioSpec is fail-fast and must satisfy all required engine fields/constraints.
  • Tooling policy: Python workflows use uv (uv run, uv add); do not rely on pip.

1) Repository layout (recommended)


satsim/
orchestrator/
pyproject.toml
README.md
ORCHESTRATOR_IMPLEMENTATION.md

satsim_orch/ init.py cli.py main.py

config/ init.py schema.py loader.py defaults.py normalize.py

runtime/ init.py run_manager.py manifest.py artifact_store.py logging.py versioning.py process.py

timebase/ init.py clock.py modes.py scheduler.py

bus/ init.py messages.py queues.py fanout.py

geomrf/ init.py client.py translate.py health.py

lanes/ init.py base.py registry.py

mininet_lane/
  __init__.py
  adapter.py
  topo.py
  shaping.py
  controller.py
  capture.py

omnet_lane/
  __init__.py
  adapter.py
  trace_ingest.py
  runner.py

metrics/ init.py prom.py records.py exporters.py

util/ init.py ids.py units.py asyncx.py errors.py

tests/ test_config_validation.py test_timebase_scheduler.py test_bus_fanout.py test_run_manifest.py test_lane_adapter_contract.py test_geomrf_client_smoke.py


subprojects/
geomrf-engine/   # separate project; orchestrator consumes it via gRPC
lanes/
omnet/
mininet/
observability/
artifacts/


2) Orchestrator design summary (targets)

2.1 Orchestrator responsibilities

  • Scenario loading & validation
  • Run directory + manifest creation
  • Geo/RF engine lifecycle (create scenario; start streams; close)
  • Lane lifecycle:
    • prepare() (build topology / start processes)
    • apply_tick() (apply link deltas / events)
    • finalize() (stop processes, collect outputs)
  • Unified runtime pacing:
    • offline apply-fast
    • real-time apply-paced (wall-clock aligned)
    • parallel lane fanout (same incoming ticks feed multiple lanes)
  • Artifact collection:
    • config snapshot, manifest
    • optional LinkState trace logging
    • metrics export
    • PCAP capture (Mininet lane)

2.2 Key architectural choices

  • Python 3.11+ with asyncio
  • gRPC async client (grpc.aio) for Geo/RF streaming
  • In-process async fanout bus using bounded queues (v1)
  • Pluggable lane adapters via a registry
  • Everything stamped with versions/seeds for reproducibility

3) Implementation checklist (extremely detailed)

3.1 Project bootstrap and build

  • Create orchestrator/pyproject.toml
    • Define package name (e.g., satsim-orchestrator)
    • Set Python version (>=3.11)
    • Add dependencies:
      • pydantic
      • pyyaml
      • grpcio, grpcio-tools, protobuf
      • rich (optional, for CLI UX)
      • prometheus-client (optional)
      • aiofiles (optional, async file writes)
  • Add dev dependencies:
    • pytest, pytest-asyncio
    • ruff / black
    • mypy (optional)
  • Add task runner and uv commands:
    • uv run pytest
    • uv run ruff check .
    • uv run python -m satsim_orch.cli run <scenario.yaml> ...

3.2 CLI and entrypoints

  • Implement satsim_orch/cli.py with commands:
    • run <scenario.yaml> --mode {omnet|mininet|parallel} --dt 1s --t0 ... --t1 ...
    • validate <scenario.yaml>
    • list-runs
    • show-run <run_id>
  • Implement satsim_orch/main.py
    • Parse CLI args
    • Load scenario
    • Create RunContext
    • Run orchestrator loop
  • Define exit codes and error messages:
    • Invalid config → exit 2
    • Missing dependency/lane binary → exit 3
    • Runtime error → exit 1

4) Configuration system

4.1 Scenario schema (Pydantic)

  • Implement config/schema.py with a canonical ScenarioConfig
    • Global:
      • name
      • seed
      • time: {t0, t1, dt, mode}
      • execution: {lane_mode, strict_reproducible, record_trace, record_pcap}
      • paths: {artifacts_root}
    • Geo/RF engine connection:
      • geomrf: {grpc_target, request_dt, selector_defaults, thresholds_defaults}
    • Geo/RF scenario payload:
      • geomrf.scenario_spec maps 1:1 to Geo/RF ScenarioSpec required fields (nodes, terminal, orbit/site, link/adaptation policy)
      • optional high-level shorthand may exist, but must compile deterministically to valid ScenarioSpec
    • Lane configs:
      • mininet: {controller: {type, addr}, topo: {...}, shaping: {...}}
      • omnet: {project_path, ini_path, run_args, trace_mode}
  • Add validators:
    • t0 < t1
    • dt > 0
    • seed >= 0
    • lane configs exist for chosen mode
    • fail-fast if engine-required scenario fields are missing/invalid
    • fail-fast if request_dt is outside engine capabilities (min_dt, max_dt)
    • if mode=mininet require Linux + OVS checks (soft validate with warnings)
  • Add defaulting rules in config/defaults.py
    • dt default (e.g., 1s)
    • thresholds default (delay/capacity/loss)
    • artifacts root default ./artifacts/runs

4.2 Loader and normalization

  • Implement config/loader.py
    • load YAML/JSON
    • environment variable expansion (optional)
    • include/merge support (optional)
  • Implement config/normalize.py
    • produce a normalized config (canonical types, timezone normalization)
    • compute derived fields (run duration, tick count)
  • Implement config/normalize.py to build:
    • GeomrfScenarioSpec (engine-facing) from ScenarioConfig
    • LaneScenarioSpec (lane-facing) from ScenarioConfig

5) Run manager and artifacts

5.1 Run context and directory structure

  • Implement runtime/run_manager.py
    • Generate run_id (timestamp + short random, or UUID)
    • Create run directory:
      • artifacts/runs/<run_id>/
      • logs/, metrics/, pcaps/, traces/, manifests/
    • Save copies of:
      • raw scenario file
      • normalized scenario JSON
  • Implement runtime/manifest.py
    • manifest fields:
      • run_id, scenario name, timestamps
      • seeds
      • component versions (orchestrator, geomrf engine, lanes)
      • execution mode, dt, tick count
      • git SHAs if available
      • host info (OS, python version) (optional)
  • Implement runtime/versioning.py
    • orchestrator version string
    • best-effort git SHA discovery

5.2 Logging

  • Implement runtime/logging.py
    • structured JSON logs to file
    • human-readable console logs
    • include run_id and correlation IDs
  • Implement log rotation policy (optional)
  • Implement util/errors.py with typed exceptions:
    • ScenarioError, GeomrfError, LaneError, TimebaseError

5.3 Artifact store helpers

  • Implement runtime/artifact_store.py
    • write_text(path, text)
    • write_json(path, obj)
    • append_jsonl(path, obj)
    • atomic writes (write temp then rename)
  • Implement trace recording option:
    • If record_trace=true, append received LinkDeltaBatch to JSONL/Parquet later

6) Timebase and pacing

6.1 Time modes

  • Implement timebase/modes.py enum:
    • OFFLINE (apply incoming ticks as fast as possible; no sleeping)
    • REALTIME (apply incoming ticks at wall-clock pace)
    • PARALLEL (lane selection mode; both lanes consume the same incoming ticks)
  • Implement timebase/clock.py
    • SimulationTime type for formatting/validation of incoming stream ticks
    • conversions and formatting
  • Implement timebase/scheduler.py
    • implement pacing, not tick generation
    • for REALTIME: sleep until expected wall-clock for next received tick
    • for OFFLINE: apply each received tick immediately
  • Add drift handling for REALTIME:
    • if late by > 1 tick, either skip ticks or catch up (configurable)
    • default: never skip control-plane ticks; warn if drift accumulates

7) Internal bus and message contracts

7.1 Canonical internal messages

  • Implement bus/messages.py dataclasses:
    • TickUpdate:
      • run_id, scenario_ref
      • tick_index, time
      • link_updates: list
      • link_removals: list
      • events: list
      • stats: compute timing, counts
    • RunControl messages:
      • start/pause/resume/stop
    • LaneStatus messages:
      • ready/running/error/stopped
  • Implement bus/queues.py
    • bounded asyncio queues
    • per-lane queue limits (configurable)
  • Implement bus/fanout.py
    • one producer (Geo/RF stream consumer)
    • N consumers (lane adapters + recorder)
    • backpressure policy:
      • default: block producer when any lane queue is full (strict sync)
      • option: drop trace recorder only (never drop lane updates)
  • Add message ordering rules:
    • tick updates delivered in increasing tick_index
    • within a tick: removals applied before updates by consumers (documented)

8) Geo/RF engine client integration

8.1 gRPC client

  • Implement geomrf/client.py
    • gRPC channel creation (grpc.aio.insecure_channel(target))
    • stub creation from generated proto
    • get_version(), get_capabilities()
    • create_scenario(scenario_spec) -> scenario_ref
    • close_scenario(scenario_ref)
    • stream_link_deltas(request) -> async iterator
    • stream_events(request) -> async iterator
  • Implement geomrf/health.py
    • connect + health check on startup
    • gate event-consumer features on engine schema/version support
  • Implement geomrf/translate.py
    • translate orchestrator ScenarioConfig to Geo/RF ScenarioSpec (engine-facing)
    • enforce deterministic key ordering where needed for reproducible payloads
    • validate all required proto fields before RPC call; reject locally on mismatch
    • translate Geo/RF LinkDeltaBatch into internal TickUpdate
  • Implement robust error mapping:
    • map NOT_FOUND, INVALID_ARGUMENT, FAILED_PRECONDITION, RESOURCE_EXHAUSTED to typed GeomrfError
    • define retry policy for UNAVAILABLE/DEADLINE_EXCEEDED (bounded retries + backoff)
    • include scenario_ref and tick_index in error logs

8.2 Stream consumption tasks

  • Implement geomrf stream consumer coroutine:
    • starts StreamLinkDeltas with emit_full_snapshot_first=true
    • reads batches and pushes TickUpdate to bus producer
  • Implement event stream consumer coroutine (optional in v1):
    • call StreamEvents with same t_start/t_end/dt/selector used for deltas
    • consume EngineEvent.tick_index directly (no nearest-tick heuristics)
    • record events to trace/metrics channel for observability
    • if connected engine does not support aligned event schema, disable event consumer and warn once
  • Merge/control strategy:
    • lane control path uses TickUpdate from StreamLinkDeltas only
    • event stream is informational and must not mutate lane state

9) Lane adapter architecture

9.1 Adapter base contract

  • Implement lanes/base.py:
    • class LaneAdapter(Protocol) or ABC with:
      • name: str
      • async prepare(run_context, scenario_config) -> None
      • async apply_tick(tick: TickUpdate) -> None
      • async finalize(run_context) -> None
      • async health() -> dict (optional)
  • Implement lanes/registry.py
    • register adapters by name
    • instantiate chosen adapters based on lane_mode
  • Implement tests/test_lane_adapter_contract.py for interface compliance

9.2 Mininet lane adapter (detailed tasks)

  • Implement lanes/mininet_lane/adapter.py

    • prepare():
      • validate Linux prereqs (ovs-vsctl, tc, ip)
      • start controller (ONOS/Ryu) if configured
      • create Mininet topology (delegate to topo.py)
      • start Mininet network
      • start PCAP capture if enabled (delegate to capture.py)
    • apply_tick():
      • apply removals (links down) first
      • apply updates:
        • for each link: set up/down state
        • apply delay/loss/rate using shaping module
    • finalize():
      • stop captures
      • stop Mininet
      • stop controller if orchestrator started it
  • Implement lanes/mininet_lane/topo.py

    • create a Mininet graph from ScenarioConfig node roles
    • map SatSim node IDs to Mininet host/switch names
    • define OVS switches and host attachments
    • decide representation:
      • v1 recommended: represent satellites as OVS switches; GS/UT as hosts
      • allow optional SAT as hosts if needed
    • create links but keep them initially “neutral” (shaping applied per tick)
  • Implement lanes/mininet_lane/shaping.py

    • provide functions:
      • set_link_up(link_id) / set_link_down(link_id)
      • apply_netem(link_id, delay_ms, loss_pct)
      • apply_rate(link_id, rate_mbps)
      • clear_shaping(link_id)
    • implement using:
      • tc qdisc replace dev <if> root netem delay ... loss ...
      • tc qdisc ... tbf/htb for rate
    • ensure idempotency (repeated calls safe)
    • log every applied shaping change with tick_index
  • Implement lanes/mininet_lane/controller.py

    • support controller options:
      • external controller address (already running)
      • orchestrator-launched controller container/process (optional v1)
    • store controller version info in manifest
  • Implement lanes/mininet_lane/capture.py

    • start tcpdump for relevant interfaces
    • rotate PCAP per time or per run (v1: one PCAP per run)
    • store PCAP path in manifest

9.3 OMNeT lane adapter (trace-first v1)

  • Implement lanes/omnet_lane/adapter.py
    • v1 assumption: OMNeT consumes a trace file (offline) rather than live streaming
  • Implement lanes/omnet_lane/trace_ingest.py
    • orchestrator writes a LinkState trace file suitable for OMNeT adapter
    • define a simple trace format:
      • JSONL per tick containing updates/removals
      • or CSV-like with (tick, src, dst, up, delay, rate, loss)
    • ensure deterministic ordering of entries
  • Implement lanes/omnet_lane/runner.py
    • launch OMNeT simulation via subprocess:
      • capture stdout/stderr to run logs
      • exit code handling
    • place outputs into artifacts directory

10) Orchestrator main runtime loop

10.1 Lifecycle coordination

  • Implement satsim_orch/main.py orchestration steps:
    • Create run context + artifact directories
    • Log environment + versions
    • Initialize Geo/RF client and fetch version/capabilities
    • Create Geo/RF scenario
    • Instantiate chosen lane adapters (mininet/omnet/parallel)
    • Call prepare() for each lane
    • Start stream consumer tasks
    • Start realtime pacing task only when time.mode=REALTIME
    • Await completion conditions:
      • reached t_end
      • user stop signal (CTRL+C)
      • error in any task
    • Finalize lanes
    • Close Geo/RF scenario
    • Write final manifest + summary

10.2 Streaming-driven execution (locked)

  • Geo/RF stream is the authoritative tick source.
  • Orchestrator does not generate ticks; it consumes them and fans out.

Tasks:

  • In streaming consumer, for each LinkDeltaBatch:
    • translate to TickUpdate
    • push to fanout bus

10.3 Fanout to lanes

  • For each lane, run a consumer task:
    • while True: tick = await queue.get(); await lane.apply_tick(tick)
    • handle cancellation and lane errors
  • Implement strict ordering:
    • do not allow lane to process tick k+1 before tick k
  • Implement shutdown handshake:
    • send RunControl(STOP) to lanes on exit
    • drain queues if configured

11) Error handling and shutdown

11.1 Exception strategy

  • Any uncaught exception in:

    • Geo/RF stream consumer
    • any lane consumer
    • any lane adapter method triggers a coordinated shutdown.
  • Implement runtime/process.py:

    • subprocess management with kill/terminate escalation
    • collect exit codes and stderr tails
  • Add SIGINT/SIGTERM handling:

    • first CTRL+C: graceful stop
    • second CTRL+C: immediate stop

11.2 Cleanup correctness

  • Always attempt:
    • finalize() lanes
    • close_scenario() Geo/RF even when errors occur.
  • Write final manifest including failure reason.

12) Metrics and run summaries

12.1 Metric recording

  • Implement metrics/records.py
    • standard metric record format for:
      • tick compute time
      • links emitted
      • lane apply times (optional)
  • Implement metrics/exporters.py
    • JSONL writer to metrics/
    • optional Prometheus exporter
  • Implement per-tick timing:
    • time spent translating batches
    • time spent applying to each lane

12.2 Summary report (v1)

  • Write a summary.json at end of run:
    • total ticks, total links emitted, mean compute time, runtime duration
    • lane success/failure states
    • artifact paths (pcaps, traces, logs)

13) Integration tests (practical, not huge)

13.1 Smoke tests

  • test_geomrf_client_smoke.py
    • connect to Geo/RF engine on localhost
    • create a tiny scenario (1 GS + 1 SAT)
    • stream first 3 ticks and assert non-empty output
  • test_event_alignment_smoke.py
    • request deltas/events with identical t_start/t_end/dt/selector
    • assert each event has tick_index and maps to existing/expected delta tick

13.2 Bus correctness

  • test_bus_fanout.py
    • ensure ticks delivered to all lanes in order
    • ensure backpressure blocks producer when lane queue is full

13.3 Run manifest correctness

  • test_run_manifest.py
    • run manager writes expected keys
    • manifest includes versions and config snapshot

14) Minimum viable orchestrator (v1) — acceptance criteria

  • Can run satsim run scenario.yaml --mode mininet

    • Geo/RF scenario created
    • Link deltas streamed and applied via tc/netem
    • PCAP recorded (optional)
    • run artifacts written (logs, manifest)
  • Can run satsim run scenario.yaml --mode omnet

    • Geo/RF stream recorded to trace
    • OMNeT launched consuming trace (or stubbed with clear TODO if not ready)
    • run artifacts written
  • Can run satsim run scenario.yaml --mode parallel

    • both lane adapters receive identical tick updates
    • lane adapters derive control only from link deltas
    • optional events are captured in artifacts without driving lane state
    • orchestrator shuts down cleanly on completion or CTRL+C

15) Optional but valuable v1.1 tasks (safe additions)

  • Orchestrator exposes its own gRPC stream StreamTickUpdates so lanes can subscribe remotely
  • Add NATS internal bus option for multi-process fanout
  • Add replay command: satsim replay <run_id> (use stored trace)
  • Add sweep runner: parameter grid search with repeated runs and consolidated summary

TASKS_TESTSUITE_GEOENGINE.md Raw

Geometry/RF Engine Test Suite Plan

This checklist tracks the work to build and verify a comprehensive RPC-focused test suite for geomrf-engine.

0) Deliverables

  • Add a dedicated gRPC service test module that exercises all six RPCs.
  • Validate success + error-path behavior for lifecycle and streaming RPCs.
  • Produce an updated coverage report and capture gaps.
  • Keep this checklist updated as tasks are completed.

1) Baseline and scope

  • Confirm current tests/coverage baseline before adding new RPC tests.
  • Confirm test scenario strategy (deterministic helper scenario; compatible with 027 overhead-pass style TLE + GS setup).

2) Test infrastructure

  • Add an in-process gRPC test harness (ephemeral port, async channel/stub, clean teardown).
  • Add shared helpers for creating/closing scenarios from tests.

3) RPC lifecycle tests

  • GetVersion returns expected identity/schema metadata.
  • GetCapabilities returns expected limits and feature flags.
  • CreateScenario success path returns scenario_ref.
  • CreateScenario invalid spec path returns INVALID_ARGUMENT.
  • CloseScenario success path returns ok=true.
  • CloseScenario unknown scenario path returns NOT_FOUND.

4) Streaming RPC tests

  • StreamLinkDeltas success path returns ordered batches with snapshot metadata.
  • StreamLinkDeltas unknown scenario path returns NOT_FOUND.
  • StreamLinkDeltas closed scenario path returns FAILED_PRECONDITION.
  • StreamLinkDeltas invalid time parameters return INVALID_ARGUMENT.
  • StreamEvents success path returns well-formed events for the test scenario.
  • StreamEvents filtered path validates event filtering behavior.
  • StreamEvents unknown scenario path returns NOT_FOUND.
  • StreamEvents closed scenario path returns FAILED_PRECONDITION.

5) Execution and coverage

  • Run full test suite and ensure all tests pass.
  • Run coverage scoped to geomrf_engine.
  • Verify server.py and stream/event modules are covered by tests.
  • Document final coverage numbers and remaining gaps.

6) Results summary

  • Test count: 20 passed.
  • Coverage total (geomrf_engine): 85% (820 statements, 120 missed).
  • Core RPC implementation coverage: server.py at 79%, streaming/events.py at 96%, streaming/backpressure.py at 80%, util/logging.py at 92%.
  • Remaining notable gaps captured for follow-up: evaluator branch coverage (56%) and delta-threshold branch coverage (71%).

7) v1.1 follow-up (event alignment)

  • Add StreamEvents alignment tests for request dt semantics (default_dt fallback + invalid-range rejection).
  • Add selector-parity tests ensuring event selection mirrors StreamLinkDeltas selection.
  • Add assertions that every emitted EngineEvent carries tick_index.
  • Add cross-stream alignment test: same window/dt/selector for events+deltas yields consistent tick mapping.
  • Extend error-path coverage for new event request fields.