Skip to content

Scorch

Scorch — SCenario ORCHestration — is an automated scenario orchestration framework within phenix. It is included in phenix as a core app. The development of the Scorch framework was motivated by the need to facilitate rigorous experimentation. Some advantages of Scorch include the ability to run many repeated scenarios on an experiment with consistency and minimal overhead. Scorch also provides the ability to efficiently capture experimental data for retrieval and analysis.

A phenix scenario configuration file is used to define and configure the Scorch app for use on a topology. The Scorch app is meant to allow for the staging of Scorch components in sequence to execute against a running experiment. When applied to a given topology, the Scorch app will be available in the Scorch table to execute and then observe, manipulate in some cases, and review output from available components for a given stage in the Scorch pipeline.

The screenshots and configuration file in the rest of this document are from an example Scorch app, scorch-demo.

Scorch Components

A Scorch component is simply an executable available to be called by the Scorch app within phenix. A component is expected to implement any or all of the various stages in the Scorch pipeline.

For an executable to be considered a Scorch component, it must meet the following requirements:

  1. Follow the phenix-scorch-component-<type> naming convention, where <type> is the component type used in the Scorch app configuration. An example would be phenix-scorch-component-tcpdump.
  2. Be an executable file.
  3. Be in the PATH of the user running phenix.

When the Scorch app executes a Scorch component, it will pass a number of positional arguments to the component via the command line, as well as the JSON representation of the experiment the component is to be executed against via STDIN. The positional arguments passed are as follows:

  1. run stage (configure, start, stop, or cleanup)
  2. component name (name given to component type in Scorch app configuration)
  3. run ID (integer >= 0 representing the array index of the Scorch run in the app configuration being executed)
  4. loop (integer >= 0 representing the current run loop being executed)
  5. count (integer >= 0 representing the current loop count being executed)

During component execution, the Scorch app assumes anything written to STDOUT by a component is intended to be relayed to the user. Thus, when Scorch is run via the web UI, anything written to STDOUT gets streamed to the UI for viewing. Any error messages generated by a component should be written to the log file or to STDERR unless it's to also be relayed to the user directly.

The Scorch app expects a component executable to exit with a value of 0 upon completion if the component was successful, and exit with any other value otherwise. An exit value of anything other than 0 will result in Scorch halting execution of the current stage and jumping to the next appropriate stage to complete the Scorch run.

Automated Component Data Collection

The Scorch app is capable of generating a configuration file for and starting an instance of Filebeat in the background before execution of each Scorch run. As each Scorch component is executed, any data it generates and collects can be configured to be automatically processed by Filebeat for indexing in Elasticsearch. At a minimum, this requires the following.

  1. Filebeat to be enabled and configured in the Scorch app configuration.
  2. A Filebeat input to be configured for each component generating and collecting data.
  3. The filebeat executable installed and in the PATH of the user running phenix.

See the example configuration below for examples of how Filebeat and Filebeat inputs are configured in the Scorch app configuration.

Built-in Components

The following Scorch component types are considered core components, in that they are included in the main phenix repository and are available for use in Scorch app configurations by default.

  • break
  • pause
  • soh
  • tap

break Component

The break component is comparable to a source code break point when debugging an application in that it pauses execution of the current Scorch run until a user exits the break. While the break component is running, users have access to a shell on the server running phenix as the user running phenix. The first user to access the shell via the terminal modal in the UI will have read-write access. If other users access the shell, they will have read-only access but will get live updates as the user with read-write access uses the terminal.

It's possible to configure the break component in the Scorch app configuration to create a minimega tap when the component is executed. When the component is executed, the tap will be deleted. In addition to the tap, external network access can also be configured (e.g., Internet access).

An example of configuring a break component to create a tap and configure external network access during the configure stage is as follows. The break component can be configured to run in any stage.

spec:
  apps:
  - name: scorch
    metadata:
      components:
      - name: break-tap
        type: break
        metadata:
          tap:
            bridge: phenix
            vlan: MGMT
            ip: 172.16.33.25/16
            internetAccess: true
      runs:
      - configure: ["break-tap"]

pause Component

The pause component is similar to the break component in that it pauses execution of the current Scorch run, but instead of waiting for user intervention it simply pauses for a predefined duration.

A simple example is as follows. The pause component can be configured to run in any stage. The value used for the duration key should be a valid Golang duration string.

spec:
  apps:
  - name: scorch
    metadata:
      components:
      - name: brief-pause
        type: pause
        metadata:
          duration: 2s
      runs:
      - start: ["brief-pause"]

soh Component

The soh component allows users to execute the State of Health app at scheduled times throughout a Scorch run. This is handy when, for example, other Scorch components might cause nodes in the experiment to misbehave or fail. The component can be configured to limit which health checks are run, and can also be configured to fail if any of the health checks fail. The log level can also be configured, which will limit what logs get sent to the component's UI modal while the component is running.

spec:
  apps:
  - name: scorch
    metadata:
      components:
      - name: health-check
        type: soh
        metadata:
          c2Timeout: 5s # if provided, will update the C2 timeout setting when this component runs the state of health app
          checks:       # default is to run all the following checks
            - network-config       # Ensure all nodes still have network configured per the topology. Will use `skipInitialNetworkConfigTests` setting in soh app config.
            - reachability         # Basic ICMP-based reachability testing. Will use `testReachability` setting in soh app config.
            - custom-reachability  # IP-based reachability testing (TCP or UDP). Will use `testCustomReachability` setting in soh app config.
            - processes            # Ensure processes are running in nodes. Will use `hostProcesses` setting in soh app config.
            - ports                # Ensure listeners are running in nodes. Will use `hostListeners` setting in soh app config.
            - custom               # Run custom tests in nodes. Will use `hostCustomTests` setting in soh app config.
            - cpu-load             # Gather CPU load stats from nodes.
            - flows                # Gather paket flows from ElasticSearch server. Requires `packetCapture` setting to be cofigured in soh app config.
          failOnError: true # default is false
          logLevel: debug   # default is info
      runs:
      - start: ["health-check"]

tap Component

The tap component implements the exact same functionality described above in the break component for creating a minimega tap and, optionally, external network access, but allows for the tap (and external network access, if configured) to exist while other components are executed (as opposed to only existing for the duration of the break component).

An example of configuring a tap component to create a tap and configure external network access is as follows. The tap component can only be configured to run in the start stage (create the tap) and the stop stage (delete the tap).

spec:
  apps:
  - name: scorch
    metadata:
      components:
      - name: tap-inet
        type: tap
        metadata:
          bridge: phenix
          vlan: MGMT
          ip: 172.16.33.25/16
          internetAccess: true
      runs:
      - start: ["tap-inet"]
      - stop: ["tap-inet"]

NOTE: In deployments where minimega is running in a container on the headnode, and Docker networking is in use (e.g., the minimega container is not configured to use host networking), users will need to execute the following commands if access to the minimega tap created by the tap (or break) component from the Docker host is required.

ovs-docker add-port phenix TDN minimega
docker exec -it minimega ovs-vsctl add-port phenix TDN
ovs-vsctl add-port phenix temp-tap tag=<vlan ID> -- set interface temp-tap type=internal

The above commands assume the name of the minimega container is minimega. The name of the local tap created (in this case, temp-tap) can be whatever, but the value for the VLAN tag must match the numerical ID of the VLAN that's mapped to the VLAN alias used in the tap (or break) component configuration.

Using host networking mode for the minimega container allows for all the above nonsense to be skipped.

User-defined Components

The following Scorch component types have been developed external to the main phenix repository and are available in the phenix-apps repository, which also includes README-based documentation for each. They are all developed in Python, and leverage common helper classes that ease the development of user components.

  • art
  • cc
  • ettercap
  • hoststats
  • snort
  • tcpdump
  • vmstats

Scorch Table

The Scorch table, accessible as one of the tab selections within the phenix UI, lists all possible Scorch apps available based on the experiments established in phenix. The following columns or functions are available:

  • Experiment name
  • Experiment status: this reports on the status of the experiment — an experiment must be running for a Scorch app to start
  • Scorch app status: this will report the running or stopped status of the Scorch app itself
  • Terminal: if the Scorch app has reached a break point, a terminal will be available — if clicked, a terminal dialog will be opened and is running on the phenix host system
  • Find an Experiment: similar to the search fields in other tables within the phenix UI, it is possible to filter experiment names based on terms entered here

Scorch Table

Scorch Pipeline

Scorch pipelines are available on the Scorch table in the phenix UI. The table is sorted by Experiment name by default. Only those experiments with the Scorch app configured in the scenario configuration will be listed in the table. It is possible to start or stop an experiment, as well as start or stop a Scorch component. Finally, if a terminal is available when a break point is reached in a running Scorch app, it can be accessed from the table.

The Scorch pipeline provides a graphical representation of the Scorch app, including the configure, start, stop, and cleanup stages. If the Scorch app provides output for a given step, or component, users can click into the component and receive the output. A user can access the terminal if a break point is reached by clicking on the component. As with the terminal access described above, a dialog will be presented with a terminal running on the phenix host system.

Scorch Pipeline

The following functions are also available in the Scorch pipeline UI:

  • Return to the table: a button that will return to the Scorch table
  • Scorch app status: a button that will allow a running Scorch app to be stopped or started depending on the current status

Return to Scorch table

Stopped or Started button

Stages

For a given Scorch pipeline, there are four stages of execution:

  1. configure
  2. start
  3. stop
  4. cleanup

A Scorch component may implement any or all of the various stages, and Scorch will execute each stage inside the components in order. Each component can be configured in the Scorch app scenario configuration file.

* There is additional done stage in the UI obtained when cleanup has been completed. It is meant to report the completion of all stages in the Pipeline UI.

It's completely up to the component developer if and how an execution stage is implemented and handled by the component. If a component is configured in the Scorch app to be executed as part of a stage, but the component does not implement said stage, then the Scorch app will happily continue on to the next component in the stage (unless the component errors out if the current stage is not implemented, in which case the Scorch run will fail).

Pipeline Stages

The following indicators are presented for each component of the Scorch app:

  • Uninitialized: a component has not yet been reached or initialized — if the component has not yet been run, all components will be identified as uninitialized
  • Running: the component is currently running and has not yet been completed
  • Success: the component has completed successfully
  • Break Point: a break component has been reached — a terminal should be accessible by clicking on the component
  • Backgrounded: a component is running in the background
  • Failure: a component has failed for some reason — reporting on the failure may be accessible by click on the component

Legend

Loops are also available within a Scorch app and are included in the configuration file. A loop supports looping within a given component through additional components. Once a loop is completed, the next component is executed. There is no limit on the number of loops or depth of them. The Scorch pipeline UI supports access to each loop and will report the depth in the pipeline's title. The first loop in the scorch-demo app is presented in the following example; clicking the return button will return the display one level up in the loop chain (or ending at the parent Scorch app).

Example Loop within Scorch app

In addition to loops, multi-run is supported within a Scorch app. Unlike loops, multi-run allows for individual Scorch runs containing four separate stages. They are not nested in each other but are independent runs. A use case for multi-run could be executing multiple independent portions of an experiment against a topology, in any order or executing a run multiple times. There are two runs depicted in the configuration below.

As described above, each component will provide a modal for output reporting. The output could either be a fairly straight forward report, nothing at all, or it include logging output for a given component in the stage. The result will be streamed as it is received if the component is currently in a running component. If the component has finished running, the output will be static.

Output of Snort component

A terminal modal is available from the Scorch table when a break point component is reached; it is also available in the Scorch pipeline. There are two types of terminal modals: read-write and read-only. If another user opens a terminal modal for a given component, it will be read-only the next time a terminal is open. The following are examples of each. In the first example, a read-only terminal, the user viewing this modal is only observing what another using is executing. In the second example, a read-write terminal, the user ran two simple commands on the phenix host system.

Read-only Terminal

Read-write Terminal

If a component fails, the Scorch app will skip the remaining components involved in the current execution stage and jump to the next appropriate stage to complete the Scorch run. For example, if a component fails in the Configure stage the Scorch app will jump to the Cleanup stage, and if a component fails in the Start stage Scorch will jump to the Stop stage (skipping execution of any configured loops in either case).

Failed component — trafficgen in the Configure stage

Example Configuration

apiVersion: phenix.sandia.gov/v2
kind: Scenario
metadata:
  name: scorch-demo
spec:
  scenario:
    apps:
    - name: mirror
      metadata:
        directGRE:
          enabled: true
          mirrorBridge: phenix
          mirrorNet: 172.30.0.0/16
          mirrorVLAN: mirror
      hosts:
      - hostname: detector
        metadata:
          interface: IF0
          vlans:
          - EXP
      - name: scorch
        metadata:
          components:
          - name: vmstats
            type: vmstats
            metadata:
              filebeat.inputs:
              - enabled: true
                type: log
                json.add_error_key: true
                paths:
                - vm_stats.jsonl
                processors:
                - copy_fields:
                    fields:
                    - from: json
                      to: scorch.vmstats
                - drop_fields:
                    fields:
                    - json
                - timestamp:
                    field: scorch.vmstats.UTC
                    layouts:
                    - '2006-01-02 15:04:05'
          - name: hoststats
            type: hoststats
            background: true
            metadata:
              filebeat.inputs:
              - enabled: true
                type: log
                json.add_error_key: true
                paths:
                - host_stats.jsonl
                processors:
                - copy_fields:
                    fields:
                    - from: json
                      to: scorch.hoststats
                - drop_fields:
                    fields:
                    - json
                - timestamp:
                    field: scorch.hoststats.timestamp
                    layouts:
                    - UNIX_MS
          - name: trafficgen
            type: trafficgen
            metadata:
              scripts:
                backgroundGen: /phenix/topologies/scorch-demo/scripts/background-gen.py
                malwareGen: /phenix/topologies/scorch-demo/scripts/malware-gen.py
                trafficServer: /phenix/topologies/scorch-demo/scripts/traffic-server.py
              targets:
              - backgroundClient:
                  hostname: background-gen
                  probability: 0.01
                  rate: 10000
                duration: 30
                hostname: traffic-server
                interface: IF0
                malwareClient:
                  hostname: malware-gen
                  probability: 1.25
                  rate: 20
          - name: break
            type: break
            metadata: {}
          - name: tcpdump
            type: tcpdump
            metadata:
              convertToJSON: false
              filebeat.inputs:
              - enabled: true
                type: log
                paths:
                - tcpdump.pcap.json
                processors:
                - copy_fields:
                    fields:
                    - from: json
                      to: scorch.tcpdump
                - drop_fields:
                    fields:
                    - json
              vms:
                detector: eth0
          - name: snort
            type: snort
            metadata:
              configs:
              - dst: /etc/snort/snort.conf
                name: snort
                src: /phenix/topologies/scorch-demo/configs/snort.conf
              - dst: /etc/snort/rules/emotet.rules
                name: emotet
                src: /phenix/topologies/scorch-demo/configs/emotet.rules
              filebeat.inputs:
              - enabled: true
                type: log
                json.add_error_key: true
                paths:
                - snort-stats.jsonl
                processors:
                - copy_fields:
                    fields:
                    - from: json
                      to: scorch.snort
                - drop_fields:
                    fields:
                    - json
                - timestamp:
                    field: scorch.snort.timestamp
                    layouts:
                    - UNIX
              hostname: detector
              scripts:
                configSnort:
                  executor: bash
                  script: /phenix/topologies/scorch-demo/scripts/configure-snort.sh
              sniffInterface: eth0
              waitDuration: 5
          runs:
          - configure:
            - trafficgen
            - snort
            start:
            - hoststats
            - vmstats
            loop:
              execute:
                configure: null
                start:
                - tcpdump
                - snort
                - trafficgen
                stop:
                - trafficgen
                - snort
                - tcpdump
                cleanup: null
            stop:
            - vmstats
            - hoststats
            - break
          - start:
            - tcpdump
            - trafficgen
            stop:
            - trafficgen
            - tcpdump
          filebeat:
            enabled: true
            config:
              output.elasticsearch:
                hosts:
                - es:9200
              setup.dashboards.enabled: true
              setup.kibana.host: http://kibana:5601