VIRO//TAXON — Virologically-Aligned Malware Classification Framework

00 // Abstract

// Abstract

Abstract

No universally adopted, biologically-coherent malware taxonomy exists. Existing schemes — MAEC's attribute vocabulary, STIX 2.1's malware object model, vendor naming conventions, and academic lifecycle frameworks — provide valuable but fragmented foundations. This paper proposes VIRO//TAXON, a polythetic malware classification framework modeled explicitly on the International Committee on Taxonomy of Viruses (ICTV) methodology. Rather than replacing existing standards, the framework extends MAEC as its canonical attribute vocabulary and STIX 2.1 as its transport and interoperability layer, adding a seven-level hierarchical classifier (Domain → Strain) and seven primary discriminators drawn from structural, ecological, replicative, tropism, pathogenic, evasion, and phylogenetic characteristics. We introduce the Taxonomic Virulence Score (TVS), a composite metric quantifying taxonomic severity across all seven discriminators, and demonstrate the framework's application to canonical malware families spanning ransomware, APT implants, wormable exploits, and supply-chain malware. The framework is designed for integration with ATT&CK-mapped CTI workflows and the KLEPT//VIZ cryptovirological assessment platform.

01 // Prior Art

// Prior Art Survey

Three standards families and a substantial academic literature form the prior art for malware classification. None achieves the combination of hierarchical stability, lineage-awareness, and virological coherence that ICTV provides for biological viruses.

1.1 Existing Standards Landscape

Framework	Scope	Hierarchical?	Lineage-Aware?	Virological Analog	Gap
MAEC 5.0	Malware attribute encoding: behaviors, artifacts, relationships	PARTIAL	LIMITED	Phenotypic characterization	No formal rank hierarchy; no phylogenetic discriminator
STIX 2.1	CTI interoperability; malware SDO with capability vocab	NO	NO	Transport / exchange layer	Classification logic entirely absent; naming is vendor-defined
ICTV (biological)	Biological virus taxonomy: 6 ranks, polythetic, sequence-primary	YES	YES	Reference model	Biological substrate; not directly applicable to software
Malware Evaluator	Feature-based classification across malware lifecycle stages	PARTIAL	NO	Lifecycle staging	No rank hierarchy; platform-centric features dominate
IoT Lifecycle	Infection vectors, payload properties, persistence, C2 for IoT	PARTIAL	NO	Transmission + tropism	Domain-restricted; no generalizable hierarchy
Vendor Naming	De facto labels (WannaCry, Emotet, BlackCat)	NO	NO	Common name (pre-ICTV chaos)	No stability; no discriminating criteria; vendor-inconsistent

◈

Assessment: The field has the attribute vocabulary (MAEC) and the transport layer (STIX). What is absent is the classification logic — the decision procedure that takes a bundle of attributes and places a sample into a stable, reproducible position within a ranked hierarchy. That is precisely what ICTV provides for biology, and precisely what VIRO//TAXON contributes for malware.

1.2 Why ICTV is the Right Model

ICTV's core methodological insight is polythetic classification: no single character is either necessary or sufficient to define a taxon. Membership is determined by possessing a sufficient number of characters from a defined set. This is ideal for malware, where no single feature cleanly separates all members of a family (ransomware families vary wildly in their cryptographic schemes, propagation methods, and target environments). The polythetic model accommodates this natural variance without requiring exceptions-per-rule overrides.

ICTV also prioritizes sequence similarity as a primary discriminator at lower ranks (genus, species), which maps naturally to code-similarity analysis tools (BinDiff, DeepBinDiff, TLSH clustering) already in use by malware analysts. The phylogenetic tier of VIRO//TAXON directly inherits this approach.

02 // Framework Design

// Framework Design Principles

VIRO//TAXON is designed around four explicit principles inherited from ICTV methodology and adapted for the malware domain:

1. Polythetic membership. No single discriminator gates inclusion in a taxon. A specimen is classified at a given rank by scoring above threshold on a weighted combination of discriminators, not by satisfying a necessary condition. This prevents classification failure when one feature is absent or unobservable.

2. Rank stability over variant instability. Higher ranks (Domain, Class, Order) should be stable across years; lower ranks (Species, Strain) may evolve rapidly. This mirrors ICTV, where realm-level assignments rarely change but species designations update annually. Analysts can rely on Order-level classifications for long-horizon threat modeling without chasing per-campaign updates.

3. Machine-readable + human-readable naming. Each taxon receives both a formal VIRO//TAXON identifier (e.g., VT-ORD-WORM-001) and a common name derived from dominant behavioral character. The identifier is stable; the common name may be updated.

4. MAEC-grounded, STIX-transported. VIRO//TAXON does not replace MAEC attribute vocabulary. It adds a classification layer on top: given a MAEC-encoded sample, VIRO//TAXON provides the decision procedure to assign it to a rank and generate a TVS score. The result is serialized as a STIX 2.1 extension object for CTI pipeline integration.

△

VIRO//TAXON intentionally avoids the word "virus" as a rank name to prevent confusion with biological virology terminology. The word "malware" is used at the Domain level; ranks below use novel terminology to signal the framework is an analogy, not an ontological claim.

03 // Rank Hierarchy

// The Seven-Rank Hierarchy

VIRO//TAXON defines seven ranks organized from the most general (Domain) to the most specific (Strain). Each rank maps to both a biological virology analog and a set of primary discriminating characters.

DOMAIN

Realm / Kingdom

Malicious CodeAll intentionally harmful software artifacts. Discriminator: purposive harmful functionality.Singleton: all malware exists in one domain.

CLASS

Class / Phylum

Execution EcologyPrimary execution substrate and dependency model. Discriminator I (Genomic Substrate).Win-PE · Linux-ELF · macOS-MachO · Script/Macro · Mobile · Firmware · Container

ORDER

Order

Replication StrategyPropagation model and replication dependency. Discriminator III (Replication Strategy).Self-Replicating · User-Executed · Supply-Chain · Lateral-Movement · Drive-By

FAMILY

Family

Shared Architectural LineageCommon codebase ancestry, shared build toolchain, packer heritage, C2 protocol family. Discriminator VII (Phylogenetic Lineage).Emotet-lineage · Mirai-lineage · REvil-lineage · Cobalt-Strike-based

GENUS

Genus

Shared Capability PatternDominant payload type + primary evasion category + host tropism cluster. Discriminators II, IV, V, VI combined.Ransomware-Enterprise · Wiper-OT · Stealer-Browser · Backdoor-Hypervisor

SPECIES

Species

Operationally Distinct FamilyNamed, operationally distinct malware family with stable, reproducible behavioral signature and shared code heritage. ≥65% code similarity threshold to genus exemplar.LockBit · BlackCat/ALPHV · Conti · WannaCry · Industroyer

STRAIN

Strain / Variant

Campaign-Specific VariantBuilder-derived or operator-customized variant of a Species. Differs in config, obfuscation layer, or minor payload modification. ≥85% code similarity to parent species.LockBit 3.0 · LockBit Green · BlackCat v2 · WannaCry-NSA-variant

VIRO//TAXON HIERARCHY TREE — EXAMPLE CLASSIFICATION: LOCKBIT 3.0

04 // Discriminators

// Seven Primary Discriminators

Each discriminator maps a biological virology classification character to a malware-observable attribute. All seven are assessed for every specimen being classified. Higher ranks require agreement on fewer discriminators; lower ranks require high agreement on all seven.

DISC I · GENOMIC SUBSTRATE

Code Architecture & Form

Bio: RNA/DNA type, sense, topology

The malware's internal code form: native binary, bytecode, script, macro, polyglot, fileless/in-memory, packed/protected, modular loader+plugins. Determines execution dependency chain and static analysis surface.

PE BinaryELFFilelessMacroShellcodeContainer Image

DISC II · STRUCTURAL MORPHOLOGY

Mutation & Polymorphism Class

Bio: capsid symmetry, envelope, morphology

Obfuscation and morphological strategy: static binary, packed, polymorphic engine (substitution/transposition), metamorphic (full rewrite), AI-assisted variant generation. Controls detection surface width.

StaticPackedPolymorphicMetamorphicAI-Mutating

DISC III · REPLICATION STRATEGY

Propagation & Transmission

Bio: replication strategy, transmission route

How the malware acquires new hosts: user execution, self-replication (worm), lateral movement via credential abuse, supply-chain injection, drive-by exploitation, removable media, phishing delivery.

WormPhishingLateral MoveSupply ChainDrive-By

DISC IV · HOST TROPISM

Target Environment Specificity

Bio: host range, tissue tropism, receptor binding

Preferred target environment and the degree of tropism: broad-spectrum vs. narrow. Consumer endpoints, enterprise servers, OT/ICS, IoT, cloud workloads, hypervisors, identity systems, SaaS tenants, mobile.

EnterpriseOT/ICSIoTCloudMobileHypervisor

DISC V · PATHOGENESIS

Effect on Host & Operational Impact

Bio: pathogenicity, virulence, cytopathic effect

Primary operational impact: espionage, credential theft, persistence, encryption/extortion, destructive wipe, botnet enrollment, DDoS, proxying, cryptomining, data staging/exfiltration, sabotage.

RansomwareWiperStealerBackdoorBotnetMiner

DISC VI · IMMUNE EVASION

Host Defense Suppression

Bio: immune escape, immunosuppression, latency

Active defense suppression: AV/EDR kill, ETW patching, process injection, anti-VM/debug, signed-driver abuse, LOLBin execution, dormant staging, environment-aware behavior, memory-only residency.

Anti-AVETW PatchInjectionLOLBinDormancyBYOVD

DISC VII · PHYLOGENETIC LINEAGE

Code Heritage & Evolutionary Ancestry

Bio: sequence similarity, phylogenetic distance

Code reuse, shared functions, build toolchain similarity, C2 protocol semantics, config structure overlap, shared packer heritage. Primary discriminator at Species/Strain level. Measured via TLSH/BinDiff similarity scores.

TLSH >85%Shared C2Build ArtifactsConfig Heritage

05 // Interactive Classifier

// Interactive Specimen Classifier

Step through the seven discriminators to classify a malware specimen within the VIRO//TAXON hierarchy and compute its Taxonomic Virulence Score.

VIRO//TAXON SPECIMEN CLASSIFIER Step 1 of 7

DISC I — What is the primary code substrate?

Native Binary (PE/ELF)

Fileless / In-Memory

Script / Macro

Packed / Protected Binary

Firmware / Bootkit

Modular Loader + Plugins

DISC II — What is the mutation / obfuscation class?

Static (no obfuscation)

Packed / Encrypted

Polymorphic Engine

Metamorphic (full rewrite)

AI-Assisted Variant Generation

DISC III — What is the primary propagation strategy?

User-Executed / Phishing

Self-Replicating Worm

Lateral Movement (credential)

Supply Chain Injection

Drive-By / Browser Exploit

DISC IV — What is the primary host environment (tropism)?

Consumer Endpoint

Enterprise Server / DC

OT / ICS / SCADA

Cloud / Container

Hypervisor / ESXi

Mobile / IoT

DISC V — What is the primary operational impact (pathogenesis)?

Ransomware / Extortion

Destructive Wiper

Espionage / Credential Theft

Botnet / DDoS Enrollment

Persistence / Backdoor

Cryptomining

DISC VI — What is the primary immune evasion mechanism?

None / Minimal

Packing / Encryption only

Process Injection / Hollowing

BYOVD / Signed Driver Abuse

ETW Patch + LOLBin + Fileless

DISC VII — What is the phylogenetic lineage confidence?

Novel / No known lineage (emergent)

Distantly related (<50% code similarity)

Related species (50–85% similarity)

Known strain (>85% similarity)

Shared build / C2 / config heritage

VIRO//TAXON CLASSIFICATION RESULT

06 // Mathematical Basis

// Mathematical Formalization

6.1 Polythetic Membership Criterion

For a malware specimen $M$ to be assigned to taxon $T$ at rank $r$, it must satisfy a weighted discriminator threshold. Let $\mathbf{d}(M) = (d_1, \ldots, d_7) \in [0,10]^7$ be the discriminator scores for $M$, and let $\mathbf{w}_r$ be the rank-specific weight vector.

Definition 1 — Polythetic Membership Function

$$M \in T_r \iff \mathbf{w}_r \cdot \mathbf{d}(M) \geq \theta_r \quad \text{and} \quad \|T_r\text{-exemplar} - \mathbf{d}(M)\|_2 \leq \delta_r$$

$\theta_r$ — rank-specific inclusion threshold (higher ranks have lower thresholds)
$\delta_r$ — maximum Euclidean distance from taxon exemplar in discriminator space
$\|T_r\text{-exemplar} - \mathbf{d}(M)\|_2$ — L2 distance from the defining exemplar of taxon $T$ at rank $r$

6.2 Jaccard Code Similarity for Phylogenetic Rank Assignment

At Species and Strain level, phylogenetic lineage (Discriminator VII) is quantified using set-theoretic similarity over function-hash fingerprint sets, analogous to ICTV's sequence identity thresholds.

Definition 2 — Phylogenetic Jaccard Similarity

$$J(A, B) = \frac{|F(A) \cap F(B)|}{|F(A) \cup F(B)|}$$

$F(A)$ — set of function-level hash fingerprints for sample $A$ (e.g., TLSH 128-bit fuzzy hashes, MinHash sketches)
Species assignment requires $J(M, \text{exemplar}) \geq 0.65$
Strain assignment requires $J(M, \text{parent species}) \geq 0.85$
Below $J = 0.30$: novel genus candidate; below $J = 0.10$: novel family candidate.

6.3 Rank-Specific Weight Vectors

Definition 3 — Discriminator Weight Matrix by Rank

$$W = \begin{pmatrix} w_{\text{Domain}} \\ w_{\text{Class}} \\ w_{\text{Order}} \\ w_{\text{Family}} \\ w_{\text{Genus}} \\ w_{\text{Species}} \\ w_{\text{Strain}} \end{pmatrix} = \begin{pmatrix} 0.20 & 0.14 & 0.14 & 0.13 & 0.13 & 0.13 & 0.13 \\ 0.35 & 0.14 & 0.12 & 0.10 & 0.10 & 0.10 & 0.09 \\ 0.15 & 0.12 & 0.35 & 0.12 & 0.10 & 0.08 & 0.08 \\ 0.10 & 0.10 & 0.12 & 0.15 & 0.12 & 0.12 & 0.29 \\ 0.10 & 0.18 & 0.10 & 0.12 & 0.18 & 0.15 & 0.17 \\ 0.05 & 0.18 & 0.05 & 0.05 & 0.05 & 0.20 & 0.42 \\ 0.05 & 0.14 & 0.12 & 0.33 & 0.12 & 0.12 & 0.12 \end{pmatrix}$$

Rows = ranks; columns = discriminators I–VII. Columns sum to 1.0 per row. Higher-rank assignments weight broader ecological and genomic characters; lower-rank assignments weight phylogenetic similarity and structural morphology more heavily.

6.4 Taxonomic Virulence Score

Definition 4 — Taxonomic Virulence Score (TVS)

$$\text{TVS}(M) = \sum_{i=1}^{7} \omega_i \cdot d_i(M) \times 10$$

where $\boldsymbol{\omega} = (0.18, 0.12, 0.16, 0.14, 0.18, 0.14, 0.08)$ is the global severity weight vector,
$d_i(M) \in [0, 10]$ is the score on discriminator $i$, and TVS $\in [0, 100]$.

Severity tiers: TVS 0–29: Low · 30–54: Moderate · 55–74: High · 75–89: Critical · 90–100: Catastrophic

07 // Phylogenetic Map

// Phylogenetic Family Map

The radial tree below visualizes the VIRO//TAXON family-level relationships across documented malware lineages. Branch length encodes phylogenetic distance (inverse Jaccard similarity); color encodes the dominant pathogenesis class.

MALWARE PHYLOGENETIC TREE — FAMILY LEVEL (ILLUSTRATIVE CORPUS)

08 // Integration Layer

// MAEC + STIX 2.1 Integration Mapping

VIRO//TAXON is not a replacement for MAEC or STIX — it is a classification logic layer on top. The mapping below shows which existing MAEC attributes and STIX SDO properties feed each discriminator, and which new properties the VIRO//TAXON extension contributes.

MAEC 5.0 Attributes (Source)

maec:BehaviorCollection

behaviors, action-collections, object-refs

maec:Package / analysis.tool

static analysis results, entropy, packer id

stix2:malware.capabilities

exfil, remote-access, evades-av, kills-processes

stix2:malware.architecture_execution_envs

x86, ARM, MIPS, JavaScript, PowerShell

stix2:malware.operating_system_refs

Windows, Linux, Android, iOS, QNX

stix2:relationship(variant-of, derived-from)

lineage edges in STIX graph

maec:Malware.obfuscation_methods

packing, polymorphic-code, anti-analysis

→

VIRO//TAXON Extensions (Output)

vt:discriminator[V] pathogenesis_class

Ransomware · Wiper · Stealer · Backdoor · Botnet

vt:discriminator[I,II] substrate + morphology_class

Fileless · Metamorphic · AI-Mutating · Firmware

vt:discriminator[VI] evasion_tier

0–4 (None→BYOVD+ETW+Fileless)

vt:discriminator[I] ecology_class

WinPE · LinuxELF · Script · Mobile · Container

vt:discriminator[IV] tropism_profile

Enterprise · OT · Hypervisor · Cloud · IoT

vt:discriminator[VII] + vt:rank.family/species/strain

J(A,B) score + formal rank assignment

vt:discriminator[III] replication_strategy

Worm · Phishing · SupplyChain · LateralMove

⬡

The VIRO//TAXON extension object is designed to be serializable as a STIX 2.1 custom extension (x-viro-taxon) on the existing malware SDO. This means zero breaking changes to existing CTI pipelines — VIRO//TAXON classification data travels as an optional extension alongside existing STIX malware objects, and ATT&CK technique mappings remain unchanged.

09 // TVS Scoring

// TVS Comparative Scoring

Taxonomic Virulence Scores for five canonical malware specimens, illustrating the discriminator profiles across the severity spectrum.

TVS PROFILE — LOCKBIT 3.0

I · Genomic Substrate

8.0

II · Morphology

8.0

III · Replication

7.0

IV · Host Tropism

9.0

V · Pathogenesis

9.0

VI · Immune Evasion

9.0

VII · Phylogeny

7.0

TVS — LOCKBIT 3.0

85.5 / 100

Specimen	Class / Order	Genus	TVS	Severity Tier
LockBit 3.0	WinPE · Lateral-Move	Ransomware-Enterprise	85.5	CRITICAL
NotPetya	WinPE · Self-Replicating	Wiper-OT	92.1	CATASTROPHIC
Industroyer 2	WinPE · Targeted Deploy	Sabotage-ICS	88.4	CRITICAL
Emotet (epoch 5)	WinPE · Phishing	Loader-BotEnrollment	72.3	HIGH
XMRig-based miner	Linux-ELF · Drive-By	Miner-Cloud	38.7	MODERATE

References

// References

ICTV (2022). The ICTV Report on Virus Classification. International Committee on Taxonomy of Viruses. https://ictv.global/report
MITRE (2022). MAEC Version 5.0 Specification. MITRE Corporation. https://maecproject.github.io
OASIS (2021). STIX Version 2.1 Committee Specification. https://docs.oasis-open.org/cti/stix/v2.1/
Ye, Y. et al. (2017). A Survey on Malware Detection Using Data Mining Techniques. ACM Computing Surveys, 50(3), 1–40.
Canzanese, R. et al. (2015). Toward an Automatic, Online Behavioral Malware Classification System. IEEE SRDS 2015.
Vadrevu, P. & Perdisci, R. (2016). MAXS: Scaling Malware Execution with Sequential Multi-Hypothesis Testing. AsiaCCS 2016.
Cimitile, A. et al. (2017). Talos: No More Ransomware Victims. J. Universal Computer Science, 23(9), 862–876.
Karbab, E. & Debbabi, M. (2021). MalDy: Portable Data-Driven Malware Detection Using Natural Language Processing. Digital Investigation, 36.
Nataraj, L. et al. (2011). Malware Images: Visualization and Automatic Classification. VizSec 2011.
Souri, A. & Hosseini, R. (2018). A State-of-the-Art Survey of Malware Detection Approaches Using Data Mining Techniques. Human-centric Computing and Information Sciences, 8(1), 1–22.
Miramirkhani, N. et al. (2017). Spotless Sandboxes: Evading Malware Analysis Systems Using Wear-and-Tear Artifacts. IEEE S&P 2017.
Bayer, U. et al. (2009). Scalable, Behavior-Based Malware Clustering. NDSS 2009.