VIRO//TAXON  ·  Kulprit Studios Research  ·  2025

Virologically-Aligned
Malware Classification
// A Polythetic Taxonomy

ICTV-Inspired Hierarchical Characterization Extended from MAEC + STIX 2.1
S. MurphyCISO / DPO, Center for Internet Security
DSc Candidate, Marymount University
Working PaperKulprit Studios Research
viro.kulpritstudios.com
malware taxonomypolythetic classification ICTV analogyMAEC extension phylogenetic lineageTVS index STIX 2.1host tropism
00 // Abstract

// Abstract

Abstract
No universally adopted, biologically-coherent malware taxonomy exists. Existing schemes — MAEC's attribute vocabulary, STIX 2.1's malware object model, vendor naming conventions, and academic lifecycle frameworks — provide valuable but fragmented foundations. This paper proposes VIRO//TAXON, a polythetic malware classification framework modeled explicitly on the International Committee on Taxonomy of Viruses (ICTV) methodology. Rather than replacing existing standards, the framework extends MAEC as its canonical attribute vocabulary and STIX 2.1 as its transport and interoperability layer, adding a seven-level hierarchical classifier (Domain → Strain) and seven primary discriminators drawn from structural, ecological, replicative, tropism, pathogenic, evasion, and phylogenetic characteristics. We introduce the Taxonomic Virulence Score (TVS), a composite metric quantifying taxonomic severity across all seven discriminators, and demonstrate the framework's application to canonical malware families spanning ransomware, APT implants, wormable exploits, and supply-chain malware. The framework is designed for integration with ATT&CK-mapped CTI workflows and the KLEPT//VIZ cryptovirological assessment platform.
01 // Prior Art

// Prior Art Survey

Three standards families and a substantial academic literature form the prior art for malware classification. None achieves the combination of hierarchical stability, lineage-awareness, and virological coherence that ICTV provides for biological viruses.

1.1 Existing Standards Landscape

FrameworkScopeHierarchical?Lineage-Aware?Virological AnalogGap
MAEC 5.0 Malware attribute encoding: behaviors, artifacts, relationships PARTIAL LIMITED Phenotypic characterization No formal rank hierarchy; no phylogenetic discriminator
STIX 2.1 CTI interoperability; malware SDO with capability vocab NO NO Transport / exchange layer Classification logic entirely absent; naming is vendor-defined
ICTV (biological) Biological virus taxonomy: 6 ranks, polythetic, sequence-primary YES YES Reference model Biological substrate; not directly applicable to software
Malware Evaluator Feature-based classification across malware lifecycle stages PARTIAL NO Lifecycle staging No rank hierarchy; platform-centric features dominate
IoT Lifecycle Infection vectors, payload properties, persistence, C2 for IoT PARTIAL NO Transmission + tropism Domain-restricted; no generalizable hierarchy
Vendor Naming De facto labels (WannaCry, Emotet, BlackCat) NO NO Common name (pre-ICTV chaos) No stability; no discriminating criteria; vendor-inconsistent
Assessment: The field has the attribute vocabulary (MAEC) and the transport layer (STIX). What is absent is the classification logic — the decision procedure that takes a bundle of attributes and places a sample into a stable, reproducible position within a ranked hierarchy. That is precisely what ICTV provides for biology, and precisely what VIRO//TAXON contributes for malware.

1.2 Why ICTV is the Right Model

ICTV's core methodological insight is polythetic classification: no single character is either necessary or sufficient to define a taxon. Membership is determined by possessing a sufficient number of characters from a defined set. This is ideal for malware, where no single feature cleanly separates all members of a family (ransomware families vary wildly in their cryptographic schemes, propagation methods, and target environments). The polythetic model accommodates this natural variance without requiring exceptions-per-rule overrides.

ICTV also prioritizes sequence similarity as a primary discriminator at lower ranks (genus, species), which maps naturally to code-similarity analysis tools (BinDiff, DeepBinDiff, TLSH clustering) already in use by malware analysts. The phylogenetic tier of VIRO//TAXON directly inherits this approach.

02 // Framework Design

// Framework Design Principles

VIRO//TAXON is designed around four explicit principles inherited from ICTV methodology and adapted for the malware domain:

1. Polythetic membership. No single discriminator gates inclusion in a taxon. A specimen is classified at a given rank by scoring above threshold on a weighted combination of discriminators, not by satisfying a necessary condition. This prevents classification failure when one feature is absent or unobservable.

2. Rank stability over variant instability. Higher ranks (Domain, Class, Order) should be stable across years; lower ranks (Species, Strain) may evolve rapidly. This mirrors ICTV, where realm-level assignments rarely change but species designations update annually. Analysts can rely on Order-level classifications for long-horizon threat modeling without chasing per-campaign updates.

3. Machine-readable + human-readable naming. Each taxon receives both a formal VIRO//TAXON identifier (e.g., VT-ORD-WORM-001) and a common name derived from dominant behavioral character. The identifier is stable; the common name may be updated.

4. MAEC-grounded, STIX-transported. VIRO//TAXON does not replace MAEC attribute vocabulary. It adds a classification layer on top: given a MAEC-encoded sample, VIRO//TAXON provides the decision procedure to assign it to a rank and generate a TVS score. The result is serialized as a STIX 2.1 extension object for CTI pipeline integration.

VIRO//TAXON intentionally avoids the word "virus" as a rank name to prevent confusion with biological virology terminology. The word "malware" is used at the Domain level; ranks below use novel terminology to signal the framework is an analogy, not an ontological claim.
03 // Rank Hierarchy

// The Seven-Rank Hierarchy

VIRO//TAXON defines seven ranks organized from the most general (Domain) to the most specific (Strain). Each rank maps to both a biological virology analog and a set of primary discriminating characters.

DOMAIN
Realm / Kingdom
Malicious CodeAll intentionally harmful software artifacts. Discriminator: purposive harmful functionality.Singleton: all malware exists in one domain.
CLASS
Class / Phylum
Execution EcologyPrimary execution substrate and dependency model. Discriminator I (Genomic Substrate).Win-PE · Linux-ELF · macOS-MachO · Script/Macro · Mobile · Firmware · Container
ORDER
Order
Replication StrategyPropagation model and replication dependency. Discriminator III (Replication Strategy).Self-Replicating · User-Executed · Supply-Chain · Lateral-Movement · Drive-By
FAMILY
Family
Shared Architectural LineageCommon codebase ancestry, shared build toolchain, packer heritage, C2 protocol family. Discriminator VII (Phylogenetic Lineage).Emotet-lineage · Mirai-lineage · REvil-lineage · Cobalt-Strike-based
GENUS
Genus
Shared Capability PatternDominant payload type + primary evasion category + host tropism cluster. Discriminators II, IV, V, VI combined.Ransomware-Enterprise · Wiper-OT · Stealer-Browser · Backdoor-Hypervisor
SPECIES
Species
Operationally Distinct FamilyNamed, operationally distinct malware family with stable, reproducible behavioral signature and shared code heritage. ≥65% code similarity threshold to genus exemplar.LockBit · BlackCat/ALPHV · Conti · WannaCry · Industroyer
STRAIN
Strain / Variant
Campaign-Specific VariantBuilder-derived or operator-customized variant of a Species. Differs in config, obfuscation layer, or minor payload modification. ≥85% code similarity to parent species.LockBit 3.0 · LockBit Green · BlackCat v2 · WannaCry-NSA-variant
VIRO//TAXON HIERARCHY TREE — EXAMPLE CLASSIFICATION: LOCKBIT 3.0
04 // Discriminators

// Seven Primary Discriminators

Each discriminator maps a biological virology classification character to a malware-observable attribute. All seven are assessed for every specimen being classified. Higher ranks require agreement on fewer discriminators; lower ranks require high agreement on all seven.

DISC I · GENOMIC SUBSTRATE
Code Architecture & Form
Bio: RNA/DNA type, sense, topology
The malware's internal code form: native binary, bytecode, script, macro, polyglot, fileless/in-memory, packed/protected, modular loader+plugins. Determines execution dependency chain and static analysis surface.
PE BinaryELFFilelessMacroShellcodeContainer Image
DISC II · STRUCTURAL MORPHOLOGY
Mutation & Polymorphism Class
Bio: capsid symmetry, envelope, morphology
Obfuscation and morphological strategy: static binary, packed, polymorphic engine (substitution/transposition), metamorphic (full rewrite), AI-assisted variant generation. Controls detection surface width.
StaticPackedPolymorphicMetamorphicAI-Mutating
DISC III · REPLICATION STRATEGY
Propagation & Transmission
Bio: replication strategy, transmission route
How the malware acquires new hosts: user execution, self-replication (worm), lateral movement via credential abuse, supply-chain injection, drive-by exploitation, removable media, phishing delivery.
WormPhishingLateral MoveSupply ChainDrive-By
DISC IV · HOST TROPISM
Target Environment Specificity
Bio: host range, tissue tropism, receptor binding
Preferred target environment and the degree of tropism: broad-spectrum vs. narrow. Consumer endpoints, enterprise servers, OT/ICS, IoT, cloud workloads, hypervisors, identity systems, SaaS tenants, mobile.
EnterpriseOT/ICSIoTCloudMobileHypervisor
DISC V · PATHOGENESIS
Effect on Host & Operational Impact
Bio: pathogenicity, virulence, cytopathic effect
Primary operational impact: espionage, credential theft, persistence, encryption/extortion, destructive wipe, botnet enrollment, DDoS, proxying, cryptomining, data staging/exfiltration, sabotage.
RansomwareWiperStealerBackdoorBotnetMiner
DISC VI · IMMUNE EVASION
Host Defense Suppression
Bio: immune escape, immunosuppression, latency
Active defense suppression: AV/EDR kill, ETW patching, process injection, anti-VM/debug, signed-driver abuse, LOLBin execution, dormant staging, environment-aware behavior, memory-only residency.
Anti-AVETW PatchInjectionLOLBinDormancyBYOVD
DISC VII · PHYLOGENETIC LINEAGE
Code Heritage & Evolutionary Ancestry
Bio: sequence similarity, phylogenetic distance
Code reuse, shared functions, build toolchain similarity, C2 protocol semantics, config structure overlap, shared packer heritage. Primary discriminator at Species/Strain level. Measured via TLSH/BinDiff similarity scores.
TLSH >85%Shared C2Build ArtifactsConfig Heritage
05 // Interactive Classifier

// Interactive Specimen Classifier

Step through the seven discriminators to classify a malware specimen within the VIRO//TAXON hierarchy and compute its Taxonomic Virulence Score.

VIRO//TAXON SPECIMEN CLASSIFIER Step 1 of 7
DISC I — What is the primary code substrate?
Native Binary (PE/ELF)
Fileless / In-Memory
Script / Macro
Packed / Protected Binary
Firmware / Bootkit
Modular Loader + Plugins
DISC II — What is the mutation / obfuscation class?
Static (no obfuscation)
Packed / Encrypted
Polymorphic Engine
Metamorphic (full rewrite)
AI-Assisted Variant Generation
DISC III — What is the primary propagation strategy?
User-Executed / Phishing
Self-Replicating Worm
Lateral Movement (credential)
Supply Chain Injection
Drive-By / Browser Exploit
DISC IV — What is the primary host environment (tropism)?
Consumer Endpoint
Enterprise Server / DC
OT / ICS / SCADA
Cloud / Container
Hypervisor / ESXi
Mobile / IoT
DISC V — What is the primary operational impact (pathogenesis)?
Ransomware / Extortion
Destructive Wiper
Espionage / Credential Theft
Botnet / DDoS Enrollment
Persistence / Backdoor
Cryptomining
DISC VI — What is the primary immune evasion mechanism?
None / Minimal
Packing / Encryption only
Process Injection / Hollowing
BYOVD / Signed Driver Abuse
ETW Patch + LOLBin + Fileless
DISC VII — What is the phylogenetic lineage confidence?
Novel / No known lineage (emergent)
Distantly related (<50% code similarity)
Related species (50–85% similarity)
Known strain (>85% similarity)
Shared build / C2 / config heritage
VIRO//TAXON CLASSIFICATION RESULT
06 // Mathematical Basis

// Mathematical Formalization

6.1 Polythetic Membership Criterion

For a malware specimen $M$ to be assigned to taxon $T$ at rank $r$, it must satisfy a weighted discriminator threshold. Let $\mathbf{d}(M) = (d_1, \ldots, d_7) \in [0,10]^7$ be the discriminator scores for $M$, and let $\mathbf{w}_r$ be the rank-specific weight vector.

Definition 1 — Polythetic Membership Function
$$M \in T_r \iff \mathbf{w}_r \cdot \mathbf{d}(M) \geq \theta_r \quad \text{and} \quad \|T_r\text{-exemplar} - \mathbf{d}(M)\|_2 \leq \delta_r$$
$\theta_r$ — rank-specific inclusion threshold (higher ranks have lower thresholds)
$\delta_r$ — maximum Euclidean distance from taxon exemplar in discriminator space
$\|T_r\text{-exemplar} - \mathbf{d}(M)\|_2$ — L2 distance from the defining exemplar of taxon $T$ at rank $r$

6.2 Jaccard Code Similarity for Phylogenetic Rank Assignment

At Species and Strain level, phylogenetic lineage (Discriminator VII) is quantified using set-theoretic similarity over function-hash fingerprint sets, analogous to ICTV's sequence identity thresholds.

Definition 2 — Phylogenetic Jaccard Similarity
$$J(A, B) = \frac{|F(A) \cap F(B)|}{|F(A) \cup F(B)|}$$
$F(A)$ — set of function-level hash fingerprints for sample $A$ (e.g., TLSH 128-bit fuzzy hashes, MinHash sketches)
Species assignment requires $J(M, \text{exemplar}) \geq 0.65$
Strain assignment requires $J(M, \text{parent species}) \geq 0.85$
Below $J = 0.30$: novel genus candidate; below $J = 0.10$: novel family candidate.

6.3 Rank-Specific Weight Vectors

Definition 3 — Discriminator Weight Matrix by Rank
$$W = \begin{pmatrix} w_{\text{Domain}} \\ w_{\text{Class}} \\ w_{\text{Order}} \\ w_{\text{Family}} \\ w_{\text{Genus}} \\ w_{\text{Species}} \\ w_{\text{Strain}} \end{pmatrix} = \begin{pmatrix} 0.20 & 0.14 & 0.14 & 0.13 & 0.13 & 0.13 & 0.13 \\ 0.35 & 0.14 & 0.12 & 0.10 & 0.10 & 0.10 & 0.09 \\ 0.15 & 0.12 & 0.35 & 0.12 & 0.10 & 0.08 & 0.08 \\ 0.10 & 0.10 & 0.12 & 0.15 & 0.12 & 0.12 & 0.29 \\ 0.10 & 0.18 & 0.10 & 0.12 & 0.18 & 0.15 & 0.17 \\ 0.05 & 0.18 & 0.05 & 0.05 & 0.05 & 0.20 & 0.42 \\ 0.05 & 0.14 & 0.12 & 0.33 & 0.12 & 0.12 & 0.12 \end{pmatrix}$$
Rows = ranks; columns = discriminators I–VII. Columns sum to 1.0 per row. Higher-rank assignments weight broader ecological and genomic characters; lower-rank assignments weight phylogenetic similarity and structural morphology more heavily.

6.4 Taxonomic Virulence Score

Definition 4 — Taxonomic Virulence Score (TVS)
$$\text{TVS}(M) = \sum_{i=1}^{7} \omega_i \cdot d_i(M) \times 10$$
where $\boldsymbol{\omega} = (0.18, 0.12, 0.16, 0.14, 0.18, 0.14, 0.08)$ is the global severity weight vector,
$d_i(M) \in [0, 10]$ is the score on discriminator $i$, and TVS $\in [0, 100]$.

Severity tiers: TVS 0–29: Low · 30–54: Moderate · 55–74: High · 75–89: Critical · 90–100: Catastrophic
07 // Phylogenetic Map

// Phylogenetic Family Map

The radial tree below visualizes the VIRO//TAXON family-level relationships across documented malware lineages. Branch length encodes phylogenetic distance (inverse Jaccard similarity); color encodes the dominant pathogenesis class.

MALWARE PHYLOGENETIC TREE — FAMILY LEVEL (ILLUSTRATIVE CORPUS)
08 // Integration Layer

// MAEC + STIX 2.1 Integration Mapping

VIRO//TAXON is not a replacement for MAEC or STIX — it is a classification logic layer on top. The mapping below shows which existing MAEC attributes and STIX SDO properties feed each discriminator, and which new properties the VIRO//TAXON extension contributes.

MAEC 5.0 Attributes (Source)
maec:BehaviorCollection
behaviors, action-collections, object-refs
maec:Package / analysis.tool
static analysis results, entropy, packer id
stix2:malware.capabilities
exfil, remote-access, evades-av, kills-processes
stix2:malware.architecture_execution_envs
x86, ARM, MIPS, JavaScript, PowerShell
stix2:malware.operating_system_refs
Windows, Linux, Android, iOS, QNX
stix2:relationship(variant-of, derived-from)
lineage edges in STIX graph
maec:Malware.obfuscation_methods
packing, polymorphic-code, anti-analysis
VIRO//TAXON Extensions (Output)
vt:discriminator[V] pathogenesis_class
Ransomware · Wiper · Stealer · Backdoor · Botnet
vt:discriminator[I,II] substrate + morphology_class
Fileless · Metamorphic · AI-Mutating · Firmware
vt:discriminator[VI] evasion_tier
0–4 (None→BYOVD+ETW+Fileless)
vt:discriminator[I] ecology_class
WinPE · LinuxELF · Script · Mobile · Container
vt:discriminator[IV] tropism_profile
Enterprise · OT · Hypervisor · Cloud · IoT
vt:discriminator[VII] + vt:rank.family/species/strain
J(A,B) score + formal rank assignment
vt:discriminator[III] replication_strategy
Worm · Phishing · SupplyChain · LateralMove
The VIRO//TAXON extension object is designed to be serializable as a STIX 2.1 custom extension (x-viro-taxon) on the existing malware SDO. This means zero breaking changes to existing CTI pipelines — VIRO//TAXON classification data travels as an optional extension alongside existing STIX malware objects, and ATT&CK technique mappings remain unchanged.
09 // TVS Scoring

// TVS Comparative Scoring

Taxonomic Virulence Scores for five canonical malware specimens, illustrating the discriminator profiles across the severity spectrum.

TVS PROFILE — LOCKBIT 3.0
I · Genomic Substrate
8.0
II · Morphology
8.0
III · Replication
7.0
IV · Host Tropism
9.0
V · Pathogenesis
9.0
VI · Immune Evasion
9.0
VII · Phylogeny
7.0
TVS — LOCKBIT 3.0
85.5 / 100
SpecimenClass / OrderGenusTVSSeverity Tier
LockBit 3.0WinPE · Lateral-MoveRansomware-Enterprise85.5CRITICAL
NotPetyaWinPE · Self-ReplicatingWiper-OT92.1CATASTROPHIC
Industroyer 2WinPE · Targeted DeploySabotage-ICS88.4CRITICAL
Emotet (epoch 5)WinPE · PhishingLoader-BotEnrollment72.3HIGH
XMRig-based minerLinux-ELF · Drive-ByMiner-Cloud38.7MODERATE
References

// References

  1. ICTV (2022). The ICTV Report on Virus Classification. International Committee on Taxonomy of Viruses. https://ictv.global/report
  2. MITRE (2022). MAEC Version 5.0 Specification. MITRE Corporation. https://maecproject.github.io
  3. OASIS (2021). STIX Version 2.1 Committee Specification. https://docs.oasis-open.org/cti/stix/v2.1/
  4. Ye, Y. et al. (2017). A Survey on Malware Detection Using Data Mining Techniques. ACM Computing Surveys, 50(3), 1–40.
  5. Canzanese, R. et al. (2015). Toward an Automatic, Online Behavioral Malware Classification System. IEEE SRDS 2015.
  6. Vadrevu, P. & Perdisci, R. (2016). MAXS: Scaling Malware Execution with Sequential Multi-Hypothesis Testing. AsiaCCS 2016.
  7. Cimitile, A. et al. (2017). Talos: No More Ransomware Victims. J. Universal Computer Science, 23(9), 862–876.
  8. Karbab, E. & Debbabi, M. (2021). MalDy: Portable Data-Driven Malware Detection Using Natural Language Processing. Digital Investigation, 36.
  9. Nataraj, L. et al. (2011). Malware Images: Visualization and Automatic Classification. VizSec 2011.
  10. Souri, A. & Hosseini, R. (2018). A State-of-the-Art Survey of Malware Detection Approaches Using Data Mining Techniques. Human-centric Computing and Information Sciences, 8(1), 1–22.
  11. Miramirkhani, N. et al. (2017). Spotless Sandboxes: Evading Malware Analysis Systems Using Wear-and-Tear Artifacts. IEEE S&P 2017.
  12. Bayer, U. et al. (2009). Scalable, Behavior-Based Malware Clustering. NDSS 2009.