Virologically-Aligned
Malware Classification
// A Polythetic Taxonomy
DSc Candidate, Marymount University
viro.kulpritstudios.com
// Abstract
// Prior Art Survey
Three standards families and a substantial academic literature form the prior art for malware classification. None achieves the combination of hierarchical stability, lineage-awareness, and virological coherence that ICTV provides for biological viruses.
1.1 Existing Standards Landscape
| Framework | Scope | Hierarchical? | Lineage-Aware? | Virological Analog | Gap |
|---|---|---|---|---|---|
| MAEC 5.0 | Malware attribute encoding: behaviors, artifacts, relationships | PARTIAL | LIMITED | Phenotypic characterization | No formal rank hierarchy; no phylogenetic discriminator |
| STIX 2.1 | CTI interoperability; malware SDO with capability vocab | NO | NO | Transport / exchange layer | Classification logic entirely absent; naming is vendor-defined |
| ICTV (biological) | Biological virus taxonomy: 6 ranks, polythetic, sequence-primary | YES | YES | Reference model | Biological substrate; not directly applicable to software |
| Malware Evaluator | Feature-based classification across malware lifecycle stages | PARTIAL | NO | Lifecycle staging | No rank hierarchy; platform-centric features dominate |
| IoT Lifecycle | Infection vectors, payload properties, persistence, C2 for IoT | PARTIAL | NO | Transmission + tropism | Domain-restricted; no generalizable hierarchy |
| Vendor Naming | De facto labels (WannaCry, Emotet, BlackCat) | NO | NO | Common name (pre-ICTV chaos) | No stability; no discriminating criteria; vendor-inconsistent |
1.2 Why ICTV is the Right Model
ICTV's core methodological insight is polythetic classification: no single character is either necessary or sufficient to define a taxon. Membership is determined by possessing a sufficient number of characters from a defined set. This is ideal for malware, where no single feature cleanly separates all members of a family (ransomware families vary wildly in their cryptographic schemes, propagation methods, and target environments). The polythetic model accommodates this natural variance without requiring exceptions-per-rule overrides.
ICTV also prioritizes sequence similarity as a primary discriminator at lower ranks (genus, species), which maps naturally to code-similarity analysis tools (BinDiff, DeepBinDiff, TLSH clustering) already in use by malware analysts. The phylogenetic tier of VIRO//TAXON directly inherits this approach.
// Framework Design Principles
VIRO//TAXON is designed around four explicit principles inherited from ICTV methodology and adapted for the malware domain:
1. Polythetic membership. No single discriminator gates inclusion in a taxon. A specimen is classified at a given rank by scoring above threshold on a weighted combination of discriminators, not by satisfying a necessary condition. This prevents classification failure when one feature is absent or unobservable.
2. Rank stability over variant instability. Higher ranks (Domain, Class, Order) should be stable across years; lower ranks (Species, Strain) may evolve rapidly. This mirrors ICTV, where realm-level assignments rarely change but species designations update annually. Analysts can rely on Order-level classifications for long-horizon threat modeling without chasing per-campaign updates.
3. Machine-readable + human-readable naming. Each taxon receives both a formal VIRO//TAXON identifier (e.g., VT-ORD-WORM-001) and a common name derived from dominant behavioral character. The identifier is stable; the common name may be updated.
4. MAEC-grounded, STIX-transported. VIRO//TAXON does not replace MAEC attribute vocabulary. It adds a classification layer on top: given a MAEC-encoded sample, VIRO//TAXON provides the decision procedure to assign it to a rank and generate a TVS score. The result is serialized as a STIX 2.1 extension object for CTI pipeline integration.
// The Seven-Rank Hierarchy
VIRO//TAXON defines seven ranks organized from the most general (Domain) to the most specific (Strain). Each rank maps to both a biological virology analog and a set of primary discriminating characters.
// Seven Primary Discriminators
Each discriminator maps a biological virology classification character to a malware-observable attribute. All seven are assessed for every specimen being classified. Higher ranks require agreement on fewer discriminators; lower ranks require high agreement on all seven.
// Interactive Specimen Classifier
Step through the seven discriminators to classify a malware specimen within the VIRO//TAXON hierarchy and compute its Taxonomic Virulence Score.
// Mathematical Formalization
6.1 Polythetic Membership Criterion
For a malware specimen $M$ to be assigned to taxon $T$ at rank $r$, it must satisfy a weighted discriminator threshold. Let $\mathbf{d}(M) = (d_1, \ldots, d_7) \in [0,10]^7$ be the discriminator scores for $M$, and let $\mathbf{w}_r$ be the rank-specific weight vector.
$\delta_r$ — maximum Euclidean distance from taxon exemplar in discriminator space
$\|T_r\text{-exemplar} - \mathbf{d}(M)\|_2$ — L2 distance from the defining exemplar of taxon $T$ at rank $r$
6.2 Jaccard Code Similarity for Phylogenetic Rank Assignment
At Species and Strain level, phylogenetic lineage (Discriminator VII) is quantified using set-theoretic similarity over function-hash fingerprint sets, analogous to ICTV's sequence identity thresholds.
Species assignment requires $J(M, \text{exemplar}) \geq 0.65$
Strain assignment requires $J(M, \text{parent species}) \geq 0.85$
Below $J = 0.30$: novel genus candidate; below $J = 0.10$: novel family candidate.
6.3 Rank-Specific Weight Vectors
6.4 Taxonomic Virulence Score
$d_i(M) \in [0, 10]$ is the score on discriminator $i$, and TVS $\in [0, 100]$.
Severity tiers: TVS 0–29: Low · 30–54: Moderate · 55–74: High · 75–89: Critical · 90–100: Catastrophic
// Phylogenetic Family Map
The radial tree below visualizes the VIRO//TAXON family-level relationships across documented malware lineages. Branch length encodes phylogenetic distance (inverse Jaccard similarity); color encodes the dominant pathogenesis class.
// MAEC + STIX 2.1 Integration Mapping
VIRO//TAXON is not a replacement for MAEC or STIX — it is a classification logic layer on top. The mapping below shows which existing MAEC attributes and STIX SDO properties feed each discriminator, and which new properties the VIRO//TAXON extension contributes.
// TVS Comparative Scoring
Taxonomic Virulence Scores for five canonical malware specimens, illustrating the discriminator profiles across the severity spectrum.
| Specimen | Class / Order | Genus | TVS | Severity Tier |
|---|---|---|---|---|
| LockBit 3.0 | WinPE · Lateral-Move | Ransomware-Enterprise | 85.5 | CRITICAL |
| NotPetya | WinPE · Self-Replicating | Wiper-OT | 92.1 | CATASTROPHIC |
| Industroyer 2 | WinPE · Targeted Deploy | Sabotage-ICS | 88.4 | CRITICAL |
| Emotet (epoch 5) | WinPE · Phishing | Loader-BotEnrollment | 72.3 | HIGH |
| XMRig-based miner | Linux-ELF · Drive-By | Miner-Cloud | 38.7 | MODERATE |
// References
- ICTV (2022). The ICTV Report on Virus Classification. International Committee on Taxonomy of Viruses. https://ictv.global/report
- MITRE (2022). MAEC Version 5.0 Specification. MITRE Corporation. https://maecproject.github.io
- OASIS (2021). STIX Version 2.1 Committee Specification. https://docs.oasis-open.org/cti/stix/v2.1/
- Ye, Y. et al. (2017). A Survey on Malware Detection Using Data Mining Techniques. ACM Computing Surveys, 50(3), 1–40.
- Canzanese, R. et al. (2015). Toward an Automatic, Online Behavioral Malware Classification System. IEEE SRDS 2015.
- Vadrevu, P. & Perdisci, R. (2016). MAXS: Scaling Malware Execution with Sequential Multi-Hypothesis Testing. AsiaCCS 2016.
- Cimitile, A. et al. (2017). Talos: No More Ransomware Victims. J. Universal Computer Science, 23(9), 862–876.
- Karbab, E. & Debbabi, M. (2021). MalDy: Portable Data-Driven Malware Detection Using Natural Language Processing. Digital Investigation, 36.
- Nataraj, L. et al. (2011). Malware Images: Visualization and Automatic Classification. VizSec 2011.
- Souri, A. & Hosseini, R. (2018). A State-of-the-Art Survey of Malware Detection Approaches Using Data Mining Techniques. Human-centric Computing and Information Sciences, 8(1), 1–22.
- Miramirkhani, N. et al. (2017). Spotless Sandboxes: Evading Malware Analysis Systems Using Wear-and-Tear Artifacts. IEEE S&P 2017.
- Bayer, U. et al. (2009). Scalable, Behavior-Based Malware Clustering. NDSS 2009.