Yanez Multimodal Inorganic Identities Dataset (MIID) Subnet (ت)

Mar 14, 2025
12 min read

Updated: Jun 10, 2025

A Subnet Built for Real-World Impact

The Bittensor ecosystem is rapidly evolving, bringing together decentralized AI applications across multiple domains. While many subnets explore cutting-edge AI research, some are beginning to establish clear business applications that bridge decentralized AI with real-world industries.

The Yanez Multimodal Inorganic Identities Dataset (MIID) Subnet is one such project.

This subnet powers Yanez Compliance, an AI-powered platform for detecting and correcting exposure, weaknesses, and configuration flaws in financial crime prevention systems. Effective financial crime prevention depends on how well these systems can detect fraudulent identities, prevent money laundering, and reduce regulatory risks. To properly test these systems, a diverse and controlled dataset of inorganic identities is essential.

By leveraging Bittensor’s decentralized AI infrastructure, the Yanez subnet enables the generation of high-quality inorganic identities, which serve as the foundation for testing, tuning, and validating fraud detection, sanctions screening, and broader financial crime prevention measures.

With a direct business use case and existing clients, the Yanez subnet brings practical, real-world adoption to the Bittensor ecosystem — demonstrating how decentralized AI can support financial institutions in strengthening their compliance and security frameworks.

Identity Data Generation

One of the biggest challenges in financial crime prevention is dealing with synthetic identities, a term commonly associated with fraudulent activity. Public literature, including reports from financial regulators and industry leaders such as the Federal Reserve and the FTC, describes synthetic identity fraud as one of the fastest-growing financial crimes. In these cases, fraudsters create identities by combining real and fictitious information to bypass security systems, establish fraudulent credit lines, and exploit financial services.

However, while the term “synthetic identity” is often linked to nefarious intent, the underlying concept of generating artificial identities is not inherently malicious — it depends on the purpose and use case. This is where inorganic identities come in.

Yanez’ Inorganic identities serve an entirely different purpose: they are AI-generated identity profiles explicitly designed to test, evaluate, and improve financial crime prevention systems.

For example, a name-matching detection algorithm may require variations on just names. Detecting potential identity theft may require identifying inorganic identities with fake digital documents. Fraudsters use synthetic identities to perpetrate crime and circumvent security from many applications. Therefore, all organizations and systems preventing fraud care about detecting identities developed by deep fake creation systems and approaches. Generating “good” inorganic identities is essential to train models that help detect them. This subnet incentivizes creating and generating inorganic identities to be used in different use cases.

Identity Use Cases in Financial Crime Prevention and Anti-money Laundering (AML) Compliance

Financial crime prevention is all about how to prevent a compromise or abuse of the financial ecosystem. There is a blurry line between financial crime prevention operations and compliance operations for anti-money laundering and sanctions. The concepts are very much related, and the applicability of the regulation is intertwined. For the purposes of this document, we’ll treat them as if they were one in the same hoping not to offend any practitioner of one vs the other.

There are many threat scenarios related to compromising or abusing the financial ecosystem. In most cases nefarious actors attempt to hide their identities or aspects of their identities as they go about executing their abuse and compromise of the financial functions. Mostly they don’t want to be identified as a nefarious actor. For that reason, realistically replicating a threat scenario almost always involves some form of synthetic identity. That is not to say that every threat scenario can only be detected by looking at the identities. In transaction fraud, for example, you can spot it by looking at patterns within those transactions. Once a fraudulent pattern is identified, likely, the identity behind it is synthetic or compromised.

Therefore, our focus on building datasets of inorganic identities. The following are examples of use cases relevant to the different functions of detecting synthetic identities. This is not meant to be an exhaustive set rather a collection of examples.

Name variation: These are variation of names with the purpose of avoiding being recognized as a high risk, matched to a sanctions list, being identified as a PEP, or simply to avoid behavior correlations.
Demographic inconsistencies: Identity attributes are way beyond names and surnames. Identities have date of birth, locations, addresses, languages, etc. All of these can be used to fool systems and avoid detection. For example users from high risk or sanctioned countries want to hide the fact that they are in those locations.
Synthetic identities: This is the most common term to represent an identity that has been cobbled together from several identities, partially made up, or completely made up. Synthetic identities is a multibillion dollar problem. Synthetic identities can be used, for example, to create bank accounts, credit cards, etc. Once they are discovered they can’t be associated with the actual identity behind it.

Other Use Cases for the General Public and Subnet Communities

For Gaming:
- Inorganic identities can enhance gaming ecosystems by providing NPCs (non-playable characters) with realistic and diverse identity attributes. These identities can also serve as anonymized player profiles in metaverse environments, preserving privacy while maintaining engagement.
For Virtual Reality (VR):
- Immersive experiences in VR often require dynamic identities to simulate real-world interactions. AI-generated identities can be used in virtual training programs, social VR platforms, and AI-driven role-playing scenarios, enhancing realism and personalization.
For AI Training and Model Benchmarking:
- AI and machine learning models used in identity verification, biometric recognition, fraud detection, and security applications require high-quality training data. Multimodal inorganic identities enable researchers and developers to benchmark the performance of identity-related AI systems against diverse, unbiased standards datasets.
For Privacy-Preserving Data Research:
- Researchers focused on identity management, biometric security, and digital identity solutions can utilize inorganic identities to conduct studies without exposing real user data, thus enhancing privacy and promoting ethical AI development.

Definitions

Threat Scenario

A way to attempt to circumvent a detection measure implemented by a screening/monitoring/evaluation system.

A threat scenario can be known or unknown.

Known Threat Scenario

It is an scenario that can be used by a miner to generate data and can be validated by a validator from a database of scenarios.

Unknown Threat Scenario

It is a scenario that may or may not be valid and that is NOT classified as a known threat scenario. It has a description and data that is consistent to such a description. A validating algorithm must be capable of assessing the likelihood that the threat scenario description and the data are reasonably related.

Proposed Threat Scenario

An unknown threat scenario that a miner is submitting to a validator for post validation consideration. A proposed scenario must be accompanied by a dataset reflecting an execution vector.

Execution Vectors

An execution vector is a set of data that reflects/executes a given threat scenario (known or unknown).

Known Execution Vector

A known execution vectors exercises a known threat scenario. Its description is known and it exists in the validator’s database.

Unknown Execution Vector

It may exercise a known or proposed threat scenario. A validating algorithm must be capable of assessing the likelihood that this unknown vector exercises a known or proposed threat scenario.

Where does Bittensor fit into Yanez?

Yanez compliance is a vulnerability scanner for financial crime prevention systems. It’s an AI-powered platform for detecting and correcting exposure, weaknesses and configuration flaws of financial crime prevention systems. The key to the success of this operation is the quality of the data being used to proactively test and tune these systems.

Bittensor provides a framework that fits Yanez’ data quality requirements while providing operational commercial benefits. Miners can generate the raw data needed, while validators ensure that the data meet the quality characteristics required for successful testing and tuning processes. The incentives and reward mechanisms makes it operationally attractive, as participants become active stakeholders in the overall business.

By leveraging Bittensor’s decentralized infrastructure, Yanez can build a multimodal inorganic identity database while maintaining cost-efficiency and accuracy. Synthetic data requires diverse variations, and different types of computation power help introduce essential complexity into the dataset. The incentivized mining and validation system ensures that the network continuously improve their models, producing more accurate, reliable, and diverse inorganic identities. The distributed computing power of miners helps lower costs while maintaining high data integrity, making identity generation scalable and verifiable.

A key advantage of Bittensor is its dual-job mechanism:

Synthetic Jobs: are designed to evaluate miner accuracy and creativity, while also refining the models implemented by validators to ensure data quality. These help Yanez refine its identity generation and quality process.
Organic Jobs: represent real user requests from applications using Yanez’s subnet, providing genuine demand for testing and training data sets.

This collaboration strengthens Yanez’s testing and compliance efforts and fosters an open, decentralized ecosystem for AI-driven identity generation, benefiting a wide range of industries.

How does this subnet work?

Validators: They define identity generation requirements and evaluate the quality of outputs provided by miners. Validators maintain a database of known threat scenarios and execution vectors for each scenario. The validators will create (inorganic) queries and route (Yanez generated) organic queries based on these scenarios. These queries may demand that the datasets are generated based on known execution vectors, or demand that miners attempt new execution vectors to fulfill the datasets.

Miners: Miners implement AI models capable of generating synthetic identities that meet the specific requirements set by validators. Their models must produce identities that align with predefined threat scenarios and metrics, ensuring accuracy and consistency.

Subnet Owner, Yanez: Along with controlling and maintaining the subnet, Yanez uses benchmarking models to assess the identities produced by miners and evaluated and shared by Validators. Yanez compare generated identities against expected metrics to validate their authenticity, diversity, and usability for different applications. Yanez measures the overall quality of the usage of such sets in the real world and updates requirements, evaluation techniques, and the incentive/reward system based on real-world application benchmarks.

This system ensures a continuous feedback loop, where miners, validators and Yanez are working in sync. As miners refine their techniques, validators can dynamically optimize assignments informed by Yanez real-world application, leading to an efficient, scalable, and decentralized identity generation process.

*Figure 1 - Yanez MIID Subnet Architecture*

How to query a subnet miner?

The way requirements are communicated to miners will evolve over time as the complexity of the identities to satisfy business applications in financial crime prevention advance. We envision that miners can satisfy queries in the form of known threat scenarios. Therefore, querying a subnet miner is not a simple retrieval process; instead, it is an adaptive approach that transforms threat scenarios requests into computational challenges requiring advanced NLP techniques that ultimately result in identity datasets to fulfill the required threat scenarios.

This ensures that miners dynamically generate inorganic identities rather than relying on pre-existing databases, making the system robust, scalable, and resistant to data leaks.

These are examples of the queries that are expected to be solved by miners:

“Please produce a dataset of 100 entries that tests the capabilities of a system to detect a user trying to hide that its location is in Sudan. The dataset should contain 50 of the entries where the user is not from Sudan (good entries), and 50 of the entries trying to hide that their location is Sudan (execution vectors to known threat scenarios – the ones that should be detected by a screening system). At least 35 of the execution vector entries should come from known vectors. The remaining 15 entries can be either of an execution vector of a known threat scenario, or you can explore a new vector. Please ensure that you label the entries that you are generating.”

“Generate 100 synthetic identities attempting to bypass screening systems while maintaining logical demographic and geographic coherence. Each entry should include name, birth date, nationality, and location. Flag inconsistencies where an identity does not logically fit within the expected cultural and geographic norms (e.g., a Western name in a sanctioned region, an unrealistic age for high-risk transactions). At least 20% of the dataset should contain entries with deliberately unrealistic data that should be filtered out by validators.”

What task should a subnet miner perform?

A subnet miner is responsible for:
- Processing both organic (real API client requests) and synthetic (validation/testing) queries
- Generate execution vectors that adhere to the description of the requested threat scenarios
Miners should not be able to distinguish between validation queries and real API client queries, ensuring they always serve requests consistently

How does a miner compete?

Miners in the Yanez subnet compete to generate the highest-quality identity data. Success depends not just on participation but on outperforming others across several dimensions:

Run the core mining code to generate valid responses while minimizing latency and compute cost.
Accurately decode structured and abstract constraints to deliver precise, tailored identity outputs.
Experiment with better models, fine-tuned parameters, and advanced prompt engineering to stay ahead.
Ensure outputs are formatted and error-free to avoid penalties and increase scores.
Submit original responses and propose new threat scenarios to earn higher rewards and leaderboard visibility.

How should a subnet miner respond?

A subnet miner’s response should be structured, ensuring clarity and ease of validation. The output format is typically a Python data frame sent back with the response to the request synapse (check the Bittensor whitepaper for the definition of neurons and synapse).

How validators evaluate and reward?

The Yanez MIID subnet incentivizes miners to generate high-quality inorganic identities that serve as execution vectors for known and proposed threat scenarios. The system evaluates how well each submission meets the task objectives, focusing on accuracy, diversity, and usefulness for testing financial crime prevention systems.

Miners are free to use any model and infrastructure. Rewards are assigned based on a composite score built from the following core metrics:

Reward Evaluation Criteria (Subject to Roadmap Updates)

Relevance
- Measures how well the submitted identity matches the intent of the threat scenario and satisfies all query constraints (e.g., geographic markers, name variations, behavioral patterns).
- Submissions must contain valid, coherent attributes aligned with the execution vector requested.
- Scenario-specific rules (e.g., obfuscated IP + legitimate address) must be respected.
Novelty
- Assesses uniqueness relative to prior submissions (across the subnet)
- Reused or derivative entries receive lower scores.
- Unique and unexplored scenario execution vectors are rewarded.
Efficiency
- Measures how quickly and consistently miners return valid data.
- Faster responses that meet quality standards receive a higher reward.
- Spamming trivial variations or submitting repetitive entries triggers penalties.
Volume Contribution
- Rewards miners who contribute large sets of valid, diverse identities.
- Sustained generation of high-quality data over time increases total rewards.

Penalty Conditions (Adaptive and Phase-Dependent)

Penalties apply when miner outputs reduce the quality or utility of the dataset:

Query Non-Compliance
- Incorrect structure, missing required attributes, or failure to follow the scenario description.
Superficial Variations
- Minor changes that do not meaningfully affect detection (e.g., small edits that bypass without realism).
Dataset Similarity
- Submissions that closely mirror public datasets or prior entries.
Reward Farming Detection
- Excessive frequency of trivial submissions results in dynamic throttling or de-prioritization by validators.

Benchmarking and Quality Assessment: YEVS and YSPI

As the subnet owner, Yanez maintains the integrity and performance of the MIID subnet by continuously benchmarking the subnet’s output and updating validation protocols. This process relies on two foundational evaluation frameworks:

YEVS – Yanez Explainable Vector Space

YEVS is a structured vector space representation used to evaluate the quality, diversity, and structure of synthetic identities. It enables Yanez to:

Map identity variations (e.g., names, addresses, demographics) into a multidimensional space.
Measure distribution, coverage, and clustering behavior across different transformation attributes.
Assess explainability and novelty by identifying redundant patterns or low-variance output.
Detect bias or structural gaps in miner outputs to inform rebalancing and guidance.

YEVS serves as an internal benchmark to analyze how effectively the subnet explores the threat space and fulfills the synthetic diversity requirements.

YSPI – Yanez Sanctions Precision Index

YSPI is an evaluation metric designed to assess how well a sanctions or KYC screening system performs under stress scenarios generated via MIID. It provides a real-world benchmark to measure the impact of Yanez-generated datasets.

YSPI Workflow:

1. Inject MIID-generated identities into a testbed screening environment.

2. Report a consolidated sanctions precision index score over time.

Applications of YSPI:

Monitor improvement in detection systems tuned using Yanez data.
Compare multiple screening systems under consistent threat conditions.
Provide audit-grade reports for compliance verification.

Together, YEVS and YSPI enable Yanez to:

Benchmark the realism and threat relevance of miner-generated data.
Give feedback on miners' performance through post-evaluation incentives for those who are significantly impactful during threat scenario testing data.
Drive adaptive updates to reward functions, threat models, and quality filters.
Measure the real-world utility of synthetic datasets across AML, sanctions, and fraud prevention workflows.

This feedback loop ensures that the MIID subnet evolves as a data-quality-driven ecosystem where miner incentives align with operational benchmarks from real-world screening systems.

Contributing/ Join Us

We believe in the power of community and collaboration. Join us in building the world's largest decentralized multimodal synthetic identities dataset! Whether you're a researcher, developer, or data enthusiast, there are many ways to contribute:

Submit synthetic identities and their attributes
Develop and improve data validation and quality control mechanisms
Train and fine-tune models on the dataset
Create applications and tools that leverage the dataset
Provide feedback and suggestions for improvement

Potential collaboration with existing subnets

Targon (Subnet 4): Deterministic Verification
- Targon provides a deterministic verification mechanism to incentivize miners to operate OpenAI-compliant endpoints, handling both synthetic and organic queries.
- In Phases 1 through 3 of our roadmap, we can utilize Targon to generate synthetic names, addresses, and birth dates. By crafting prompts specific to our requirements, miners can produce relevant data, which we can then filter and post-process before validation.

Compute Subnet (Subnet 51) & ComputeHorde (Subnet 12): Decentralized GPU Resources
- These subnets offer decentralized, peer-to-peer GPU rental marketplaces, connecting miners contributing GPU resources with users needing computational power.
- As we progress into advanced phases involving complex synthetic identity generation (e.g., biometrics, documents), the computational demands will increase. Our miners can leverage these subnets to access scalable and trusted GPU computing power, ensuring efficient processing.
Data Universe (Subnet 13): Decentralized Data Collection and Storage
- Data Universe focuses on collecting and storing large datasets from diverse sources for use by other subnets.
- We can utilize this subnet to store our synthetic identity datasets and build robust data pipelines, ensuring scalability and decentralization in our data management approach.
- Yanez can utilize Data Universe to access diverse data sources, enriching its synthetic identity generation processes.
- Yanez can contribute its generated synthetic identity datasets to Data Universe, promoting data sharing and collaboration within the Bittensor ecosystem.
Three Gen (Subnet 17) & 3D Asset Generator (Subnet 46): 3D Content Creation
- These subnets provide platforms for democratizing 3D content creation, facilitating the development of virtual worlds, games, and AR/VR experiences.
- In later phases, when creating 3D avatars of synthetic identities, our miners can collaborate with these subnets to generate high-quality 3D models, enhancing the realism and applicability of our synthetic identities.
Dojo Subnet (Subnet 52): Crowdsourced Data Labeling
- Dojo offers an open platform designed to crowdsource high-quality human-generated datasets, allowing participants to earn TAO by labeling data or providing human-preference information.
- We can engage this subnet to involve human participants in generating and validating synthetic identities, especially for tasks like name variations, addresses, and birth dates. This collaboration can enhance the quality and authenticity of our datasets.