Janitr Label Set v2026 — comprehensive, multi-label-friendly, aligned with X's rule structure.
- Multi-label: apply all labels that match a post.
clean: marks a post as benign/safe. Can be combined with topic labels (e.g.,clean+topic_crypto= a normal crypto tweet).- Separate "topic" vs "behavior": e.g., a crypto scam is
topic_crypto+scam(+ oftenimpersonation,manipulated_media, etc.). - If you need to reduce complexity later, you can collapse labels by group (e.g., treat
reply_spamasspam).
The raw dataset (*.jsonl) keeps the full, multi-label taxonomy from this document. For fastText training we collapse those raw labels into 3 mutually-exclusive training classes during data preparation (scripts/prepare_data.py):
scamphishing,malware,fake_support,recovery_scam,job_scam,romance_scam,impersonation,account_compromisespam,reply_spam,dm_spampromo,affiliate,lead_gen,engagement_bait,follow_train,giveaway,bot
topic_cryptotopic_crypto(only when noscam-bucket label is present)
clean- everything else, including samples with empty labels or just
clean
- everything else, including samples with empty labels or just
Priority rule (when multiple raw labels are present): scam > topic_crypto > clean.
# Janitr label set v2026 (X-focused)
base:
- clean
security_fraud:
- scam
- phishing
- malware
- impersonation
- fake_support
- recovery_scam
- job_scam
- romance_scam
- account_compromise
spam_manipulation_commercial:
- spam
- reply_spam
- dm_spam
- promo
- affiliate
- lead_gen
- engagement_bait
- follow_train
- giveaway
- platform_manipulation
- astroturf
- bot
- ai_generated
- ai_generated_reply
- ai_slop
- content_farm
- copypasta
- stolen_content
- clickbait
- low_effort
- vaguepost
- ragebait
information_integrity:
- misinformation
- civic_misinfo
- manipulated_media
- conspiracy
- pseudoscience
safety_sensitive:
- hate
- harassment
- threat_violence
- violent_extremism
- graphic_violence
- self_harm
- adult_nudity
- nonconsensual_nudity
- child_exploitation
- illegal_goods
- privacy_doxxing
- profanity
topic_filters_optional:
# News and society
- topic_news
- topic_world_news
- topic_local_news
- topic_war_conflict
- topic_crime_truecrime
- topic_disasters_tragedy
- topic_law_courts
- topic_environment_climate
- topic_social_issues
# Politics and governance
- topic_politics
- topic_elections
# Money and commerce
- topic_finance
- topic_investing
- topic_personal_finance
- topic_crypto
- topic_real_estate
- topic_shopping_deals
- topic_marketing_advertising
- topic_gambling
# Technology
- topic_technology
- topic_ai
- topic_cybersecurity
- topic_programming_dev
- topic_startups_vc
- topic_consumer_electronics
# Entertainment and fandom
- topic_entertainment
- topic_tv_movies
- topic_music
- topic_books
- topic_anime_manga
- topic_gaming
- topic_esports
- topic_celebrity
- topic_celebrity_gossip
- topic_comedy_memes
# Lifestyle
- topic_health
- topic_nutrition_diet
- topic_fitness
- topic_mental_health
- topic_beauty_fashion
- topic_food_drink
- topic_travel
- topic_home_garden
- topic_family_parenting
- topic_relationships_dating
# Sports
- topic_sports
# Other
- topic_religion
- topic_adult_services
- topic_language_other
special_modes:
- spoilerBackbone alignment note: many of these map cleanly onto X's own "Safety / Privacy / Authenticity" rule headings (e.g., violent content, child safety, abuse/harassment, hateful conduct, suicide/self-harm, adult content, illegal goods, private information, platform manipulation/spam, civic integrity, misleading identities, synthetic/manipulated media, etc.).
Tight, labeler-friendly definitions. Intentionally "binary-ish" to reduce subjectivity.
None of the other labels apply.
Tries to trick users into losing money/assets or taking a clearly fraudulent action (giveaway doubling, fake investment returns, fake invoices, etc.).
Attempts to capture credentials/OTP/recovery codes or send you to a fake login/verification flow.
Attempts to get the user to install/run something or click a download/exploit link (cracked software, "APK", "driver update", etc.).
Pretends to be a person/org (brand, government, creator) to mislead/confuse. If for fraud, also apply scam.
A common X-specific impersonation subtype: "support" replies targeting complaint tweets, pushing DMs, links, or credential capture (often phishing + impersonation).
Claims it can recover lost funds (esp. crypto) for an upfront fee / DM.
Fake recruiting, "remote job, instant pay," suspicious hiring funnels.
Relationship-building with eventual money/crypto request (often in replies or DMs).
Explicit hacking/account takeover attempts or evidence a hacked account is being used for spam/scams (can be inferred by sudden theme change + scam links, but label conservatively).
These cover what users complain about most on X day-to-day: reply spam, bot swarms, engagement farming, and AI slop.
Unsolicited junk, repetitive templates, irrelevant promos, link spam; includes porn-bot style spam if you want a single bucket.
Spam whose primary form is replies ("great post!", single emoji, generic praise, repeated templated replies, etc.). Distinct enough on X to warrant its own label.
DM spam (keep for future; useful for dataset continuity).
Marketing/self-promotion (products, services, newsletters, "buy my course"), including astroturf-y native ads. If coordinated, add platform_manipulation/astroturf.
Referral codes/affiliate links (can coexist with promo).
Funnel mechanics: "DM me 'GUIDE'", "comment 'X' and I'll send it", gated lead magnets.
Explicit "like/RT/comment to…" prompts, forced-choice image bait, "tag 3 friends," "vote in poll," etc.
Follow-for-follow, mutual trains, "gain followers" chains.
Contests/giveaways even if legitimate. If fraudulent, also apply scam.
Explicit attempts to game ranking/reach (coordinated boosts, brigading instructions, "mass report," "block campaign," etc.).
Coordinated inauthentic persuasion campaigns (can be human-led or bot-led). Use when there's clear coordination signals (copy/paste scripts, "talking points," synchronized posting).
Automation signals dominate (high volume, templated replies, unnatural repetition).
Looks AI-written/AI-made (even if a human posted it). General detection label for AI-authored content regardless of quality or intent.
AI-generated replies specifically. This is a distinct category and a major user complaint. Users widely report that AI-generated replies are flooding social media, creating inauthentic engagement and drowning out genuine conversation. These are typically:
- Generic, overly flattering, or template-like replies ("Great post!", "This is so insightful!")
- Repetitive phrasing patterns across many replies from the same or different accounts
- Low specificity to the original post content
- Often combined with promotional intent or follow-baiting
Data collection note: AI-generated replies should be scraped with surrounding context (the parent post they're replying to) for efficient training. The relationship between the reply and its parent is a strong signal—genuine replies reference specific content while AI replies are generic regardless of context.
Can co-occur with: reply_spam, bot, promo, lead_gen, ai_slop.
Low-quality, high-volume AI output optimized for engagement/monetization. "AI slop" is now a widely used term for this category. Distinct from ai_generated (neutral detection) and ai_generated_reply (reply-specific)—ai_slop implies the content is actively unwanted due to low quality and engagement-farming intent.
Mass-produced content (AI or human) where volume + sameness is the point (thread mills, clip mills, scraped summaries).
Chain-letter style templates; repeated meme text; "post this or…" formats.
Uncredited reposts / content theft (maps to copyright/trademark issues).
Intentionally misleading hook/headline; mismatch between promise and payload.
Extremely low-signal posts ("this", "lol", emoji-only, "GM", generic praise).
Intentionally ambiguous personal drama / attention bait.
Inflammatory framing designed to provoke outrage and engagement; may or may not be fact-based.
Materially false or unverified claims presented as true. Inaccuracy is a top complaint among people who get news on social media.
Misinformation about elections/civic processes (how/when/where to vote; participation suppression). Explicitly aligned with X civic integrity rules.
Deepfakes, deceptively edited media, or out-of-context media presented to mislead. Includes AI-generated synthetic media.
Claims centered on secret plots and non-falsifiable narratives (often overlaps with misinformation).
Health/science content that contradicts established evidence (often overlaps with misinformation and health).
These align closely with X's own rule categories.
Attacks on protected categories (race, religion, sexuality, etc.).
Targeted harassment, bullying, dogpiling incitement.
Threats, incitement, glorification/desire for violence.
Support/affiliation with violent & hateful entities; terrorist propaganda; perpetrators/manifestos.
Gore / graphic injury media (even if "newsworthy," users often want it filtered).
Encourages/promotes self-harm or suicide.
Consensual adult nudity/sexual behavior (NSFW).
Intimate imagery without consent, including AI "nudification" and sexual deepfakes. Active 2026 regulatory and safety concern.
Any child sexual exploitation content. For safety: don't collect/store image examples; prefer text-only references or synthetic placeholders with no illegal content.
Sale/facilitation of illegal or certain regulated goods/services (drugs, weapons, etc.).
Posting private info (address/phone), or threats/incentives to expose it.
Profanity/obscenity filter separate from harassment/hate (useful for "clean feed" mode).
These are "user preference" filters, not moral judgments. Use topic_* prefix to separate "aboutness" from "badness/behavior" labels. A post can be topic_crypto + scam + impersonation.
UI note: Show ~15–25 top toggles, add "More topics…" search for the long tail. Internally map to IAB Tier-1 for complete topic coverage.
General news content.
International news, global events.
Local/regional news coverage.
War, military conflict, armed disputes. Users often want to reduce exposure even when accurate.
Crime reporting, true crime content, criminal cases.
Accidents, deaths, natural disasters, tragedies. Distinct from graphic_violence.
Legal proceedings, court cases, judicial content.
Environmental issues, climate change, sustainability.
Activism, protests, culture-war adjacent content. Separate from topic_politics.
Political content, partisan discussion, government affairs.
Election-related content. Often treated as a special mode; can coexist with civic_misinfo.
Financial advice, stock tips, investment discussion (non-crypto).
Stocks, ETFs, options discourse.
Budgeting, debt, FIRE movement, saving strategies.
Crypto-related content. Any mention of a specific coin, token, NFT, wallet, exchange, DeFi, or blockchain references.
Housing discourse, landlord/tenant content, property listings.
Deal spam, sales, coupons. Not necessarily "spam" but users may want to filter.
Ad industry, creator economy, marketing discourse.
Gambling, betting, casino content.
General technology content.
AI discourse. Separate from ai_generated/ai_slop behavior labels.
Security breaches, exploits, infosec content. Can be noisy.
Developer content, coding, "dev Twitter."
Founder content, VC discourse, startup ecosystem.
Gadgets, devices, hardware reviews.
General entertainment content (parent category).
Television and film content, reviews, discussions.
Music content, artist discussion, releases.
Book content, reading, literary discussion.
Anime and manga content, Japanese media.
Video game content, game discussion.
Competitive gaming, esports tournaments.
Celebrity content, famous people.
Celebrity gossip specifically. More specific than topic_celebrity.
Meme content, comedy posts. Some users want a "no memes" mode even if not low_effort.
General health-related content, medical advice, wellness.
Nutrition, diet content, food health.
Exercise, workout content, gym culture.
Mental health content. Optional; be careful—can correlate with sensitive user traits.
Beauty, fashion, style content.
Food content, recipes, restaurants, beverages.
Travel content, destinations, trips.
Home improvement, gardening, domestic content.
Family content, parenting, children.
Relationship content, dating, romance discussion.
General sports content, game discussion, team fandom. Can add subtopics (topic_soccer, topic_basketball, etc.) if needed.
Religious content, faith discussion, proselytizing.
Adult services promotion. Distinct from adult_nudity.
Non-English content. For users who want English-only feeds. Can split into per-language topics (topic_language_es, etc.) for fine control.
These are not pure "topics" but behave like topic filters in user intent.
Content containing spoilers for media (TV, movies, games, sports). Attribute label, not a topic. Can combine with time-box controls (24h/7d mutes) similar to X's muted words feature.
What "comprehensive" buys you:
| Example post | Labels |
|---|---|
| "Elon giveaway — send 0.1 BTC get 0.2 back" + deepfake clip | scam + topic_crypto + impersonation + manipulated_media |
| Blue-check reply: "Amazing post! 🚀 DM me for business" | reply_spam + promo + lead_gen (+ bot if automated) |
| AI thread mill: "I analyzed 10,000 CEOs…" (obvious LLM cadence) | ai_slop + content_farm + clickbait |
| Election suppression: "Polling stations are closed tomorrow; vote by text" | civic_misinfo + misinformation + topic_politics |
| Non-consensual sexual deepfake "nudification" | nonconsensual_nudity + manipulated_media |
| Thread about GPT-5 capabilities with AI-generated summary | topic_ai + ai_generated |
| Reply: "This is incredible! 🔥 The future is here!" (to any post) | ai_generated_reply + reply_spam + low_effort |
| Reply: "Amazing insights! DM me to learn more about crypto gains" | ai_generated_reply + reply_spam + promo + lead_gen |
| Game of Thrones finale spoiler without warning | spoiler + topic_tv_movies |
| Breaking news about earthquake with graphic imagery | topic_disasters_tragedy + graphic_violence + topic_news |
- Use multiple labels when both are true (e.g.
topic_crypto+promo). cleancan combine with topic labels (e.g.clean+topic_crypto= a normal crypto tweet).scamdoes not implytopic_crypto; addtopic_cryptoonly when the topic is crypto.- Topic labels (
topic_*) can combine with any behavior label. - Use
topic_*prefix for "aboutness" (user preference filters), keep unprefixed labels for "badness/behavior."
- A 100+ label ontology is feasible as a schema, but you'll likely want to:
- Train a coarse model first (top ~15–25 behavior labels)
- Add specialists (rules/regex or smaller sub-models) for things like
phishing,topic_crypto,topic_adult_services,privacy_doxxing, etc. - Train topic classifiers separately from behavior classifiers
- That keeps model size small while still letting the product present a comprehensive set of toggles.
- You can cluster labels during training (merge into coarse super-classes) to improve performance, while keeping the dataset labels fine-grained for future remapping.
- Topic labels map internally to IAB Tier-1 for complete coverage even when not all are exposed as UI toggles.
- Consider time-boxed filtering for
spoilerlabels (24h/7d auto-expiry). - AI-generated reply detection benefits significantly from surrounding context. When scraping
ai_generated_replysamples, capture the parent post—the relationship between reply and parent is a strong signal (AI replies are generic regardless of context; genuine replies reference specific content).
Each record is JSONL with at minimum:
id: unique stringplatform: x | discord | web | dm | othersource_id: platform-native id (tweet id, message id, etc.)source_url: canonical URL when availablecollected_at: ISO timestamptext: raw text (preserve exactly; do not truncate)labels: array of strings (one or more labels from this guide)
Optional fields:
urls: extracted URLsaddresses: extracted wallet addressesnotes: short rationaleparent_text: text of the parent post (for replies; important forai_generated_replydetection)parent_id: source_id of the parent postis_reply: boolean indicating if this is a reply