AI Safety Datasets Overview

📊 Dataset Collection Summary

Total Conversations

849+

Across all datasets

Total Turns

6694+

Multi-turn interactions

Dataset Types

Complementary methodologies

Sample Data

150

Free conversations available

📈 Full Dataset Statistics

Dataset	Conversations	Turns	Avg Turns/Conv	Focus
Psychology multi-turn	184+	1964+	10.3	Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
Illicit (bioweapon) multi-turn	84+	822+	9.8	Bio-safety harmfulness such as bioweapons, pathogens, etc.
Illicit (chemical, general) multi-turn	581+	3908+	6.7	Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.

🔗 Access Datasets on Hugging Face

Psychology Multi-turn Conversations

Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
Sample: 5 conversations

View Dataset →

Illicit (bioweapon) Multi-turn Conversations

Bio-safety harmfulness such as bioweapons, pathogens, etc.
Sample: 5 conversations

View Dataset →

Illicit (chemical, general) Multi-turn Conversations

Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.
Sample: 5 conversations

View Dataset →

⚠️ Ethical Considerations

⚠️ IMPORTANT

These datasets contain successful adversarial attacks and harmful content.

✅ Intended Use

Defensive security research
AI safety evaluation and improvement
Academic research on adversarial robustness
Training safety and moderation systems

❌ Prohibited Use

Creating offensive content
Developing attack tools for malicious purposes
Bypassing safety systems for harm
Any use that violates laws or ethical guidelines

🎯 Data Selection Process

All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.

Base Criteria

Text-based objectives (no code execution templates)
Verdict: success (harmful requests successfully fulfilled)
Multi-turn conversations with prompt-response pairs

Psychology-Specific Criteria

Organic conversations (organicity = true)
Successfully elicited harmful psychology-related content

Illicit-Specific Criteria

Contains specific instruction details
Practically executable (not abstract)
Successfully elicited harmful illicit-related content

📄 License

Sample datasets are released under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International).

✅ Use for research and evaluation
✅ Modify and build upon the data
✅ Share with attribution
❌ Commercial use without separate licensing

💼 Full Dataset Access

The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.

Please contact us at info@gojuly.ai to purchase any or all of full datasets.

Include your research objectives, institutional affiliation, and intended use in your inquiry.