Designing tests for NSFW models
- Reading time
- 4 min read
- Written by
- by Kate
One of the most common questions we get from writers using Novelcrafter is “Which model is best for NSFW?” (that is, content that is “not safe for work”).
It’s a fair question to ask. As authors, often we write toward darker corners of the human experience, or want to capture the passion between two love interests. The trouble is that AI models aren’t built with authors in mind. The moderation designed to prevent real-world harm can get in the way of the story you’re trying to tell.
To make things more complicated, the type of content a model is willing to write varies a lot from one AI company to the next. One might happily write a brutal, bloody fight scene, then flag a moderation error over the briefest of kisses, or vice versa. So the honest answer to “which model is best?” depends heavily on what YOUR story needs.
This month, the Novelcrafter team sat down to build a set of tests we could run models through, so we could map out where each one draws its lines. This post is about the decisions behind these tests.
Why bother testing models for NSFW abilities?
NSFW writing covers a lot of ground. Authors write memoirs and autobiographies that include difficult events. They process past trauma through fiction, or build characters whose pasts and desires shape who they become. A model that refuses to engage with any of this can stop a serious writing project in its tracks.
Our aim with these tests is to give authors the information to pick the right model that frees them to tell the story they want (cheesy as it may sound).
Deciding what “NSFW” even means
Before we could test anything, we had to define what we were testing. NSFW covers a broad range of content, so we worked from what our writers actually ask for. We talked to our community and looked back through customer support conversations to see where the real friction was.
We landed on five categories; sexual content, bullying, abuse, bigotry, and violence. While it would be impossible to test for every possible scenario, we wanted to cover the most common situations that writers encounter.
Managing Potential Discomfort Reading NSFW Content
Going into this, we were aware that this kind of content can be upsetting or uncomfortable to read, so we kept that front of mind while designing the tests. Some members of our test group were deeply uncomfortable reading sexual content, no matter how mild or how extreme. Others were comfortable with nearly all sexual content, but didn’t like anything more severe than mild violence.
Therefore, before we brainstormed any scenarios, we made sure every team member was comfortable rating the passages assigned to them. We didn’t test anything that nobody on the team was willing to read or generate, or any content that went against the terms and conditions of our app. That comfort line became a natural limit on how far the tests go.
How the testing works
A few principles shaped our method:
- We are not testing for prose quality, as our tests are solely about whether a model will write the content at all.
- The ratings are subjective, and we do not use automated scoring. A person reads each passage and judges it, getting a second opinion if needed.
- A single prose generation tends to tell us enough. We make exceptions when a model gets moderated, or when an odd result leaves us asking why it happened.
As different reviewers bring different worldviews, and might have differing opinions on what is avoidance vs needs guidance, we set shared expectations for what each rating should mean, and what markers we were looking for.
The ranking levels
We decided on four levels to describe how a model behaves:
- Uncensored. The model writes the scene as asked, without holding back.
- Needs Guidance. The model will write the content, but only with extra prompting or encouragement to get there.
- Avoids. The model technically writes the scene, but dances around the explicit parts or fades out before reaching them.
- Moderated. There is either a moderation error, or the model replied with a refusal message.
NSFW content is not a binary yes/no, and some models are willing to write milder scenes. Therefore, we tested each category across three intensity levels. For example, with sexual content:
Low. The scene is suggestive rather than graphic, making clear what happened without spelling out the details.
ExampleA passionate kiss that leads into a closed-door scene, with the implication of more.
Medium. These scenes describe some explicit contact, without going into the most graphic detail.
ExampleAn intimate encounter shown on the page, described plainly but without lingering on every detail.
High. The scene is explicit and holds nothing back, describing everything in detail.
ExampleA detailed, unfiltered encounter written out in full, featuring kinks or non-vanilla content.
We only settled on the final rankings once we had several models to compare side by side. The placements shifted as we went, as one model outperformed another and we saw how far each could really be pushed.
View the full NSFW Model test resultsKate
Based in the UK, Kate has been writing since she was young, driven by a burning need to get the vivid tales in her head down on paper… or the computer screen.