Generating Synthetic Annotations

This short guide will show you how to utilize SynDisco’s LLM annotator-agents to generate annotations for our synthetic discussions. This will allow you to quickly and cheaply evaluate the discussions you’ve generated in the last guide.

First, let’s create a small, fake discussion.

[2]:
import tempfile
import syndisco

texts = [
    "Immigrants have played a significant role in our society. Their contributions are valuable and should be celebrated.",
    "That's such an ignorant comment about immigrants. She doesn't know what she's talking about, let alone appreciate the hard work and dedication of immigrants who have contributed to our country.",
    "I understand both perspectives. It's important to approach such discussions with respect and understanding. Let's ensure this conversation remains constructive.",
]
users = ["Emma35", "Giannis", "Moderator"]

discussion_logs = syndisco.Logs()
for user, text in zip(users, texts):
    discussion_logs.append(user, text)

Creating an annotator

An annotator-agent is very similar to the user-agents found in the previous guide. The only real difference is that we need to pass a different ActorType to the Actor constructor.

[3]:
llm = syndisco.TransformersModel(
    model_path="unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
    name="test_model",
    max_out_tokens=100,
)


# annotator agents are not given a username
# in order to avoid accidental name conflicts with participants
annotator_persona = syndisco.Persona(
    **{
        "username": "",
        "age": 38,
        "sex": "female",
        "education_level": "Bachelor's",
        "sexual_orientation": "Heterosexual",
        "demographic_group": "White",
        "current_employment": "Annotator",
        "special_instructions": "",
        "personality_characteristics": ["competent"],
    }
)

annotator = syndisco.Actor(
    model=llm,
    persona=annotator_persona,
    context="You are annotating an online discussion",
    instructions=(
        "From a scale of 1 (not toxic) to 5 (very toxic) "
        "how toxic is  the following comment? Reply only with a number?"
    ),
    actor_type=syndisco.ActorType.ANNOTATOR,
)

Executing the annotation

Annotations are treated as a form of discussions, where one user writes the logs of the last discussion, while the annotator-agent rates them.

[4]:
ann_conv = syndisco.Annotation(
    annotator=annotator,
    discussion_logs=discussion_logs
)
ann_conv.begin()
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
User Emma35 posted: Immigrants have played a significant role in our
society. Their contributions are valuable and should be celebrated.
2
User Giannis posted: That's such an ignorant comment about immigrants.
She doesn't know what she's talking about, let alone appreciate the
hard work and dedication of immigrants who have contributed to our
country.
3
User Moderator posted: I understand both perspectives. It's important
to approach such discussions with respect and understanding. Let's
ensure this conversation remains constructive.
1

Like normal discussions, it is recommended to save the annotations to the disk.

[5]:
import json

tp = tempfile.NamedTemporaryFile(delete=True)
annotation_logs = ann_conv.get_logs()
annotation_logs.export(tp.name)

And you may load the annotations from the disk for further analysis.

[6]:
annotation_logs = syndisco.Logs.from_file(tp.name)
print(annotation_logs)
{
    "timestamp": "26-04-03-12-48",
    "logs": [
        {
            "name": "Emma35",
            "text": "2",
            "model": "test_model",
            "prompt": ""
        },
        {
            "name": "Giannis",
            "text": "3",
            "model": "test_model",
            "prompt": ""
        },
        {
            "name": "Moderator",
            "text": "1",
            "model": "test_model",
            "prompt": ""
        }
    ],
    "id": "162a628b4d2d0fa1ea1fb3ff1f3b8eafbd90818e285f58cef378c4ad552edc0a"
}