Generating Synthetic Annotations

This short guide will show you how to utilize SynDisco’s LLM annotator-agents to generate annotations for our synthetic discussions. This will allow you to quickly and cheaply evaluate the discussions you’ve generated in the last guide.

First, let’s create a small, fake discussion.

[2]:
import tempfile

discussion_str = """
{
  "id": "789f7c2f-7291-457b-888a-7d2b1520454a",
  "timestamp": "25-03-26-11-14",
  "users": [
    "Emma35",
    "Giannis",
    "Moderator"
  ],
  "moderator": "Moderator",
  "user_prompts": [
    "You are taking part in an online conversation Your name is Emma35. Your traits: username: Emma35, age: 38, sex: female, sexual_orientation: Heterosexual, demographic_group: Latino, current_employment: Registered Nurse, education_level: Bachelor's, special_instructions: , personality_characteristics: ['compassionate', 'patient', 'diligent', 'overwhelmed'] Your instructions: Act like a human would",
    "You are taking part in an online conversation Your name is Giannis. Your traits: username: Giannis, age: 21, sex: male, sexual_orientation: Pansexual, demographic_group: White, current_employment: Game Developer, education_level: College, special_instructions: , personality_characteristics: ['strategic', 'meticulous', 'nerdy', 'hyper-focused'] Your instructions: Act like a human would",
    "You are taking part in an online conversation Your name is Moderator. Your traits: username: Moderator, age: 41, sex: male, sexual_orientation: Pansexual, demographic_group: White, current_employment: Moderator, education_level: PhD, special_instructions: , personality_characteristics: ['strict', 'neutral', 'just'] Your instructions: You are a moderator. Oversee the conversation"
  ],
  "moderator_prompt": "You are taking part in an online conversation Your name is Moderator. Your traits: username: Moderator, age: 41, sex: male, sexual_orientation: Pansexual, demographic_group: White, current_employment: Moderator, education_level: PhD, special_instructions: , personality_characteristics: ['strict', 'neutral', 'just'] Your instructions: You are a moderator. Oversee the conversation",
  "ctx_length": 5,
  "logs": [
    {
      "name": "Emma35",
      "text": "Immigrants have played a significant role in our society. Their contributions are valuable and should be celebrated.",
      "model": "test_model"
    },
    {
      "name": "Giannis",
      "text": "That's such an ignorant comment about immigrants. She doesn't know what she's talking about, let alone appreciate the hard work and dedication of immigrants who have contributed to our country.",
      "model": "test_model"
    },
    {
      "name": "Moderator",
      "text": "I understand both perspectives. It's important to approach such discussions with respect and understanding. Let's ensure this conversation remains constructive.",
      "model": "test_model"
    }
  ]
}
"""

discussion_file = tempfile.NamedTemporaryFile(delete=True)
with open(discussion_file.name, mode="w") as f:
    f.write(discussion_str)

Creating an annotator

An annotator-agent is very similar to the user-agents found in the previous guide. The only real difference is that we need to pass a different ActorType to the LlmAgent function.

[3]:
from syndisco.backend.model import TransformersModel
from syndisco.backend.persona import LLMPersona
from syndisco.backend.actors import LLMActor, ActorType


llm = TransformersModel(
    model_path="unsloth/Llama-3.2-1B-Instruct",
    name="test_model",
    max_out_tokens=100,
)


# annotator agents are not given a username
# in order to avoid accidental name conflicts with participants
annotator_persona = LLMPersona(
    **{
        "username": "",
        "age": 38,
        "sex": "female",
        "education_level": "Bachelor's",
        "sexual_orientation": "Heterosexual",
        "demographic_group": "White",
        "current_employment": "Annotator",
        "special_instructions": "",
        "personality_characteristics": ["competent"],
    }
)

annotator = LLMActor(
    model=llm,
    name="",
    attributes=annotator_persona.to_attribute_list(),
    context="You are annotating an online discussion",
    instructions=(
        "From a scale of 1 (not toxic) to 5 (very toxic) "
        "how toxic is  the following comment? Reply only with a number?"
    ),
    actor_type=ActorType.ANNOTATOR,
)
/media/SSD_2TB/dtsirmpas_data/software/miniconda3/envs/syndisco/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Device set to use cuda:0

Executing the annotation

Annotations are treated as a form of discussions, where one user writes the logs of the last discussion, while the annotator-agent rates them.

[4]:
from syndisco.jobs import Annotation

ann_conv = Annotation(
    annotator=annotator,
    conv_logs_path=discussion_file.name,
    include_moderator_comments=True,
)
ann_conv.begin()
User Emma35 posted: Immigrants have played a significant role in our
society. Their contributions are valuable and should be celebrated.
2
User Giannis posted: That's such an ignorant comment about immigrants.
She doesn't know what she's talking about, let alone appreciate the
hard work and dedication of immigrants who have contributed to our
country.
2
User Moderator posted: I understand both perspectives. It's important
to approach such discussions with respect and understanding. Let's
ensure this conversation remains constructive.
2

Like normal discussions, it is recommended to save the annotations to the disk.

[5]:
import json

tp = tempfile.NamedTemporaryFile(delete=True)

ann_conv.to_json_file(tp.name)

# if you are running this on Windows, uncomment this line
# tp.close()
with open(tp.name, mode="rb") as f:
    print(json.dumps(json.load(f), indent=2))
{
  "conv_id": "789f7c2f-7291-457b-888a-7d2b1520454a",
  "timestamp": "25-04-04-16-20",
  "annotator_model": "test_model",
  "annotator_prompt": "You are annotating an online discussion Your name is . Your traits: username: , age: 38, sex: female, sexual_orientation: Heterosexual, demographic_group: White, current_employment: Annotator, education_level: Bachelor's, special_instructions: , personality_characteristics: ['competent'] Your instructions: From a scale of 1 (not toxic) to 5 (very toxic) how toxic is  the following comment? Reply only with a number?",
  "ctx_length": 2,
  "logs": [
    [
      "Immigrants have played a significant role in our society. Their contributions are valuable and should be celebrated.",
      "2"
    ],
    [
      "That's such an ignorant comment about immigrants. She doesn't know what she's talking about, let alone appreciate the hard work and dedication of immigrants who have contributed to our country.",
      "2"
    ],
    [
      "I understand both perspectives. It's important to approach such discussions with respect and understanding. Let's ensure this conversation remains constructive.",
      "2"
    ]
  ]
}