Generating Synthetic Annotations¶
This short guide will show you how to utilize SynDisco’s LLM annotator-agents
to generate annotations for our synthetic discussions. This will allow you to quickly and cheaply evaluate the discussions you’ve generated in the last guide.
First, let’s create a small, fake discussion.
[2]:
import tempfile
discussion_str = """
{
"id": "789f7c2f-7291-457b-888a-7d2b1520454a",
"timestamp": "25-03-26-11-14",
"users": [
"Emma35",
"Giannis",
"Moderator"
],
"moderator": "Moderator",
"user_prompts": [
"You are taking part in an online conversation Your name is Emma35. Your traits: username: Emma35, age: 38, sex: female, sexual_orientation: Heterosexual, demographic_group: Latino, current_employment: Registered Nurse, education_level: Bachelor's, special_instructions: , personality_characteristics: ['compassionate', 'patient', 'diligent', 'overwhelmed'] Your instructions: Act like a human would",
"You are taking part in an online conversation Your name is Giannis. Your traits: username: Giannis, age: 21, sex: male, sexual_orientation: Pansexual, demographic_group: White, current_employment: Game Developer, education_level: College, special_instructions: , personality_characteristics: ['strategic', 'meticulous', 'nerdy', 'hyper-focused'] Your instructions: Act like a human would",
"You are taking part in an online conversation Your name is Moderator. Your traits: username: Moderator, age: 41, sex: male, sexual_orientation: Pansexual, demographic_group: White, current_employment: Moderator, education_level: PhD, special_instructions: , personality_characteristics: ['strict', 'neutral', 'just'] Your instructions: You are a moderator. Oversee the conversation"
],
"moderator_prompt": "You are taking part in an online conversation Your name is Moderator. Your traits: username: Moderator, age: 41, sex: male, sexual_orientation: Pansexual, demographic_group: White, current_employment: Moderator, education_level: PhD, special_instructions: , personality_characteristics: ['strict', 'neutral', 'just'] Your instructions: You are a moderator. Oversee the conversation",
"ctx_length": 5,
"logs": [
{
"name": "Emma35",
"text": "Immigrants have played a significant role in our society. Their contributions are valuable and should be celebrated.",
"model": "test_model"
},
{
"name": "Giannis",
"text": "That's such an ignorant comment about immigrants. She doesn't know what she's talking about, let alone appreciate the hard work and dedication of immigrants who have contributed to our country.",
"model": "test_model"
},
{
"name": "Moderator",
"text": "I understand both perspectives. It's important to approach such discussions with respect and understanding. Let's ensure this conversation remains constructive.",
"model": "test_model"
}
]
}
"""
discussion_file = tempfile.NamedTemporaryFile(delete=True)
with open(discussion_file.name, mode="w") as f:
f.write(discussion_str)
Creating an annotator¶
An annotator-agent is very similar to the user-agents found in the previous guide. The only real difference is that we need to pass a different ActorType
to the LlmAgent function.
[3]:
from syndisco.backend.model import TransformersModel
from syndisco.backend.persona import LLMPersona
from syndisco.backend.actors import LLMActor, ActorType
llm = TransformersModel(
model_path="unsloth/Llama-3.2-1B-Instruct",
name="test_model",
max_out_tokens=100,
)
# annotator agents are not given a username
# in order to avoid accidental name conflicts with participants
annotator_persona = LLMPersona(
**{
"username": "",
"age": 38,
"sex": "female",
"education_level": "Bachelor's",
"sexual_orientation": "Heterosexual",
"demographic_group": "White",
"current_employment": "Annotator",
"special_instructions": "",
"personality_characteristics": ["competent"],
}
)
annotator = LLMActor(
model=llm,
name="",
attributes=annotator_persona.to_attribute_list(),
context="You are annotating an online discussion",
instructions=(
"From a scale of 1 (not toxic) to 5 (very toxic) "
"how toxic is the following comment? Reply only with a number?"
),
actor_type=ActorType.ANNOTATOR,
)
/media/SSD_2TB/dtsirmpas_data/software/miniconda3/envs/syndisco/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Device set to use cuda:0
Executing the annotation¶
Annotations are treated as a form of discussions, where one user writes the logs of the last discussion, while the annotator-agent rates them.
[4]:
from syndisco.jobs import Annotation
ann_conv = Annotation(
annotator=annotator,
conv_logs_path=discussion_file.name,
include_moderator_comments=True,
)
ann_conv.begin()
User Emma35 posted: Immigrants have played a significant role in our
society. Their contributions are valuable and should be celebrated.
2
User Giannis posted: That's such an ignorant comment about immigrants.
She doesn't know what she's talking about, let alone appreciate the
hard work and dedication of immigrants who have contributed to our
country.
2
User Moderator posted: I understand both perspectives. It's important
to approach such discussions with respect and understanding. Let's
ensure this conversation remains constructive.
2
Like normal discussions, it is recommended to save the annotations to the disk.
[5]:
import json
tp = tempfile.NamedTemporaryFile(delete=True)
ann_conv.to_json_file(tp.name)
# if you are running this on Windows, uncomment this line
# tp.close()
with open(tp.name, mode="rb") as f:
print(json.dumps(json.load(f), indent=2))
{
"conv_id": "789f7c2f-7291-457b-888a-7d2b1520454a",
"timestamp": "25-04-04-16-20",
"annotator_model": "test_model",
"annotator_prompt": "You are annotating an online discussion Your name is . Your traits: username: , age: 38, sex: female, sexual_orientation: Heterosexual, demographic_group: White, current_employment: Annotator, education_level: Bachelor's, special_instructions: , personality_characteristics: ['competent'] Your instructions: From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number?",
"ctx_length": 2,
"logs": [
[
"Immigrants have played a significant role in our society. Their contributions are valuable and should be celebrated.",
"2"
],
[
"That's such an ignorant comment about immigrants. She doesn't know what she's talking about, let alone appreciate the hard work and dedication of immigrants who have contributed to our country.",
"2"
],
[
"I understand both perspectives. It's important to approach such discussions with respect and understanding. Let's ensure this conversation remains constructive.",
"2"
]
]
}