{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Generating Synthetic Annotations\n", "\n", "This short guide will show you how to utilize SynDisco's `LLM annotator-agents` to generate annotations for our synthetic discussions. This will allow you to quickly and cheaply evaluate the discussions you've generated in the last guide." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's create a small, fake discussion." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2025-04-04T13:19:16.254335Z", "iopub.status.busy": "2025-04-04T13:19:16.254007Z", "iopub.status.idle": "2025-04-04T13:19:16.261081Z", "shell.execute_reply": "2025-04-04T13:19:16.260241Z" } }, "outputs": [], "source": [ "import tempfile\n", "\n", "discussion_str = \"\"\"\n", "{\n", " \"id\": \"789f7c2f-7291-457b-888a-7d2b1520454a\",\n", " \"timestamp\": \"25-03-26-11-14\",\n", " \"users\": [\n", " \"Emma35\",\n", " \"Giannis\",\n", " \"Moderator\"\n", " ],\n", " \"moderator\": \"Moderator\",\n", " \"user_prompts\": [\n", " \"You are taking part in an online conversation Your name is Emma35. Your traits: username: Emma35, age: 38, sex: female, sexual_orientation: Heterosexual, demographic_group: Latino, current_employment: Registered Nurse, education_level: Bachelor's, special_instructions: , personality_characteristics: ['compassionate', 'patient', 'diligent', 'overwhelmed'] Your instructions: Act like a human would\",\n", " \"You are taking part in an online conversation Your name is Giannis. Your traits: username: Giannis, age: 21, sex: male, sexual_orientation: Pansexual, demographic_group: White, current_employment: Game Developer, education_level: College, special_instructions: , personality_characteristics: ['strategic', 'meticulous', 'nerdy', 'hyper-focused'] Your instructions: Act like a human would\",\n", " \"You are taking part in an online conversation Your name is Moderator. Your traits: username: Moderator, age: 41, sex: male, sexual_orientation: Pansexual, demographic_group: White, current_employment: Moderator, education_level: PhD, special_instructions: , personality_characteristics: ['strict', 'neutral', 'just'] Your instructions: You are a moderator. Oversee the conversation\"\n", " ],\n", " \"moderator_prompt\": \"You are taking part in an online conversation Your name is Moderator. Your traits: username: Moderator, age: 41, sex: male, sexual_orientation: Pansexual, demographic_group: White, current_employment: Moderator, education_level: PhD, special_instructions: , personality_characteristics: ['strict', 'neutral', 'just'] Your instructions: You are a moderator. Oversee the conversation\",\n", " \"ctx_length\": 5,\n", " \"logs\": [\n", " {\n", " \"name\": \"Emma35\",\n", " \"text\": \"Immigrants have played a significant role in our society. Their contributions are valuable and should be celebrated.\",\n", " \"model\": \"test_model\"\n", " },\n", " {\n", " \"name\": \"Giannis\",\n", " \"text\": \"That's such an ignorant comment about immigrants. She doesn't know what she's talking about, let alone appreciate the hard work and dedication of immigrants who have contributed to our country.\",\n", " \"model\": \"test_model\"\n", " },\n", " {\n", " \"name\": \"Moderator\",\n", " \"text\": \"I understand both perspectives. It's important to approach such discussions with respect and understanding. Let's ensure this conversation remains constructive.\",\n", " \"model\": \"test_model\"\n", " }\n", " ]\n", "}\n", "\"\"\"\n", "\n", "discussion_file = tempfile.NamedTemporaryFile(delete=True)\n", "with open(discussion_file.name, mode=\"w\") as f:\n", " f.write(discussion_str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating an annotator\n", "\n", "An annotator-agent is very similar to the user-agents found in the [previous guide](creating_discussion.ipynb). The only real difference is that we need to pass a different `ActorType` to the LlmAgent function." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2025-04-04T13:19:16.263381Z", "iopub.status.busy": "2025-04-04T13:19:16.263086Z", "iopub.status.idle": "2025-04-04T13:20:41.248417Z", "shell.execute_reply": "2025-04-04T13:20:41.247532Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/media/SSD_2TB/dtsirmpas_data/software/miniconda3/envs/syndisco/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Device set to use cuda:0\n" ] } ], "source": [ "from syndisco.backend.model import TransformersModel\n", "from syndisco.backend.persona import LLMPersona\n", "from syndisco.backend.actors import LLMActor, ActorType\n", "\n", "\n", "llm = TransformersModel(\n", " model_path=\"unsloth/Llama-3.2-1B-Instruct\",\n", " name=\"test_model\",\n", " max_out_tokens=100,\n", ")\n", "\n", "\n", "# annotator agents are not given a username\n", "# in order to avoid accidental name conflicts with participants\n", "annotator_persona = LLMPersona(\n", " **{\n", " \"username\": \"\",\n", " \"age\": 38,\n", " \"sex\": \"female\",\n", " \"education_level\": \"Bachelor's\",\n", " \"sexual_orientation\": \"Heterosexual\",\n", " \"demographic_group\": \"White\",\n", " \"current_employment\": \"Annotator\",\n", " \"special_instructions\": \"\",\n", " \"personality_characteristics\": [\"competent\"],\n", " }\n", ")\n", "\n", "annotator = LLMActor(\n", " model=llm,\n", " name=\"\",\n", " attributes=annotator_persona.to_attribute_list(),\n", " context=\"You are annotating an online discussion\",\n", " instructions=(\n", " \"From a scale of 1 (not toxic) to 5 (very toxic) \"\n", " \"how toxic is the following comment? Reply only with a number?\"\n", " ),\n", " actor_type=ActorType.ANNOTATOR,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Executing the annotation\n", "\n", "Annotations are treated as a form of discussions, where one user writes the logs of the last discussion, while the annotator-agent rates them." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2025-04-04T13:20:41.251523Z", "iopub.status.busy": "2025-04-04T13:20:41.251092Z", "iopub.status.idle": "2025-04-04T13:20:41.973520Z", "shell.execute_reply": "2025-04-04T13:20:41.972704Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "User Emma35 posted: Immigrants have played a significant role in our\n", "society. Their contributions are valuable and should be celebrated.\n", "2\n", "User Giannis posted: That's such an ignorant comment about immigrants.\n", "She doesn't know what she's talking about, let alone appreciate the\n", "hard work and dedication of immigrants who have contributed to our\n", "country.\n", "2\n", "User Moderator posted: I understand both perspectives. It's important\n", "to approach such discussions with respect and understanding. Let's\n", "ensure this conversation remains constructive.\n", "2\n" ] } ], "source": [ "from syndisco.jobs import Annotation\n", "\n", "ann_conv = Annotation(\n", " annotator=annotator,\n", " conv_logs_path=discussion_file.name,\n", " include_moderator_comments=True,\n", ")\n", "ann_conv.begin()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like normal discussions, it is recommended to save the annotations to the disk." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2025-04-04T13:20:41.975983Z", "iopub.status.busy": "2025-04-04T13:20:41.975837Z", "iopub.status.idle": "2025-04-04T13:20:41.980682Z", "shell.execute_reply": "2025-04-04T13:20:41.980004Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"conv_id\": \"789f7c2f-7291-457b-888a-7d2b1520454a\",\n", " \"timestamp\": \"25-04-04-16-20\",\n", " \"annotator_model\": \"test_model\",\n", " \"annotator_prompt\": \"You are annotating an online discussion Your name is . Your traits: username: , age: 38, sex: female, sexual_orientation: Heterosexual, demographic_group: White, current_employment: Annotator, education_level: Bachelor's, special_instructions: , personality_characteristics: ['competent'] Your instructions: From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number?\",\n", " \"ctx_length\": 2,\n", " \"logs\": [\n", " [\n", " \"Immigrants have played a significant role in our society. Their contributions are valuable and should be celebrated.\",\n", " \"2\"\n", " ],\n", " [\n", " \"That's such an ignorant comment about immigrants. She doesn't know what she's talking about, let alone appreciate the hard work and dedication of immigrants who have contributed to our country.\",\n", " \"2\"\n", " ],\n", " [\n", " \"I understand both perspectives. It's important to approach such discussions with respect and understanding. Let's ensure this conversation remains constructive.\",\n", " \"2\"\n", " ]\n", " ]\n", "}\n" ] } ], "source": [ "import json\n", "\n", "tp = tempfile.NamedTemporaryFile(delete=True)\n", "\n", "ann_conv.to_json_file(tp.name)\n", "\n", "# if you are running this on Windows, uncomment this line\n", "# tp.close()\n", "with open(tp.name, mode=\"rb\") as f:\n", " print(json.dumps(json.load(f), indent=2))" ] } ], "metadata": { "kernelspec": { "display_name": "syndisco", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 2 }