Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save oscartangqh/de7e2d9bfa2f7649c67b9686a4474d0e to your computer and use it in GitHub Desktop.
Save oscartangqh/de7e2d9bfa2f7649c67b9686a4474d0e to your computer and use it in GitHub Desktop.
Very Large-Scale Multi-Agent Simulation in AgentScope. Continue this conversation at http://localhost:3000?gist=de7e2d9bfa2f7649c67b9686a4474d0e

[arxiv] Very Large-Scale Multi-Agent Simulation in AgentScope

Source

Xuchen Pan, Dawei Gao, Yuexiang Xie, Zhewei Wei, Yaliang Li, Bolin Ding, Ji-Rong Wen, Jingren Zhou

Recent advances in large language models (LLMs) have opened new avenues for applying multi-agent systems in very large-scale simulations. However, there remain several challenges when conducting multi-agent simulations with existing platforms, such as limited scalability and low efficiency, unsatisfied agent diversity, and effort-intensive management processes. To address these challenges, we develop several new features and components for AgentScope, a user-friendly multi-agent platform, enhancing its convenience and flexibility for supporting very large-scale multi-agent simulations. Specifically, we propose an actor-based distributed mechanism as the underlying technological infrastructure towards great scalability and high efficiency, and provide flexible environment support for simulating various real-world scenarios, which enables parallel execution of multiple agents, centralized workflow orchestration, and both inter-agent and agent-environment interactions among agents. Moreover, we integrate an easy-to-use configurable tool and an automatic background generation pipeline in AgentScope, simplifying the process of creating agents with diverse yet detailed background settings. Last but not least, we provide a web-based interface for conveniently monitoring and managing a large number of agents that might deploy across multiple devices. We conduct a comprehensive simulation to demonstrate the effectiveness of the proposed enhancements in AgentScope, and provide detailed observations and discussions to highlight the great potential of applying multi-agent systems in large-scale simulations. The source code is released on GitHub at this https URL to inspire further research and development in large-scale multi-agent simulations.

URL: https://arxiv.org/html/2407.17789v1

Conversation

▲ Please explain the paper to reader with no relevant technical or research background, be clear and concise.

This research paper explores how to use large language models (LLMs), like ChatGPT, to create realistic simulations of human behavior. Imagine a virtual world where AI agents act like real people, making decisions and interacting with each other.

The paper focuses on a platform called AgentScope, which is designed to make these simulations easier and more efficient. The authors highlight three key challenges:

  1. Scalability: Simulating a large number of agents (think thousands or millions) is computationally demanding.
  2. Diversity: Agents need to be unique, with different backgrounds, personalities, and motivations.
  3. Management: Keeping track of and controlling so many agents can be overwhelming.

To address these challenges, the paper proposes several improvements to AgentScope:

  • Distributed Mechanism: This allows agents to run on multiple computers simultaneously, making simulations much faster.
  • Agent-Environment Interactions: Agents can interact with a virtual environment, like a city or a social network, making the simulations more dynamic.
  • Heterogeneous Configurations: Users can easily create agents with diverse backgrounds and personalities.
  • Management Server: This provides a user-friendly interface to monitor and control the agents.

The paper then demonstrates these improvements through a series of simulations of a classic game called "guess 2/3 of the average." The results show that AgentScope can successfully simulate a large number of agents with diverse behaviors, and that the agents can learn and adapt their strategies over time.

Overall, this research shows the potential of LLMs to create more realistic and complex simulations, which could be used for a variety of purposes, such as studying social dynamics, testing new technologies, or even creating immersive entertainment experiences.

▲ Tell me more about this "guess 2/3 of the average" game

The "guess 2/3 of the average" game is a classic example of a game theory problem that demonstrates how individual rationality can lead to counterintuitive collective outcomes. Here's how it works:

The Rules:

  1. Players: A group of people (or in this case, AI agents) participate.
  2. Guesses: Each player secretly chooses a number between 0 and 100.
  3. Average: The average of all the numbers is calculated.
  4. Winner: The player who guesses the closest to 2/3 of the average wins.

The Logic:

  • Intuitive Strategy: A player might initially think, "If everyone guesses 100, the average will be 100, and 2/3 of that is 66.67. So I should guess 66.67."
  • Iterative Reasoning: However, if everyone thinks this way, the average will be closer to 66.67, and 2/3 of that is around 44.45. Players will then adjust their guesses downward. This process continues, with each player trying to anticipate the average and guess lower and lower.
  • Nash Equilibrium: The game eventually reaches a point where everyone guesses 0. This is called the Nash Equilibrium, where no player can improve their outcome by changing their strategy, assuming everyone else keeps theirs the same.

Why It's Interesting:

  • Individual vs. Collective: While it's rational for each player to guess lower and lower, the collective outcome is that everyone loses, as the average converges to 0.
  • Bounded Rationality: The game highlights the concept of bounded rationality, where people are not always perfectly rational and may not be able to fully anticipate the actions of others.
  • Social Dynamics: The game can be used to model social phenomena, such as price bubbles or herd behavior, where individual actions can lead to unexpected collective outcomes.

In the context of the research paper:

The authors use this game to test the capabilities of their AgentScope platform. They want to see if AI agents can learn to play the game strategically, understand the iterative reasoning process, and eventually converge to the Nash Equilibrium. The results show that the agents are indeed capable of this, demonstrating the potential of LLMs for simulating complex social dynamics.

{
"url": "https://arxiv.org/html/2407.17789v1",
"type": "arxiv",
"title": "Very Large-Scale Multi-Agent Simulation in AgentScope",
"subtitle": "Xuchen Pan, Dawei Gao, Yuexiang Xie, Zhewei Wei, Yaliang Li, Bolin Ding, Ji-Rong Wen, Jingren Zhou",
"description": "Recent advances in large language models (LLMs) have opened new avenues for applying multi-agent systems in very large-scale simulations. However, there remain several challenges when conducting multi-agent simulations with existing platforms, such as limited scalability and low efficiency, unsatisfied agent diversity, and effort-intensive management processes. To address these challenges, we develop several new features and components for AgentScope, a user-friendly multi-agent platform, enhancing its convenience and flexibility for supporting very large-scale multi-agent simulations. Specifically, we propose an actor-based distributed mechanism as the underlying technological infrastructure towards great scalability and high efficiency, and provide flexible environment support for simulating various real-world scenarios, which enables parallel execution of multiple agents, centralized workflow orchestration, and both inter-agent and agent-environment interactions among agents. Moreover, we integrate an easy-to-use configurable tool and an automatic background generation pipeline in AgentScope, simplifying the process of creating agents with diverse yet detailed background settings. Last but not least, we provide a web-based interface for conveniently monitoring and managing a large number of agents that might deploy across multiple devices. We conduct a comprehensive simulation to demonstrate the effectiveness of the proposed enhancements in AgentScope, and provide detailed observations and discussions to highlight the great potential of applying multi-agent systems in large-scale simulations. The source code is released on GitHub at this https URL to inspire further research and development in large-scale multi-agent simulations."
}

22footnotetext: Corresponding authors.

Very Large-Scale Multi-Agent Simulation in AgentScope

Xuchen PanAlibaba Group Dawei GaoAlibaba Group Yuexiang XieAlibaba Group Zhewei WeiRenmin University of China Yaliang LiAlibaba Group Bolin DingAlibaba Group Ji-Rong WenRenmin University of China Jingren ZhouAlibaba Group

Abstract

Recent advances in large language models (LLMs) have opened new avenues for applying multi-agent systems in very large-scale simulations. However, there remain several challenges when conducting multi-agent simulations with existing platforms, such as limited scalability and low efficiency, unsatisfied agent diversity, and effort-intensive management processes. To address these challenges, we develop several new features and components for AgentScope, a user-friendly multi-agent platform, enhancing its convenience and flexibility for supporting very large-scale multi-agent simulations. Specifically, we propose an actor-based distributed mechanism as the underlying technological infrastructure towards great scalability and high efficiency, and provide flexible environment support for simulating various real-world scenarios, which enables parallel execution of multiple agents, centralized workflow orchestration, and both inter-agent and agent-environment interactions among agents. Moreover, we integrate an easy-to-use configurable tool and an automatic background generation pipeline in AgentScope, simplifying the process of creating agents with diverse yet detailed background settings. Last but not least, we provide a web-based interface for conveniently monitoring and managing a large number of agents that might deploy across multiple devices. We conduct a comprehensive simulation to demonstrate the effectiveness of the proposed enhancements in AgentScope, and provide detailed observations and discussions to highlight the great potential of applying multi-agent systems in large-scale simulations. The source code is released on GitHub111https://github.com/modelscope/agentscope to inspire further research and development in large-scale multi-agent simulations.

1 Introduction

Large language models (LLMs), such as GPT-4 (OpenAI, 2023), Claude3.5 (ANTHROP, 2024), Qwen2 (Yang et al., 2024), Llama3 (Meta, 2024), and so on (Anil et al., 2023; GLM et al., 2024; MistralAI, 2024; Yang et al., 2023), notable for their vast number of parameters and extensive training on diverse large-scale datasets, demonstrate remarkable capabilities in understanding, generating, and interacting with human language. Recent advancements in LLMs have sparked a revolution in natural language processing and relevant fields, paving the way for novel applications that were previously inconceivable.

Building on the capabilities of LLMs, there is a growing interest in the development of intelligent agents that are empowered to resolve practical tasks (Hong et al., 2023; Ren et al., 2024). As the scope of these intelligent agents spans a wide array of applications, their potential to redefine the landscape of simulations becomes increasingly evident (Matsumoto et al., 2024; Sorokovikova et al., 2024; Sreedhar and Chilton, 2024; Yue et al., 2024). Traditional simulations heavily rely on predefined rules and sophisticated mechanisms to generate simulated scenarios, necessitating lots of expertise and human interventions (Macal and North, 2010). With the incorporation of LLM-empowered agents, simulations are expected to become more interactive, adaptive, and realistic, while requiring substantially fewer human efforts.

Recently, several platforms (Hong et al., 2023; Wu et al., 2023; Team, 2023) have been proposed to streamline the development of multi-agent systems, providing some fundamental functionalities including unified LLM services, various tools, and advanced reasoning algorithms. Despite significant progress, we identify several challenges in conducting simulations with multi-agent platforms, particularly when the number of agents becomes extremely large. We summarize these challenges below.

(i) Scalability and Efficiency Limitations The scale of involved agents can be critical when conducting certain simulations, since simulations at a small scale run the risk of inaccurately representing real-world complexities, making simulations less realistic and reliable (Macal and North, 2010; Macal, 2016). However, increasing the scale of agents brings challenges to the simulation platform in terms of scalability and efficiency. Specifically, it is non-trivial to efficiently organize agents to execute their tasks and communications following an appropriate order, with the aim of reducing the running time while ensuring accurate results. Moreover, the simulation platform should be capable of handling high-frequency access to support both inter-agent and agent-environment interactions in large-scale agent-based simulations.

(ii)Unsatisfied Population Distributions and Agent Diversity For a large-scale simulation, it is essential that the involved agents exhibit diverse behaviors while generally following a specific population distribution (Gao et al., 2023; Ren et al., 2024). Assigning agents with simple backgrounds may result in a significant number of highly homogenized agents, making it difficult to derive meaningful insights. Besides, existing studies rarely consider how to specify population distributions of agents from various perspectives, such as age, education, occupation, etc. which reduces the realism of the simulations.

(iii)Difficult Management Processes As the scale of agents increases, it becomes rather effort-intensive to manage the simulations, including initialization, execution, and termination of a large number of agents spread across multiple devices, as well as monitoring their status, behaviors, and interactions (Mou et al., 2024). Such difficulties in managing make it challenging to promptly identify valuable group-level and individual-level behaviors, which can further hinder the discovery of critical insights for optimizing simulations and advancing research. Therefore, an easy-to-use tool for managing large-scale agents is a necessary functionality that should be provided by the agent-based simulation platforms.

To tackle these challenges, we adopt a user-friendly multi-agent platform, named AgentScope (Gao et al., 2024), as the foundation framework to provide the basic functionalities, and further develop several new features and components upon it to improve its usability, convenience, and flexibility for supporting very large-scale multi-agent simulations.

To be specific, we propose a distributed mechanism based on the actor model (Agha, 1985), featuring automatic parallel execution and centralized workflow orchestration to provide great scalability and high efficiency for multi-agent-based simulations. The proposed actor-based distributed mechanism enables us to further expand the scale of agents in the simulation with a limited number of devices, and provides linear benefit on running time from the addition of devices. We support both inter-agent and agent-environment interactions in the simulations. Agents can initiatively communicate with other agents and the environment, and can respond to messages from other agents and changes in the environment. Furthermore, a multi-layer environment structure is proposed to support group-wise information synchronization, enhancing the flexibility of AgentScope for simulating various real-world scenarios.

Besides, to satisfy the requirements of population distribution and agent diversity, we integrate a configuration tool in AgentScope and provide automatic background generation. Users only need to simply specify the distributions of the population from several aspects, a large number of agents with detailed and diverse characteristics can be effortlessly generated accordingly. These agents can be managed and monitored conveniently through AgentScope-Manager, a proposed module for simplifying the organization and observation process of large-scale agent-based simulations. Using a web-based visual interface, AgentScope-Manager provides a comprehensive overview of all agents across multiple devices, allowing users to efficiently configure, launch, and terminate these agents.

With such an agent-based simulation platform, we conduct a comprehensive simulation on the classic “guess 2323\frac{2}{3}divide start_ARG 2 end_ARG start_ARG 3 end_ARG of the average” game (Nagel, 1995; Camerer et al., 2004) to demonstrate the improvements and advances brought by the infrastructure introduced above. Firstly, we conduct agent-based simulations involving 1 million agents using only 4 devices, showing the scalability and efficiency of the platform. Then, we incorporate agents using different LLMs of different sizes, equipped with different prompts and diverse background settings, resulting in various and realistic behaviors in the simulations. We provide comprehensive observations on both collective and individual behaviors, drawing meaningful and valuable insights from a series of simulation experiments, along with further discussions on helpful tips and open questions. These experimental results confirm the feasibility and great potential of conducting large-scale agent-based simulations in AgentScope. We have released the source code at https://github.com/modelscope/agentscope for future research.

2 Related Works

LLM-Empowered Agent Platforms

With the advances of LLMs, a significant number of agent platforms have been developed to integrate LLMs into real-world applications and assist humans in problem-solving (Wu et al., 2023; Hong et al., 2023; Li et al., 2023b; Significant-Gravitas, 2023; Team, 2023; Gao et al., 2024). These platforms can be categorized into single-agent platforms and multi-agent platforms. The single-agent platforms include AutoGPT (Significant-Gravitas, 2023), LangChain (langchain ai, 2024a), ModelScope-Agent (Li et al., 2023a), and Transformers Agents (Wolf et al., 2020), which are proposed to resolve practical tasks using LLMs. On the other hand, multi-agent platforms like MetaGPT (Hong et al., 2023), Auto-Gen (Wu et al., 2023), CAMEL Li et al. (2023b), and LangSmith (langchain ai, 2024b) employ multi-agent collaboration to tackle more complex challenges, including software programming (Hong et al., 2023; Qian et al., 2023), data science (Hong et al., 2024), social simulation (Park et al., 2023), game-playing (Gao et al., 2024), etc. Although remarkable progress has been made, applications built on these platforms can currently be limited in the scale of agents and suffer from low efficiency, hindering their potential for large-scale simulations. To address these limitations, we enhance AgentScope with new features and components to support very large-scale agent-based applications and improve the running efficiency of these applications.

Agent-Based Simulation Frameworks

Due to the ability of LLMs to imitate human behaviors, agent-based simulation has become an attractive topic in the research community (Gürcan, 2024; Matsumoto et al., 2024; Sorokovikova et al., 2024; Sreedhar and Chilton, 2024; Ye and Gao, 2024; Team et al., 2024; Park et al., 2022). Previous studies have explored the integration of LLMs in various fields, including education (Yue et al., 2024), economic (Matsumoto et al., 2024), societal study (Park et al., 2023; Ye and Gao, 2024; Gao et al., 2023; Ren et al., 2024), transportation (Jin et al., 2023), healthcare (Zhang et al., 2023) etc. Recently, researchers have built up several LLM-based or agent-based simulation frameworks. For instance, Vidur (Agrawal et al., 2024) is a simulation framework that focuses on providing high-throughput LLM services, Ataei et al. (2024) is proposed for design requirements elicitation, Cheng et al. (2023) is developed to evaluate the level of caricature, and Ren et al. (2024) is designed to simulate the behaviors of web search users. However, these existing frameworks are domain-specific, making it challenging for users to conduct large-scale agent-based simulations for a wide variety of applications. To tackle this, we design a multi-agent-based simulation platform AgentScope and provide easy-to-use configurable tools to ease the heavy workload associated with conducting various large-scale simulations.

3 Infrastructure

To provide the basic functionalities required for conducting agent-based simulations, including LLM services, memory management, and agent interactions, we adopt AgentScope, a user-friendly multi-agent platform designed for flexible Standard Operating Procedure (SOP) tasks, as our foundation framework. We further develop several new features and components, making it more convenient and feasible to support very large-scale simulations involving multiple agents.

Specifically, we first design an actor-based distributed mechanism (Sec. 3.1) that serves as the underlying technological infrastructure for conducting large-scale simulations, providing great scalability and high efficiency. Building upon such an infrastructure, we enable both inter-agent interactions and agent-environment interactions (Sec. 3.2) to facilitate multi-agent simulations, which forms the core components that drive the simulated dynamics. To improve the diversity of agents involved in the simulations, we allow users to set heterogeneous configurations for agents by specifying their population distributions and detailed background settings (Sec. 3.3). Furthermore, we build a graphical user interface to monitor and manage the distributed agents on different devices, making it easy to observe and organize large-scale agents in the simulations (Sec. 3.4). In the following subsections, we elaborate on the details of these proposed enhancements.

3.1 Actor-based Distributed Mechanism

The actor model is a mathematical model of concurrent computation, where each actor acts as a basic computing unit, receives messages, and computes independently (Agha, 1985). Based on the actor model, we build a distributed mechanism to provide great scalability and high efficiency for agent-based simulation. The proposed actor-based distributed mechanism serves as the basic technological infrastructure, featuring automatic parallel execution and centralized workflow orchestration.

Automatic Parallel Execution

In a multi-agent simulation, the interactions between agents follow an atomized pattern, where interactions occur within small isolated cliques (Matsumoto et al., 2024; Sorokovikova et al., 2024). Such a pattern holds significant potential for parallelization, leading to substantial gains in efficiency. However, achieving parallelization is non-trivial as it requires the formal definition of interactions, the identification of potential parallelizable computations, and subsequent optimization.

To provide automatic parallelization in AgentScope, we first format the inter-agent interactions in the simulation as a communication graph, where each vertex v𝑣vitalic_v represents an agent, and each directed edge e𝑒eitalic_e represents a message passing from one agent to another. Note that the communication graph can be a directed cyclic graph that might be highly dynamic and uncertain, which is different from the concept of a computational graph in machine learning (Ansel et al., 2024) and thus makes it much more challenging in providing automatic parallelization.

With the communication graph, we can dynamically identify the agents that are ready to be executed, which indicates that they do not depend on the outputs of others or that all their dependencies have been fulfilled. These agents are executed automatically in parallel, utilizing the maximum available resources. As they finish one by one, more blocked agents become active and begin their executions.

Such a design of automatic parallel execution is well-suited for the actor-based distributed mechanism. Note that we employ the multi-process mode rather than the asynchronous programming to further improve efficiency, wherein each agent can run in a separate process and be treated as an independent node in the communication graph. As a result, an agent’s internal computations can only be triggered when it receives the required messages. In this way, each agent only relies on those from which they receive the necessary messages, and the communication graph automatically divides into independent subgraphs based on their dependencies. With these advanced designs and implementations in AgentScope, users only need to specify the message passing paths among agents, the multi-agent simulation can be automatic parallel execution without additional effort.

An example of automatic parallel execution is shown in Fig. 1, where agent-B and agent-C are independent of each other but rely on the messages from agent-A, allowing them to be executed in parallel once the execution of agent-A is completed. In contrast, agent-D and agent-E cannot be executed in parallel, as agent-E depends on messages from both agent-B and agent-D.

Centralized Workflow Orchestration

In addition to the proposed automatic parallel execution, the workload associated with simulation orchestration is also important for supporting very large-scale simulations, as users need to set up and manage distributed agents across multiple devices. These requirements might not be well satisfied when there is a lack of a complete and explicit view of the entire workflow during the development and execution phases.

To resolve these issues, we further enhance the proposed actor-based distributed mechanism by allowing users to orchestrate the simulation explicitly and centrally. For simplicity, we refer to the central process that runs the simulation as the center. For each distributed agent, we adopt a proxy in the center that serves as the substitute for it, hiding the remote calculation from users and being responsible for invoking functions and receiving messages from the corresponding distributed agent. With the proxy, users can specify the communications between distributed agents in the center.

However, when implementing such a centralized workflow orchestration, a question arises: How can sequential execution at the center be compatible with automatic parallel execution? To tackle this, we elaborate on the concept of placeholder, which ensures that the workflow execution in the center is not blocked by the calculations in distributed agents. In the central process, once an agent proxy receives a message, it immediately returns a placeholder and forwards the message to its corresponding distributed agents. For a distributed agent, when receiving a message, it first checks if the message contains a placeholder to decide whether it needs to request and wait for the actual value from the agents indicated on the placeholder. Agents execute their inner computations accordingly as soon as they receive all actual values in placeholders.

Fig. 2 shows how the placeholder works. From the example we can see that, agent-A receives a greeting message x𝑥xitalic_x and sends a response message y𝑦yitalic_y to agent-B. After that, agent-B replies with a message z𝑧zitalic_z when it receives y𝑦yitalic_y. In fact, from the view of computation, the center sends message x𝑥xitalic_x to agent-A, provides the placeholder of y𝑦yitalic_y to agent-B, and requests for the real value of z𝑧zitalic_z to construct the workflow. Upon receiving the placeholder of y𝑦yitalic_y, agent-B sends a request and waits for agent-A to finish calculating y𝑦yitalic_y. When agent-B receives y𝑦yitalic_y, it begins to calculate and return z𝑧zitalic_z to the center. Note that the construction of the workflow graph (the green part) and the computation (the blue part) start at the same time, while the construction of the workflow graph is instantaneous. In this way, we allow the independent distributed agents to run in parallel.

In summary, enhanced by the proposed actor-based distributed mechanism, AgentScope offers high efficiency in inter-agent communications, enables parallel execution of multi-agents, and allows centralized workflow orchestration to simplify usage.

3.2 Agent-Environment Interactions

The agent-environment interactions are crucial alongside the above inter-agent communication in simulations, which allows agents to access the environment, respond to changes in it, and alter it if needed. Considering the large number of agents, the agent-environment interactions should be capable of handling high-frequency access. For example, in a social simulation like AI town (Park et al., 2023), the environment includes timeline and map with the interactive items on it. Agents might frequently check and interact with the items on the map, and the change of these items should also trigger the reactions of agents.

To satisfy such requirements, we abstract the environment operations into registering, querying, updating, removing, and monitoring by providing a base class, which can be adapted to various underlying storage databases, including key-value stores, relational databases, NoSQL databases, and so on. We provide two dimensions for users to perform agent-environment interactions, i.e., timeline and location. For the timeline, users can set specific triggers to make the agents access the global time and adjust their behaviors accordingly. For the location, the environment serves as a map maintaining the locations of agents and providing hook functions to trigger the interactions with agents or items nearby. Such a design provides a flexible environment support for simulating various real-world scenarios.

We implement the environment module as a distributed node, which performs communication with agents through Remote Procedure Call (RPC) to allow high-frequency access. When receiving requests from agents, the environment first categorizes these requests into different queues according to the accessed data, and then processes the requests in different queues simultaneously to resolve the issues of data conflicts. Users can set the maximum concurrency based on available hardware resources.

Furthermore, we provide a multi-layer environment structure to support group-wise information synchronization in the simulation, as illustrated in Fig. 3. Different environments can be established for different groups of agents to provide interactive items and shared information. Upon these environments, a global environment can be set up for global synchronization. Such a multi-layer environment structure can be employed in simulations that necessitate intra-group collaborations and inter-group information differentiation. A use case of the multi-layer environment can be found in Sec. 4.6.

3.3 Heterogeneous Configurations

In a simulation, agents are expected to act as humans with diverse backgrounds, including different ages, genders, careers, nationalities, education, experiences, etc. An intuitive approach is to add these background settings of agents in their system prompts, providing guidance for agents on the roles to play and actions to take. However, for large-scale simulations, providing diverse, heterogeneous, and reasonable background settings for agents can be laborious and time-consuming, especially when precise control of different population distributions is required in certain simulations. This problem motivates us to provide tools in AgentScope to assist users in effortlessly setting up large-scale agents with diverse background settings.

Configurable Tool

Specifically, users can begin by defining the total population of the simulation, and then specify the distributions of the population from various perspectives. We provide some widely-used distribution templates in AgentScope for convenient usage, from the aspects of age, gender, occupation, nationality, and education. Besides, the proposed configurable tool allows for easy extension of new aspects, enhancing its flexibility to meet diverse requirements. Listing 1 in Appendix A shows an example configuration file for a group of people with different educational levels, in which a distribution can be specified by the proportions of its different components.

Automatic Background Generation Pipeline

After configurations have been provided via the above tool, more detailed and heterogeneous background settings can be automatically generated to instantiate the agents. Specifically, when users start a simulation, we draw specific values from the distributions based on the configurations, convert them into a JSON format, and fill them into a meta prompt to produce the completed instructions for background generation tasks. These instructions are utilized by LLMs to generate heterogeneous background settings. To introduce more diversity, the generation process involves adjusting the random seed and the temperature used by LLMs. Several examples of the generated background settings can be found in Sec. 4.5, along with the results and analysis of the simulations involving diverse agents. The adopted meta prompt can be found in Appendix B.

In summary, by providing the easy-to-use configurable tool and the automatic background generation pipeline in AgentScope, we simplify the configuration process for conducting large-scale agent-based simulations, and further promote its research and applications.

3.4 Management for Large-scale Agents

In a simulation, users need to manage and monitor a large number of agents distributed across different devices, which might become intractable to handle manually as the scale and complexity of the simulation increase. To tackle this, we incorporate advanced forms of agent management and monitoring, named AgentScope-Manager. Specifically, when users start a simulation, servers are first launched on all the remote devices, which provide resident services to remotely create, monitor, and stop distributed agents. These servers are responsible for managing the lifecycle of distributed agents and synchronizing their information to a web-based visual interface, as illustrated in Fig. 4. The web-based visual interface provides a comprehensive overview of all registered servers and all deployed agents on different devices, from which users can view the server’s identity, IP address, running status, and utilization of computing resources.

The AgentScope-Manager also simplifies the management and monitoring processes for conducting multiple simulations. Since the servers can be reused in different simulations, users don’t need to restart the distributed servers between two simulations. Users can efficiently configure, launch, and terminate servers and agents during the simulations as needed. With such a design, we streamline the management process by focusing on servers rather than individual agents, thereby improving the efficiency and effectiveness of managing large-scale agent systems in AgentScope.

In a nutshell, based on AgentScope, we implement an actor-based distributed mechanism that serves as the underlying technological infrastructure, which is well-designed for both inter-agent and agent-environment interactions, and provides great scalability and high efficiency in conducting large-scale agent-based simulations Building on the infrastructure, we provide heterogeneous configurations and the management server, enhancing the diversity of agents and simplifying the observation and organization of the simulation process.

4 Experiments

In this section, we conduct large-scale simulations to show the improvements and advances brought by the proposed infrastructure and components in AgentScope. Meanwhile, we provide detailed observations and in-depth discussions on the agents’ collective and individual behaviors, drawing valuable insights.

4.1 Settings

We set up a large number of agents to participate in the classic game guess the 2323\frac{2}{3}divide start_ARG 2 end_ARG start_ARG 3 end_ARG of the average, where each agent reports a real number between 00 and 100100100100 and the agent who reports a number closest to 2323\frac{2}{3}divide start_ARG 2 end_ARG start_ARG 3 end_ARG of the average of all the reported numbers wins the game. In this game, intuitively the highest possible average is 100100100100. Therefore, for winning the game, agents tend to report a number no larger than 100×23=66⁤23100236623100\times\frac{2}{3}=66\frac{2}{3}100 × divide start_ARG 2 end_ARG start_ARG 3 end_ARG = ⁤ 66 divide start_ARG 2 end_ARG start_ARG 3 end_ARG. Once all agents adopt this strategy, 66⁤23662366\frac{2}{3}⁤ 66 divide start_ARG 2 end_ARG start_ARG 3 end_ARG becomes the new highest possible average and thus they tend to report a number no larger than 66⁤23×23=44⁤49662323444966\frac{2}{3}\times\frac{2}{3}=44\frac{4}{9}⁤ 66 divide start_ARG 2 end_ARG start_ARG 3 end_ARG × divide start_ARG 2 end_ARG start_ARG 3 end_ARG = ⁤ 44 divide start_ARG 4 end_ARG start_ARG 9 end_ARG. This process continues until the average becomes 0 and all agents report 0, indicating that the game has reached its Nash equilibrium. However, considering that agents may not always be rational, those agents who report 0 cannot always win the game since the average does not converge to 0 immediately. Agents should carefully take into account the possible actions of others before reporting their numbers. Meanwhile, agents can adjust their strategies in a multi-round game according to the average reported numbers in previous rounds.

Note that all the experiments in this section follow the aforementioned settings. With this game, we aim to demonstrate the capabilities of AgentScope in supporting large-scale agent-based simulations, and show how agents perform considerations and reasoning concerning their system prompts, background settings, and other information obtained in the simulations.

Devices & LLMs

The experiments are conducted on a cluster containing multiple devices, each of which is equipped with 8 A100-80G GPUs, a 64-core CPU, and 1 TB of memory. We adopt vLLM (Kwon et al., 2023) as the LLM inference engine to handle highly concurrent service requests. We utilize six powerful and popular open-source LLMs of different sizes. We adopt their instruction versions due to their enhanced ability to follow instructions. The details of the adopted LLMs are provided below:

Llama3-8B / Llama3-70B (Meta, 2024): A series of open-source LLMs developed by Meta, which have been pre-trained and fine-tuned on a massive corpus.

Qwen2-7B / Qwen2-72B (Yang et al., 2024): The second generation of Qwen open-source LLMs, developed by Alibaba.

MistralAI-8x7B / MistralAI-8x22B (MistralAI, 2024): The open-source mixture-of-experts (MOE) LLMs released by MistralAI, where each MOE LLM consists of eight 7B/22B models.

Due to the limited GPU memory, unless otherwise specified, we deploy eight Qwen2-7B / Llama3-8B models, two Qwen2-72B / Llama3-70B / MistralAI-8x7B models, or one MistralAI-8x22B model on each device. The generation temperature for all LLMs is set to 1.01.01.01.0 to promote the diversity of responses. Besides, to prevent errors in response format, each agent executes two LLM server calls in every game round. The first call is used to generate the response, including its thought process and the reported number, while the second call is made to extract the reported number correctly.

System Prompts

We provide system prompts for agents to guide them in defining their dialogue style, background knowledge, task requirements, and so on. To be more specific, for playing this game, the system prompt incorporates the game rules and response formats, as illustrated in Prompt 3.4. Besides, we can include further behavioral guidance in the system prompts to encourage behaviors that more closely resemble those of real human beings. For example, inspired by “chain-of-thought” studies (Wei et al., 2022; Wang et al., 2023), we ask agents to think step by step before reporting their numbers, producing the system prompt shown in Prompt 3.4.

Further explorations of system prompts, providing detailed instructions to enhance the performance and diversity of agents, can be found in Section 4.4.

4.2 Scalability and Efficiency

First of all, we conduct a series of experiments to show the scalability and efficiency of the agent-based simulations supported by the proposed actor-based distributed mechanism (see Sec. 3.1). Specifically, we illustrate how the overall simulation running time changes as the number of participating agents grows when using LLMs of different sizes, including Llama3-8B and Llama3-70B. In addition to the model sizes, the system prompt provided to agents is also a factor that can influence the running time, since some prompts (e.g., Prompt 3.4) may encourage agents to generate longer responses and thereby lead to longer response time. From the experimental results shown in Fig. 5, we can obtain the following observations and insights.

(i) We support an agent-based simulation involving 1 million agents, which can be completed in 12 minutes using 4 devices. In Fig. 5(a), we fix the device number to 4 and record the simulation running time as the number of agents grows from 100 to 1M. It can be observed that the simulation involving 1 million agents finishes in 12 minutes when using Llama3-8B with Prompt 3.4, while it takes 85 minutes if we choose Prompt 3.4, as the number of averaged response tokens grows by more than 150-fold222The number of response tokens when using different LLMs and system prompts are summarized in Fig. 16 in Appendix C.. For the heaviest inference workload, i.e., when agents adopt Llama3-70B and Prompt 3.4, it takes around 10.6 hours to complete the simulation.

(ii) The proposed actor-based distributed mechanism significantly improves the efficiency of large-scale agent-based simulations. To better demonstrate the improvements brought by the proposed actor-based distributed mechanism, we adopt a dummy model request (i.e., agents sleep for 1 second and generate random numbers rather than posting the requests) in the simulation to remove the impact of the LLM inference speed. The experimental results summarized in Fig. 5(b) show that, completing an agent-based simulation with the proposed actor-based distributed mechanism involving 1 million agents only takes 40 seconds, whereas simulations using serial execution or asynchronous mode in Python (adopted by existing works (Wu et al., 2023; Hong et al., 2023)) require around 12 days and 8.6 hours, respectively.

(iii) Increasing the number of devices can proportionally reduce the simulation running time. As shown in Fig. 5(c), we maintain the number of agents at 10,000 and vary the number of devices used in the simulation. For Llama3-70B with Prompt 3.4, the simulation running time decreases from 22 minutes to 5.6 minutes as the number of devices increases from 1 to 4. Such a phenomenon can be attributed to a reduction in the number of agents served within one device. As a comparison, we increase the number of devices from 1 to 4, and deploy 10,000 agents on each device, respectively. As illustrated in Fig. 5(d), the running time remains nearly the same as the number of devices and agents increases, which demonstrates the horizontal scalability of AgentScope.

In summary, the proposed actor-based distributed mechanism in AgentScope enhances the efficiency in conducting very large-scale agent-based simulations, and offers great scalability by allowing users to expand the scale of agents from the addition of devices.

4.3 Simulation Results and Analysis

In this section, we add some detailed information with six LLMs and two system prompts. We summarize the experimental results in Fig. 6, from which we derive the insights below, and provide more detailed results (e.g., distributions and statistics of the reported numbers) and individual-level observations in Appendix D and Appendix E.1, respectively.

From the comparisons in the figures, we observe that when utilizing a basic system prompt Prompt 3.4 with most LLMs, agents generally tend to report numbers around 50. However, it is worth noting that agents with MistralAI-8×\times×7B and MistralAI-8×\times×22B, report smaller numbers (36.63 and 31.69 in average, respectively) than other agents. These results indicate that without providing specific instructions in system prompts, the performance of agents can be different due to the LLMs they adopt, influenced by factors such as model sizes and model architectures.

When we change the system prompts to a chain-of-thought prompt (i.e., Prompt 3.4), the reported numbers of agents move forward zero markedly, with the winning numbers (i.e., 2323\frac{2}{3}divide start_ARG 2 end_ARG start_ARG 3 end_ARG of the average) being much smaller than those using Prompt 3.4, decreasing from 33.7033.7033.7033.70 to 12.7612.7612.7612.76. Meanwhile, we notice that more than 30%percent3030\%30 % of agents using MistralAI-8×\times×22B report around 00, leading to the Nash Equilibrium of this game. These observations in the simulations demonstrate the effectiveness and importance of providing suitable system prompts for guiding agents to perform the thought processes.

Moving a forward step, we expand the simulation to multiple rounds. We inform agents of the winner number in the previous round at the beginning of each round except the first one, and request each agent to report a number at each round. Informing these winner numbers to agents enables them to adjust their strategies accordingly. Such a process is implemented based on the agent-environment interaction mechanism in AgentScope (see Sec. 3.2), with the winning number being set as an interactive item in the environment.

The simulation results are demonstrated in Fig. 7. From these results we can observe that, as the game progresses from round to round, the reported numbers of agents gradually converge to 0, indicating that agents have a good understanding of this game and are capable of considering other agents’ behaviors and making rational decisions. Some case studies on the behaviors of agents are summarized in Appendix E.2. Similarly, the chain-of-thought prompts can accelerate the game to reach its Nash equilibrium. For example, in the fifth round, the average reported number of agents using Qwen2-72B with Prompt 3.4 is 2.022.022.022.02, which is significantly smaller than those using Prompt 3.4, reporting 25.1625.1625.1625.16.

It is worth noting that these experimental results are consistent with previous studies (Nagel, 1995; Camerer et al., 2004) in social simulation, which confirms the reliability and significant potential of multi-agent-based simulations. In the following subsections, we are continuing to explore how different configurations affect the simulation results, such as system prompts, background settings, mixtures of LLMs, and so on.

4.4 Detailed Instructions in System Prompts

To further explore the impact of behavioral guidance on agents, we incorporate more detailed instructions tailored for this game in the system prompts. Specifically, we remind agents that all their competitors are rational and will try to adjust the reported numbers by analyzing others’ strategies, resulting in Prompt 4.3 and Prompt 4.3 respectively. With adding such behavioral guidance in the system prompts, we expect agents can engage in more thoughtful and diverse considerations before reporting their numbers, thereby making simulations more practical, meaningful, and interesting.

The comparisons among different system prompts are illustrated in Fig. 8. In general, from the figure we can observe that the reported numbers are closer to 0 when using Prompt 4.3 and Prompt 4.3 than those of Prompt 3.4 and Prompt 3.4. These experimental results indicate that detailed instructions are more effective than general guidance (e.g., “think step by step”) in encouraging agents to perform thoughtful considerations and take rational actions. Several case studies shown in Appendix E.3 confirm the improvements brought by adding detailed instructions in the system prompts.

Furthermore, in a multi-round game illustrated in Fig. 7, agents using Prompt 4.3 and Prompt 4.3 can converge to the Nash equilibrium faster than those using Prompt 3.4 and Prompt 3.4. For example, agents with Qwen2-72B report 35.3035.3035.3035.30, 6.116.116.116.11, 1.551.551.551.55, and 1.691.691.691.69 in average at the third round when using Prompt 3.4, Prompt 3.4, Prompt 4.3, and Prompt 4.3, respectively, while in the fifth round, the average of reported numbers become 25.1625.1625.1625.16, 2.022.022.022.02, 0.140.140.140.14, and 0.150.150.150.15.

It is worth noting that the impact of the system prompts on different LLMs can be different. For example, from the perspective of the range of the reported numbers (i.e., the maximum and minimum value of reported numbers among all agents), employing Prompt 4.3 and Prompt 4.3 in Qwen-72B can significantly reduce the maximum number, while that of Mistral-8×\times×22B remains unchanged. Besides, using detailed instructions in system prompts might increase the token number of responses, as summarized in Appendix C, because agents are likely to consider multiple aspects before providing an answer.

4.5 Diverse Background Settings

The diversity of agents is a critical factor in agent-based simulations. In Sec. 3.3, we introduce the proposed configurable tool and background generation pipeline designed to automatically instantiate agents with diverse background settings. In this subsection, we conduct simulation experiments involving diverse agents, considering their educational levels and occupations.

Specifically, we divide the agents into several groups, each of which consists of 200 agents. For each group, we manually provide a basic configuration and utilize LLMs to generate a detailed description for each agent, thereby further enhancing the diversity of the agents. These generated background settings are added to the system prompts and labeled as “character background”, using Prompt 4.5.

Agents with Different Educational Levels

Firstly, we set up a simulation experiment involving agents with different educational levels, in which we assign five different educational levels to agents, including elementary school, high school, bachelor, master, and Ph.D. An example the generated background settings can be found in Appendix E.4.

The simulation results are illustrated in Fig. 9, from which we can derive the following insights. In general, the higher the educational level of agents, the lower the average reported numbers, indicating more rational behaviors in this game. For example, when using Qwen2-72B, the average numbers reported by agents with a Ph.D. education are much lower than those reported by agents characterized as primary school students. Meanwhile, from the observations on the individual-level behaviors, we notice that agents can effectively perform reasoning processes and make corresponding decisions according to the assigned roles. We provide some case studies in Appendix E.5. Besides, different LLMs demonstrate varying sensitivities to educational levels in the background settings. For example, MistralAI-8×\times×22B has the least sensitivity, with the largest difference in average reported numbers is 3.49 (32.73 v.s. 29.24), while that of Llama3-70B and Qwen2-72B is 5.72 and 8.24, respectively.

Agents with Different Occupations

We also conduct a simulation experiment involving agents with different occupations, in which we assign six different occupations to agents, including professors in game theory, economists, psychologists, athletes, artists, and writers. An example of the generated background settings can be found in Appendix E.4.

As demonstrated in Fig. 10, the experimental results confirm the impact of the different occupational descriptions assigned to the agents. It can be concluded that agents characterized as professors in game theory and economists tend to report smaller numbers than other agents. Some case studies on individual-level behaviors, as shown in Appendix E.5, further confirm that agents’ considerations and actions are consistent with their respective occupations. For example, agents characterized as professors in game theory tend to adjust their behaviors under the assumption that others may not be rational enough, while those characterized as artists might adopt a straightforward strategy.

In summary, in large-scale simulations involving multiple agents, the diversity of agents can be enhanced by providing different descriptions of their background settings. Serving as examples, we provide simulation experiments involving agents with different educational levels and occupations, which show that the agents’ behaviors are consistent with their background settings. Meanwhile, we provide empirical usage examples and case studies of the proposed heterogeneous configurations introduced in Sec. 3.3, enabling users to effortlessly set up a large number of agents with diverse background settings.

4.6 Mixture of LLMs

In this subsection, we conduct a simulation experiment involving agents employing a mixture of LLMs. Specifically, we configure agents employing Llama3-70B, MistraAI-8×\times×22B, and Qwen2-72B, with 500 agents assigned to each LLM. We conduct both individual-level simulations, where each agent plays the game independently, and group-level simulations, where agents using the same LLMs form a group.

Individual-Level Simulation

The simulation results are illustrated in Fig. 11. At the first round of the game, we observe that agents with Llama3-70B exhibit similar behaviors, tending to report numbers around 33333333, while agents with MistralAI-8×\times×22B consistently report 00. On the other hand, agents with Qwen2-72B exhibit more diverse behaviors, reporting a wider range of numbers, with most of them falling between 00 and 50505050. These behaviors can be attributed to the preferences of the LLMs, which may be related to their architectures, training corpus, etc.

As the game progresses round by round, agents are informed of the winning number from the previous round and adjust their strategies accordingly. As shown in Figure 11(b), the majority of agents report numbers close to the winning number in the previous round, with approximately 59.7% reporting numbers smaller than the previous winning number. We present a typical response in Appendix E.6, where an agent adopts a conservative strategy and chooses a number slightly smaller than the winner number 15.9015.9015.9015.90.

In the pie chart of Fig. 11, we further show the winners of each round in the simulation, grouped by their employed LLMs. To reduce the randomness in the simulation, we regard those agents whose reported numbers fall within the range of ±0.5plus-or-minus0.5\pm 0.5± 0.5 from 2323\frac{2}{3}divide start_ARG 2 end_ARG start_ARG 3 end_ARG of the average as the winners. The figure shows that in the first and fifth rounds, agents equipped with Qwen2-72B outperform other agents, while agents equipped with MistralAI-8×\times×22B emerge as winners in the second, third, and fourth rounds. Notably, almost all agents tend to report numbers near 00 in the final round, indicating agents can perform reasonable considerations and behaviors to promote this game approach to its Nash equilibrium.

Group-Level Simulation

In the group-level simulation, agents are divided into three groups. Each agent reports a number, and the average number among the agents within each group is regarded as the reported number of this group. Finally, the group that reports a number that is closest to the 2323\frac{2}{3}divide start_ARG 2 end_ARG start_ARG 3 end_ARG of the average among the groups’ reported numbers wins the game. The system prompt adopted, which specifies the above game rules, is shown in Prompt 4.6.

Meanwhile, we modify the information announced between different rounds in a multi-round game. Starting from the second round, in addition to the winning number of the previous round, all agents are informed of the reported numbers from all three groups in the previous round. This provides additional guidance for agents to adjust their strategies. Such a group-wise synchronization is implemented based on the agent-environment interaction mechanism (see Sec. 3.2), allowing agents within the same group to share an interactive environment for synchronization.

The simulation results are shown in Fig. 12. From the figures, it can be observed that agents within the same group quickly converge to similar behaviors as the game comes to the second round, as indicated by their reported numbers falling within a narrow range. Agents using Qwen2-72B and Llama3-70B exhibit relatively consistent behaviors, while some agents using MistralAI-8×22B might exhibit different behaviors, such as reporting larger numbers. For better understanding, we provide several examples of agents’ behaviors in Appendix E.7, showing how agents consider the reported numbers from other groups and strategically choose the reported numbers to benefit their own group. Such a phenomenon confirms that agents can perform reasonable thoughts and actions to help achieve a collective goal.

4.7 Further Discussions

In this subsection, we provide further discussions on usage tips and open questions when conducting large-scale agent-based simulations.

Impact of the Prior Knowledge of LLMs

As “guess the 2323\frac{2}{3}divide start_ARG 2 end_ARG start_ARG 3 end_ARG of the average” is a classic game, it is not surprising that LLMs might have acquired prior knowledge from their training corpus. To measure the impact of this prior knowledge, we change the ratio from 2323\frac{2}{3}divide start_ARG 2 end_ARG start_ARG 3 end_ARG to 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG and 5110051100\frac{51}{100}divide start_ARG 51 end_ARG start_ARG 100 end_ARG, and then conduct the simulation experiments, respectively. Note that changing the ratios does not alter the fundamental nature of the game, and as a result, the behaviors (such as the reasoning process) of the participating agents are expected to remain similar if the LLMs indeed understand the game.

The experimental results are shown in Fig. 13, from which we can observe that there are significant differences in the agents’ performance when the ratio is set to 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG and 5110051100\frac{51}{100}divide start_ARG 51 end_ARG start_ARG 100 end_ARG, although both cases should be very similar. We observe that more agents tend to report large numbers (e.g., around 50) when the ratio is 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG compared to the scenario with a ratio of 5110051100\frac{51}{100}divide start_ARG 51 end_ARG start_ARG 100 end_ARG, which indicates that some agents might not follow the game when setting the ratio to 5110051100\frac{51}{100}divide start_ARG 51 end_ARG start_ARG 100 end_ARG. In response, we add a note into system prompts to encourage LLMs to draw from the classic game, stating This game is a variation of the famous “guess the 2/3 of the average” game. The results summarized in Fig. 13, denoted as “51100+limit-from51100\frac{51}{100}+divide start_ARG 51 end_ARG start_ARG 100 end_ARG +note”, show that the winning number decreases from 11.8511.8511.8511.85 to 6.466.466.466.46, aligning more closely with that of using the ratio 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG, reporting 6.216.216.216.21. These simulation experiments highlight the impact of LLMs’ prior knowledge, and the effects of using a prompt to explicitly guide the agents and help them understand the settings of the simulation.

When the Nash equilibrium Is Not 0

We also set up another variant of the game to validate the agents’ capabilities in understanding and reasoning. Specifically, we modify the winning criteria so that the Nash equilibrium becomes 10101010, instead of 00 as in the classic game. As a result, LLMs might not exhibit reasonable thoughts and behaviors if they have a limited understanding of the game or poor reasoning ability. The adopted system prompt is shown in Prompt 4.7.

As shown in the Fig. 14, we observe that some agents using the Qwen2-72B and MistralAI-8×\times×22B are able to reason out the Nash equilibrium point in the first round. As the game progresses, the numbers reported by these agents gradually approach 10101010, demonstrating their understanding of the game and ability to make reasonable decisions. These findings are further confirmed by our observations of individual-level behaviors, as detailed in the Appendix E.8.

Besides, from observations on the individual-level behaviors, we can identify some typical mistakes made by agents. For example, as shown in Appendix E.9, some agents might make simple calculation errors, such as calculating a wrong average value, leading to incorrect results despite having a correct reasoning process. Some agents may follow a logical step-by-step process (e.g., calculating the Nash equilibrium) but still make incorrect decisions (e.g., directly reporting 0). But even so, these mistakes are infrequent and do not affect the overall conclusions drawn from group-level observations, as we previously demonstrated.

These observations indicate that while agents may make simple calculation errors (especially in decimal calculations), they exhibit powerful reasoning abilities and show great potential for use in simulations.

Impact of Temperature

The temperature serves as a hyperparameter for controlling the diversity of LLM generation. In this subsection, we set up 1,000 agents equipped with Qwen2-72B and Prompt 3.4, and vary the temperature within the range of {0.05,0.25,0.50,1.00}0.050.250.501.00\{0.05,0.25,0.50,1.00\}{ 0.05 , 0.25 , 0.50 , 1.00 }. The experimental results are presented in Fig. 15, from which we can observe that as the temperature increases, the standard deviation rises from 6.506.506.506.50 to 8.208.208.208.20 while the average number shows only slight changes. These findings indicate that the generation temperature can impact the distributions of reported numbers, but may not significantly alter the overall average when the number of participating agents is large.

Playing as a Seven-Year-Old Child

In the simulation involving agents with diverse background settings (see Sec. 4.5), we attempt to assign an agent with an extreme role: playing as a seven-year-old child in the game. From the generation of this agent, as shown in Appendix E.5, we can see that the agent’s behaviors do not align with those expected of a seven-year-old child, as they demonstrate abilities such as performing calculations and engaging in multi-step reasoning. These findings highlight the importance for users to evaluate the model’s capabilities for playing certain characters and to carefully design the background settings before conducting simulations. Although various background settings can be assigned to agents, they may not exhibit consistent behaviors in some extreme cases.

5 Conclusions

In this paper, we first discuss several key factors of concern for conducting large-scale agent-based simulations, including scalability and efficiency, population distribution and agent diversity, and ease of management. Motivated by these factors, we propose and implement several enhancements in AgentScope, including an actor-based distributed mechanism that provides automatic parallel execution and centralized workflow orchestration, the flexible environment support to simulate various real-world scenarios, the heterogeneous configurations that allow users to specify population distributions and to automatically generate agents with diverse background settings, and a web-based interface to simplify the management of large-scale agents. These enhancements make AgentScope more flexible and convenient for supporting large-scale agent-based simulations. We conduct a series of simulation experiments with AgentScope and provide detailed observations on the diverse and realistic behaviors of agents, highlighting its great potential to further advance research and applications in agent-based simulations.

Ethics Statement

This study is concentrated on proposing a novel multi-agent framework designed to assist researchers in conducting various simulation experiments, drawing valuable behavioral observations and insights to promote the development of related fields. We are committed to upholding principles of equity and fairness, and firmly reject any form of prejudicial discrimination based on age, education level, occupation, or any other characteristics.

References

  • Agha (1985)Gul A. Agha.Actors: a Model of Concurrent Computation in Distributed Systems (Parallel Processing, Semantics, Open, Programming Languages, Artificial Intelligence).PhD thesis, University of Michigan, USA, 1985.
  • Agrawal et al. (2024)Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee, and Alexey Tumanov.Vidur: A large-scale simulation framework for LLM inference.CoRR, abs/2405.05465, 2024.
  • Anil et al. (2023)Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernández Ábrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan A. Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vladimir Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, and et al.Palm 2 technical report.CoRR, abs/2305.10403, 2023.
  • Ansel et al. (2024)Jason Ansel, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain, Michael Voznesensky, Bin Bao, Peter Bell, David Berard, Evgeni Burovski, Geeta Chauhan, Anjali Chourdia, Will Constable, Alban Desmaison, Zachary DeVito, Elias Ellison, Will Feng, Jiong Gong, Michael Gschwind, Brian Hirsh, Sherlock Huang, Kshiteej Kalambarkar, Laurent Kirsch, Michael Lazos, Mario Lezcano, Yanbo Liang, Jason Liang, Yinghai Lu, CK Luk, Bert Maher, Yunjie Pan, Christian Puhrsch, Matthias Reso, Mark Saroufim, Marcos Yukio Siraichi, Helen Suk, Michael Suo, Phil Tillet, Eikan Wang, Xiaodong Wang, William Wen, Shunting Zhang, Xu Zhao, Keren Zhou, Richard Zou, Ajit Mathews, Gregory Chanan, Peng Wu, and Soumith Chintala.PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation.In ASPLOS. ACM, 2024.
  • ANTHROP (2024)ANTHROP.Claude 3.5 sonnet.https://www.anthropic.com/news/claude-3-5-sonnet, 2024.
  • Ataei et al. (2024)Mohammadmehdi Ataei, Hyunmin Cheong, Daniele Grandi, Ye Wang, Nigel Morris, and Alexander Tessier.Elicitron: An LLM agent-based simulation framework for design requirements elicitation.CoRR, abs/2404.16045, 2024.
  • Camerer et al. (2004)Colin F Camerer, Teck-Hua Ho, and Juin-Kuan Chong.A cognitive hierarchy model of games.The Quarterly Journal of Economics, 119(3):861–898, 2004.
  • Cheng et al. (2023)Myra Cheng, Tiziano Piccardi, and Diyi Yang.Compost: Characterizing and evaluating caricature in LLM simulations.In EMNLP, pages 10853–10875, 2023.
  • Gao et al. (2023)Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, and Yong Li.S33{}^{\mbox{3}}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT: Social-network simulation system with large language model-empowered agents.CoRR, abs/2307.14984, 2023.
  • Gao et al. (2024)Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, and Jingren Zhou.Agentscope: A flexible yet robust multi-agent platform.CoRR, abs/2402.14034, 2024.
  • GLM et al. (2024)Team GLM, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xiao Xia, Xiaohan Zhang, Xiaotao Gu, Xin Lv, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, and Zihan Wang.Chatglm: A family of large language models from glm-130b to glm-4 all tools, 2024.
  • Gürcan (2024)Önder Gürcan.Llm-augmented agent-based modelling for social simulations: Challenges and opportunities.CoRR, abs/2405.06700, 2024.
  • Hong et al. (2023)Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, and Chenglin Wu.Metagpt: Meta programming for multi-agent collaborative framework.CoRR, abs/2308.00352, 2023.
  • Hong et al. (2024)Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Wenyi Wang, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zongze Xu, and Chenglin Wu.Data interpreter: An LLM agent for data science.CoRR, abs/2402.18679, 2024.
  • Jin et al. (2023)Ye Jin, Xiaoxi Shen, Huiling Peng, Xiaoan Liu, Jingli Qin, Jiayang Li, Jintao Xie, Peizhong Gao, Guyue Zhou, and Jiangtao Gong.Surrealdriver: Designing generative driver agent simulation framework in urban contexts based on large language model.CoRR, abs/2309.13193, 2023.
  • Kwon et al. (2023)Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica.Efficient memory management for large language model serving with pagedattention.In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023.
  • langchain ai (2024a)langchain ai.Langchain.https://github.com/langchain-ai/langchain, 2024a.
  • langchain ai (2024b)langchain ai.Langsmith.https://docs.smith.langchain.com/, 2024b.
  • Li et al. (2023a)Chenliang Li, He Chen, Ming Yan, Weizhou Shen, Haiyang Xu, Zhikai Wu, Zhicheng Zhang, Wenmeng Zhou, Yingda Chen, Chen Cheng, Hongzhu Shi, Ji Zhang, Fei Huang, and Jingren Zhou.Modelscope-agent: Building your customizable agent system with open-source large language models.In EMNLP (Demos), pages 566–578, 2023a.
  • Li et al. (2023b)Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem.CAMEL: communicative agents for "mind" exploration of large language model society.In NeurIPS, 2023b.
  • Macal (2016)Charles M. Macal.Everything you need to know about agent-based modelling and simulation.J. Simulation, 10(2):144–156, 2016.
  • Macal and North (2010)Charles M. Macal and Michael J. North.Tutorial on agent-based modelling and simulation.J. Simulation, 4(3):151–162, 2010.
  • Matsumoto et al. (2024)Takashi Matsumoto, Yurie Kibi, and Tetsuro Kondo.Purpose model simulation - purpose formation of multi-stakeholder by dialog with llm-based AI.In HCI, volume 14718, pages 110–129, 2024.
  • Meta (2024)Meta.Introducing meta llama3: The most capable openly available llm to date.https://ai.meta.com/blog/meta-llama-3/, 2024.
  • MistralAI (2024)MistralAI.Bienvenue to mistral ai documentation.https://docs.mistral.ai/getting-started/models/, 2024.
  • Mou et al. (2024)Xinyi Mou, Zhongyu Wei, and Xuanjing Huang.Unveiling the truth and facilitating change: Towards agent-based large-scale social movement simulation.CoRR, abs/2402.16333, 2024.
  • Nagel (1995)Rosemarie Nagel.Unraveling in guessing games: An experimental study.The American Economic Review, 85(5):1313–1326, 1995.
  • OpenAI (2023)OpenAI.GPT-4 technical report.CoRR, abs/2303.08774, 2023.
  • Park et al. (2022)Joon Sung Park, Lindsay Popowski, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein.Social simulacra: Creating populated prototypes for social computing systems.In UIST, pages 74:1–74:18. ACM, 2022.
  • Park et al. (2023)Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein.Generative agents: Interactive simulacra of human behavior.In UIST, pages 2:1–2:22, 2023.
  • Qian et al. (2023)Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun.Chatdev: Communicative agents for software development.CoRR, abs/2307.07924, 2023.
  • Ren et al. (2024)Ruiyang Ren, Peng Qiu, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Hua Wu, Ji-Rong Wen, and Haifeng Wang.BASES: large-scale web search user simulation with large language model based agents.CoRR, abs/2402.17505, 2024.
  • Significant-Gravitas (2023)Significant-Gravitas.Autogpt.https://github.com/Significant-Gravitas/AutoGPT, 2023.
  • Sorokovikova et al. (2024)Aleksandra Sorokovikova, Natalia Fedorova, Sharwin Rezagholi, and Ivan P. Yamshchikov.Llms simulate big five personality traits: Further evidence.CoRR, abs/2402.01765, 2024.
  • Sreedhar and Chilton (2024)Karthik Sreedhar and Lydia B. Chilton.Simulating human strategic behavior: Comparing single and multi-agent llms.CoRR, abs/2402.08189, 2024.
  • Team et al. (2024)SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi, Zhitao Gong, Lucy Gonzalez, Kshitij Gupta, Karol Gregor, Arne Olav Hallingstad, Tim Harley, Sam Haves, Felix Hill, Ed Hirst, Drew A. Hudson, Jony Hudson, Steph Hughes-Fitt, Danilo J. Rezende, Mimi Jasarevic, Laura Kampis, Nan Rosemary Ke, Thomas Keck, Junkyung Kim, Oscar Knagg, Kavya Kopparapu, Andrew K. Lampinen, Shane Legg, Alexander Lerchner, Marjorie Limont, Yulan Liu, Maria Loks-Thompson, Joseph Marino, Kathryn Martin Cussons, Loic Matthey, Siobhan Mcloughlin, Piermaria Mendolicchio, Hamza Merzic, Anna Mitenkova, Alexandre Moufarek, Valéria Oliveira, Yanko Gitahy Oliveira, Hannah Openshaw, Renke Pan, Aneesh Pappu, Alex Platonov, Ollie Purkiss, David P. Reichert, John Reid, Pierre Harvey Richemond, Tyson Roberts, Giles Ruscoe, Jaume Sanchez Elias, Tasha Sandars, Daniel P. Sawyer, Tim Scholtes, Guy Simmons, Daniel Slater, Hubert Soyer, Heiko Strathmann, Peter Stys, Allison C. Tam, Denis Teplyashin, Tayfun Terzi, Davide Vercelli, Bojan Vujatovic, Marcus Wainwright, Jane X. Wang, Zhengdong Wang, Daan Wierstra, Duncan Williams, Nathaniel Wong, Sarah York, and Nick Young.Scaling instructable agents across many simulated worlds.CoRR, abs/2404.10179, 2024.
  • Team (2023)XAgent Team.Xagent: An autonomous agent for complex task solving, 2023.
  • Wang et al. (2023)Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou.Self-consistency improves chain of thought reasoning in language models.In ICLR, 2023.
  • Wei et al. (2022)Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al.Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022.
  • Wolf et al. (2020)Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush.Transformers: State-of-the-art natural language processing.In EMNLP (Demos), pages 38–45, 2020.
  • Wu et al. (2023)Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang.Autogen: Enabling next-gen LLM applications via multi-agent conversation framework.CoRR, abs/2308.08155, 2023.
  • Yang et al. (2023)Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, Juntao Dai, Kun Fang, Lei Su, Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, Weipeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, and Zhiying Wu.Baichuan 2: Open large-scale language models.CoRR, abs/2309.10305, 2023.
  • Yang et al. (2024)An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, Tianhao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Zeyu Cui, Zhenru Zhang, and Zhihao Fan.Qwen2 technical report.CoRR, abs/2407.10671, 2024.
  • Ye and Gao (2024)Frank Tian-fang Ye and Xiaozi Gao.Simulating family conversations using llms: Demonstration of parenting styles.CoRR, abs/2403.06144, 2024.
  • Yue et al. (2024)Murong Yue, Wijdane Mifdal, Yixuan Zhang, Jennifer Suh, and Ziyu Yao.Mathvc: An llm-simulated multi-character virtual classroom for mathematics education.CoRR, abs/2404.06711, 2024.
  • Zhang et al. (2023)Hongbo Zhang, Junying Chen, Feng Jiang, Fei Yu, Zhihong Chen, Guiming Chen, Jianquan Li, Xiangbo Wu, Zhiyi Zhang, Qingying Xiao, Xiang Wan, Benyou Wang, and Haizhou Li.Huatuogpt, towards taming language model to be a doctor.In EMNLP (Findings), pages 10859–10885, 2023.

Appendix A Example of Configuration File

In Listing 1, we show an example of the configuration file for a group of people with different educational levels.

Appendix B Meta Prompt for Generating Heterogeneous Background Settings

The meta prompt used for generating heterogeneous background settings can be found in Prompt B.

Appendix C The Number of Response Tokens

The statistics of the response tokens of agents when using different LLMs and system prompts can be found in Fig. 16.

Appendix D Responses of Agents with Different LLMs and System Prompts

The distributions of the reported numbers of agents with different LLMs and system prompts are shown in Fig. 17, which includes the average (AVG), minimum (MIN), maximum (MAX), standard deviation (STD), median (MED), and mode (MOD). Besides, in Fig. 18, Fig. 19, and Fig. 20, we report the distributions of the reported numbers of agents with different LLMs and prompts in a multi-round game.

Appendix E Case Studies

In this section, we provide individual-level case studies for a better understanding of how agents perform considerations and reasoning.

E.1 Case Studies on Agents with Different LLMs and System Prompts

For better understanding, we provide examples of the responses of different LLMs with Prompt 3.4 and Prompt 3.4, as shown in Example E.1 and Example E.1, respectively.

E.2 Case Studies on Agent Behaviors in A Multi-round Game

In a multi-round game, we observe that agents can consider the winner number from the previous round when reporting numbers in the current round. For example, as shown in Example E.2, some agents tend to report a number slightly smaller than the winner number, while others choose to report 2323\frac{2}{3}divide start_ARG 2 end_ARG start_ARG 3 end_ARG of the winning number from the previous round, as illustrated in Example E.2.

E.3 Case Studies on Detailed Instructions in System Prompts

Several case studies shown in Example E.3 and Example E.3 confirm the effectiveness of providing detailed instructions in system prompts. These instructions encourage agents to perform reasonable behaviors by guiding them to regard other agents as rational players (as instructed by Prompt 4.3) and to guess other agents’ strategies (as instructed by Prompt 4.3).

E.4 Case Studies on Generated Background Settings

We provide two generated background settings for agents, including a character who has a bachelor’s degree ( Example E.4) and a character who is an artist ( Example E.4). We use GPT-4o to generate these diverse yet detailed background settings.

E.5 Case Studies on Agents Characterized with Different Background Settings

We provide case studies on individual-level behaviors when providing detailed yet diverse background settings. For example, we show the responses of agents characterized as an elementary school student in Example E.5 and a character who gets the Ph.D. degree in Example E.5. From these cases, we observe that agents can effectively perform reasoning processes and make decisions according to their assigned roles.

Besides, as shown in Example E.5, we provide the profile of a professor in game theory to agents, and find that the agent tends to assume that others are not rational enough and adjust its behavior accordingly. However, we notice that agents might exhibit inconsistent behaviors when asked to play as a seven-year-old child, as shown in Example E.5.

E.6 Case Studies on Individual-level Simulations Involving Agents with A Mixture of LLMs

Example E.6 shows a typical response in the individual-level simulation involving a mixture of LLMs, where an agent report chooses a number slightly smaller than the winner number in the previous round.

E.7 Case Studies on Group-level Simulations Involving Agents with A Mixture of LLMs

We provide three examples in Example E.7, Example E.7, and Example E.7 to show the behaviors of agents in different groups. We observe that agents can perform reasoning processes considering the behaviors of other groups and how to contribute to their own group.

E.8 Case Studies on Agent Behaviors When Changing the Game Rules

In the variant game where we change the Nash equilibrium to 10, we observe that agents can give a good understanding of the game and correctly calculate the Nash equilibrium, as shown in Example E.8, indicating their powerful reasoning ability.

E.9 Case Studies on Mistakes in Agent Behaviors

We summarize some typical mistakes made by agents, including simple calculation errors, as shown in Example E.9, and making decisions that are inconsistent with the considerations, as shown in Example E.9. Although agents might make mistakes, they exhibit powerful reasoning abilities and show great potential for use in simulations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment