CrewAI ReadMe Generator, a fully customised Crew

A fully customised CrewAI project to generate a README file from a local folder

Updated May, 2024

Introduction

I've battled with this one for a while, probably just about long enough for Google to have thought I'd quit the blogging game, I'm still here, still battling.

There are plenty of examples at João Moura's crewAI-examples GitHub repository, having a trip planner is pretty cool but I wouldn't call it personally useful, of course these are designed as starter projects and they're great for that, but I wanted to do something a little more useful for me which might be useful to others, that would also serve as a learning experience for me and maybe unlock more ideas about what could be possible.

In this search for something useful, and the battles I've had with it, I've wasted a lot of time, crashed my old computer a few times, span up a buggy Ubuntu machine on an M1 Mac, which kinda worked for a while but was difficult to live with, it crashed a few times per day, lost its network connection if the Mac ever went to sleep and any apps had to work on arm64, not a great experience.

That's when I knew I needed a more powerful system. I didn't have thousands of pounds to throw at a high spec AI/gaming rig, but I had enough to create something that runs quiet and is fairly efficient, enough to run a local 7B model.

The Plan

I write the odd README when it's required but I'm not as diligent as I could be, often I'm so caught up with smashing stuff out that a README is an afterthought, so the idea I settled on is something to index a local folder, inspect the files and directories in it, and have a crew summarise the contents, talk among themselves and pump out a README file written in Markdown.

The Project

There seems to be a few ways to lay things out. To be honest, when I was first getting my head around all this I used a single file and built it up bit by bit, only once I had that working did I split it out. In the end the project looks like this:

.
├── agents
│   ├── analyse_files_agent.py
│   ├── __init__.py
│   ├── order_files_agent.py
│   └── readme_generator_agent.py
├── config.json
├── main.py
├── requirements.txt
├── tasks
│   ├── analyse_files_task.py
│   ├── generate_readme_task.py
│   ├── __init__.py
│   └── order_files_task.py
├── tools
│   ├── analyse_files.py
│   ├── __init__.py
│   ├── list_files.py
│   ├── search.py
│   └── write_readme.py
└── utils
    ├── file_utils.py
    └── __init__.py

The Crew

Here's a quick breakdown of what the files do with some examples:

main.py kicks it all off, there's a few standard imports and it imports the all 3 agents, all 3 tasks and the 2 utils that are defined in the file_utils.py file. It defines the crew, specifies the agents and tasks, then there's some Python that gets user inputs, stores them in the config.json file (so it can be run again without making you type things out again), and it lists out the files of the given path, which is written out to ./all_files.json

Seeing as this is all about "Agentic AI", it would seem logical to start with he Agents, but if you follow the trail of imports, it's actually the Tasks that are top of the tree, it's just that the Agents do the communicating. So here we go in a possibly unconventional, but to my brain logical, order...

Tasks

There's a one to one relationship between tasks and agents in this project, I'm guessing that, because a task defines which agent it should use, then you could have multiple tasks calling on the same agent, but not the other way around?

This is what the analyse_files_task.py file looks like:

from crewai import Task
from agents.analyse_files_agent import AnalyseFilesAgent

analyse_files_task = Task(
  description="""Analyse the content of specific files in the project to extract key information and insights.
  Files should be analysed one at a time ot prevent hitting LLM limits""",
  expected_output="""For each file, you should output the name and a brief explanation of any functions or similar constructs in the file.
                  Check for any dependencies that may be present, including references to elements in other files in the project.
                  Include a summary of the file as a whole.""",
  agent=AnalyseFilesAgent
)

I also have an order_files_task and a generate_readme_task, I'll explain how these all hook together below.

Agents

The OrderFilesAgent has a go at ordering the files, without any context as to their content by the way, just from the file names, I'm not sure this step is actually necessary, but I think it's good to prove that you can define and agent and force them to stay within their boundaries.

I swear the OrderFilesAgent, when I first ran this thing, was asking the AnalyseFilesAgent to share the file contents so it could perform its task, which was doubling up on the files being read, all adding to the tokens being used by the AI API.

This is the code in the order_files_agent.py:

import os
from crewai import Agent
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
from tools.list_files import list_files

load_dotenv()

ClaudeHaiku = ChatAnthropic(
  model="claude-3-haiku-20240307"
)
ClaudeSonnet = ChatAnthropic(
  model="claude-3-sonnet-20240229"
)

ANTHROPIC_API_KEY = os.environ.get('ANTHROPIC_API_KEY')

OrderFilesAgent = Agent(
  role='File Prioritise Agent',
  goal='Order files based on importance and relevance.',
  backstory="""You are a file prioritisation expert with a keen eye for detail.
    Your task is to order the files in the project directory based on their importance and relevance.
    You should rely solely on the `list_files` tool to list the project files out, based on their name and file type, you should recognise the type of project and order them accordingly.
    Your output should be in the same format as the input list, including all files and their paths, if that was what was passed to you.
    Pass this information on to the `AnalyseFilesAgent` for further analysis.""",
  verbose=True,
  allow_delegation=True,
  llm=ClaudeHaiku,
  max_iter=5,
  memory=True,
  tools=[list_files]
)

As you can see, among other things, in each of the agent files we're importing the CrewAI Agent, ChatAnthropic from langchain and defining the Claude models to use. We also explicitly tell the agent that it should only use the list_files tool, which just reads ./all_files.json, the agent should then do it's best to process that list.

The other agents, AnalyseFilesAgent and ReadMeGeneratorAgent use the same format, while being given different tools to use.
One thing that surprised me was that I could tell the Agent in plain English, that when it uses a tool, it should call it in a certain way, when I first built this thing, I didn't have that in the backstory and I wasn't getting the information passed to the tools.

This is a snippet from the AnalyseFilesAgent backstory:

backstory="""A specialist in file analysis, adept at extracting key insights from code and documentation.
    Your role is to provide valuable information from the project files for the README compilation.
    Only use the search function if you need to look up additional information.
    You should be particularly interested in and dependencies between files and any complex data manipulation that requires explanation.
    Your output should be a summary of key information extracted from the project files.
    You should look at one file at a time, commit the output of the `analyse_files_task` to memory before moving on to the next file.
    To use the analyse_files tool, you should call it with `analyse_files(file_name)` where `file_name` is the name of the file you want to analyse.
    Once collated, pass this information on to the `ReadMeGeneratorAgent` for README generation.""",

Tools

The tools, in this project, are the things that let the agents read files, write out to files and, if needed, search the internet.

I found it interesting that, in an early iteration of this, just handing the Agents a tool without any instructions on how to use it meant that they were not actually able to read the files, or write out to files. Adding instructions to their backstory like this seemed to make it work To use the analyse_files tool, you should call it with `analyse_files(file_name)` where `file_name` is the name of the file you want to analyse.

This is my analyse_files.py tool:

import json
import time
from crewai_tools import tool, FileReadTool
from pathlib import Path


@tool('analyse_files')
def analyse_files(file_name):
  """Analyse a single file in the project by fetching the base path from config and appending it to the filename."""
  try:
    # Load the base path from the config file
    with open('config.json', 'r') as config_file:
      config = json.load(config_file)
    base_path = config.get('last_path', '')

    # Construct the full path to the file
    full_path = Path(base_path) / file_name

    # Introduce a delay to manage rate limits
    time.sleep(8)

    # Perform the file analysis
    file_tool = FileReadTool(file_path=str(full_path))
    content = file_tool.run()
    return {file_name: content}
  except FileNotFoundError:
    print(f"Error: 'config.json' not found or '{file_name}' does not exist.")
    return {file_name: None}
  except json.JSONDecodeError:
    print("Error: JSON decoding failed. Check the format of 'config.json'.")
    return {file_name: None}
  except Exception as e:
    print(f"Error analysing file {full_path}: {e}")
    return {file_name: None}

It's a simple tool, it reads the base path from the config file, appends the file name to it, reads the file and returns the content. There's a sleep in there to manage rate limits, I found that I was hitting the API limits when I first built this and I had to slow things down.

The Result

I ran this against a fairly small amount of terraform code from my Advanced Terraform Wrap-up, the results were... OK, nothing more, it was a starting point to be built on, but it was far from being a useful README.

This is what it gave me:

# Project README

## Overview

This project is managed using Terraform and deploys resources across multiple AWS regions, including `eu-west-1` (Dublin), `us-east-1` (Virginia), and `ap-south-1` (Mumbai).

## Key Components

### Terraform Configuration

- The `main.tf` file sets the required Terraform version (1.7.5 or higher) and configures multiple AWS provider blocks, each with a different region and profile.

### Variables and Local Values

- The `variables.tf` file defines a single variable, `environment`, with a default value of `"dev"`. It also sets up a set of global tags in the `locals` block, including `Environment`, `Department`, and `Owner`.

### EC2 Instances

- The `ec2.tf` file is responsible for creating EC2 instances in the different regions. It looks up the latest Ubuntu 22.04 ARM64 AMI in each region and uses a `for_each` loop to create the instances based on the `local.ec2_map` variable.
- The instances are configured with various attributes, such as the AMI, instance type, subnet, IAM instance profile, user data, key pair, and root block device settings.
- The instances are tagged with the `local.global_tags` and additional tags, such as `Name`, `ShutdownTime`, and `StartTime`.

### Virtual Private Clouds (VPCs)

- The `vpc.tf` file manages the creation of VPCs in the different regions. It uses a `for_each` loop to create the VPCs based on the `local.vpc[var.environment]` variable.
- Each VPC is configured with a CIDR block, DNS support, and DNS hostnames.
- The VPCs are tagged with the `local.global_tags` and a `Name` tag.

### Public Subnets and Route Tables

- The `subnets.tf` file creates public subnets and route tables in the different regions.
- It uses a `for_each` loop to create the public subnets based on the `local.subnet_map` variable, with each subnet configured with a VPC ID, CIDR block, and availability zone.
- The file also creates public route tables and associates them with the public subnets.

## Conclusion

This Terraform project manages the deployment of resources, including EC2 instances, VPCs, and public subnets, across multiple AWS regions. The configuration is driven by various variables and local values, providing a flexible and reusable solution.

The project's unusual or complex aspects include:

1. **Multi-region Deployment**: The project deploys resources across three different AWS regions: Dublin, Virginia, and Mumbai. This requires careful configuration of the AWS providers, VPCs, and subnets to ensure consistent and reliable infrastructure.

2. **Dynamic EC2 Instance Creation**: The `ec2.tf` file uses a `for_each` loop to dynamically create EC2 instances based on the `local.ec2_map` variable. This allows for easy scaling and modification of the EC2 infrastructure without having to manually update the Terraform configuration.

3. **Dynamic VPC and Subnet Creation**: Similar to the EC2 instances, the `vpc.tf` and `subnets.tf` files use `for_each` loops to dynamically create VPCs and subnets based on the `local.vpc` and `local.subnet_map` variables. This adds flexibility and reusability to the infrastructure deployment.

4. **AMI Lookup**: The project uses the `aws_ami` data source to lookup the latest Ubuntu 22.04 ARM64 AMI in each region. This ensures that the EC2 instances are always provisioned with the most up-to-date base image.

Overall, this Terraform project demonstrates a robust and flexible multi-region infrastructure deployment, with several advanced features that may require additional explanation or documentation for users unfamiliar with the project.

Conclusion

This was my first attempt at using any Agentic AI, It was an interesting experiment, and using Claude Haiku cost next to nothing, it would go down by 2 or 3 cents per run