The New Butterfly Pipeline

I use a Python script to help generate code, data, and textures for Bok’s Banging Butterflies. However, this script was starting to get unwieldy. At more than 500 lines it was getting harder to find certain sections when modifications needed to be made.

I’ve deleted the file from the project now, but if you look in the project’s history, you can see how long the file was getting. So, I decided that things would be a lot nicer if I separated the code out into files that made logical sense. What I ended up with was the Pipeline, a set of scripts each generating a different part of the data needed for the mod.

With this separation of concerns, it is much easier to find parts of the script doing specific tasks and modify them if needed. This is especially useful for version-specific data and code generation, as it makes it obvious where there are differences in each script.

The entire set of pipeline scripts can be found under the pipeline folder in my GitHib repository. I’m going to go through each file and explain what its responsibilities are, as well as any useful features they have.

Config

The Config script is where most of the constants and logging are set up for the pipeline. It lists colours, flowers, and paths used in various parts of the script. This is a useful file, as many of the data and asset paths changed in 1.21.4, so we can just use a slight modified config file for versions after that.

One thing you may notice is the use of Path instead of raw strings, as in this snippet:

        # === File and Directory Paths ===
        self.ACHIEVEMENTS = Path("resources/data/butterflies/advancements/butterfly/")
        self.BIOME_MODIFIERS = Path("resources/data/butterflies/forge/biome_modifier/")
        self.BUTTERFLY_DATA = Path("resources/data/butterflies/butterfly_data/")

All the scripts in the pipeline now use the pathlib library to determine file/folder paths. This is a much more robust solution than using strings, as it will intelligently handle any concatenations for us, ensuring the path is always correct.

Data Generation

The Data Generation script handles the generation of JSON data for all butterflies. There are three main functions in this file, one to generate a list of butterflies, one to generate data, and one to make them frog food.

Butterfly Lists

The generate_butterfly_list() method iterates over files in a folder and generates a list of species based on the filenames. This is intended to be called on the folders within butterfly_data, so we can generate lists of butterfly species, moth species, special species and variant species.

Data Files

The generate_data_files() method can be used with these lists in order to generate data files for all entities in the mod. The way this method works is that it takes the base species (first in the list), and copies any data files it has for the other species. It never overwrites anything, so if a data file is modified after the fact, the changes are kept.

Since loot tables are only used for a couple of species, this script specifically ignores any loot tables that it may come across when generating data.

Frog Food

The generate_frog_food() method adds all species to the frog_food.json file, tagging them as frog food. This method also checks the butterflies for the INEDIBLE trait, and skips over them if they have it.

Localisation

The Localisation script generates localisation strings for all butterfly species. It only supports English at the moment, but if there is interest more languages could be added.

The script iterates over a species list created using the Data Generation script, and adds the needed localisation strings. Like the data generation, it doesn’t overwrite strings that already exist so the output can be modified without risk of losing changes.

Advancements

The Advancements generates advancements based on templates in the folder declared in the Config. This means that advancements that require all butterflies/moths, or one of any species will work without any manual editing. Unlike with data and localisation, advancements will always be overwritten, meaning any changes need to be made in the templates before regeneration.

Code Generation

The Code Generation script generates constants that are used in code rather than read in through a datapack. This is useful for certain things that happen before a datapack is loaded, such as registering items and entities.

The _load_species_data() method preloads the butterfly data so that it doesn’t need to be loaded multiple times. It’s already on my TODO list to move this out of this class, as the butterfly data is loaded in multiple places across multiple files, and it would be more efficient to only load it once.

The _write_enum_array() method helps when writing out an array. While it is only used once in the main branch, it is reused in 1.18.2 to write constants for spawn data, since biome modifiers didn’t exist in that version. It’s also reused in 1.21.4, since some systems have been changed to load before datapacks, meaning the data isn’t available as it was in prior versions.

Both of these methods are used by generate_code() to write out a simple ButterflyInfo class that contains all the constants needed to build code to support all species in the mod.

Biome Modifiers

The Biome Modifiers script generates biome modifiers from templates similar to how advancements work. Like advancements, this overwrites any files that are already there, so changes need to be made in the templates first.

The add_spawns() file converts habitats to tags, and then adds spawns to all the relevant biome modifiers based on the result.

In earlier versions of Minecraft, references to the Cherry Grove are removed, and in 1.18.2 this script is completely removed, since spawns are setup in code rather than through a datapack.

Entry Point

Bringing it all together we have the Main script, which is also the entry point into our pipeline. This generates the lists of species, then runs through all the steps to generate code and data from the other scripts detailed above.

Future Plans

That’s all the scripts contained in the pipeline so far. These scripts save a lot of time, as sometimes there are over 200 files that need creating, and this would be a momentous task to do by hand. There are still some improvements I want to make in the future, however.

Pre-load Species Data

In the Data Generation script I preload the species data, which is great as it save time loading the data multiple times. However, this data could be preloaded in the main script and passed to the others, since it is also loaded in many other places. Doing so would vastly improve the efficiency of the script.

Image Generation

Last week I wrote up how I generated images for all the spawn eggs in the game using Python. Whenever I add new species, they will also need new images generated as well. I plan to move these image generation scripts into the main pipeline, so that all generation is in the same place.

Automation

This one will need a bit of testing, but it’s possible that these scripts could be automatically run through GitHub’s Actions. Doing so would mean that I wouldn’t have to do it manually when I port the game, and would also prevent releasing a version of the game with incorrect data.

What started as a simple script to transform some data has turned into a project all of its own. I find it fun to write stuff like this, and I love how powerful the results can be. If you have any suggestions for improving these scripts, or use scripts like these in your own pipeline, I’d love to hear about them in teh comments below!