The Joy of Data Generation

A while ago I was talking about my Python-based data generation in the NeoForge Discord, when someone asked me why I didn’t use Forge’s data generation instead? The answer was simple: I didn’t know about it. I created a ticket to remind me to look into it at some point, and now I’ve finally gotten around to it.

I should have done this sooner than I have. It would have saved a lot of headaches further down the line. But, honestly, the reason I didn’t is because I already had a solution for data generation, and I didn’t think there would be much benefit to rewriting the whole thing. As it turns out, I was wrong.

Benefits


As I started implementing Forge’s data generation I started to realise that there were a few major benefits from doing things this way. These benefits make it worth implementing, despite it being a rewrite of code I had already written in Python.

Standard Practices

A major benefit is in following the standard way of doing things. Standards exist, in part, to help with collaboration and this is no exception. By using Forge’s native data generation, other programmers can expand upon or learn from the code I commit to my repository.

It also helps if I ever want to get help with my mod, or help other mods. Other Forge/NeoForge developers will be able to help improve/fix my code, and I’ll be able to do the same for others.

To me, this is a major benefit as it means I have learned how to be a better Minecraft modder.

Data Integrity

Forge’s data generation includes checks that help ensure your data is complete. If you create a blockstate that references models that don’t exist, or you reference missing textures, the data generation will throw an error. This is powerful, as you can discover errors in your data before you even run the game.

Given that there have been bugs with missing textures that have slipped through to actual releases of Bok’s Banging Butterflies, this is a tool that can help prevent these bugs from appearing in the first place.

Future Proofing

Minecraft’s data format changes as new version of the game are released. This is usually minor, but some versions can introduce sweeping changes, and you have to ensure your data fits with these new changes. With my Python code, I’ve had to update it to include version-specific code that updates the data properly.

With Forge’s data generation I don’t need to do that, because it does it for you. When the format changes between versions, you can simply run the data generation task again and all of these changes will be implemented for you. This means less code, and less data for me to maintain in the long run.

Sunk Cost


The work I’ve done up to this point isn’t a complete loss. I’ve left some of the more complex models and terrain generation out of the new data generation code, since it’s easier to just write it. I also left the localisation code out of it as well, since that really needs some human intervention to ensure the strings are correct.

The Python pipeline I’ve developed is still useful. I’ve stripped out the parts I no longer need, but it still handles code generation and image generation. It also generates placeholder strings for butterflies if they don’t already exist. It’s sad to lose the code I don’t need anymore, but in the long run this will make development a lot smoother.

No Release


While this is a pretty big change for the mod, I’m not planning a release for it just yet. CurseForge has rules against releasing mods with no new features. While technically this is a new feature, it is a rewrite of underlying code rather than a new butterfly or a new item. From a user’s perspective nothing has changed – the mod still plays the same as the last version would.

So for now, I’m holding off on releasing this code. I’ll wait until there’s something worth seeing for players before I create another release.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.