Synthetic Dataset Generator!

Kicking off this channel with this Synthetic Dataset Generator Notebook!

Think one of the most common things I do week over week is build notebooks for different use cases. And all of them require some sort of dataset / file to work off of to simulate that use case.

I’ve tried looking for Kaggle datasets - typically works alright, but usually datasets are really old, or doesn’t quite hit what I need. So I just ended up putting together a notebook that helps generate fairly decent synthetic datasets! https://julius.ai/s/notebooks/40e8c146-64fd-42f3-b8ff-f9b7bd861920

  • First section - just work with Julius on explaining your use case, and it will ask you some clarifying questions to fit the dataset for your needs
  • Middle sections - these will run various tests / evaluations on your dataset looking for artifacts, irregularities, and biases, and will suggest ways to fix them
  • Finally you can select what sort of output you need and you’ll be good to go!

Give it a try! Let me know if you have any questions or feedback, and share the synthetic datasets you’re making – would love to check them out! :100:

1 Like