Kicking off this channel with this Synthetic Dataset Generator Notebook!
Think one of the most common things I do week over week is build notebooks for different use cases. And all of them require some sort of dataset / file to work off of to simulate that use case.
I’ve tried looking for Kaggle datasets - typically works alright, but usually datasets are really old, or doesn’t quite hit what I need. So I just ended up putting together a notebook that helps generate fairly decent synthetic datasets! https://julius.ai/s/notebooks/40e8c146-64fd-42f3-b8ff-f9b7bd861920
- First section - just work with Julius on explaining your use case, and it will ask you some clarifying questions to fit the dataset for your needs
- Middle sections - these will run various tests / evaluations on your dataset looking for artifacts, irregularities, and biases, and will suggest ways to fix them
- Finally you can select what sort of output you need and you’ll be good to go!
Give it a try! Let me know if you have any questions or feedback, and share the synthetic datasets you’re making – would love to check them out!