Advice for Data Cleaning - Standardizing and aggregating names

Hey guys,

I have a question about constructing a prompt for identifying and aggregating data.
I have an Excel file with a list of raw names and want a new column with standardized names. Here’s a snapshot of the ideal solution that Julius might help with, where it could identify if names can be standardized/aggregated by checking adjacent entries and adding to a new column. For example, “albright, benjamin b.” has variations like “albright, benjamin” and “albright, benjamin b”, and I hope Julius could compare and output the more complete version, which is “albright, benjamin b.”. If a name doesn’t need aggregation, like “agrawal, arpana” in the snapshot, it should be copied to the new column as is. Does anyone have any good ideas for constructing an effective prompt for this?
Screenshot 2024-06-27 at 2.27.49 PM

Hi can you share the dataset in google drive please?

You actually may have a good prompt in this post: “…where it could identify if names can be standardized/aggregated by checking adjacent entries and adding to a new column.”
Have you tried asking Julius to do this with this prompt? For example, “can you identify names that can be standardized/aggregated by checking adjacent entries and adding them to a new column?”.

Absolutely! Here’s the link:Loading Google Sheets
Thanks a lot!

Yes, I’ve tried that but it ended up incorrectly aggregating all names with the same last name to one specific name, here’re some of the wrong entries it returned: adams, j.c. adams, elizabeth troutman
adams, leslie b. adams, elizabeth troutman
adams, marci adams, elizabeth troutman
adams, marie adams, elizabeth troutman
adams, molly adams, elizabeth troutman
adams, ursula adams, elizabeth troutman

Ignore my previous file
Consider this please WeTransfer - Send Large Files & Share Photos Online - Up to 2GB Free

1 Like

I clicked the link and it directed me to the registration page for this website…

Just click agree n proceed to load the file

1 Like

Wow, it worked perfectly! Would you mind sharing how you formulated the prompt?
Many thanks!

Hey Mahmed,

Thank you for quickly providing this standardized dataset. I’d appreciate tips on constructing effective prompts to achieve this output.​​​​​​​​​​​​​​​​

Thanks for sharing this information Mahmed! This is super useful, I didn’t know that something like this existed.

Most and always welcome

1 Like