Baseball Data Visualizations with Julius

This one dedicated to HoopsGPT :basketball: . Real ones know. :fist_left: :fist_left: :v:

This post is a detailed account of my conversation with Julius, specifically focusing on analyzing Ronald Acuña Jr.'s batting performance in the 2023 season. Here’s a technical walkthrough of our conversation, including code snippets and methodologies

Querying the Top Batter of 2023

My first task for Julius was straightforward: identify the top batter of 2023 by batting average. Julius utilized the batting_stats function from the pybaseball library, pinpointing Ronald Acuña Jr. as the standout performer.

from pybaseball import batting_stats

# Fetch batting statistics for 2023
batting_stats_2023 = batting_stats(2023)
# Identify the top batter
top_batter = batting_stats_2023.loc[batting_stats_2023['xBA'].idxmax()]
print(top_batter[['Name', 'xBA']])

Visualizing Batting Event Distribution

To delve deeper, I requested a pie chart showing the distribution of Acuña Jr.'s batting events—singles, doubles, triples, and home runs. Julius provided two iterations: first, a basic distribution by percentages, and second, an enhanced version with actual counts alongside percentages.

I did verify this data on and he indeed had 41 Home Runs in the 2023 season

Advanced Analysis: Hit Landing Locations

The next challenge was to calculate and visualize the landing locations of hits. Given the complexity of accurately modeling baseball trajectories, Julius proposed a simplified model using direct distances and angles from the pitch data. The formula applied here is a basic projection assuming a straight-line trajectory, which is not entirely accurate due to neglecting factors like air resistance and spin effects.

import numpy as np

# Filter for rows where a hit occurred
hits_data = statcast_data[statcast_data['events'].isin(['single', 'double', 'triple', 'home_run'])]

# Simplified calculation of landing locations (assuming straight trajectory)
# For simplicity, we'll use hit_distance_sc directly as the distance
hits_data['landing_x'] = np.cos(np.radians(hits_data['launch_angle'])) * hits_data['hit_distance_sc']
hits_data['landing_y'] = np.sin(np.radians(hits_data['launch_angle'])) * hits_data['hit_distance_sc']

# Plotting
plt.figure(figsize=(15, 10), facecolor='white')
for event in ['single', 'double', 'triple', 'home_run']:
    subset = hits_data[hits_data['events'] == event]
    plt.scatter(subset['landing_x'], subset['landing_y'], label=event, s=50)

plt.axhline(0, color='black')
plt.axvline(0, color='black')
plt.title('Hit Landing Locations for Ronald Acuña Jr. in 2023')
plt.xlabel('Distance from Home Plate (feet)')
plt.ylabel('Lateral Distance (feet)')

Note: The simplified calculation of landing locations omits complex physics, a limitation that was necessary for our analysis but worth acknowledging. That is why this plot seems off and fixing it requires some additional trajectory modeling work.

Enhancing Visualization with a Baseball Diamond

Realizing the value of context, I suggested overlaying a baseball diamond on the hit location plot. Julius adapted quickly, adding a function to draw the diamond, thereby providing a clearer visualization of hit distances relative to standard baseball field positions.

# Function to draw a baseball diamond
def draw_baseball_diamond(ax):
    # Coordinates for the baseball diamond
    home_plate = (0, 0)
    first_base = (90, 90)
    second_base = (0, 180)
    third_base = (-90, 90)
    bases = [home_plate, first_base, second_base, third_base, home_plate]

    # Draw the diamond
    x, y = zip(*bases)
    ax.plot(x, y, 'k-', linewidth=2)

Refinement: Using hc_x and hc_y for Accuracy

To further refine our analysis, I requested Julius to recreate the hit spray chart using hc_x and hc_y columns from the data, aiming for a more accurate representation of hit locations. This adjustment led to a significantly improved visualization.

Wrapping Up

This deep dive with Julius, from extracting top player statistics to intricately plotting hit locations, underscores the incredible potential of leveraging AI tools for sports analytics. The ability to generate custom analysis and visualizations without writing extensive code myself highlights the evolving landscape of data analysis, where technology enhances our ability to uncover insights.

Despite the technical challenges and simplifications required, the pathway to gathering this data and creating simple and more complicated visualizations has never been simpler, providing a comprehensive look at how data analysis tools like Julius are reshaping our approach to sports analytics.

In conclusion, our interaction not only offered detailed insights into Acuña Jr.'s 2023 season but also demonstrated the practical application of AI in simplifying and advancing the field of baseball analytics.

Keywords: AI, GPT 4, Claude 3, Julius, Data Analysis, Data Visualization, Sports, Baseball


I didn’t realize you could pull in baseball stats using Python — really cool!


Oh yeah definitely. And they go in incredible detail too, plus Julius is very familiar with this library so it’s very easy to use.


I didn’t even know about this python module. It’s amazing how powerful the python module ecosystem is that you can have such niche module.

Thanks for sharing, I learned something new today :100:


yo same didnt know about that one. theres fr so many good python libraries for AIs to play with. a lot of times i have julius use


to generate me snippets for text embeddings


imagine if brad pitt had julius in moneyball