Analysing poker data

I’d like to analyze a bunch of poker hand histories and ask AI to look for patterns (f.e. player A is bluffing more with pot bet size, player B is value heavy when betting 2x pot, player C when acting within 2 seconds is bluffing 70% of time etc)

Any recommendation for what approach should I take here? What model should I use? How to train the model?

First, could you tell me a little bit about the data you collected? What variables have you collected in your dataset?

It’s hand history in txt format (I have hundreds thousands), f.e. here are 3 different hands:
PokerStars Hand #250705826750: Tournament #3740602279, $100+$9 USD Hold’em No Limit - Level I (50/100) - 2024/05/26 18:48:08 CET [2024/05/26 12:48:08 ET]
Table ‘3740602279 759’ 8-max Seat #2 is the button
Seat 1: letty2412 (3646 in chips)
Seat 2: mextrex (27901 in chips)
Seat 3: fabianod12 (22586 in chips)
Seat 4: litvinis (25546 in chips)
Seat 5: Aiden1623 (23909 in chips)
Seat 6: Pujîto111 (22909 in chips)
Seat 7: rossyrm (27442 in chips)
Seat 8: sosickPL (25000 in chips)
letty2412: posts the ante 12
mextrex: posts the ante 12
fabianod12: posts the ante 12
litvinis: posts the ante 12
Aiden1623: posts the ante 12
Pujîto111: posts the ante 12
rossyrm: posts the ante 12
sosickPL: posts the ante 12
fabianod12: posts small blind 50
litvinis: posts big blind 100
*** HOLE CARDS ***
Dealt to sosickPL [5s 9c]
Aiden1623: folds
Pujîto111: raises 150 to 250
rossyrm: folds
sosickPL: folds
letty2412: folds
mextrex: folds
fabianod12: raises 878 to 1128
litvinis: folds
Pujîto111: raises 2572 to 3700
fabianod12: calls 2572
*** FLOP *** [7s 6c As]
fabianod12: checks
Pujîto111: bets 2507
fabianod12: calls 2507
*** TURN *** [7s 6c As] [4h]
fabianod12: bets 4161
Pujîto111: calls 4161
*** RIVER *** [7s 6c As 4h] [Tc]
fabianod12: bets 12206 and is all-in
Pujîto111: folds
Uncalled bet (12206) returned to fabianod12
fabianod12 collected 20932 from pot
fabianod12: doesn’t show hand
*** SUMMARY ***
Total pot 20932 | Rake 0
Board [7s 6c As 4h Tc]
Seat 1: letty2412 folded before Flop (didn’t bet)
Seat 2: mextrex (button) folded before Flop (didn’t bet)
Seat 3: fabianod12 (small blind) collected (20932)
Seat 4: litvinis (big blind) folded before Flop
Seat 5: Aiden1623 folded before Flop (didn’t bet)
Seat 6: Pujîto111 folded on the River
Seat 7: rossyrm folded before Flop (didn’t bet)
Seat 8: sosickPL folded before Flop (didn’t bet)

PokerStars Hand #250705890645: Tournament #3740602279, $100+$9 USD Hold’em No Limit - Level II (60/120) - 2024/05/26 18:51:16 CET [2024/05/26 12:51:16 ET]
Table ‘3740602279 759’ 8-max Seat #3 is the button
Seat 1: letty2412 (3634 in chips)
Seat 2: mextrex (27889 in chips)
Seat 3: fabianod12 (33138 in chips)
Seat 4: litvinis (25434 in chips)
Seat 5: Aiden1623 (23897 in chips)
Seat 6: Pujîto111 (12529 in chips)
Seat 7: rossyrm (27430 in chips)
Seat 8: sosickPL (24988 in chips)
letty2412: posts the ante 15
mextrex: posts the ante 15
fabianod12: posts the ante 15
litvinis: posts the ante 15
Aiden1623: posts the ante 15
Pujîto111: posts the ante 15
rossyrm: posts the ante 15
sosickPL: posts the ante 15
litvinis: posts small blind 60
Aiden1623: posts big blind 120
*** HOLE CARDS ***
Dealt to sosickPL [Kd Kh]
Pujîto111: folds
rossyrm: folds
sosickPL: raises 240 to 360
letty2412: folds
mextrex: folds
fabianod12: folds
litvinis: folds
Aiden1623: folds
Uncalled bet (240) returned to sosickPL
sosickPL collected 420 from pot
sosickPL: doesn’t show hand
*** SUMMARY ***
Total pot 420 | Rake 0
Seat 1: letty2412 folded before Flop (didn’t bet)
Seat 2: mextrex folded before Flop (didn’t bet)
Seat 3: fabianod12 (button) folded before Flop (didn’t bet)
Seat 4: litvinis (small blind) folded before Flop
Seat 5: Aiden1623 (big blind) folded before Flop
Seat 6: Pujîto111 folded before Flop (didn’t bet)
Seat 7: rossyrm folded before Flop (didn’t bet)
Seat 8: sosickPL collected (420)

PokerStars Hand #250705903452: Tournament #3740602279, $100+$9 USD Hold’em No Limit - Level II (60/120) - 2024/05/26 18:51:54 CET [2024/05/26 12:51:54 ET]
Table ‘3740602279 759’ 8-max Seat #4 is the button
Seat 1: letty2412 (3619 in chips)
Seat 2: mextrex (27874 in chips)
Seat 3: fabianod12 (33123 in chips)
Seat 4: litvinis (25359 in chips)
Seat 5: Aiden1623 (23762 in chips)
Seat 6: Pujîto111 (12514 in chips)
Seat 7: rossyrm (27415 in chips)
Seat 8: sosickPL (25273 in chips)
letty2412: posts the ante 15
mextrex: posts the ante 15
fabianod12: posts the ante 15
litvinis: posts the ante 15
Aiden1623: posts the ante 15
Pujîto111: posts the ante 15
rossyrm: posts the ante 15
sosickPL: posts the ante 15
Aiden1623: posts small blind 60
Pujîto111: posts big blind 120
*** HOLE CARDS ***
Dealt to sosickPL [4s 9c]
rossyrm: folds
sosickPL: folds
letty2412: folds
mextrex: folds
fabianod12: folds
litvinis: raises 120 to 240
Aiden1623: raises 1020 to 1260
Pujîto111: folds
litvinis: calls 1020
*** FLOP *** [7s 5d 2c]
Aiden1623: checks
litvinis: checks
*** TURN *** [7s 5d 2c] [3c]
Aiden1623: checks
litvinis: bets 1380
Aiden1623: folds
Uncalled bet (1380) returned to litvinis
litvinis collected 2760 from pot
*** SUMMARY ***
Total pot 2760 | Rake 0
Board [7s 5d 2c 3c]
Seat 1: letty2412 folded before Flop (didn’t bet)
Seat 2: mextrex folded before Flop (didn’t bet)
Seat 3: fabianod12 folded before Flop (didn’t bet)
Seat 4: litvinis (button) collected (2760)
Seat 5: Aiden1623 (small blind) folded on the Turn
Seat 6: Pujîto111 (big blind) folded before Flop
Seat 7: rossyrm folded before Flop (didn’t bet)
Seat 8: sosickPL folded before Flop (didn’t bet)

Oh wow, thanks for the snippet. Okay, first step would probably be data parsing and preparation, where you extract relevant info from the hand history (actions, bet sizes, pot sizes, outcomes) and then structure it into a dataset/data frame.

The next step I would take is create features from the raw dataframe you created to capture important aspects of the players behaviours and game dynamics. Some feature examples can be:

  1. bet size percentage: calculate the size of the player’s bet relative to the current pot size. This can indicate aggressive betting.
  2. time to act: how quickly player makes decisions, revels confidence or hesitation.
  3. player position: encode the players position relative to the dealers button
  4. previous actions: encode the seq. of actions taken by the player in current hand (check, bet, fold).

Model selection would be next. There are a couple you can choose from based on what you want. For example:
Recurrent Neural Networks (RNNs) or LSTMs: these are good for sequence modelling and can capture patterns over a series of actions within hand history.
Decision Trees, Random Forests, or Gradient Boosting Models: these are great for classification tasks (such as detecting bluffing behaviour) and regression tasks (predicting bet sizes).

Once you have selected a particular model, as you know you have to train it on your prepared data. You first want to split the data, dividing it into training and validation sets. You can ask Julius to help split your dataset into these two sets. Then you would train it on the model that you have chosen. This is where you feed the engineered features you created into the model type. You should be able to ask Julius to do this as well, and it will run it effectively.

Finally, you would evaluate the performance. You can do this examining the accuracy, precision, or specific metrics that are related to poker analysis (bluff detection acc.) to assess the performance.

Below is a detailed example made in Python code:

Assuming you have parsed and structured your data into a DataFrame 
`poker_data`

#Feature engineering (example features)
poker_data['bet_size_pct'] = poker_data['bet_size'] / poker_data['pot_size']
poker_data['time_to_act'] = poker_data['action_end_time'] - 
poker_data['action_start_time']

#Define features and target variable
features = ['bet_size_pct', 'time_to_act', 'player_position', 'previous_action']
target = 'bluffing_label'  # Example: binary label (0 or 1) indicating bluffing 
behavior

#Split data into train and validation sets
from sklearn.model_selection import train_test_split

train_data, val_data = train_test_split(poker_data, test_size=0.2, 
random_state=42)

#Example model training (using RandomForestClassifier as an example)
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(train_data[features], train_data[target])

#Predict on validation set
predictions = model.predict(val_data[features])

#Evaluate model performance
accuracy = accuracy_score(val_data[target], predictions)
print(f"Accuracy: {accuracy}")

#Further analyze feature importance
feature_importance = pd.Series(model.feature_importances_, 
index=features)
print("Feature Importance:")
print(feature_importance)

I hope this helps clear things up a bit on how to move forward with your data.