I have used Julius for cleaning data, providing Python code that I can tweak or adjust, and provide some initial visualisations, even seeing what approach Julius suggests. Julius has performed amazingly well on many things, but has struggled with complex analysis that takes some time to fit (often timers out or had issues with the compatibility between packages). My workaround has been to use the code generated by Julius and running the code myself outside of Julius in a Python environment, tweaking the code of necessary. At other times I have translated Python code to R code and run in an R environment. What workarounds have others used to navigate difficulties?
Hi Tony, thanks a lot for the question. Glad to hear you’re liking Julius. I would love to get more details on the issues you’re running into to see if we can propose a workaround or build some of those thing into the product for you:
-
What packages do you face compatibility issues with? You should be able to tell Julius to install a certain version of a package or install a completely new packages, although a caveat there is that the AI models have a knowledge cutoff date (for example August 2023) so they have seen packages up until a certain date. We try our best to pre-install popular packages in Julius’s code enviroments at versions that are compatible with the model’s knowledge cutoff as well as compatible with other package versions.
-
We are looking into increased timeout as a configurable option. How intensive are the workloads that you are running? Would a 15 minute timeout instead of 10 minute timeout be succificient?
In the meantime, your workaround of using Julius’s generated code into your own python environment makes sense.
Looking forward to hearing more details
Hi Rahul,
Thank you very much for your reply. I will give some examples of models Julius struggled to fit and the errors generated in case this helps.
1) I ran the following prompt following a frequentist/standard regression.
“Could you perform a Bayesian regression on the same data and provide direct probability statements for the slope and 95% credible intervals”
I gave no priors as I wished to keep it as simple as possible with software defaults. It is a relatively straightforward analysis done in rstan, brms (R), rstanarm (R) etc. I had not run Bayesian analysis in Python prior to this, but there are a number of suitable packages e.g. pymc3, pystan, ArviZ etc.
pymc was used:
import pymc3 as pm
import theano.tensor as tt
#Define the model
with pm.Model() as bayesian_model:
Priors for unknown model parameters
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10)
sigma = pm.HalfNormal('sigma', sigma=1)
# Expected value of outcome
mu = alpha + beta * X_train.values.squeeze()
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=y_train.values)
# Posterior distribution
trace = pm.sample(2000, return_inferencedata=False)
# Summary of the posterior distribution for each parameter
summary = pm.summary(trace)
# 95% credible intervals
credible_intervals = pm.hpd(trace, hdi_prob=0.95)
print(summary)
print('\
95% Credible Intervals:\
', credible_intervals)
Errors
ModuleNotFoundError Traceback (most recent call last)
/tmp/ipykernel_45/1761239941.py in
----> 1 import pymc3 as pm
2 import theano.tensor as tt
3
4 # Define the model
5 with pm.Model() as bayesian_model:
ModuleNotFoundError: No module named ‘pymc3’
Also, evaluate what went wrong and consider options for how to fix it.
!pip install pymc3==3.11.4
The model ran again with the following error:
Errors
ModuleNotFoundError Traceback (most recent call last)
/tmp/ipykernel_45/3402853092.py in
1 import pandas as pd
2 import numpy as np
----> 3 import pymc3 as pm
4 import theano.tensor as tt
5
ModuleNotFoundError: No module named ‘pymc3’
Also, evaluate what went wrong and consider options for how to fix it.
I ran the model in a different environment.
2) I had performed cluster analysis on this data in R and Python previously but wished to explore more options. I used the following prompt “perform cluster analysis on V_Value, W_Value, X_Value, Y_Value, Z_Value columns”. Variable names have been modified here. The dataset was quite large.
Errors
ModuleNotFoundError Traceback (most recent call last)
/tmp/ipykernel_45/1894133855.py in
1 import pandas as pd
2 from sklearn.preprocessing import StandardScaler
----> 3 from genieclust import Genie
4 from sklearn.metrics import silhouette_score
5 import matplotlib.pyplot as plt
ModuleNotFoundError: No module named ‘genieclust’
Also, evaluate what went wrong and consider options for how to fix it.
!pip install genieclust==0.9.8
Errors
TypeError Traceback (most recent call last)
/tmp/ipykernel_45/1894133855.py in
1 import pandas as pd
2 from sklearn.preprocessing import StandardScaler
----> 3 from genieclust import Genie
4 from sklearn.metrics import silhouette_score
5 import matplotlib.pyplot as plt
/opt/conda/lib/python3.9/site-packages/genieclust/init.py in
25
26
—> 27 from . import plots
28 from . import inequality
29 from . import tools
/opt/conda/lib/python3.9/site-packages/genieclust/plots.py in
31 # module globals:
32 col = [“#000000”, “#DF536B”, “#61D04F”, “#2297E6”, “#28E2E5”, “#CD0BBC”, “#F5C710”] +
—> 33 [matplotlib.colors.to_hex(c) for c in plt.colormaps[“tab10”].colors] +
34 [matplotlib.colors.to_hex(c) for c in plt.colormaps[“tab20”].colors] +
35 [matplotlib.colors.to_hex(c) for c in plt.colormaps[“tab20b”].colors] + \
TypeError: ‘function’ object is not subscriptable
Also, evaluate what went wrong and consider options for how to fix it.
from genieclust.genie import Genie
TypeError Traceback (most recent call last)
/tmp/ipykernel_45/2188656598.py in
----> 1 from genieclust.genie import Genie
/opt/conda/lib/python3.9/site-packages/genieclust/init.py in
25
26
—> 27 from . import plots
28 from . import inequality
29 from . import tools
opt/conda/lib/python3.9/site-packages/genieclust/plots.py in
31 # module globals:
32 col = [“#000000”, “#DF536B”, “#61D04F”, “#2297E6”, “#28E2E5”, “#CD0BBC”, “#F5C710”] +
—> 33 [matplotlib.colors.to_hex(c) for c in plt.colormaps[“tab10”].colors] +
34 [matplotlib.colors.to_hex(c) for c in plt.colormaps[“tab20”].colors] +
35 [matplotlib.colors.to_hex(c) for c in plt.colormaps[“tab20b”].colors] + \
TypeError: ‘function’ object is not subscriptable
Also, evaluate what went wrong and consider options for how to fix it.
The errors continued.
Similar errors have arisen e.g.
Errors
AttributeError Traceback (most recent call last)
/tmp/ipykernel_45/2149710847.py in
----> 1 import pandas as pd
2 from tqdm import tqdm
3
4 tqdm.pandas()
5
/opt/conda/lib/python3.9/site-packages/pandas/init.py in
133 )
134
→ 135 from pandas import api, arrays, errors, io, plotting, testing, tseries
136 from pandas.util._print_versions import show_versions
137
/opt/conda/lib/python3.9/site-packages/pandas/testing.py in
4
5
----> 6 from pandas._testing import (
7 assert_extension_array_equal,
8 assert_frame_equal,
/opt/conda/lib/python3.9/site-packages/pandas/_testing/init.py in
977
978
→ 979 cython_table = pd.core.common._cython_table.items()
980
981
AttributeError: partially initialized module ‘pandas’ has no attribute ‘core’ (most likely due to a circular import)
HINT: Print out the columns and check for spelling or type issues.
Also, evaluate what went wrong and consider options for how to fix it.
Errors
AttributeError Traceback (most recent call last)
/tmp/ipykernel_45/692730887.py in
1 get_ipython().run_line_magic(‘reset’, ‘-f’)
----> 2 import pandas as pd
3 from tqdm import tqdm
/opt/conda/lib/python3.9/site-packages/pandas/init.py in
133 )
134
→ 135 from pandas import api, arrays, errors, io, plotting, testing, tseries
136 from pandas.util._print_versions import show_versions
137
/opt/conda/lib/python3.9/site-packages/pandas/testing.py in
4
5
----> 6 from pandas._testing import (
7 assert_extension_array_equal,
8 assert_frame_equal,
/opt/conda/lib/python3.9/site-packages/pandas/_testing/init.py in
977
978
→ 979 cython_table = pd.core.common._cython_table.items()
980
981
AttributeError: partially initialized module ‘pandas’ has no attribute ‘core’ (most likely due to a circular import)
HINT: Print out the columns and check for spelling or type issues.
Also, evaluate what went wrong and consider options for how to fix it.
I apologize for the inconvenience. It seems there is a persistent issue with the Python environment that is preventing the pandas library from functioning correctly. This is unusual and might be due to a deeper issue with the environment setup.
Unfortunately, I am unable to resolve this issue at the moment. I recommend trying to run the code in a different Python environment or checking the system for potential conflicts that might be causing this issue.
I used the code produced and ran the code successfully in a different environment.
I hope these examples are of some help Rahul. Do keep up the great work you are your team are doing.