Idea: In this project, I have to forecast the sales for the 122 month, based on past 120 month (past 10 years) sales data. This is a time series dataset with some extreme outliers. I tried several times with julius but it did not give me the desired result, the amount it forecasted has a very high deviation from actual figure.
The dataset you have with just two data points, Date and Sales_total_in_BDT, is limited for forecasting purposes. Here’s a breakdown of the limitations and some suggestions for improvement:
Limitations of this dataset:
Lack of historical data: Forecasting relies on identifying trends and patterns in past data. With only dates and corresponding total sales, it’s difficult to uncover these patterns. Ideally, I would need daily, weekly, monthly, or quarterly sales figures over a longer period.
Seasonality: Sales might fluctuate based on seasonality (e.g., holidays, promotions). Without further data points, you can’t account for these trends in your forecast. But unfortunately the company does not have this data
External factors: There could be external factors impacting sales, like economic conditions, competitor activity, or marketing campaigns. The current dataset doesn’t capture these influences. So please dont ask for them
Suggestions already I figured out but i cant do it since it lacks of this
Include more granular sales data: Break down the sales data by day, week, month, or quarter. This allows you to identify trends and seasonality.
Incorporate additional data points: Consider including factors that might influence sales, such as marketing spend, promotional activity, competitor prices, or economic indicators.
Expand the date range: The more historical data you have, the better you can identify patterns and predict future trends.
Considering all these any help would be highly appreciated instead of only advice?
I took your data and added another column from what I could find. Given you currency is BDT I assumed this is sales data relevant to the Bangladeshi economy, so I went to their government website and pulled some Nominal GDP values from here and here. I really couldn’t find anything for 2014 and that might be something you should search for as someone more familiar with this economy and language.
But, then I proceeded to talk with Julius about your project and asked it to build an XGBoost model that attempts to predict all data from July 2023 to June 2024 (half of that over sales data already in the dataset so I can compare visually).
If you check how the GDP data compares with your sales data, it’s not that helpful to look at it directly like this:
It’s really not that incredible because the downturn in 2023 is not predicted well so the model sort of continues the trend from right before that, where sales were quite higher. However, this all might recover.
How to improve this further:
More data (I understand you might not have access to that)
More features (this needs to be done carefully, with industry knowledge - what is being sold and what drives demand? What products are being used as inputs in production and what are their price changes month to month? How elastic is user demand to price fluctuations? What products act as substitutes for what is being sold so they can be tracked too? etc)
Since the GPD does seem to correlate nicely in some way with this data, you could use World Bank forecasts for 2024 and use them to adjust your predictions further.
According to my understanding I am sharing some context of this dataset
Company background: The company is a content writing company. It works for international clients. So what I understand the GDP, is less impactful here since GDP is mostly involved with domestic economic condition. May be I miss the linkage what you want to say,
I can see there are some extreme outliers in the dataset. For that reason, I think based on last 2 or 3 years data, if we try to forecast that would be more appropriate.
What is your suggestion on this?
Have you tried talking to Julius about what other kind of data can be included to improve your predictions, given that it is a Content Writing company?