PROPHET_PREDICT
Run a Prophet time series prediction model on an incoming dataframe.The DataContainer input type must be a dataframe, and the first column (or index) of the dataframe must be of a datetime type.
This node always returns a DataContainer of a dataframe type. It will also always return an 'extra' field with a key 'prophet' of which the value is the JSONified Prophet model.
This model can be loaded as follows:Params:run_forecast : boolIf True (default case), the dataframe of the returning DataContainer
('m' parameter of the DataContainer) will be the forecasted dataframe.
It will also have an 'extra' field with the key 'original', which is
the original dataframe passed in.
If False, the returning dataframe will be the original data.
This node will also always have an 'extra' field, run_forecast, which
matches that of the parameters passed in. This is for future nodes
to know if a forecast has already been run.
Default = Trueperiods : intThe number of periods to predict out. Only used if run_forecast is True.
Default = 365Returns:out : DataFrameWith parameter as df.
Indicates either the original df passed in, or the forecasted df
(depending on if run_forecast is True).out : DataContainerWith parameter as 'extra'.
Contains keys run_forecast which correspond to the input parameter,
and potentially 'original' in the event that run_forecast is True.
Python Code
from flojoy import DataFrame, flojoy, run_in_venv
@flojoy
@run_in_venv(
pip_dependencies=[
"prophet==1.1.5",
]
)
def PROPHET_PREDICT(
default: DataFrame, run_forecast: bool = True, periods: int = 365
) -> DataFrame:
"""Run a Prophet time series prediction model on an incoming dataframe.
The DataContainer input type must be a dataframe, and the first column (or index) of the dataframe must be of a datetime type.
This node always returns a DataContainer of a dataframe type. It will also always return an 'extra' field with a key 'prophet' of which the value is the JSONified Prophet model.
This model can be loaded as follows:
Parameters
----------
run_forecast : bool
If True (default case), the dataframe of the returning DataContainer
('m' parameter of the DataContainer) will be the forecasted dataframe.
It will also have an 'extra' field with the key 'original', which is
the original dataframe passed in.
If False, the returning dataframe will be the original data.
This node will also always have an 'extra' field, run_forecast, which
matches that of the parameters passed in. This is for future nodes
to know if a forecast has already been run.
Default = True
periods : int
The number of periods to predict out. Only used if run_forecast is True.
Default = 365
Returns
-------
DataFrame
With parameter as df.
Indicates either the original df passed in, or the forecasted df
(depending on if run_forecast is True).
DataContainer
With parameter as 'extra'.
Contains keys run_forecast which correspond to the input parameter,
and potentially 'original' in the event that run_forecast is True.
"""
import os
import sys
import numpy as np
import pandas as pd
import prophet
from prophet.serialize import model_to_json
def _make_dummy_dataframe_for_prophet():
"""Generate random time series data to test if prophet works"""
start_date = pd.Timestamp("2023-01-01")
end_date = pd.Timestamp("2023-07-20")
num_days = (end_date - start_date).days + 1
timestamps = pd.date_range(start=start_date, end=end_date, freq="D")
data = np.random.randn(num_days) # Random data points
df = pd.DataFrame({"ds": timestamps, "ys": data})
df.rename(
columns={df.columns[0]: "ds", df.columns[1]: "y"}, inplace=True
) # PROPHET model expects first column to be `ds` and second to be `y`
return df
def _apply_macos_prophet_hotfix():
"""This is a hotfix for MacOS. See https://github.com/facebook/prophet/issues/2250#issuecomment-1559516328 for more detail"""
if not sys.platform == "darwin":
return
# Test if prophet works (i.e. if the hotfix had already been applied)
try:
_dummy_df = _make_dummy_dataframe_for_prophet()
prophet.Prophet().fit(_dummy_df)
except RuntimeError:
print("Could not run prophet, applying hotfix...")
else:
return
prophet_dir = prophet.__path__[0] # type: ignore
# Get stan dir
stan_dir = os.path.join(prophet_dir, "stan_model")
# Find cmdstan-xxxxx dir
cmdstan_basename = [x for x in os.listdir(stan_dir) if x.startswith("cmdstan")]
assert len(cmdstan_basename) == 1, "Could not find cmdstan dir"
cmdstan_basename = cmdstan_basename[0]
# Run (from stan_dir) : install_name_tool -add_rpath @executable_path/<CMDSTAN_BASENAME>/stan/lib/stan_math/lib/tbb prophet_model.bin
cmd = f"install_name_tool -add_rpath @executable_path/{cmdstan_basename}/stan/lib/stan_math/lib/tbb prophet_model.bin"
cwd = os.getcwd()
os.chdir(stan_dir)
return_code = os.system(cmd)
os.chdir(cwd)
if return_code != 0:
raise RuntimeError("Could not apply hotfix")
_apply_macos_prophet_hotfix()
df = default.m
first_col = df.iloc[:, 0]
if not pd.api.types.is_datetime64_any_dtype(first_col):
raise ValueError(
"First column must be of datetime type data for PROPHET_PREDICT!"
)
df.rename(
columns={df.columns[0]: "ds", df.columns[1]: "y"}, inplace=True
) # PROPHET model expects first column to be `ds` and second to be `y`
model = prophet.Prophet()
model.fit(df)
extra = {"prophet": model_to_json(model), "run_forecast": run_forecast}
# If run_forecast, the return df is the forecast, otherwise the original
return_df = df.copy()
if run_forecast:
future = model.make_future_dataframe(periods)
forecast = model.predict(future)
extra["original"] = df
return_df = forecast
return DataFrame(df=return_df, extra=extra)
Example
Having problems with this example app? Join our Discord community and we will help you out!
In this example, the TIMESERIES
node generates random time series data
This dataframe is then passed to the PROPHET_PREDICT
node, with the default parameters
of run_forecast=True
and periods=365
. This node trains a Prophet
model and runs a prediction
forecast over a 365 period.
It returns a DataContainer with the following
type
:dataframe
m
: The forecasted dataframeextra
:run_forecast
:True
(because that’s what was passed in)prophet
: The trainedProphet
modeloriginal
: The dataframe passed into the node
Finally, this is passed to 2 nodes, PROPHET_PLOT
and PROPHET_COMPONENTS
, wherein
the forecast and the trend components are plotted in Plotly. Because a forecast was already run,
the PROPHET_PLOT
and PROPHET_COMPONENTS
nodes know to use the already predicted dataframe.