Skip to content

ETL Overview

Welcome to Flojoy’s ETL Blocks page. Here you can find all the information on how to handle ETL tasks using Flojoy.

EXTRACT

DATAFRAME

EXTRACT_COLUMNSTake an input dataframe/matrix and returns a dataframe/matrix with only the specified columns.

FILE

OPEN_IMAGELoad an image file from disk and return a DataContainer of type 'image'.
OPEN_PARQUETLoad a local parquet file, then return the data as a dataframe.
READ_CSVRead a .csv file from disk or a URL, and then return it as a dataframe.
READ_S3Take an S3 key name, S3 bucket name, and file name as input, then extract the file from the specified bucket.

LOAD

CLOUD_DATABASE

FLOJOY_CLOUD_DOWNLOADDownload a DataContainer from Flojoy Cloud (beta).
FLOJOY_CLOUD_UPLOADUpload a DataContainer to Flojoy Cloud (beta).

LOCAL_FILE_SYSTEM

BATCH_PROCESSORBlob match a pattern in the given input directory, iterate (in a LOOP) over all of the files found, then return each file path as a TextBlob.
LOCAL_FILELoad a local file from disk, infer the type, and convert it to a DataContainer class.
OPEN_MATLABThe OPEN_MATLAB node loads a local file of the .mat file format.

REMOTE_FILE_SYSTEM

REMOTE_FILELoad a remote file from an HTTP URL endpoint, infer the type, and convert it to a DataContainer class.

TRANSFORM

MATRIX_MANIPULATION

DOT_PRODUCTTake two input matrices, multiply them (by dot product), and return the result.
INVERTInvert a Matrix or OrderedPair.
MATMULTake two input matrices, multiply them, and return the result.
SHUFFLE_MATRIXReturn a matrix that is randomly shuffled by the first axis
SORT_MATRIXTake an input matrix and sort it along the chosen axis.
TRANSPOSE_MATRIXTake an input 2D matrix and transpose it.

ORDERED_PAIR_MANIPULATION

ORDERED_PAIR_XY_INVERTReturn an OrderedPair with the axes inverted.

TEXT_MANIPULATION

TEXT_CONCATConcatenate 2 strings given by 2 TextBlob DataContainers.

TYPE_CASTING

BOOLEAN_2_SCALARTakes boolean type data and converts it into scalar data type.
DF_2_NPConvert a DataFrame DataContainer to a Matrix DataContainer.
DF_2_ORDERED_TRIPLEConvert a DataFrame DataContainer to an OrderedTriple DataContainer.
MATRIX_2_VECTORConvert a Matrix DataContainer to a Vector DataContainer.
MAT_2_DFConvert a Matrix DataContainer to a DataFrame DataContainer.
NP_2_DFInfer the type of an array-like DataContainer, then convert it to a DataFrame DataContainer'.
ORDERED_PAIR_2_VECTORReturns the split components (x, y) of an ordered pair as Vectors.
ORDERED_TRIPLE_2_SURFACEConvert an OrderedTriple DataContainer to a Surface DataContainer.
VECTOR_2_MATRIXConvert a Vector DataContainer to a Matrix DataContainer.
VECTOR_2_ORDERED_PAIRConvert a Vector DataContainer to an OrderedPair DataContainer.
VECTOR_2_SCALARTakes a vector and transform it into scalar data type.

VECTOR_MANIPULATION

DECIMATE_VECTORThe DECIMATE_VECTOR node returns the input vector by reducing the
INTERLEAVE_VECTORThe INTERLEAVE_VECTOR node combine multiple vectors into a single vector type by interleaving their elements.
REMOVE_DUPLICATES_VECTORThe REMOVE_DUPLICATES_VECTOR node returns a vector with only unique elements.
REPLACE_SUBSETThe REPLACE_SUBSET node returns a new Vector with subset of elements replaced.
REVERSE_VECTORThe REVERSE_VECTOR node returns a vector equal to the input vector but reversed.
SHIFT_VECTORThe SHIFT_VECTOR node shifts the elements in the vector by the amount specified
SHUFFLE_VECTORThe SHUFFLE_VECTOR node returns a vector that is randomly shuffled.
SORT_VECTORThe SORT_VECTOR node returns the input Vector that is sorted
SPLIT_VECTORThe SPLIT_VECTOR node returns a vector that is splited by a given index
VECTOR_DELETEThe VECTOR_DELETE node returns a new Vector with elements deleted from requested indices
VECTOR_INDEXINGThe VECTOR_INDEXING node returns the value of the vector at the requested index.
VECTOR_INSERTThe VECTOR_INSERT node inserts a value to the Vector at the
VECTOR_LENGTHThe VECTOR_LENGTH node returns the length of the input vector.
VECTOR_MAXThe VECTOR_MAX node returns the maximum value from the Vector.
VECTOR_MINThe VECTOR_MIN node returns the minimum value from the Vector
VECTOR_SUBSETThe VECTOR_SUBSET node returns the subset of values from requested indices