Corteza Envoy
Corteza Envoy defines a data transportation framework that we use when working with complex data with lots of dependencies. The goal is to simplify data-related operations as much as possible.
Envoy consists of a series of layers which allow flexibility while preserving fairly trivial basic implementation.
@startuml ' --------------- ' Decoder layer ' --------------- package "Decoder layer" as dc { [yaml] as dcyml [csv] as dccsv [store] as dcstore note as ndc A decoder takes in a set of sources and returns a set of resource and template nodes. end note } interface "Decoded data source" as dcDataSrc dcyml -down-> dcDataSrc dccsv -down-> dcDataSrc dcstore -down-> dcDataSrc ' --------------- ' Data shaping ' --------------- package "Data shaping layer" as shaperLayer { [Shaper] as shaper note as nshaperLayer The data shaping layer takes in raw datasets and resource template and produces actual resources. end note } interface "Envoy resource set" as eNodeSet dcDataSrc -down-> shaper shaper -down-> eNodeSet ' --------------- ' Graph builder ' --------------- package "Graph builder" as g { [Graph builder] as gbld note as ng The graph builder performs additional data transformations and pre-processing in order to construct a final dependency graph. end note } interface "Dependency graph" as depG eNodeSet -down-> gbld gbld -down-> depG ' --------------- ' Encoder layer ' --------------- package "Encoder" as enc { [yaml] as encyml [csv] as enccsv [store] as encstore note as nenc The encoder layer takes in a dependency graph and encodes the data to the specified destination. end note } depG -down-> encyml depG -down-> enccsv depG -down-> encstore @enduml
Resources
A resource is some generic intermediate structure that represents the provided source data with some additional context to enable further processing (things like dependency management via the dependency graph).
Data shaping
Data shaping lets us define templates how some input source should be shaped (processed) before being encoded.
Data shaping is useful for unstructured sources (such as .csv) where we can’t automatically extract contextual information (such as what module and namespace to use).
Source decoder
The source decoding layer takes in different data formats (such as json and yaml) and outputs a set of resource structures.
The source decoder is usually the first layer but we can also omit it if we construct resource structures manually.
Source encoder
The resource encoding layer takes in resources structures and encodes them to different destinations (such as the store layer).
Different encoders (YAML, JSON, and CSV) are on their way. |
Dependency graph
The dependency graph is the heart of the system, as it lets us determine the order in which the resources should be processed in.
It is a result of the graph building flow that enables you to do any pre-processing to the raw data.