Revision

Back to ML System Design


Data Sources

Data can be user generated of systel generated:



Data Format

Different data formats exist:


Row-major vs Column-major

Here is a representation of Row-major vs Column-major data types:


Example of DataFrame and np.array

Here is a example of difference in access time:


Data model: Relational vs Non Relational

Relational data model can be summarized as ‘data stored in table’:



Relational model: normalization

Normalization is using associated values to index, the mapping being stored in a table:



Relational model and SQL


NoSQL: Not Only SQL

JSON is an example of NoSQL format.


Structured vs Unstructured



Data Storage Engines and Processing

ETL


Resources

See: