Helper Functions

mdml_client.create_schema(d, title, descr, required_keys=None, add_time=False)

Create a schema for use in a kafka_mdml_producer object. An example of the data object that will be produced is needed to create the schema.

Parameters:
  • d (dict) – Data object to translate into a schema
  • title (str) – Title of the schema
  • descr (str) – Description of the schema
  • required_keys (list(str)) – List of strings of the keys that are required in the schema
Returns:

Return type:

Schema dictionary compatible with kafka_mdml_producer

mdml_client.chunk_file(fn, chunk_size, use_b64=True, encoding='utf-8', file_id=None)

Chunks a file into parts. Yields dictionaries containing the file bytes encoded in base64. Base64 is used since the kafka Producer requires a string and some files must be opened in byte format.

Parameters:
  • fn (str) – Path to the file
  • chunk_size (int) – Size of chunk to use
  • use_b64 (bool) – True to return the file bytes as a base64 encoded string
  • encoding (string) – Encoding to use to open the file if use_b64 is False
  • file_id (string) – File ID to use in the chunking process if the fn param is not suitable
Yields:
  • Dictionary containing a chunk of data and metadata information
  • required to piece all of the chunks back together.