MDML S3 Client

This is used for “coat-checking” large files.

class mdml_client.kafka_mdml_s3_client(topic, s3_endpoint=None, s3_access_key=None, s3_secret_key=None, kafka_host='merf.egs.anl.gov', kafka_port=9092, schema_host='merf.egs.anl.gov', schema_port=8081, schema=None)

Creates an MDML producer for sending >1MB files to an s3 location. Simultaneously, the MDML sends upload information along a Kafka topic to be received by a client that can retrieve the file.

Parameters:
  • topic (str) – Topic to send under
  • s3_endpoint (str) – Host of the S3 service
  • s3_access_key (str) – S3 access key
  • s3_secret_key (str) – S3 secret key
  • kafka_host (str) – Host name of the kafka broker
  • kafka_port (int) – Port used for the kafka broker
  • schema_host (str) – Host name of the kafka schema registry
  • schema_port (int) – Port of the kafka schema registry
  • schema (dict or str) – Schema of the messages sent on the supplied topic. Default schema sends a dictionary containing the time of upload and the location for retrieval. If dict, value is used as the schema. If string, value is used as a file path to a json file.
consume(bucket, object_name, save_filepath)

Gets a file from an S3 bucket. Can return the bytes of the file or save the file to a specified path.

Parameters:
  • bucket (str) – Name of the bucket the object is saved in
  • object_name (str) – Name/key of the object to retrieve from the bucket
  • save_filepath (str) – Path in which to save the downloaded file. Using a value of None will return the bytes of the file instead of saving to a file
produce(filepath, obj_name, payload=None)

Produce data to supplied S3 endpoint and Kafka topic

Parameters:
  • filepath (str) – Path of the file to upload to the S3 bucket
  • obj_name (str) – Name to store the file under
  • payload (dict) – Payload for the message sent on the Kafka topic. Only used when the default schema has been overridden.