Comparing SerDe Modules in Python

SerDe(serialization and deserialization) is a process in the programming world where convert object into byte stream format, so that reusability of the object in the same or across different script or environment. Different python frameworks like pyspark used SerDe to convert the dataframe partition into a serialized format to distribute across nodes.

Python serialization

The python serialization libraries are used to conduct serDe operations on Python objects to convert into byte format. These libraries are used to serialize simple data structures like arrays, lists, dictionaries to complex class objects, Machine learning models. For simple use cases, other formats like JSON or YAML are used but to save a complex or long-running object(like ML models) pickling is useful.

Why Serialization

  • Object persistence — storing the object’s state in a permanent persistence mechanism such as a database or blob storage. The persisted file can be used in another python script or transferred to another location vis TCP. The state of the object is preserved in the .pickle file.
  • Preserve State -Serialization allows the developer to save the state of an object and recreate it as needed.
  • Remote Object access- Serialized objects can be referred by another program in a remote location
  • Distributed data processing — Data partitions in serialized format can be spread across nodes to parallelly process the data.

Unpickling

Unpickling is the process of loading any pickled file into any python object and making it reusable for the same or different script. There are a few good use cases where the machine learning model is pickled after training in order to avoid retraining with the same data set. Later on, the Pickled ML models are used for prediction in a REST API.

A few Python Serializers

In python, there are three commonly used serializers.

  1. Pickle Serializer
  2. Marshal Serializer
  3. JSON serializer

Marshal Module

Marshal serializer is the primitive serializer but it can only convert some python objects. It converts the state of the object and codebase into a byte array or remote object format. The serialization is a subset of a process within marshaling and marshaling add the code base +data type info to the serialized object. Typical use cases are remote method call from another network.

Pickle Module

The Pickle module converts the python object to byte code. Unlike marshal, the pickle is used to object sharing and transfer. The Pickle model made the conversion even faster compared to the marshal module. The Pickling is used in multiprocessing libraries in python to distribute objects on different processors in parallel. Keep in mind byte code is not human readable, so use pickle from a trusted source.

JSON Module

The JSON format is text-like and human-readable format thereby making it ais a lightweight format for data- interchange therefore much faster than Pickle. The JSON serialization is meant for objects without schema requirements and there are limitations to the type of python objects it can serialize.

Compression

Compression of any serialization format will reduce the time to write the file which increases the speed of serialization and faster transfer across networks.

Conclusion

Loading any serialized formats from untrusted sources in Pickle and Marshal is a concern of security. Based on the security concerns, the JSON format is preferred over pickled values as it is fast enough, human-readable, reduces cause security issues, and other programming languages friendly. Compared to JSON, the pickle is slow, insecure, and used in Python only. The main reason pickle is used is to serialize arbitrary Python objects, whereas both JSON have serialization limitations on the type of data. I would like to hear other comments and concerns from the readers.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store