Migration notes

To petastorm 0.5.0

Petastorm 0.5.0 has some breaking changes from previous versions. These include:

  • Users should use make_reader(), instead of instantiating Reader directly to create a new instances
  • It is still possible (although discouraged in most cases) to instantitate Reader. Some of its argument has changed.

Use make_reader() to instantiate a reader instance

Use make_reader() to create a new instance of a reader. make_reader() takes arguments that are almost similar to constructor arguments of Reader. The following list enumerates the differences:

  • reader_pool_type: takes one of the strings: 'thread', 'process', 'dummy' (instead of ThreadPool(), ProcessPool() and DummyPool() object instances). Pass number of workers using workers_count argument.
  • training_partition and num_training_partitions were renamed into cur_shard and shard_count.
  • shuffle and shuffle_options were replaced by shuffle_row_groups=True, shuffle_row_drop_partitions=1
from petastorm.reader import Reader
reader = Reader(dataset_url,
                training_partition=1, num_training_partitions=5,


from petastorm import make_reader
reader = make_reader(dataset_url,
                     cur_shard=1, shard_count=5,