Release notes
Release 0.4.3
New features and bug fixes
- A command line utility
petastorm-copy-dataset.py
. Makes it easier to create subsets (columns/rows) of existing petastorm datasets.
- Add option to use custom pyarrow filesystem when materializing datasets.
- Limit memory usage correctly when using
Reader
with ProcessPool
.
- Added
--pyarrow-serialize
switch to petastorm-throughput.py
benchmarking command line utility.
- Faster serialization (using
pyarrow.serialize
) in ProcessPool
implementation. Now decimal types are supported.
- More information in reader.diagnostics property.
- Check if a –unischema string passed to petastorm-generate-metadata is actually a
Unischema
instance.
- Fixed race condition in
ProcessPool
resulting in indefinite wait on ProcessPool
shutdown.
- Force loading
pyarrow
before torch
. Helps to avoid a segfault (documented in docs/toubleshoot.rst
)
- Fixed mnist training examples.
- Make dependency on opencv optional in
codecs.py
Release 0.4.2
New features and bug fixes
- Making
decimal.Decimal
to be decoded as decimal
, as in before 0.4.0.
- Adding a
benchmark
module with a petastorm-throughput.py
command line utility.
Release 0.4.0, 0.4.1
Breaking changes
- All
decimal.Decimal
fields are now decoded as strings
PredicateBase
moved from petastorm
package to petastorm.predicates
RowGroupSelectorBase
moved from petastorm
package to petastorm.selectors
New features and bug fixes
- Added
WeightedSamplingReader
: aggregates multiple Reader
output by sampling them with a specified probabilityWeightedSamplingReader see documentation.
- Add option for driver memory in regenerating metadata
- petastorm-generate-metadata command line tool renamed to petastorm-generate-metadata.py
- pytorch support (
petastorm.pytorch.DataLoader
class)
- pytorch and tensorflow mnist model training
- Added
CompressedNdarrayCodec
codec
- Support passing pyarrow filesystem as
Reader
construction argument
- Speedup serialization (use
pyarrow.serialize
) when ProcessPool is used.
- New, experimental, implementation of reader:
ReaderV2
.