You really can use the idx2numpy package available on PyPI. It is extremely easy to use and directly converts data into arrays. Here is what you should do:
Data loading
Download the MNIST dataset from the official website .
If you are using Linux, you can use wget to get it from the command line. Just run:
wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Data unpacking
Unzip or unzip the data. On Linux you can use gzip
Ultimately, you should have the following files:
data/train-images-idx3-ubyte data/train-labels-idx1-ubyte data/t10k-images-idx3-ubyte data/t10k-labels-idx1-ubyte
The data/ prefix is ββonly because I extracted them to a folder called data . Your question looks like you've done everything so far, so keep reading.
Using idx2numpy
Here is simple Python code to read everything from unpacked files as arrays.
import idx2numpy import numpy as np file = 'data/train-images-idx3-ubyte' arr = idx2numpy.convert_from_file(file)
Now you can use it with OpenCV Juts in the same way that you display any other image, using something like
cv.imshow("Image", arr[4])
To install idx2numpy, you can use PyPI ( pip package manager). Just run the command:
pip install idx2numpy
source share