How can I asynchronously load data from large files into Qt?

Question

How can I asynchronously load data from large files into Qt?

I use Qt 5.2.1 to implement a program that reads data from a file (maybe from several bytes to several GB) and visualizes this data so that it depends on each byte. My example is a hex viewer.

One object reads and emits a dataRead() signal when it reads a new data block. The signal has a pointer to a QByteArray as follows:

filereader.cpp

 void FileReader::startReading() { /* Object state code here... */ { QFile inFile(fileName); if (!inFile.open(QIODevice::ReadOnly)) { changeState(STARTED, State(ERROR, QString())); return; } while(!inFile.atEnd()) { QByteArray *qa = new QByteArray(inFile.read(DATA_SIZE)); qDebug() << "emitting dataRead()"; emit dataRead(qa); } } /* Emit EOF signal */ }

The viewer has a loadData slot connected to this signal, and this is a function that displays data:

hexviewer.cpp

 void HexViewer::loadData(QByteArray *data) { QString hexString = data->toHex(); for (int i = 0; i < hexString.length(); i+=2) { _ui->hexTextView->insertPlainText(hexString.at(i)); _ui->hexTextView->insertPlainText(hexString.at(i+1)); _ui->hexTextView->insertPlainText(" "); } delete data; }

The first problem is that if it just does as it is, the GUI thread will become completely unresponsive. All dataRead() signals will be emitted before the GUI is constantly redrawn.

(The full code can be run, and when you use a file larger than 1 KB, you will see this behavior.)

The transition from the answer to my forum Non-blocking local IO file in Qt5 and the answer to another question How to make async io file in qt? , answer: use streams. But none of these answers contain details on how to mix the data itself and how to avoid common mistakes and errors.

If the data was small (about 100 bytes), I would just emit it with a signal. But in case the file is of GB size ( change ) or the file is in the network file system, for example. NFS, Samba share, I don’t want the user interface to be blocked just because it read blocks of files.

The second problem is that the mechanics of using new in the emitter and delete in the receiver seem a bit naive: I effectively use the whole bunch as a cross-flow queue.

Question 1: Does Qt have a better / idiomatic way of moving data across threads while limiting memory usage? Does it have a thread safe queue or other structures that can simplify all this?

Question 2: Do I need to implement threads myself? I'm not a big fan of wheel ingenuity, especially with regard to memory management and slicing. Are there higher-level constructions that can already do this, for example, for network transport?

+7

c ++ multithreading io qt

detly Jan 03 '15 at 0:07

source share

4 answers

hank · Answer 1 · 2016-01-28T08:40:06+0000

First of all, you do not have multithreading in your application. Your FileReader class is a subclass of QThread , but this does not mean that all FileReader methods will execute in another thread. In fact, all of your operations are performed in the main (GUI) thread.

FileReader should be a subclass of QObject , not QThread . Then you create a basic QThread object and move your worker (reader) to it using QObject::moveToThread . You can read about this method here .

Make sure you register the type FileReader::State using qRegisterMetaType . This is necessary so that the connections to the Qt signal slot work in different streams.

Example:

 HexViewer::HexViewer(QWidget *parent) : QMainWindow(parent), _ui(new Ui::HexViewer), _fileReader(new FileReader()) { qRegisterMetaType<FileReader::State>("FileReader::State"); QThread *readerThread = new QThread(this); readerThread->setObjectName("ReaderThread"); connect(readerThread, SIGNAL(finished()), _fileReader, SLOT(deleteLater())); _fileReader->moveToThread(readerThread); readerThread->start(); _ui->setupUi(this); ... } void HexViewer::on_quitButton_clicked() { _fileReader->thread()->quit(); _fileReader->thread()->wait(); qApp->quit(); }

There is also no need to allocate data in a bunch:

 while(!inFile.atEnd()) { QByteArray *qa = new QByteArray(inFile.read(DATA_SIZE)); qDebug() << "emitting dataRead()"; emit dataRead(qa); }

QByteArray uses implicit exchange . This means that its contents are not copied again and again when you pass a QByteArray object through read-only functions.

Change the code above and forget about manual memory management:

 while(!inFile.atEnd()) { QByteArray qa = inFile.read(DATA_SIZE); qDebug() << "emitting dataRead()"; emit dataRead(qa); }

But in any case, the main problem is not multithreading. The problem is that the QTextEdit::insertPlainText not cheap, especially when you have a huge amount of data. FileReader quickly reads the file data and then floods your widget with new displayed pieces of data.

It should be noted that you have a very inefficient implementation of HexViewer::loadData . You insert char text data into a char, which allows QTextEdit constantly redraw its contents and freeze the GUI.

First you must prepare the resulting hexadecimal string (note that the data parameter is no longer a pointer):

 void HexViewer::loadData(QByteArray data) { QString tmp = data.toHex(); QString hexString; hexString.reserve(tmp.size() * 1.5); const int hexLen = 2; for (int i = 0; i < tmp.size(); i += hexLen) { hexString.append(tmp.mid(i, hexLen) + " "); } _ui->hexTextView->insertPlainText(hexString); }

In any case, the bottleneck of your application is not reading files, but QTextEdit . Loading data in chunks and then adding them to the widget using QTextEdit::insertPlainText will not speed anything up. For files smaller than 1 MB, it’s faster to read the entire file at a time, and then set the resulting text for the widget in one step.

I suppose you cannot easily display huge texts larger than a few megabytes in size using the default Qt widgets. This task requires some non-trivial correspondence, which in the general case has nothing to do with multithreaded or asynchronous data loading. It's all about creating a complex widget that will not try to display its huge content right away.

g24l · Answer 2 · 2016-01-30T23:42:57+0000

This is similar to what you would like to have a consumer producer with semaphores. There is a very specific example that will help you implement it correctly. You need another thread to do this work separately from the main thread.

The setting should be:

Thread A launches your filereader as a producer.
The GUI thread launches the Hexviewer widget, which uses your data for specific events. Before releasing QSemaphore::acquire() you need to check with QSemaphore :: available () `to avoid GUI blocking.
Filereader and Hexviewer have access to the third class, for example. DataClass, where data is placed on read and retrieve from the consumer. It should also be defined by semaphores.
There is no need to give a signal with data or notify.

This largely covers the transfer of your data read from filereader to your widget, but does not cover how to actually draw this data. To achieve this, you can consume data in paintevent by overriding the Hexviewer drawing event and reading what has been queued. A more sophisticated approach would be to write an event filter .

In addition to this, you may want to get the maximum number of bytes read, after which Hexviewer explicitly signals data consumption.

Please note that this solution is completely asynchronous, thread safe and streamlined, since none of your data is sent to Hexviewer, but Hexviewer only consumes this when it needs to be displayed on the screen.

Marek r · Answer 3 · 2016-01-28T14:01:35+0000

if you plan to edit 10 GB files, forget about QTextEdit . This ui->hexTextView->insertPlainText will just have a whole memory before you read 1/10 of the file. IMO, you should use QTableView to present and edit data. To do this, you must inherit QAbstractTableModel . In one line, you must provide 16 bytes. The first 16 columns are in hexadecimal and the next column is in ASCII form. It does not have to be complicated. Just read the scary QAbstractTableModel documentation. Caching data is important here. If I have time, I will give a code example.
I forgot about using multiple threads. This is a bad case to use such a thing, and most likely you will create a lot of problems related to synchronization.

Good. I had some code here for some time that works (I am testing it normally):

 #include <QObject> #include <QFile> #include <QQueue> class LargeFileCache : public QObject { Q_OBJECT public: explicit LargeFileCache(QObject *parent = 0); char geByte(qint64 pos); qint64 FileSize() const; signals: public slots: void SetFileName(const QString& filename); private: static const int kPageSize; struct Page { qint64 offset; QByteArray data; }; private: int maxPageCount; qint64 fileSize; QFile file; QQueue<Page> pages; }; #include <QAbstractTableModel> class LargeFileCache; class LageFileDataModel : public QAbstractTableModel { Q_OBJECT public: explicit LageFileDataModel(QObject *parent); // QAbstractTableModel int rowCount(const QModelIndex &parent) const; int columnCount(const QModelIndex &parent) const; QVariant data(const QModelIndex &index, int role) const; signals: public slots: void setFileName(const QString &fileName); private: LargeFileCache *cachedData; }; #include "lagefiledatamodel.h" #include "largefilecache.h" static const int kBytesPerRow = 16; LageFileDataModel::LageFileDataModel(QObject *parent) : QAbstractTableModel(parent) { cachedData = new LargeFileCache(this); } int LageFileDataModel::rowCount(const QModelIndex &parent) const { if (parent.isValid()) return 0; return (cachedData->FileSize() + kBytesPerRow - 1)/kBytesPerRow; } int LageFileDataModel::columnCount(const QModelIndex &parent) const { if (parent.isValid()) return 0; return kBytesPerRow; } QVariant LageFileDataModel::data(const QModelIndex &index, int role) const { if (index.parent().isValid()) return QVariant(); if (index.isValid()) { if (role == Qt::DisplayRole) { qint64 pos = index.row()*kBytesPerRow + index.column(); if (pos>=cachedData->FileSize()) return QString(); return QString::number((unsigned char)cachedData->geByte(pos), 0x10); } } return QVariant(); } void LageFileDataModel::setFileName(const QString &fileName) { beginResetModel(); cachedData->SetFileName(fileName); endResetModel(); } #include "largefilecache.h" const int LargeFileCache::kPageSize = 1024*4; LargeFileCache::LargeFileCache(QObject *parent) : QObject(parent) , maxPageCount(1024) { } char LargeFileCache::geByte(qint64 pos) { // largefilecache if (pos>=fileSize) return 0; for (int i=0, n=pages.size(); i<n; ++i) { int k = pos - pages.at(i).offset; if (k>=0 && k< pages.at(i).data.size()) { pages.enqueue(pages.takeAt(i)); return pages.back().data.at(k); } } Page newPage; newPage.offset = (pos/kPageSize)*kPageSize; file.seek(newPage.offset); newPage.data = file.read(kPageSize); pages.push_front(newPage); while (pages.count()>maxPageCount) pages.dequeue(); return newPage.data.at(pos - newPage.offset); } qint64 LargeFileCache::FileSize() const { return fileSize; } void LargeFileCache::SetFileName(const QString &filename) { file.close(); file.setFileName(filename); file.open(QFile::ReadOnly); fileSize = file.size(); }

It is shorter than I expected, and it needs some improvement, but it should be a good base.

CodeLurker · Answer 4 · 2019-01-24T02:07:22+0000

For a hex viewer, I don’t think you are on the right track, unless you think it will most likely be used on a system with SCSI or RAID arrays for speed. Why upload gigabytes of data at all? Access to the file to fill the text field these days is pretty fast. Of course, for example, Notepad ++ has an excellent plugin for viewing in hexadecimal format, and you must first download the file; but this is because the file can be edited, and the way the plant works.

I think that you probably want to subclass the text field, get and receive enough data to load the text field, or even go broke, and load 500 thousand data before and after the current position. Then, let's say you start with a zero byte. Download enough data for your display and possibly some additional data; but set the scroll bar type to always visible. Then, I think, you are likely to intercept scroll events by subclassing QTextBox; and writing your own events scrollContentsBy () and changeEvent () and / or paint ().

Even simpler, you can simply create a QTextBox without scrollbars; and a QVerticalScrollbar next to it. Set the range and initial value. Then respond to the valueChanged () event; and change the contents of the QTextBox. Thus, the user does not need to wait for the disk to read to start editing, and it will be much easier for resources (for example, memory), so if many applications are open, they will not be unloaded to disk). Subclasses of these things sound difficult, but in many cases it seems harder than it actually is. Often there are already good examples of someone doing something similar.

If you have several threads reading a file, on the contrary, you can have one reading from the beginning, another from the middle and another towards the end. One read head will jump, trying to satisfy all requests, and therefore work less efficiently. If it is an SDD drive instead, nonlinear reads will not hurt you, but they won’t help either. If you prefer to use a trade-off between the presence of, perhaps, noticeable loading time, so that the user can scroll a lot perforce, a little faster (the text field with the data for reading does not actually take much time) to load, in the end), then you can have one thread reading it in the background, and then you can allow the main one to continue processing the event loop. Simply put, just read the blocks n megabytes at a time, as it opens the entire file at once, and then do qApp->processEvents(); Allow the graphical interface to respond to any graphical interface events that may have occurred in the meantime after each reading of the block.

If you are sure that it will most likely be used in a SCSI or RAID array, then it might make sense to use multithreading for reading. A SCSI drive can have multiple read heads; and some RAID arrays are configured to distribute data across multiple drives in order to increase speed. Note that it is better to use a single stream for reading if the RAID array is configured to store several identical copies of the data for data security purposes. When I started implementing multithreading, I found that the lightweight model proposed here is most useful: QThread: You were not mistaken . I had to do Q_DECLARE_METATYPE for the result structure, so that a constructor, destructor and move operator were defined for it (I used memmove), and did qRegisterMetaType () for both the structure and the vector to save the results, to return the results correctly. You pay the price for locking the vector to return its results; but actually it’s not so much. In this context, it may be worth using shared memory, but perhaps each stream can have its own, so you do not need to block reading from the results of other threads in order to write it.

How can I asynchronously load data from large files into Qt?

filereader.cpp

hexviewer.cpp

More articles: