What is the fastest way to load a large CSV file into master data

Conclusion
The problem is closed, I think.
It seems that the problem had nothing to do with the methodology, but Xcode did not properly clean the project between assemblies.
After all these tests, it seems that the sqlite file that was used was still the first that was not indexed ......
Beware of Xcode 4.3.2, I have nothing but problems with cleaning, not cleaning or adding files to the project, which are not automatically added to the package resources ...
Thanks for the different answers.

Update 3
Since I invite someone to simply try the same steps to see if they get the same results, let me tell you in detail about what I did:
I start with a clean project
I defined a datamodel with one Entity, 3 attributes (2 rows, 1 float)
The first row is indexed.
enter image description here

In did finishLaunchingWithOptions, I'm calling:

[self performSelectorInBackground:@selector(populateDB) withObject:nil]; 

The code for populateDb is below:

 -(void)populateDB{ NSLog(@"start"); NSPersistentStoreCoordinator *coordinator = [self persistentStoreCoordinator]; NSManagedObjectContext *context; if (coordinator != nil) { context = [[NSManagedObjectContext alloc] init]; [context setPersistentStoreCoordinator:coordinator]; } NSString *filePath = [[NSBundle mainBundle] pathForResource:@"input" ofType:@"txt"]; if (filePath) { NSString * myText = [[NSString alloc] initWithContentsOfFile:filePath encoding:NSUTF8StringEncoding error:nil]; if (myText) { __block int count = 0; [myText enumerateLinesUsingBlock:^(NSString * line, BOOL * stop) { line=[line stringByReplacingOccurrencesOfString:@"\t" withString:@" "]; NSArray *lineComponents=[line componentsSeparatedByString:@" "]; if(lineComponents){ if([lineComponents count]==3){ float f=[[lineComponents objectAtIndex:0] floatValue]; NSNumber *number=[NSNumber numberWithFloat:f]; NSString *string1=[lineComponents objectAtIndex:1]; NSString *string2=[lineComponents objectAtIndex:2]; NSManagedObject *object=[NSEntityDescription insertNewObjectForEntityForName:@"Bigram" inManagedObjectContext:context]; [object setValue:number forKey:@"number"]; [object setValue:string1 forKey:@"string1"]; [object setValue:string2 forKey:@"string2"]; NSError *error; count++; if(count>=1000){ if (![context save:&error]) { NSLog(@"Whoops, couldn't save: %@", [error localizedDescription]); } count=0; } } } }]; NSLog(@"done importing"); NSError *error; if (![context save:&error]) { NSLog(@"Whoops, couldn't save: %@", [error localizedDescription]); } } } NSLog(@"end"); } 

Everything else is the standard kernel data code, nothing has been added.
I ran this in a simulator.
I go to ~ / Library / Application Support / iPhone Simulator / 5.1 / Applications // Documents
Sqlite file created

I take it and I copy it in my package

I will comment on the call for populateDb

I edit persistentStoreCoordinator to copy sqlite file from package to documents on first run

 - (NSPersistentStoreCoordinator *)persistentStoreCoordinator { @synchronized (self) { if (__persistentStoreCoordinator != nil) return __persistentStoreCoordinator; NSString *defaultStorePath = [[NSBundle mainBundle] pathForResource:@"myProject" ofType:@"sqlite"]; NSString *storePath = [[[self applicationDocumentsDirectory] path] stringByAppendingPathComponent: @"myProject.sqlite"]; NSError *error; if (![[NSFileManager defaultManager] fileExistsAtPath:storePath]) { if ([[NSFileManager defaultManager] copyItemAtPath:defaultStorePath toPath:storePath error:&error]) NSLog(@"Copied starting data to %@", storePath); else NSLog(@"Error copying default DB to %@ (%@)", storePath, error); } NSURL *storeURL = [NSURL fileURLWithPath:storePath]; __persistentStoreCoordinator = [[NSPersistentStoreCoordinator alloc] initWithManagedObjectModel:[self managedObjectModel]]; NSDictionary *options = [NSDictionary dictionaryWithObjectsAndKeys: [NSNumber numberWithBool:YES], NSMigratePersistentStoresAutomaticallyOption, [NSNumber numberWithBool:YES], NSInferMappingModelAutomaticallyOption, nil]; if (![__persistentStoreCoordinator addPersistentStoreWithType:NSSQLiteStoreType configuration:nil URL:storeURL options:options error:&error]) { NSLog(@"Unresolved error %@, %@", error, [error userInfo]); abort(); } return __persistentStoreCoordinator; } } 


I remove the application from the simulator, I verify that ~ / Library / Application Support / iPhone Simulator / 5.1 / Applications / is now deleted
I restore and run again and again
As expected, the sqlite file is copied to ~ / Library / Application Support / iPhone Simulator / 5.1 / Applications // Documents

However, the file size is smaller than the bundle, significantly! Also, making a simple query with a predicate like this predicate = [NSPredicate predicateWithFormat: @ "string1 ==% @", string1]; clearly shows that string1 is no longer indexed

After that, I create a new version of datamodel with a meaningless update, just to make migration easier. If you run on the simulator, the transition takes several seconds, the database doubles in size, and the same query now takes less than a second, and not instead of minutes.
This will solve my problem, forcibly transfer the migration, but the same migration takes 3 minutes on the iPad and occurs in the foreground.
So, the hat I'm in right now, the best solution for me would be to eliminate index deletion, any other importing solution at launch time would take too much time.
Let me know if you need further clarification ...

Update 2
Thus, the best result that I have received so far is to sow the base database with the sqlite file, created from a quick tool with a similar data model, but without the indexes set when creating the sqlite file. Then I import this sqlite file into the main data application with the indexes set, and allowing for easier migration. For 2 million entries on the new iPad, this migration lasts 3 minutes. The final application should have 5 times this number of entries, so we are still looking at the long processing time. If I go along this route, the new question will be: can lightweight migration be performed in the background?

Update
My question is not how to create a tool to populate the Core Data database, and then import the sqlite file into my application. I know how to do this, I have done it countless times.
But so far, I have not understood that such a method can have some side effect: in my case, the indexed attribute in the resulting database explicitly got "unindexed" when importing the sqlite file this way.
If you could verify that indexed data is still indexed after such a transfer, I would like to know how you act, or otherwise, what would be the best strategy for effectively using such a database.

Original

I have a large CSV file (millions of rows) with 4 columns, rows and floats. This app is for iOS.

I need this to be loaded into the main data the first time the application is loaded.

The application practically does not function until the data is available, so the download time matters, because the first user clearly does not want the application to load 20 minutes before it can start it.

Right now, my current code takes 20 minutes on the new iPad to process the 2 millionth csv file. I use a background context to not block the user interface and keep the context every 1000 entries

The first idea I got was to create a database on the simulator, and then copy / paste it into the document folder on first start, as this is a common unofficial way to plant a large database. Unfortunately, the indexes do not seem to survive such a transfer, and although the database was available in a few seconds, the performance is terrible because my indexes were lost. I already asked a question about indexes, but there seems to be no good answer to this question.
So what I'm looking for is either:

  • A way to improve performance when loading millions of records in master data.
  • if the database is preloaded and moved on first start, the way to save my indexes
  • recommendations for working with a similar scenario. I don’t remember to use any application that requires me to wait x minutes before the first use (but maybe The Daily, and it was a terrible experience).
  • Any creative way to make the user wait until he is aware of it: importing the background during the passage of the tutorial, etc.
  • Do not use master data?
  • ...
+6
source share
2 answers

Pre-create your database using a standalone application (say command line utilities) written in Cocoa that runs on OS X and uses the same underlying data structure as it does on iOS. You don’t need to worry about “index survival” or anything — the output is a .sqlite database file created using Core Data, directly and immediately applicable to the iOS application.

As long as you can create a database outside the network, this is the best solution to date. I have successfully used this method for pre-created databases for deploying iOS. Check out my previous questions / answers in more detail.

+6
source

I am just starting with SQLite, and I need to integrate the database into one of my applications, which will have a lot of indexed data in the SQLite database. I was hoping I could make some method where I could insert my information into an SQLite file and add this file to my project. After discovering and reading your question, the answer provided, and numerous comments, I decided to check the SQLite source to see if I can make heads or tails of this problem.

My initial thought was that the implementation of SQLite on iOS essentially throws your indexes. The reason is because you first create your DB index on the x86 / x64 system. IOS is an ARM processor, and numbers are handled differently. If you want your indexes to be fast, you must generate them in such a way that they are optimized for the processor in which they will run.

Since SQLite is designed for multiple platforms, it will do so to reset any indexes that were created in a different architecture and rebuild them. However, since no one wants to wait for the index to be restored the first time it is accessed, SQLite developers most likely decided to simply drop the index.

After digging into the SQLite code, I came to the conclusion that this is most likely happening. If not for the reason for the processor architecture, I found code (see analyze.c and other meta information in sqliteint.h ) where indexes were deleted if they were generated in an unexpected context. My guess is that the context that governs this process is how the underlying b-tree data structure is built for an existing key. If the current instance of SQLite cannot use the key, it deletes it.

It is worth noting that iOS Simulator is just a simulator. This is not a hardware emulator. Thus, your application runs on a pseudo-iOS device running on an x86 / x64 processor.

When your application and SQLite DB are downloaded to your iOS device, the ARM-compiled version is loaded, which also refers to the ARM compiled libraries in iOS. I could not find ARM-specific SQLite related code, so I assume that Apple had to modify it according to their requirements. May also be part of the problem. It may not be a problem with root-SQLite code, it may be a problem with a compiled Apple / ARM version.

The only reasonable solution I can come up with is that you can create a generator application that runs on your iOS machine. Launch the application, create the keys, and then rip the SQLite file from the device. I would suggest that such a file would work on all devices, since all ARM processors used by iOS are 32-bit.

Again, this answer is a little educated. I am going to re-mark your question as SQLite. I hope the guru can find this and be able to weigh this problem. I would really like to know the truth in my favor.

0
source

Source: https://habr.com/ru/post/914867/


All Articles