Answer
CL. The answer is a trick! I ended up using a Python script (which can be viewed in the next section down under “Updates: working on the answer”) and after I set up my database correctly so that the identifier columns are set as integer keys (or, if that’s not possible, numeric), and the Name columns were set as text, then it will work!
Updates: working on the answer
I tried running the .py file, which looks like this:
import sqlite3 conn = sqlite3.connect('data.db') c = conn.cursor() c.executescript(""" UPDATE CorpData SET OperationID4Counter = (SELECT COUNT(*) FROM PlantData JOIN OperationData ON PlantName LIKE '%' || OperationName WHERE OperationID IN (SELECT OperationID FROM ServiceData WHERE ServiceID = 512) AND CorpID = CorpData.CorpID) """)
and get this error: sqlite3.OperationalError: ambiguous column name: OperationID . I assume this is due to the fact that we joined PlantData and OperationData, both of which have a column named OperationID . When I change this line of code to read WHERE OperationData.OperationID IN (SELECT OperationID or WHERE PlantData.OperationID IN (SELECT OperationID , it starts, but in the end I get zero in all rows of the CorpData table in the OperationID4Counter column.
I think we are close, but there is no cigar. I think something is wrong with the ON StationName LIKE '%' || OperationName ON StationName LIKE '%' || OperationName , because when I change it to ON StationName LIKE '%house' (that, if I understand this correctly, I get everything that ends in the "house", which will include the Warehouse), I still get all zeros for OperationID4Counter (although it should at least reckon with repositories that have OperationID4.)
CL. requested some .dump information to find out what types are used in this database. I did not specify anything, so he just used defaults. Also note that in different tables there are more columns than those that I showed in my examples (but we also note that these columns are not relevant to this issue, since they relate to data not related to the issue under consideration). For example, one .dump piece for a PlantData table looks like this:
INSERT INTO "PlantData" VALUES('60015145','0','0','50000000','10000','15','386 8','1000181','30003830','20000560','10000048','Anytown 334 - Unit 3 - Widgit Corp Logistics Center','-1.444E+12','-71312793600','-9.25528E+11','0.5','0.025','4 ');
The A.dump file from OperationData is as follows:
INSERT INTO "OperationData" VALUES('20','45','Manufacturing','','0','0','0','0', '0','','','','','');
And the .dump detail from CorpData looks like this:
INSERT INTO "CorpData" VALUES(NULL,0,'1000158','Shapeset',' S',' N',' 500005',' XYZ Consortium',' 20','6','7','1','5','0');
Background and data image
I have 4 tables - 3 of which I want to extract data to increase the counter under certain conditions, and then add this counter as a new column to the fourth. This 4th table, let her call her CorpData (to which I want to add more data) currently looks like this and usually has between 10-50 rows (note that I use commas to display column separators):
CorpID, CorpName, Size, Type, PlantCount, OtherCounter1, OtherCounter2, OtherCounter3, OtherCounter4, OtherCounter5 100002, Widgit Corp, G, R, 25, 1, 5, 4, 3, 0 100004, ACME Corp, G, S, 15, 15, 4, 25, 28, 1
The significant parts are CorpID (a unique key) and PlantCount, which is a counter of the number of plants (i.e. objects) of this corporation.
The first of these additional data source tables, let it call it OperationData has such data and has about 50 rows:
OperationID, OperationName, Description 1, Warehouse, This facility stores items 2, Distribution Center, Items are brought her from Warehouses to be distributed 3, Factory, Goods are manufactured here
Second, ServiceData has about 700 lines and looks something like this:
OperationID, ServiceID 1, 4 1, 25 1, 33 1, 105 1, 19505 1, 32590 2, 4 2, 25 2, 55 2, 199 2, 19505 2, 335679 2, 529934 3, 2 3, 105 3, 55 3, 170 3, 48907
Each ServiceID is explained in another table, but I want to find one or two ServiceIDs, which I will indicate as 4 and 55.
The last of the note data tables, let it be called PlantData , contains parts for all plants for all corporations, so it has about 5200 rows and looks like this:
PlantID, CorpID, CityID, CountryID, PlantName 60000004, 100002, 74900, 34590, Somewhereville 123 - Widgit Corp Warehouse 60000007, 100002, 74878, 34590, Anytown 334 - Unit 3 - Widgit Corp Distribution Center 60000023, 100002, 56799, 23487, Quietville 532 - Unit 4 - Widgit Corp Warehouse 60000027, 100004, 74900, 34590, Somewhereville 544 - Unit 3 - ACME Corp Distribution Center 60000150, 100004, 56799, 23487, Quietville 312 - Unit 2 - ACME Corp Factory 60000155, 100004, 56799, 23487, Quietville 312 - Unit 4 - ACME Corp Warehouse
Please note the following: 1) The CorpID in this table corresponds to the CorpID in my start table. 2) CorpName for this CorpID will always be displayed in PlantName 3). PlantName also contains one OperationName 4) One CityID can have several corporations as well as several of the same corporations. 4) As a side element, this is just a small part of this table, and if you think that all the data indicated by the CorpID data in this table will be the same as the PlantCount for this CorpID (so it can be used to check some view to make sure the plants have not been missed.)
Question
I want to add two columns to the CorpData table, both of them will be counted - the first will count how many plants has a case with ServiceID 4, and in the second - the number of corp has ServiceID 55. To do this, I need to look at the long PlantData table, analyze the name Operation ( from the OperationData table) from each PlantName name, check that this OperationIame corresponds to the Operational identifier (in the OperationData table), and see if this OperationID is specified in the ServiceID (from the ServiceData table) (4 in the first case and 55 in the second).
I intend to do this using sqlite3 with my 4 tables stored in a .db file, but I could be open to other options if you can make a solid argument why I should use this option over sqlite3.
goal
My final goal, given the examples given here, would have a CorpData table that looks like this:
CorpID, CorpName, Size, Type, PlantCount, OtherCounter1, OtherCounter2, OtherCounter3, OtherCounter4, OtherCounter5, OperationID4Counter, OperationID55Counter 100002, Widgit Corp, G, R, 25, 1, 5, 4, 3, 0, 3, 1 100004, ACME Corp, G, S, 15, 15, 4, 25, 28, 1, 2, 2
This is because Widgit Corp has two warehouses, and the distribution center, storage and distribution centers have OperationID 4, but only the Distribution Center and Factory have OperationID 55 and ACME Corp have 1 each of Factory, Warehouse, and distribution center and factories do not have OperationID 4, but they have OperationID 55.
Other notes
Here are some things that I think can make this harder:
- PlantName contains several words and may or may not have a
Unit X - . OperationName does not always have only one word, and the root name can also be more (or less) than two words. Thus, searching for an OperationName in a PlantName will probably have to look at everything, rather than trying to break it into parts, to somehow find only the part that probably contains OperationName. - When we work on
PlantData and count count tables, we need to check the other two tables to see if the row should be counted in PlantData or not. I am worried that if this code is not built properly, it may turn out to be very slow. - It’s hard for me to at least ponder all this, so it would be easy to go to the wrong table or skip the search step from things like OperationName to the corresponding OperationID.