Convert List List to Dictionary

I have a data file that looks like this:

["Arts & Entertainment", "Arts & Entertainment / Animation & Comics", "Arts & Entertainment / Books & Literature", "Arts & Entertainment / Celebrity/Gossip", "Arts & Entertainment / Fine Art", "Arts & Entertainment / Humor", "Arts & Entertainment / Movies", "Arts & Entertainment / Movies / Action", "Arts & Entertainment / Movies / Comedy", "Arts & Entertainment / Movies / Documentary", "Arts & Entertainment / Movies / Drama", "Arts & Entertainment / Movies / Horror", "Arts & Entertainment / Music", "Arts & Entertainment / Music / Alternative Music", "Arts & Entertainment / Music / Blues", "Arts & Entertainment / Music / Christian Music", "Arts & Entertainment / Music / Classic Rock", "Arts & Entertainment / Music / Classical Music", "Arts & Entertainment / Music / Country Music", "Arts & Entertainment / Music / Electronic Dance Music", "Arts & Entertainment / Music / Heavy Metal", "Arts & Entertainment / Music / Pop Music", "Arts & Entertainment / Music / Rap", "Arts & Entertainment / Radio Stations", "Arts & Entertainment / Television", "Arts & Entertainment / Television / Game Show", "Arts & Entertainment / Television / Kids", "Arts & Entertainment / Television / News", "Arts & Entertainment / Television / Reality", "Arts & Entertainment / Television / Science", "Arts & Entertainment / Television / Sitcom", "Arts & Entertainment / Television / Soap Opera", "Arts & Entertainment / Television / Talk Show", "Autos", "Autos / 4-Wheel Drive/SUVs", "Autos / Buying/Selling Cars", "Autos / Certified Pre-Owned", "Autos / Convertible", "Autos / Coupe", "Autos / Crossover", "Autos / Diesel", "Autos / Electric Vehicles", "Autos / Hatchback", "Autos / Hybrid", "Autos / Luxury", "Autos / Maintenance", "Autos / Maintenance / Parts", "Autos / Maintenance / Repair", "Autos / MiniVan", "Autos / Motorcycles", "Autos / Off-Road Vehicles", "Autos / Road-Side Assistance", "Autos / Sedan", "Autos / Trucks", "Autos / Trucks / Pickup", "Autos / Vintage Cars", "Autos / Wagon", "Business & Industry", "Business & Industry / Advertising", "Business & Industry / Agriculture", "Business & Industry / Biotech/Biomedical", "Business & Industry / Business Software", "Business & Industry / Construction", "Business & Industry / Construction / Composites & Plastics", "Business & Industry / Forestry", "Business & Industry / Government", "Business & Industry / Green Solutions", "Business & Industry / Human Resources", "Business & Industry / Logistics", "Business & Industry / Marketing", "Business & Industry / Metals", "Business & Industry / Non-Profit Organizations", "Business & Industry / Power Industry", "Business & Industry / Public Services", "Business & Industry / Public Services / Emergency Services", "Business & Industry / Public Services / Waste Management", "Business & Industry / Purchasing", "Business & Industry / Retail Industry", "Business & Industry / Small Business", "Business & Industry / Telecom", "Career", "Career / Career Planning", "Career / Job Search", "Career / Job Search / Resume Writing/Advice", "Career / Telecommuting", "Career / US Military", "Education", "Education / Business School", "Education / College Education", "Education / College Education / Admissions", "Education / College Education / College Life", "Education / Continuing Education", "Education / Distance Learning", "Education / Financial Aid", "Education / Financial Aid / Scholarships", "Education / Graduate School", "Education / Homeschooling", "Education / Language Learning", "Education / Language Learning / English as a 2nd Language", "Education / Primary Education", "Education / Secondary Education", "Education / Special Education", "Finance & Money", "Finance & Money / Credit/Debt & Loans", "Finance & Money / Day Trading", "Finance & Money / Exchange Traded Funds", "Finance & Money / Financial News", "Finance & Money / Financial Planning", "Finance & Money / Financial Planning / Retirement Planning", "Finance & Money / Financial Planning / Tax Planning", "Finance & Money / Foreign Exchange Trading", "Finance & Money / Hedge Fund", "Finance & Money / Insurance", "Finance & Money / Investing", "Finance & Money / Mutual Funds", "Finance & Money / Options", "Finance & Money / Stocks", "Food & Drink", "Food & Drink / Barbecues & Grilling", "Food & Drink / Beverages", "Food & Drink / Beverages / Cocktails/Beer", "Food & Drink / Beverages / Coffee/Tea", "Food & Drink / Beverages / Wine", "Food & Drink / Cuisine-Specific", "Food & Drink / Cuisine-Specific / American Cusine", "Food & Drink / Cuisine-Specific / Cajun/Creole", "Food & Drink / Cuisine-Specific / Chinese Cuisine", "Food & Drink / Cuisine-Specific / French Cuisine", "Food & Drink / Cuisine-Specific / Italian Food", "Food & Drink / Cuisine-Specific / Japanese Food", "Food & Drink / Cuisine-Specific / Mexican Cuisine", "Food & Drink / Desserts & Baking", "Food & Drink / Health/LowFat Cooking", "Food & Drink / Organic Food", "Food & Drink / Vegetarian", "Health & Fitness", "Health & Fitness / ADD", "Health & Fitness / AIDS/HIV", "Health & Fitness / Allergies", "Health & Fitness / Alternative Medicine", "Health & Fitness / Alzheimer\\ Disease", "Health & Fitness / Arthritis", "Health & Fitness / Asthma", "Health & Fitness / Autism/PDD", "Health & Fitness / Bipolar Disorder", "Health & Fitness / Brain Tumor", "Health & Fitness / Cancer", "Health & Fitness / Cancer / Breast Cancer", "Health & Fitness / Cancer / Lung Cancer", "Health & Fitness / Cancer / Prostate Cancer", "Health & Fitness / Cholesterol", "Health & Fitness / Chronic Fatigue Syndrome", "Health & Fitness / Chronic Obstructive Pulmonary Disease", "Health & Fitness / Chronic Pain", "Health & Fitness / Cold & Flu", "Health & Fitness / Deafness", "Health & Fitness / Dental Care", "Health & Fitness / Depression", "Health & Fitness / Dermatology", "Health & Fitness / Diabetes", "Health & Fitness / Epilepsy", "Health & Fitness / Exercise", "Health & Fitness / GERD/Acid Reflux", "Health & Fitness / Headaches/Migraines", "Health & Fitness / Heart Disease", "Health & Fitness / Heart Disease / Women\\ Heart Disease", "Health & Fitness / Hepatitis", "Health & Fitness / Herbs for Health", "Health & Fitness / Holistic Healing", "Health & Fitness / Hypertension", "Health & Fitness / IBS/Crohn\\ Disease", "Health & Fitness / Incest/Abuse Support", "Health & Fitness / Incontinence", "Health & Fitness / Infertility", "Health & Fitness / Men\\ Health", "Health & Fitness / Nursing", "Health & Fitness / Nutrition", "Health & Fitness / Orthopedics", "Health & Fitness / Orthopedics / Sports Medicine", "Health & Fitness / Panic/Anxiety Disorders", "Health & Fitness / Pediatrics", "Health & Fitness / Pharmaceutical", "Health & Fitness / Physical Therapy", "Health & Fitness / Psychology/Psychiatry", "Health & Fitness / Senior Health", "Health & Fitness / Sexuality", "Health & Fitness / Sleep Disorders", "Health & Fitness / Smoking Cessation", "Health & Fitness / Substance Abuse", "Health & Fitness / Substance Abuse / Alcoholism", "Health & Fitness / Thyroid Disease", "Health & Fitness / Weight Loss", "Health & Fitness / Women\\ Health", "Hobbies & Games", "Hobbies & Games / Arts & Crafts", "Hobbies & Games / Arts & Crafts / Beadwork", "Hobbies & Games / Arts & Crafts / Drawing/Sketching", "Hobbies & Games / Arts & Crafts / Needlework", "Hobbies & Games / Arts & Crafts / Painting", "Hobbies & Games / Arts & Crafts / Photography", "Hobbies & Games / Arts & Crafts / Woodworking", "Hobbies & Games / Astrology", "Hobbies & Games / Birdwatching", "Hobbies & Games / BoardGames/Puzzles", "Hobbies & Games / Candle & Soap Making", "Hobbies & Games / Card Games", "Hobbies & Games / Chess", "Hobbies & Games / Cigars", "Hobbies & Games / Collecting", "Hobbies & Games / Collecting / Antiques", "Hobbies & Games / Collecting / Book Collecting", "Hobbies & Games / Collecting / Miniatures", "Hobbies & Games / Collecting / Stamps & Coins", "Hobbies & Games / Creative Writing", "Hobbies & Games / Getting Published", "Hobbies & Games / Home Recording", "Hobbies & Games / Inventors & Patents", "Hobbies & Games / Learning a Musical Instrument", "Hobbies & Games / Learning a Musical Instrument / Guitar", "Hobbies & Games / Magic & Illusion", "Hobbies & Games / Paranormal Phenomena", "Hobbies & Games / Sci-Fi & Fantasy", "Hobbies & Games / Video Games", "Hobbies & Games / Video Games / Nintendo", "Hobbies & Games / Video Games / PSP", "Hobbies & Games / Video Games / Playstation", "Hobbies & Games / Video Games / RPG", "Hobbies & Games / Video Games / Racing", "Hobbies & Games / Video Games / X-Box", "Home & Garden", "Home & Garden / Appliances", "Home & Garden / Environmental Safety", "Home & Garden / Gardening/Landscaping", "Home & Garden / Home Repair", "Home & Garden / Interior Decorating", "News & Current Affairs", "News & Current Affairs / Law & Politics", "News & Current Affairs / Law & Politics / Immigration", "News & Current Affairs / Law & Politics / Legal Issues", "News & Current Affairs / Law & Politics / US Government Resources", "Parenting & Family", "Parenting & Family / Adoption", "Parenting & Family / Babies & Toddlers", "Parenting & Family / Daycare/Pre-School", "Parenting & Family / Parenting Children", "Parenting & Family / Parenting Teens", "Parenting & Family / Pregnancy", "Parenting & Family / Special Needs Kids", "Pets", "Pets / Aquariums", "Pets / Cats", "Pets / Dogs", "Pets / Veterinary Medicine", "Real Estate", "Real Estate / Apartments", "Real Estate / Architecture", "Real Estate / Buying/Selling Homes", "Religion", "Religion / Alternative Religions", "Religion / Atheism/Agnosticism", "Religion / Buddhism", "Religion / Catholicism", "Religion / Christianity", "Religion / Hinduism", "Religion / Islam", "Religion / Judaism", "Religion / Latter-Day Saints", "Religion / Pagan/Wiccan", "Science", "Science / Astronomy", "Science / Biology", "Science / Chemistry", "Science / Geology", "Science / Physics", "Sensitive Content", "Sensitive Content / Gambling", "Sensitive Content / Gambling / Sports Gambling", "Society", "Society / Dating", "Society / Divorce", "Society / Gay Life", "Society / Marriage", "Society / Senior Living", "Society / Weddings", "Sports & Recreation", "Sports & Recreation / Auto Racing", "Sports & Recreation / Auto Racing / NASCAR Racing", "Sports & Recreation / Baseball", "Sports & Recreation / Basketball", "Sports & Recreation / Bicycling", "Sports & Recreation / Bicycling / Mountain Biking", "Sports & Recreation / Bodybuilding", "Sports & Recreation / Boxing", "Sports & Recreation / Canoeing/Kayaking", "Sports & Recreation / Cheerleading", "Sports & Recreation / Climbing", "Sports & Recreation / College Sports", "Sports & Recreation / Cricket", "Sports & Recreation / Figure Skating", "Sports & Recreation / Fishing", "Sports & Recreation / Fishing / Fly Fishing", "Sports & Recreation / Fishing / Freshwater Fishing", "Sports & Recreation / Fishing / Game & Fish", "Sports & Recreation / Fishing / Saltwater Fishing", "Sports & Recreation / Football", "Sports & Recreation / Golf", "Sports & Recreation / Horses", "Sports & Recreation / Horses / Horse Racing", "Sports & Recreation / Hunting/Shooting", "Sports & Recreation / Ice Hockey", "Sports & Recreation / Inline Skating", "Sports & Recreation / Martial Arts", "Sports & Recreation / Olympics", "Sports & Recreation / Paintball", "Sports & Recreation / Rodeo", "Sports & Recreation / Rugby", "Sports & Recreation / Running/Walking", "Sports & Recreation / Sailing", "Sports & Recreation / Scuba Diving", "Sports & Recreation / Skateboarding", "Sports & Recreation / Skiing", "Sports & Recreation / Snowboarding", "Sports & Recreation / Soccer", "Sports & Recreation / Surfing/Bodyboarding", "Sports & Recreation / Swimming", "Sports & Recreation / Table Tennis/Ping-Pong", "Sports & Recreation / Tennis", "Sports & Recreation / Volleyball", "Sports & Recreation / Waterski/Wakeboard", "Sports & Recreation / Yachting", "Style & Fashion", "Style & Fashion / Body Art", "Style & Fashion / Cosmetics", "Style & Fashion / Fashion", "Style & Fashion / Jewelry", "Technology & Computing", "Technology & Computing / Cameras & Camcorders", "Technology & Computing / Cell Phones", "Technology & Computing / Computer Certification", "Technology & Computing / Computer Networking", "Technology & Computing / Computer Peripherals", "Technology & Computing / Computer Security", "Technology & Computing / Computer Security / Antivirus Software", "Technology & Computing / Computer Security / Network Security", "Technology & Computing / Databases", "Technology & Computing / Graphics", "Technology & Computing / Graphics / 3-D Graphics", "Technology & Computing / Graphics / Animation", "Technology & Computing / Graphics / Desktop Publishing", "Technology & Computing / Graphics / Desktop Video", "Technology & Computing / Graphics / Web Design/HTML", "Technology & Computing / Home Theater Systems", "Technology & Computing / Operating Systems", "Technology & Computing / Operating Systems / Linux", "Technology & Computing / Operating Systems / Mac OS", "Technology & Computing / Operating Systems / Unix", "Technology & Computing / Operating Systems / Windows", "Technology & Computing / Portable Device", "Technology & Computing / Programming", "Technology & Computing / Programming / C/C++", "Technology & Computing / Programming / Java", "Technology & Computing / Programming / JavaScript", "Technology & Computing / Programming / Visual Basic", "Travel", "Travel / Adventure Travel", "Travel / Africa", "Travel / Air Travel", "Travel / Asia", "Travel / Asia / Japan", "Travel / Australia & New Zealand", "Travel / Bed & Breakfasts", "Travel / Budget Travel", "Travel / Business Travel", "Travel / Camping", "Travel / Canada", "Travel / Caribbean", "Travel / Cruises", "Travel / Europe", "Travel / Europe / Eastern Europe", "Travel / Europe / France", "Travel / Europe / Greece", "Travel / Europe / Italy", "Travel / Europe / United Kingdom", "Travel / Honeymoons/Getaways", "Travel / Hotels", "Travel / Mexico & Central America", "Travel / National Parks", "Travel / South America", "Travel / Spas", "Travel / Theme Parks", "Travel / United States", "Travel / United States / California", "Travel / United States / Florida", "Travel / United States / Hawaii", "Travel / United States / Las Vegas, Nevada", "Travel / United States / Manhattan, New York", "Travel / United States / New England", "Travel / United States / Texas", "Travel / Weather"] 

I clean the data file and I split it, so it looks something like this:

 ['Arts & Entertainment'] ['Arts & Entertainment', 'Animation & Comics'] ['Arts & Entertainment', 'Books & Literature'] ['Arts & Entertainment', 'Celebrity Gossip'] ['Arts & Entertainment', 'Fine Art'] ['Arts & Entertainment', 'Humor'] ['Arts & Entertainment', 'Movies'] ['Arts & Entertainment', 'Movies', 'Action'] ['Arts & Entertainment', 'Movies', 'Comedy'] ['Arts & Entertainment', 'Movies', 'Documentary'] ['Arts & Entertainment', 'Movies', 'Drama'] ['Arts & Entertainment', 'Movies', 'Horror'] ['Arts & Entertainment', 'Music'] ['Arts & Entertainment', 'Music', 'Alternative Music'] ['Arts & Entertainment', 'Music', 'Blues'] ['Arts & Entertainment', 'Music', 'Christian Music'] ['Arts & Entertainment', 'Music', 'Classic Rock'] ['Arts & Entertainment', 'Music', 'Classical Music'] ['Arts & Entertainment', 'Music', 'Country Music'] ['Arts & Entertainment', 'Music', 'Electronic Dance Music'] ['Arts & Entertainment', 'Music', 'Heavy Metal'] ['Arts & Entertainment', 'Music', 'Pop Music'] ['Arts & Entertainment', 'Music', 'Rap'] ['Arts & Entertainment', 'Radio Stations'] ['Arts & Entertainment', 'Television'] ['Arts & Entertainment', 'Television', 'Game Show'] ['Arts & Entertainment', 'Television', 'Kids'] ['Arts & Entertainment', 'Television', 'News'] ['Arts & Entertainment', 'Television', 'Reality'] ['Arts & Entertainment', 'Television', 'Science'] ['Arts & Entertainment', 'Television', 'Sitcom'] ['Arts & Entertainment', 'Television', 'Soap Opera'] ['Arts & Entertainment', 'Television', 'Talk Show']... 

Now I'm trying to convert list objects to a dictionary that looks like this:

 { "Arts & Entertainment": { "Animation & Comics": {}, "Books & Literature": {}, "Celebrity Gossip": {}, "Fine Art": {}, "Humor": {}, "Movies": { "Horror": {}, "Action": {}, "Comedy": {}, ... }, ... } 

The problem is that I cannot figure out how not to redefine my subcategories. In the example above, the Movies subcategory has three categories, but when I run my code below, it just has the Horror key in it, and because Horror is the last item in the last item in the last list in this category. An example of what I get:

 { "Arts & Entertainment": { "Animation & Comics": {}, "Books & Literature": {}, "Celebrity Gossip": {}, "Fine Art": {}, "Humor": {}, "Movies": { "Horror": {} # notice there are no other categories in the movies section }, ... } 

The code I tried:

 def cleanup_contextweb(): contextweb_file_path = directory_path + raw_file_names[1] tree = {} with open(contextweb_file_path, 'r') as contextweb_file: cats = contextweb_file.read().replace('Manhattan, New York', 'Manhattan New York').replace('Las Vegas, Nevada', 'Las Vegas Nevada').replace('Celebrity/Gossip', 'Celebrity Gossip').replace('Atheism/Agnosticism', 'Atheism Agnosticism').replace('Pagan/Wiccan', 'Pagan Wiccan').split(',') #cats = re.sub(r'"|\[|\]', '', cats) cats = [map(str.strip, re.sub(r'"|\[|\]', '', cat).split('/')) for cat in cats] cats = sorted(cats) for cat in cats: if len(cat) == 1: tree[cat[0]] = {} elif len(cat) == 2: tree[cat[0]][cat[1]] = {} elif len(cat) == 3: tree[cat[0]][cat[1]] = {} tree[cat[0]][cat[1]][cat[2]] = {} elif len(cat) == 4: tree[cat[0]][cat[1]] = {} tree[cat[0]][cat[1]][cat[2]] = {} tree[cat[0]][cat[1]][cat[2]][cat[3]] = {} with open(directory_path + 'cleaned_' + raw_file_names[1], 'w') as contextweb_file_out: json.dump(tree, contextweb_file_out, sort_keys=True, indent=4) return json.dumps(tree, sort_keys=True, indent=4) 

As you will see, I am trying to create a dictionary, I know how deep (how many keys I need). I am based on the length of the past list. Other things that I tried but were erased, enable sorting the list of cats ( cats ) by the length of the sub-list and changing it, so that the entire list of 4 elements will be iterated first. I thought that I could build the keys this way, because the key would exist for lower levels. It did not help.

+5
source share
2 answers

Here's what it looks like with recursion:

 data = [ ['Arts & Entertainment'], ['Arts & Entertainment', 'Animation & Comics'], ..., # full data list elided for readability ['Arts & Entertainment', 'Television', 'Talk Show'] ] def classify(in_list): sub_dict = {} label_set = set([category[0] for category in in_list]) for label in label_set: # print label sub_category = [sub[1:] for sub in in_list if sub[0] == label and len(sub) > 1] # print sub_category sub_dict[label] = classify(sub_category) return sub_dict print classify(data) 

Output (which I did not format for readability):

 {'Arts & Entertainment': {'Celebrity Gossip': {}, 'Humor': {}, 'Television': {'Game Show': {}, 'Kids': {}, 'Science': {}, 'Talk Show': {}, 'Sitcom': {}, 'Reality': {}, 'Soap Opera': {}, 'News': {}}, 'Animation & Comics': {}, 'Movies': {'Action': {}, 'Drama': {}, 'Horror': {}, 'Comedy': {}, 'Documentary': {}}, 'Radio Stations': {}, 'Music': {'Alternative Music': {}, 'Christian Music': {}, 'Electronic Dance Music': {}, 'Pop Music': {}, 'Country Music': {}, 'Classical Music': {}, 'Rap': {}, 'Heavy Metal': {}, 'Blues': {}, 'Classic Rock': {}}, 'Fine Art': {}, 'Books & Literature': {}}} 
+5
source

Actually, for-loop can also create a pretty nice solution:

 >>> data [['a', 'b', 'c', 'd'], ['a', 'b', 'c'], ['a', 's', 'd'], ['a', 'b', 'c', 'd', 'e']] >>> tree = {} >>> for cats in data: ... curtree = tree ... for c in cats: ... curtree = curtree.setdefault(c, {}) ... >>> tree {'a': {'s': {'d': {}}, 'b': {'c': {'d': {'e': {}}}}}} 

The .setdefault() method ensures that a sub-dictionary is added if and only if the key (category) did not exist before.

curtree starts with the basic tree dictionary and traces / builds the tree using categories.

+6
source

Source: https://habr.com/ru/post/1234049/


All Articles