IMDB for MySQL: insert IMDB data into MySQL database

Im looking for a solution to import all IMDB data into my own MySQL database. Ive downloaded all IMDB data files from its home page, which are in the * .list file format (on Windows).

I want to get this information and insert it correctly into my MySQL database so that I can perform some checks and queries.

I followed the guide, but about halfway I realized that this is the 2004 manual, and everything that works now doesn’t work well with the tools seven years ago.

Ive scanned the network for applications, php scripts, python script and that did not find a solution, but no luck. The W32 tool referenced by IMDB also does not work.

Is there anyone who knows a solution or a way to accomplish this task?

+6
source share
3 answers

There are some nice py script , the witch helped me. Just make a connection and run it. ~ 1 hour to get around everything.

EDIT: use this readme file to create a script.

+6
source

On ubuntu

1) Install all the necessary packages.

sudo apt-get install -y gcc python python-dev libssl-dev libxml2-dev libxslt1-dev zlib1g-dev python-setuptools python-pip easy_install -U SQLObject pip install MySQL-python 

2) Install IMDBPY.

 cd [IMDBPY_parent_directory] wget http://prdownloads.sourceforge.net/imdbpy/IMDbPY-5.1.tar.gz tar -xzf IMDbPY-5.1.tar.gz cd IMDbPY-5.1 python setup.py install 

3) In mysql, create the database "imdb" and grant all privileges to the "user" with the password "password".

 CREATE DATABASE imdb; GRANT ALL PRIVILEGES ON imdb.* TO 'user'@'localhost' IDENTIFIED BY 'password'; FLUSH PRIVILEGES; 

4) Download all IMDB data.

 mkdir [imdb_data_directory] cd [imdb_data_directory] wget -r --accept="*.gz" --no-directories --no-host-directories --level 1 ftp://ftp.fu-berlin.de/pub/misc/movies/database/ 

5) Load the IMDB data into mysql (use myisam as the storage mechanism).

 cd [IMDBPY_parent_directory]/IMDbPY-5.1/bin python imdbpy2sql.py -d [imdb_data_directory] -u 'mysql://user: password@localhost /imdb' --mysql-force-myisam 

Adapted from Import IMDb Data from Plain Text Files to MySQL Database with some minor fixes.

+1
source

Changes in IMDbPY and IMDb file format mean that existing answers no longer work (as of January 2018).

I am using Ubuntu 17.10 and MariaDB 10.1 (but not MySQL, but the following will work with MySQL as well).

Changes to IMDbPY

The latest version of IMDbPY is 6.2, it is implemented in Python 3, and the dependencies on gcc and SQLObject have been removed. In addition, the Python package MySQL-python not available for Python 3, so we install mysqlclient ; See below. (The mysqlclient API mysqlclient compatible with MySQL-python .)

Changes to the IMDb Data File Format

Changes in the format of IMDb data files were introduced in December 2017, and IMDbPY 6.2 (current version) does not yet work with the new file format. (See this is a GitHub issue.)

Until this is fixed, use the latest version of the IMDd data published in the old format, which is available at ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/ , download all *.list.gz files (excluding files from subdirectories).

New steps to complete

  • Install Python 3 and the necessary packages:

     sudo apt install python3 pip3 install mysqlclient 
  • In MariaDB, create the imdb database and grant all user privileges with password password .

     CREATE DATABASE imdb; GRANT ALL PRIVILEGES ON imdb.* TO 'user'@'localhost' IDENTIFIED BY 'password'; FLUSH PRIVILEGES; 
  • Get IMDbPY 6.2:

     wget https://github.com/alberanid/imdbpy/archive/6.2.zip unzip 6.2.zip cd imdbpy-6.2 python3 setup.py install 
  • Upload IMDb data to MariaDB:

     cd bin python3 imdbpy2sql.py -d [imdb_dataset_directory] -u 'mysql://user: password@localhost /imdb' 

Edit: Version 6.2 IMDbPY does not create foreign keys. See this GitHub question. You will need to use an earlier version of IMDbPY if you need foreign keys to be created, but there are also problems with generating foreign keys in older versions (see the related GitHub issue).

Update: It took 4.5 hours to import, and I had no problems using InnoDB tables.

Edit: if you want to use IMDbPY version 6.2 and require foreign keys, you will need to add them manually to the database after it is created. Before adding foreign keys, very little data cleansing is required. This cleanup and the foreign keys that need to be added are described in this GitHub issue.

0
source

Source: https://habr.com/ru/post/896210/


All Articles