IMDb Top 250

This scraper works on http://akas.imdb.com/chart/top

It includes three tables:

  • data contains the most recent top 250 scraped,
  • scraper_run maps from time of scraping to a run id,
  • movie_rating is a list of all the rankings ever captured (with run ids)

Injecting configuration and compiling... Injecting scraper and running... Traceback (most recent call last): File "scraper.py", line 102, in <module> main() File "scraper.py", line 89, in main session = _get_db_session() File "scraper.py", line 47, in _get_db_session Base.metadata.create_all(engine) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/schema.py", line 2148, in create_all bind.create(self, checkfirst=checkfirst, tables=tables) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1698, in create connection=connection, **kwargs) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1740, in _run_visitor **kwargs).traverse_single(element) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/sql/visitors.py", line 83, in traverse_single return meth(obj, **kw) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/engine/ddl.py", line 36, in visit_metadata collection = [t for t in sql_util.sort_tables(tables) if self._can_create(t)] File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/engine/ddl.py", line 29, in _can_create return not self.checkfirst or not self.dialect.has_table(self.connection, table.name, schema=table.schema) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/dialects/sqlite/base.py", line 427, in has_table cursor = _pragma_cursor(connection.execute("%stable_info(%s)" % (pragma, qtable))) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1191, in execute params) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1287, in _execute_text return self.__execute_context(context) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1302, in __execute_context context.parameters[0], context=context) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1401, in _cursor_execute context) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1394, in _cursor_execute context) File "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 299, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.DatabaseError: (DatabaseError) database disk image is malformed 'PRAGMA table_info("data")' ()


Average successful run time: 27 minutes

Total run time: 13 days

Total cpu time used: 3 days

Total disk space used: 11.5 MB


