walinchus / Chapter_23

Basics of PDF scraping


Contributors walinchus

Last run completed successfully .

Console output of last run

Injecting configuration and compiling... Injecting scraper and running... The pdf file has 92908 bytes After converting to xml it has 15093 bytes The first 2000 characters are: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd"> <pdf2xml producer="poppler" version="0.24.5"> <page number="1" position="absolute" top="0" left="0" height="1262" width="892"> <fontspec id="0" size="16" family="Times" color="#000000"/> <fontspec id="1" size="10" family="Times" color="#000000"/> <fontspec id="2" size="16" family="Times" color="#000000"/> <fontspec id="3" size="12" family="Times" color="#000000"/> <fontspec id="4" size="8" family="Times" color="#000000"/> <text top="94" left="504" width="5" height="16" font="0"><b> </b></text> <text top="52" left="797" width="3" height="14" font="1"><b> </b></text> <text top="65" left="728" width="74" height="21" font="0"><b>H/16–17 </b></text> <text top="86" left="699" width="103" height="21" font="0"><b>1st Meeting </b></text> <text top="107" left="797" width="5" height="21" font="0"><b> </b></text> <text top="134" left="108" width="5" height="16" font="2"> </text> <text top="1193" left="108" width="5" height="16" font="2"> </text> <text top="148" left="352" width="194" height="21" font="0"><b>HOUSE COMMITTEE </b></text> <text top="169" left="446" width="5" height="21" font="0"><b> </b></text> <text top="190" left="410" width="16" height="21" font="0"><b>M</b></text> <text top="193" left="426" width="56" height="16" font="3"><b>INUTES</b></text> <text top="190" left="483" width="5" height="21" font="0"><b> </b></text> <text top="211" left="108" width="5" height="21" font="0"><b> </b></text> <text top="232" left="395" width="108" height="21" font="0"><b>29 June 2016 </b></text> <text top="253" left="108" width="5" height="21" font="2"> </text> <text top="273" left="108" width="64" height="21" font="2">Present: </text> <text top="294" left="108" width="154" height="21" font="2">L. Campbell-Savours </text> <text top="315" left="108" width="172" height="21" font="2">B. D’Souza (Chairman) </text> <text top="336" left="108" width="162" height="21" font="2">L. Hope of Craighead </tex The pages are numbered: ['1', '2', '3'] {'left': '504', 'font': '0', 'height': '16', 'top': '94', 'width': '5'} <b> </b> {'left': '797', 'font': '1', 'height': '14', 'top': '52', 'width': '3'} <b> </b> {'left': '728', 'font': '0', 'height': '21', 'top': '65', 'width': '74'} <b>H/1617 </b> {'left': '699', 'font': '0', 'height': '21', 'top': '86', 'width': '103'} <b>1st Meeting </b> {'left': '797', 'font': '0', 'height': '21', 'top': '107', 'width': '5'} <b> </b> {'left': '108', 'font': '2', 'height': '16', 'top': '134', 'width': '5'} {'left': '108', 'font': '2', 'height': '16', 'top': '1193', 'width': '5'} {'left': '352', 'font': '0', 'height': '21', 'top': '148', 'width': '194'} <b>HOUSE COMMITTEE </b> {'left': '446', 'font': '0', 'height': '21', 'top': '169', 'width': '5'} <b> </b> {'left': '410', 'font': '0', 'height': '21', 'top': '190', 'width': '16'} <b>M</b> {'left': '426', 'font': '3', 'height': '16', 'top': '193', 'width': '56'} <b>INUTES</b> {'left': '483', 'font': '0', 'height': '21', 'top': '190', 'width': '5'} <b> </b> {'left': '108', 'font': '0', 'height': '21', 'top': '211', 'width': '5'} <b> </b> {'left': '395', 'font': '0', 'height': '21', 'top': '232', 'width': '108'} <b>29 June 2016 </b> {'left': '108', 'font': '2', 'height': '21', 'top': '253', 'width': '5'} {'left': '108', 'font': '2', 'height': '21', 'top': '273', 'width': '64'} Present: {'left': '108', 'font': '2', 'height': '21', 'top': '294', 'width': '154'} L. Campbell-Savours {'left': '108', 'font': '2', 'height': '21', 'top': '315', 'width': '172'} B. DSouza (Chairman) {'left': '108', 'font': '2', 'height': '21', 'top': '336', 'width': '162'} L. Hope of Craighead {'left': '108', 'font': '2', 'height': '21', 'top': '357', 'width': '173'} L. Hunt of Kings Heath {'left': '108', 'font': '2', 'height': '21', 'top': '378', 'width': '74'} L. Laming {'left': '108', 'font': '2', 'height': '21', 'top': '399', 'width': '89'} B. Manzoor {'left': '108', 'font': '2', 'height': '21', 'top': '420', 'width': '161'} B. Stowell of Beeston {'left': '108', 'font': '2', 'height': '21', 'top': '441', 'width': '97'} L. Wakeham {'left': '108', 'font': '2', 'height': '21', 'top': '461', 'width': '189'} L. Wallace of Tankerness {'left': '108', 'font': '2', 'height': '21', 'top': '482', 'width': '34'} ----- {'left': '108', 'font': '2', 'height': '21', 'top': '503', 'width': '151'} B. Cohen of Pimlico {'left': '108', 'font': '2', 'height': '21', 'top': '524', 'width': '5'} {'left': '108', 'font': '2', 'height': '21', 'top': '545', 'width': '321'} together with the Clerk of the Parliaments. {'left': '108', 'font': '2', 'height': '21', 'top': '566', 'width': '5'} {'left': '108', 'font': '2', 'height': '21', 'top': '587', 'width': '598'} The Finance Director and the Director of Public Information were in attendance. {'left': '108', 'font': '2', 'height': '21', 'top': '607', 'width': '5'} {'left': '108', 'font': '2', 'height': '21', 'top': '628', 'width': '416'} Apologies: L. Cope of Berkeley, B. McDonagh, L. Stirrup {'left': '108', 'font': '2', 'height': '21', 'top': '649', 'width': '5'} {'left': '108', 'font': '0', 'height': '21', 'top': '670', 'width': '205'} <b>1. MINUTES OF THE 6</b> {'left': '313', 'font': '4', 'height': '12', 'top': '670', 'width': '16'} <b>TH</b> {'left': '329', 'font': '0', 'height': '21', 'top': '670', 'width': '379'} <b> MEETING OF THE 2015-16 SESSION AND </b> {'left': '135', 'font': '0', 'height': '21', 'top': '691', 'width': '179'} <b>MATTERS ARISING </b> {'left': '108', 'font': '2', 'height': '21', 'top': '712', 'width': '5'} {'left': '108', 'font': '2', 'height': '21', 'top': '733', 'width': '391'} The Committee <b>agreed</b> {'left': '108', 'font': '2', 'height': '21', 'top': '754', 'width': '5'} {'left': '108', 'font': '0', 'height': '21', 'top': '774', 'width': '675'} <b>2. IMPLEMENTING THE RECOMMENDATIONS OF THE LEADERS GROUP </b> {'left': '135', 'font': '0', 'height': '21', 'top': '795', 'width': '177'} <b>ON GOVERNANCE </b> {'left': '108', 'font': '2', 'height': '21', 'top': '816', 'width': '5'} {'left': '108', 'font': '2', 'height': '21', 'top': '837', 'width': '253'} The Leader introduced the paper. {'left': '108', 'font': '2', 'height': '21', 'top': '858', 'width': '5'} {'left': '108', 'font': '2', 'height': '21', 'top': '879', 'width': '658'} The Committee considered whether the Senior Deputy Speaker or chairs of the Services {'left': '108', 'font': '2', 'height': '21', 'top': '900', 'width': '611'} Committee would answer oral questions in the Chamber. The Leader clarified that {'left': '108', 'font': '2', 'height': '21', 'top': '921', 'width': '666'} responding to written questions and questions for short debate would be delegated to the {'left': '108', 'font': '2', 'height': '21', 'top': '942', 'width': '673'} appropriate chair, but she expected that only the Senior Deputy Speaker would respond to {'left': '108', 'font': '2', 'height': '21', 'top': '962', 'width': '289'} oral questions from the Despatch Box. {'left': '108', 'font': '2', 'height': '21', 'top': '983', 'width': '5'} {'left': '108', 'font': '2', 'height': '21', 'top': '1004', 'width': '641'} The Committee discussed remuneration for the chair of the Services Committee. Lord {'left': '108', 'font': '2', 'height': '21', 'top': '1025', 'width': '629'} Campbell-Savours suggested that the post was effectively full-time and so it should be {'left': '108', 'font': '2', 'height': '21', 'top': '1046', 'width': '592'} remunerated. Other members noted that chairs of Select Committees were not {'left': '108', 'font': '2', 'height': '21', 'top': '1067', 'width': '623'} remunerated; chairing the Services Committee was thought to be an equivalent role. {'left': '108', 'font': '2', 'height': '21', 'top': '1088', 'width': '669'} Primary legislation would be required if the post were to be remunerated. The Committee {'left': '108', 'font': '2', 'height': '21', 'top': '1109', 'width': '329'} agreed the post should not be remunerated. {'left': '108', 'font': '2', 'height': '21', 'top': '1130', 'width': '5'} fontspec {'ID': 1, 'text': ''} fontspec {'ID': 2, 'text': ''} fontspec {'ID': 3, 'text': ''} fontspec {'ID': 4, 'text': ''} fontspec {'ID': 5, 'text': ''} text {'left': '504', 'font': '0', 'height': '16', 'top': '94', 'width': '5'} <b> </b> {'ID': 6, 'text': '<b> </b>\n'} text {'left': '797', 'font': '1', 'height': '14', 'top': '52', 'width': '3'} <b> </b> {'ID': 7, 'text': '<b> </b>\n'} text {'left': '728', 'font': '0', 'height': '21', 'top': '65', 'width': '74'} <b>H/1617 </b> {'ID': 8, 'text': '<b>H/1617 </b>\n'} text {'left': '699', 'font': '0', 'height': '21', 'top': '86', 'width': '103'} <b>1st Meeting </b> {'ID': 9, 'text': '<b>1st Meeting </b>\n'} text {'left': '797', 'font': '0', 'height': '21', 'top': '107', 'width': '5'} <b> </b> {'ID': 10, 'text': '<b> </b>\n'} text {'left': '108', 'font': '2', 'height': '16', 'top': '134', 'width': '5'} {'ID': 11, 'text': ' '} text {'left': '108', 'font': '2', 'height': '16', 'top': '1193', 'width': '5'} {'ID': 12, 'text': ' '} text {'left': '352', 'font': '0', 'height': '21', 'top': '148', 'width': '194'} <b>HOUSE COMMITTEE </b> {'ID': 13, 'text': '<b>HOUSE COMMITTEE </b>\n'} text {'left': '446', 'font': '0', 'height': '21', 'top': '169', 'width': '5'} <b> </b> {'ID': 14, 'text': '<b> </b>\n'} text {'left': '410', 'font': '0', 'height': '21', 'top': '190', 'width': '16'} <b>M</b> {'ID': 15, 'text': '<b>M</b>\n'} text {'left': '426', 'font': '3', 'height': '16', 'top': '193', 'width': '56'} <b>INUTES</b> {'ID': 16, 'text': '<b>INUTES</b>\n'} text {'left': '483', 'font': '0', 'height': '21', 'top': '190', 'width': '5'} <b> </b> {'ID': 17, 'text': '<b> </b>\n'} text {'left': '108', 'font': '0', 'height': '21', 'top': '211', 'width': '5'} <b> </b> {'ID': 18, 'text': '<b> </b>\n'} text {'left': '395', 'font': '0', 'height': '21', 'top': '232', 'width': '108'} <b>29 June 2016 </b> {'ID': 19, 'text': '<b>29 June 2016 </b>\n'} text {'left': '108', 'font': '2', 'height': '21', 'top': '253', 'width': '5'} {'ID': 20, 'text': ' '} text {'left': '108', 'font': '2', 'height': '21', 'top': '273', 'width': '64'} Present: {'ID': 21, 'text': 'Present: '} text {'left': '108', 'font': '2', 'height': '21', 'top': '294', 'width': '154'} L. Campbell-Savours {'ID': 22, 'text': 'L. Campbell-Savours '} text {'left': '108', 'font': '2', 'height': '21', 'top': '315', 'width': '172'} B. DSouza (Chairman) {'ID': 23, 'text': 'B. DSouza (Chairman) '} text {'left': '108', 'font': '2', 'height': '21', 'top': '336', 'width': '162'} L. Hope of Craighead {'ID': 24, 'text': 'L. Hope of Craighead '} text {'left': '108', 'font': '2', 'height': '21', 'top': '357', 'width': '173'} L. Hunt of Kings Heath {'ID': 25, 'text': 'L. Hunt of Kings Heath '} text {'left': '108', 'font': '2', 'height': '21', 'top': '378', 'width': '74'} L. Laming {'ID': 26, 'text': 'L. Laming '} text {'left': '108', 'font': '2', 'height': '21', 'top': '399', 'width': '89'} B. Manzoor {'ID': 27, 'text': 'B. Manzoor '} text {'left': '108', 'font': '2', 'height': '21', 'top': '420', 'width': '161'} B. Stowell of Beeston {'ID': 28, 'text': 'B. Stowell of Beeston '} text {'left': '108', 'font': '2', 'height': '21', 'top': '441', 'width': '97'} L. Wakeham {'ID': 29, 'text': 'L. Wakeham '} text {'left': '108', 'font': '2', 'height': '21', 'top': '461', 'width': '189'} L. Wallace of Tankerness {'ID': 30, 'text': 'L. Wallace of Tankerness '} text {'left': '108', 'font': '2', 'height': '21', 'top': '482', 'width': '34'} ----- {'ID': 31, 'text': '----- '} text {'left': '108', 'font': '2', 'height': '21', 'top': '503', 'width': '151'} B. Cohen of Pimlico {'ID': 32, 'text': 'B. Cohen of Pimlico '} text {'left': '108', 'font': '2', 'height': '21', 'top': '524', 'width': '5'} {'ID': 33, 'text': ' '} text {'left': '108', 'font': '2', 'height': '21', 'top': '545', 'width': '321'} together with the Clerk of the Parliaments. {'ID': 34, 'text': 'together with the Clerk of the Parliaments. '} text {'left': '108', 'font': '2', 'height': '21', 'top': '566', 'width': '5'} {'ID': 35, 'text': ' '} text {'left': '108', 'font': '2', 'height': '21', 'top': '587', 'width': '598'} The Finance Director and the Director of Public Information were in attendance. {'ID': 36, 'text': 'The Finance Director and the Director of Public Information were in attendance. '} text {'left': '108', 'font': '2', 'height': '21', 'top': '607', 'width': '5'} {'ID': 37, 'text': ' '} text {'left': '108', 'font': '2', 'height': '21', 'top': '628', 'width': '416'} Apologies: L. Cope of Berkeley, B. McDonagh, L. Stirrup {'ID': 38, 'text': 'Apologies: L. Cope of Berkeley, B. McDonagh, L. Stirrup '} text {'left': '108', 'font': '2', 'height': '21', 'top': '649', 'width': '5'} {'ID': 39, 'text': ' '} text {'left': '108', 'font': '0', 'height': '21', 'top': '670', 'width': '205'} <b>1. MINUTES OF THE 6</b> {'ID': 40, 'text': '<b>1. MINUTES OF THE 6</b>\n'} text {'left': '313', 'font': '4', 'height': '12', 'top': '670', 'width': '16'} <b>TH</b> {'ID': 41, 'text': '<b>TH</b>\n'} text {'left': '329', 'font': '0', 'height': '21', 'top': '670', 'width': '379'} <b> MEETING OF THE 2015-16 SESSION AND </b> {'ID': 42, 'text': '<b> MEETING OF THE 2015-16 SESSION AND </b>\n'} text {'left': '135', 'font': '0', 'height': '21', 'top': '691', 'width': '179'} <b>MATTERS ARISING </b> {'ID': 43, 'text': '<b>MATTERS ARISING </b>\n'} text {'left': '108', 'font': '2', 'height': '21', 'top': '712', 'width': '5'} {'ID': 44, 'text': ' '} text {'left': '108', 'font': '2', 'height': '21', 'top': '733', 'width': '391'} The Committee <b>agreed</b> {'ID': 45, 'text': 'The Committee <b>agreed</b>\n'} text {'left': '108', 'font': '2', 'height': '21', 'top': '754', 'width': '5'} {'ID': 46, 'text': ' '} text {'left': '108', 'font': '0', 'height': '21', 'top': '774', 'width': '675'} <b>2. IMPLEMENTING THE RECOMMENDATIONS OF THE LEADERS GROUP </b> {'ID': 47, 'text': '<b>2. IMPLEMENTING THE RECOMMENDATIONS OF THE LEADERS GROUP </b>\n'} text {'left': '135', 'font': '0', 'height': '21', 'top': '795', 'width': '177'} <b>ON GOVERNANCE </b> {'ID': 48, 'text': '<b>ON GOVERNANCE </b>\n'} text {'left': '108', 'font': '2', 'height': '21', 'top': '816', 'width': '5'} {'ID': 49, 'text': ' '} text {'left': '108', 'font': '2', 'height': '21', 'top': '837', 'width': '253'} The Leader introduced the paper. {'ID': 50, 'text': 'The Leader introduced the paper. '} text {'left': '108', 'font': '2', 'height': '21', 'top': '858', 'width': '5'} {'ID': 51, 'text': ' '} text {'left': '108', 'font': '2', 'height': '21', 'top': '879', 'width': '658'} The Committee considered whether the Senior Deputy Speaker or chairs of the Services {'ID': 52, 'text': 'The Committee considered whether the Senior Deputy Speaker or chairs of the Services '} text {'left': '108', 'font': '2', 'height': '21', 'top': '900', 'width': '611'} Committee would answer oral questions in the Chamber. The Leader clarified that {'ID': 53, 'text': 'Committee would answer oral questions in the Chamber. The Leader clarified that '} text {'left': '108', 'font': '2', 'height': '21', 'top': '921', 'width': '666'} responding to written questions and questions for short debate would be delegated to the {'ID': 54, 'text': 'responding to written questions and questions for short debate would be delegated to the '} text {'left': '108', 'font': '2', 'height': '21', 'top': '942', 'width': '673'} appropriate chair, but she expected that only the Senior Deputy Speaker would respond to {'ID': 55, 'text': 'appropriate chair, but she expected that only the Senior Deputy Speaker would respond to '} text {'left': '108', 'font': '2', 'height': '21', 'top': '962', 'width': '289'} oral questions from the Despatch Box. {'ID': 56, 'text': 'oral questions from the Despatch Box. '} text {'left': '108', 'font': '2', 'height': '21', 'top': '983', 'width': '5'} {'ID': 57, 'text': ' '} text {'left': '108', 'font': '2', 'height': '21', 'top': '1004', 'width': '641'} The Committee discussed remuneration for the chair of the Services Committee. Lord {'ID': 58, 'text': 'The Committee discussed remuneration for the chair of the Services Committee. Lord '} text {'left': '108', 'font': '2', 'height': '21', 'top': '1025', 'width': '629'} Campbell-Savours suggested that the post was effectively full-time and so it should be {'ID': 59, 'text': 'Campbell-Savours suggested that the post was effectively full-time and so it should be '} text {'left': '108', 'font': '2', 'height': '21', 'top': '1046', 'width': '592'} remunerated. Other members noted that chairs of Select Committees were not {'ID': 60, 'text': 'remunerated. Other members noted that chairs of Select Committees were not '} text {'left': '108', 'font': '2', 'height': '21', 'top': '1067', 'width': '623'} remunerated; chairing the Services Committee was thought to be an equivalent role. {'ID': 61, 'text': 'remunerated; chairing the Services Committee was thought to be an equivalent role. '} text {'left': '108', 'font': '2', 'height': '21', 'top': '1088', 'width': '669'} Primary legislation would be required if the post were to be remunerated. The Committee {'ID': 62, 'text': 'Primary legislation would be required if the post were to be remunerated. The Committee '} text {'left': '108', 'font': '2', 'height': '21', 'top': '1109', 'width': '329'} agreed the post should not be remunerated. {'ID': 63, 'text': 'agreed the post should not be remunerated. '} text {'left': '108', 'font': '2', 'height': '21', 'top': '1130', 'width': '5'} {'ID': 64, 'text': ' '}

Data

Downloaded 0 times

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (6 KB) Use the API

rows 10 / 64

text ID
1
2
3
4
5
<b> </b>
6
<b> </b>
7
<b>H/1617 </b>
8
<b>1st Meeting </b>
9
<b> </b>
10

Statistics

Average successful run time: less than 10 seconds

Total run time: less than 20 seconds

Total cpu time used: less than 5 seconds

Total disk space used: 29.3 KB

History

  • Manually ran revision 10d4e4c4 and completed successfully .
    64 records added, 7 records removed in the database
  • Manually ran revision b3d1a741 and failed .
    7 records added in the database
  • Manually ran revision 3272ead0 and failed .
    nothing changed in the database
  • Created on morph.io

Scraper code

Chapter_23