ylchan87 / HKCourtList

HK Court


This is a scraper that runs on Morph. To get started see the documentation

HK Court Lists Archive

What the code do

This repo scrap HK court cases from https://e-services.judiciary.hk/dcl/index.jsp?lang=en

This repositry is a prototype for the one of the projects pitched in the first g0vhk.io Hackathon on 2018-06-23 in Hong Kong. The main document written by project owner, Selina, listed the details in this Hackpad. Please go and read.

The scraped data can be found at https://morph.io/ylchan87/HKCourtList

Files

scraper.py

the scraper, get called by morph.io, makes http request and parse the reply with courtParser.py

courtParser.py

parse the html to get the fields and save to database, as defined by dataModel.py

dataModel.py

the sqlAlchemy data model for the court cases

extractor.py

util to explode a html table to a python list of list (i.e. 2D array)

testTableExtract.py

test script to test extractor.py

About the data model

Event

a row in the court's timetable, the "main" table of the SQL DB

Case

Case no. uniquely identifying a case. A case can have many events, when there's multiple hearings and trials. An event can also deal with multiple cases at the same time.

many-to-many relationship with event

Judge

many-to-many relationship with event

Lawyer

many-to-many relationship with event

Tag

Could be 1. Offence nature of the event (theft, robbery, etc) 2. Procedure type of the event (trial, hearing, mention, summon etc)

many-to-many relationship with event

Contributors ylchan87

Last run completed successfully .

Console output of last run

Injecting configuration and compiling...  -----> Python app detected -----> Installing python-3.6.2 -----> Installing pip -----> Installing requirements with pip  Obtaining scraperwiki from git+http://github.com/openaustralia/scraperwiki-python.git@morph_defaults#egg=scraperwiki (from -r /tmp/build/requirements.txt (line 6))  Cloning http://github.com/openaustralia/scraperwiki-python.git (to revision morph_defaults) to /app/.heroku/src/scraperwiki  Running command git clone -q http://github.com/openaustralia/scraperwiki-python.git /app/.heroku/src/scraperwiki  Running command git checkout -b morph_defaults --track origin/morph_defaults  Switched to a new branch 'morph_defaults'  Branch morph_defaults set up to track remote branch morph_defaults from origin.  Resolved http://github.com/openaustralia/scraperwiki-python.git to commit 732dda1982a3b2073f6341a6a24f9df1bda77fa0  Preparing metadata (setup.py): started  Preparing metadata (setup.py): finished with status 'done'  Collecting lxml==3.4.4  Downloading lxml-3.4.4.tar.gz (3.5 MB)  Preparing metadata (setup.py): started  Preparing metadata (setup.py): finished with status 'done'  Collecting cssselect==0.9.1  Downloading cssselect-0.9.1.tar.gz (32 kB)  Preparing metadata (setup.py): started  Preparing metadata (setup.py): finished with status 'done'  Collecting beautifulsoup4==4.5.3  Downloading beautifulsoup4-4.5.3-py3-none-any.whl (85 kB)  Collecting pytz==2018.9  Downloading pytz-2018.9-py2.py3-none-any.whl (510 kB)  Collecting requests==2.21.0  Downloading requests-2.21.0-py2.py3-none-any.whl (57 kB)  Collecting SQLAlchemy==1.2.18  Downloading SQLAlchemy-1.2.18.tar.gz (5.7 MB)  Preparing metadata (setup.py): started  Preparing metadata (setup.py): finished with status 'done'  Collecting chardet<3.1.0,>=3.0.2  Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)  Collecting certifi>=2017.4.17  Downloading certifi-2021.10.8-py2.py3-none-any.whl (149 kB)  Collecting urllib3<1.25,>=1.21.1  Downloading urllib3-1.24.3-py2.py3-none-any.whl (118 kB)  Collecting idna<2.9,>=2.5  Downloading idna-2.8-py2.py3-none-any.whl (58 kB)  Collecting dumptruck>=0.1.2  Downloading dumptruck-0.1.6.tar.gz (15 kB)  Preparing metadata (setup.py): started  Preparing metadata (setup.py): finished with status 'done'  Building wheels for collected packages: lxml, cssselect, SQLAlchemy, dumptruck  Building wheel for lxml (setup.py): started  Building wheel for lxml (setup.py): still running...  Building wheel for lxml (setup.py): finished with status 'done'  Created wheel for lxml: filename=lxml-3.4.4-cp36-cp36m-linux_x86_64.whl size=3300610 sha256=dbd31b48a7031df4018e83c04800cd92c34766e4aac41890a6317e6e8b722e99  Stored in directory: /tmp/pip-ephem-wheel-cache-1n9a117i/wheels/6d/4f/4c/af39325568e80f4188c8fc7232557540270ee6293e952d3d87  Building wheel for cssselect (setup.py): started  Building wheel for cssselect (setup.py): finished with status 'done'  Created wheel for cssselect: filename=cssselect-0.9.1-py3-none-any.whl size=27016 sha256=7af89046966958b749c62c382c1a5a5d37d9581d531a34f39a6072b1f1d89688  Stored in directory: /tmp/pip-ephem-wheel-cache-1n9a117i/wheels/63/71/d5/b5473de5b6bebecb4642ef7ef61b9124f461282297e7db01d0  Building wheel for SQLAlchemy (setup.py): started  Building wheel for SQLAlchemy (setup.py): finished with status 'done'  Created wheel for SQLAlchemy: filename=SQLAlchemy-1.2.18-cp36-cp36m-linux_x86_64.whl size=1138069 sha256=297f496b0e802cd8faf22a6c9d50ce3a11ac71a9d19080d7351c377a0c0e17bf  Stored in directory: /tmp/pip-ephem-wheel-cache-1n9a117i/wheels/f0/c8/23/564790f927d5635c6e95f92bf290694ce855966b9b0e3f1496  Building wheel for dumptruck (setup.py): started  Building wheel for dumptruck (setup.py): finished with status 'done'  Created wheel for dumptruck: filename=dumptruck-0.1.6-py3-none-any.whl size=11842 sha256=27aae6d267db4752ccd0c79b8f05598d31c4e572eb6ed5db160c4001beed914c  Stored in directory: /tmp/pip-ephem-wheel-cache-1n9a117i/wheels/dd/d6/90/5b8b02a27b50092a98b66976204cb03b18cb08f6a646cbd6fe  Successfully built lxml cssselect SQLAlchemy dumptruck  Installing collected packages: urllib3, idna, chardet, certifi, requests, dumptruck, SQLAlchemy, scraperwiki, pytz, lxml, cssselect, beautifulsoup4  Running setup.py develop for scraperwiki  Successfully installed SQLAlchemy-1.2.18 beautifulsoup4-4.5.3 certifi-2021.10.8 chardet-3.0.4 cssselect-0.9.1 dumptruck-0.1.6 idna-2.8 lxml-3.4.4 pytz-2018.9 requests-2.21.0 scraperwiki-0.3.7 urllib3-1.24.3   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=BP Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=CLCMC Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=CRHPI Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=CWUP Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=MIA Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=O14 Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=OTD Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=CLPI Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=MCL Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=CACFI Fail parsing CACFI 20220514 (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (sqlite3.DatabaseError) database disk image is malformed [SQL: 'INSERT INTO events (category, court, datetime, parties, parties_atk, parties_def) VALUES (?, ?, ?, ?, ?, ?)'] [parameters: ('CACFI', 'No.20', '2022-05-14 09:30:00.000000', 'WONG See Kit (汪思傑)', '', '')] (Background on this error at: http://sqlalche.me/e/4xp6) Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=CFA Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=CT Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=DC Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=DCMC Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=HCMC Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=LANDS Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=DCMC Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=FMC Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=ETNMAG Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=FLMAG Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=KCMAG Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=KTMAG ParseError: Parse court failed: 法庭 Court : Fail parsing KTMAG 20220514 This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (sqlite3.DatabaseError) database disk image is malformed [SQL: 'INSERT INTO events (category, court, datetime, parties, parties_atk, parties_def) VALUES (?, ?, ?, ?, ?, ?)'] [parameters: ('CACFI', 'No.20', '2022-05-14 09:30:00.000000', 'WONG See Kit (汪思傑)', '', '')] (Background on this error at: http://sqlalche.me/e/4xp6) Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=STMAG ParseError: Parse court failed: 法庭 Court : Fail parsing STMAG 20220514 This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (sqlite3.DatabaseError) database disk image is malformed [SQL: 'INSERT INTO events (category, court, datetime, parties, parties_atk, parties_def) VALUES (?, ?, ?, ?, ?, ?)'] [parameters: ('CACFI', 'No.20', '2022-05-14 09:30:00.000000', 'WONG See Kit (汪思傑)', '', '')] (Background on this error at: http://sqlalche.me/e/4xp6) Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=TMMAG Parsing https://e-services.judiciary.hk/dcl/view.jsp?lang=tc&date=14052022&court=WKMAG ParseError: Parse court failed: 法庭 Court : Fail parsing WKMAG 20220514 This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (sqlite3.DatabaseError) database disk image is malformed [SQL: 'INSERT INTO events (category, court, datetime, parties, parties_atk, parties_def) VALUES (?, ?, ?, ?, ?, ?)'] [parameters: ('CACFI', 'No.20', '2022-05-14 09:30:00.000000', 'WONG See Kit (汪思傑)', '', '')] (Background on this error at: http://sqlalche.me/e/4xp6)

Data

Downloaded 46 times by ylchan87 employproof oktak howawong hk01data hksoftmedia olsungd01 AlanJeffries

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (79.5 MB) Use the API

rows 0 / 0

id category court datetime parties parties_atk parties_def
1
CRHPI
No.43
2019-03-18T09:30:00+00:00
hidden
hidden
2
CRHPI
No.43
2019-03-18T09:50:00+00:00
hidden
hidden
3
CRHPI
No.43
2019-03-18T10:50:00+00:00
hidden
hidden
4
CRHPI
No.43
2019-03-18T11:10:00+00:00
hidden
hidden
5
CRHPI
No.43
2019-03-18T11:30:00+00:00
hidden
hidden
6
CRHPI
No.43
2019-03-18T14:30:00+00:00
hidden
hidden
7
O14
No.41
2019-03-18T09:30:00+00:00
hidden
hidden
8
O14
No.41
2019-03-18T09:30:00+00:00
hidden
hidden
9
MCL
No.6
2019-03-18T09:30:00+00:00
hidden
10
MCL
No.6
2019-03-18T09:30:00+00:00
hidden
hidden

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (79.5 MB) Use the API

rows 10 / 455

id name_zh name_en
1
余敏奇聆案官
Master R. Yu
2
許家灝聆案官
Master K.H. Hui
3
雷健文聆案官
Master Lui
4
林文瀚上訴庭副庭長
Hon Lam VP
5
張澤祐上訴庭法官
Hon Cheung JA
6
鮑晏明上訴庭法官
Hon Barma JA
7
薛偉成上訴庭法官
Hon Zervos JA
8
朱芬齡上訴庭法官
Hon Chu JA
9
潘兆初上訴庭法官
Hon Poon JA
10
區慶祥上訴庭法官
Hon Au JA

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (79.5 MB) Use the API

rows 10 / 216531

id caseNo description
1
HCPI1037/2018
2
HCPI1038/2018
3
HCPI1041/2018
4
HCPI1042/2018
5
HCPI866/2017
6
HCPI519/2016
7
HCA2917/2018
民事訴訟
8
HCA2769/2017
民事訴訟
9
HCA1185/2018
10
HCA249/2019
民事訴訟

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (79.5 MB) Use the API

rows 10 / 5088

id name_zh name_en
1
反對自動解除破產
Objections to discharge
2
有關無力償還的雜項申請
Miscellaneous Insolvency Application
3
簡易判決
O.14 List
4
破產呈請
Bankruptcy Petition
5
核對列表聆訊/案件管理會議
Check List/Case Management Conference
6
核對列表審核聆訊 (人身傷亡案件)
Checklist Review Hearing(PI Cases)
7
公司清盤呈請
Winding Up Petition
8
勞資審裁處
Labour Tribunal
9
傳票(停止代表訴訟當事人)
Summons (To cease acting)
10
法庭指示
For Directions

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (79.5 MB) Use the API

rows 10 / 1115

id name_zh name_en
1
方氏律師事務所
FONGS
2
劉陳高律師事務所
Lau, Chan & Ko
3
柯廣輝律師事務所
Or & Partners
4
梁鳳慈律師行
Winnie Leung & Co.
5
尹麗儀律師行
Mandy Wan & Co.
6
陳應達律師事務所
Y.T. Chan & Co.
7
CHIH
8
蘇龍律師事務所
So, Lung & Associates
9
陳、陳律師行
Chan & Chan
10
何韋律師行
Howse Williams

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (79.5 MB) Use the API

rows 10 / 408557

event_id judge_id
1
1
2
1
3
1
4
1
5
1
6
1
7
2
8
2
9
3
10
3

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (79.5 MB) Use the API

rows 0 / 0

event_id case_id
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (79.5 MB) Use the API

rows 0 / 0

event_id tag_id
1
6
2
6
3
6
4
6
5
6
6
6
7
3
8
3
9
9
10
10

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (79.5 MB) Use the API

rows 10 / 137639

event_id lawyer_id
7
12
7
13
7
14
8
15
9
16
10
17
11
16
12
10
14
20
17
23

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (79.5 MB) Use the API

rows 10 / 68578

event_id lawyer_id
1
1
2
3
3
5
4
7
5
9
6
11
13
18
16
21
18
23
19
23

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (79.5 MB) Use the API

rows 10 / 68578

event_id lawyer_id
1
2
2
4
3
6
4
8
5
10
6
4
13
19
16
22
18
24
19
24

Statistics

Average successful run time: 12 minutes

Total run time: 10 days

Total cpu time used: about 5 hours

Total disk space used: 79.7 MB

History

  • Auto ran revision 5e804686 and completed successfully .
  • Auto ran revision 5e804686 and completed successfully .
  • Auto ran revision 5e804686 and completed successfully .
  • Auto ran revision 5e804686 and completed successfully .
  • Auto ran revision 5e804686 and completed successfully .
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

HKCourtList / scraper.py