Environment variables to configure the scraper:
Get trials with last updated mark => this date.
Get trials with last updated mark <= this date.
Requests we're making to clinicaltrials.gov interval.
We need to download around 200 000 pages and we want to be polite to the source webserver:
So we can scrape manually by years then pull updates for the last year automatically:
Proposed settings for the scraping some year:
Proposed settings for the database updating:
Source of all data scraped by this scraper: clinicaltrials.gov.
ClinicalTrials.gov data are available to all requesters, both within and outside the United States, at no charge.
Average successful run time: about 5 hours
Total run time: 2 months
Total cpu time used: about 20 hours
Total disk space used: 69.8 KB