spendright / msd

Merge SpendRight scraper data

Scrapes morph.io

Get structured data out of the web. Code collaboration through GitHub. Run your scrapers in the cloud.


Merge Scraper Data (for great justice!)

There are a lot of consumer campaigns out there on the Internet. Consumer campaigns supported by perfectly lovely organizations, organized around causes you wholeheartedly support, that would change the world if enough people followed through on them.

But it's hard to put them into practice. And it's really hard to put more than one campaign at a time into practice. Different campaigns have different scoring systems, different names for the same company, and, when they have them at all, different apps that don't talk to each other.

msd takes messy data from a bunch of different consumer campaigns, and puts it into a single unified format.

msd currently powers SpendRight (msd's creator) and the thinkContext browser extension.

Using the data

If you're not there already, check out msd's morph.io page, where you can view and download data merged from SpendRight's scrapers.

Keep in mind that the original consumer campaigns are generally copyrighted by the non-profits that created them, and they have all sorts of different terms/licensing agreements. It's up to you to decide whether to ask them for permission now, or forgiveness later.

(This mostly applies to the claim and rating tables; facts about companies and brands are almost certainly fair game.)

Installation

It's on PyPI: pip install msd

Usage

msd db1.sqlite [db2.sqlite ...]

This produces a file named msd.sqlite (you can change this with the -o switch).

msd can also take YAML files as input. The YAML files should encode a map from table name to list of rows (which are maps from column name to value). For example:

campaign:
- author: Greenpeace International
  campaign_id: greenpeace_palm_oil
rating:
- campaign_id: greenpeace_palm_oil
  company: Colgate-Palmolive
  judgment: -1
- campaign_id: greenpeace_palm_oil
  company: Danone
  judgment: 0

If you don't have the library installed (e.g. for development), you can use python -m msd.cmd in place of msd.

Data format

msd uses a SQLite data format, both for input and output.

The input and output format are almost identical; differences are noted in italics.

Keys

Every campaign in the input data should have a campaign_id that would work as a Python identifier (for example wwf_palm_oil).

There isn't a company_id field though; we just use the shortest name that a company is commonly referred to by.``msd`` is smart enough to know that, for example, The Coca-Cola Company can be called Coca-Cola but that we can't refer to The Learning Company as simply "Learning".

Similarly, there isn't a brand_id field, msd just figures out the proper name for the brand (minus the ™, etc.), and puts it into the brand field; the "key" for any given brand is company and brand together.

There also aren't (product) category keys; we just put the name of the category (e.g. Chocolate) into the category field.

Finally, the initial data sources each get a scraper_id, which is one or more identifiers, separated by dots (e.g. sr.campaign.wwf_palm_oil). These serve only to help you track down problems in your input data.

Every table in the input data may have a scraper_id field to help identify which code gathered that data. The stem of whatever input file data came from will be prepended to form the scraper_id in the output.

For example, a scraper_id of wwf_palm_oil from an input file named sr.campaign.sqlite would become sr.campaign.wwf_palm_oil in the output data.

Messy input data

msd can accept very, very messy input data. The goal is for you to be able as little effort as possible into writing scrapers.

no primary keys

For starters, the input data need not have primary keys, or any keys at all. The first thing we do is shovel all the input data into a single "scratch" table anyways.

It's totally fine to have two rows that would have the same keys in the output data; msd will merge them for you.

missing/extra fields

It's totally fine for the input data to be missing fields, or have fields set to NULL that are supposed to have a value (in the worst case, if you omit a required value, msd will just ignore that row.

It's fine to have extra fields; msd will just ignore them.

different names for companies and brands

It's fine to use different names for the same company or brand; msd will figure this out and merge them as appropriate.

general text cleanliness

For every text field, msd does the following things for you:

  • converts all whitespace (tabs, double spaces, etc.) to a single space
  • strips leading and trailing whitespace
  • converts "smart quotes", ligatures, and other silliness to the plain equivalent
  • normalizes all unicode into NFKD form (this basically means there aren't multiple ways to represent the same accented character).

brand name cleaning

In addition, you can be even lazier with the brand field. msd automatically finds ™, ®, etc., puts it elsewhere for safekeeping (see the tm field, below), and ignores anything after it.

For example, if you throw something like INVOKANA™ (canagliflozin) USPI into the brand field, it'll know that the brand is named INVOKANA and is supposed to have a ™ after it.

category name cleaning

msd formats category names in a consistent way. For example, food & beverages in the input data would become Food and Beverages in the output data.

rating cleanup

msd can do limited cleanup of ratings, including inferring judgment from grade. See rating table for details.

inferred rows

msd will infer that companies and brands exist. For example, if you include a rating for a company in the rating table, a corresponding entry will be automatically created for you in the company table.

and that's not all...

Nope, that's pretty much everything. Here are the table definitions:

Table definitions

brand: facts about brands

Primary Key: company, brand

brand: canonical name for the brand (e.g. Dove)

company: canonical name for the company (e.g. Unilever)

facebook_url: optional link to official Facebook page for the brand. (If there's only a page for the company, put that in company.facebook_url). So consumers can say nice/brutally honest things on their Facebook page.

is_former: 0 or 1. If 1, this brand no longer exists (e.g. Sanyo) or was sold to another company (e.g. LU is no longer owned by Groupe Danone). Set this to 1 in your input data to knock out out-of-date brand information from out-of-date consumer campaigns.

is_licensed: 0 or 1. If 1, this brand actually belongs to another company (e.g. The Coca-Cola Company markets products under the Evian brand). Generally a good idea to put the responsiblity for a brand on its actual owner.

is_prescription: 0 or 1. If 1, this brand is available by prescription only (so you probably can't buy it on, like, Amazon.com).

logo_url: 0 or 1. Optional link to an image of this brand's logo (need not be on the brand's website).

tm: empty string, , ® or . The thing that companies like to appear directly after the brand name.

twitter_handle: optional handle for the brand's Twitter account, including the @ (e.g. @BrownCowYogurt). So consumers can congratulate them/call them out on Twitter.

url: optional link to official web site/page for this brand. It's okay if this is just a sub-page of the company's official website.

campaign: consumer campaigns

In practice, introducing consumer campaigns to users is one of the most important parts of any tool you build; you'll probably want to just use this table as a starting point, and include some content of your own.

Primary Key: campaign_id

author: optional free-form name of the organization behind the campaign (e.g. Greenpeace International).

author_url: optional link to author's website

campaign: free-form name of the campaign (e.g. Guide to Greener Electronics)

campaign_id: unique identifier for this campaign (e.g. greenpeace_electronics.) Up to you to pick something that makes sense and doesn't collide with other campaign IDs.

contributors: optional free-form description of other contributors to the consumer campaign (e.g. International Labor Rights Forum, Baptist World Aid).

copyright: optional copyright notice. Usually starts with © (e.g. © 2006-2014 Climate Counts. All Rights Reserved.).

date: optional date this campaign was created, in YYYY-MM-DD, YYYY-MM, or YYYY format. A string, not a number. Sometimes the best available data is a couple years old, and consumers deserve to know!

donate_url: optional link to a page where you can donate back to the campaign/author. Try to include this somewhere in whatever you build; create a virtuous cycle and help these consumer campaigns become financially self-sustaining!

email: optional contact email for the campaign (e.g. feedback@free2work.org)

facebook_url: optional link to official Facebook page for the campaign, so consumers can get involved in the movement!

goal: very brief (40 characters or less) description of what someone helps accomplish by being involved in this campaign (e.g. stop forced labor in Uzbekistan). Best to start this with a lowercase letter unless the first word is a proper noun.

twitter_handle: optional handle for the campaign's Twitter account, so that consumers can follow/reference them on Twitter. Including the @ (e.g. @WWF).

url: optional link to campaign's web site, so consumers can learn more and get involved.

category: product categories for companies and brands

msd doesn't build an organized category tree like, say, online retailers have; these are more like hints. See the subcategory table for details.

Primary Key: company, brand, category

brand: canonical name for the brand. Empty string if we're categorizing a company

category: free-form name for category (e.g. Food and Beverages).

company: canonical name for the company

is_implied: 0 or 1. If 1, this category was only implied by a subcategory relationship (see subcategory table). Ignored in the input data.

claim: bullet points to support ratings

Primary Key: campaign_id, company, brand, scope, claim

(claim is free-form, so this is more like a non-unique key)

brand: canonical name for the brand. Empty string if this is a claim about a company.

campaign_id: unique identifier of campaign making this claim (see campaign.campaign_id)

claim: free-form claim. Should be small enough to fit in a bullet point, and be able to stand on its own (spell out obscure acronyms and other context). Best to start this with a lowercase letter unless the first word is a proper noun.

company: canonical name for the company

date: optional date this claim was made, in YYYY-MM-DD, YYYY-MM, or YYYY format. A string, not a number.

judgment: -1, 0, or 1. Does the claim say something good (1), mixed (0), or bad (-1) about the company or brand? Need not match the campaign's rating. If a claim is totally neutral (e.g. manufactures large appliances) it doesn't belong in this table at all!

scope: optional free-form limitation on which products this applies to (e.g. Fair Trade). Usually an empty string, to mean no limitation or that it's only not some scope elsewhere in the data (don't set this to Non-Certified).

url: optional link to web page/PDF document etc. where this claim was made. Some people like to see the supporting data!

company: facts about companies

Primary Key: company

company: canonical name for the company (e.g. Disney)

company_full: full, official name of the company (e.g. The Walt Disney Company).

email: contact/feedback email for the company (e.g. consumer.relations@adidas.com).

facebook_url: optional link to official Facebook page for the company.

feedback_url: optional link to a page where consumers can submit feedback to the company (some companies don't like to do this by email).

hq_company: optional name of the country where this company is headquartered (e.g. USA).

logo_url: 0 or 1. Optional link to an image of this company's logo (need not be on the company's website).

phone: optional phone number for customer feedback/complaints (a string, not a number)

twitter_handle: optional handle for the company's Twitter account, including the @ (e.g. @Stonyfield).

url: optional link to official web site/page for this company.

company_name: canoncial, full, and alternate names for companies

Primary Key: company, company_name

company: canonical name for the company (e.g. Disney)

company_name: a name for the company. can be the canonical name, the full name (see company.company_full) or something else (e.g. Walt Disney).

is_alias: 0 or 1. If 1, this is a name that somebody used somewhere but isn't really a recognizable name for the company (e.g. "AEO" for American Eagle Outfitters or "LGE" for "LG Electronics"). Set this your input data to knock out weird company aliases.

is_full: 0 or 1. If 1, this is the full name for the company, which also appears in company.company_full. (There isn't an is_canonical field; just check if company = company_name.)

rating: campaigns' judgments of brands and companies

This is where the magic happens.

brand: canonical name for the brand. Empty string if this is a rating of a company.

campaign_id: unique identifier of campaign this rating comes from (see campaign.campaign_id)

company: canonical name for the company

date: optional date this rating was last updated, in YYYY-MM-DD, YYYY-MM, or YYYY format. A string, not a number.

description: free-form, brief description of the rating (e.g. Soaring, Cannot Recommend).

grade: optional letter grade (e.g. A+, C-, F). Some campaigns use E instead of F.

judgment: -1, 0, or 1. Should consumers support (1), consider (0), or avoid (-1) the company or brand? Some campaigns will give everything a 1 (e.g. certifiers) or everything a -1 (e.g. boycott campaigns).

msd can infer judgment from grade, but otherwise you need to set it yourself in the input data.

Red for avoid, yellow for consider, and green for support is a de-facto standard among consumer campaigns. If all else fails, contact the campaign's author and ask.

max_score: if score is set, the highest score possible on the rating scale (a number).

min_score: if score is set, the lowest score possible on the rating scale (a number). If score is set but min_score is not, msd will assume min_score is zero.

num_ranked: if rank is set, the number of things ranked (an integer)

rank: if campaign ranks companies/brands, where this one ranks (this is an integer, and the best ranking is 1, not 0).

scope: optional free-form limitation on which products this applies to (e.g. Fair Trade). Usually an empty string, to mean no limitation or that it's only not some scope elsewhere in the data (don't set this to Non-Certified).

score: optional numerical score (e.g. 57.5).

url: optional link to web page/PDF document etc. where this rating was made. Some people like to see the supporting data!

scraper: when data was last gathered

Primary Key: scraper_id

last_scraped: when this data was last gathered, as a UTC ISO timestamp (for example, 2015-08-03T20:55:36.795227Z).

scraper_id: unique identifier for the scraper that gathered this data

scraper_brand_map: names of brands in the input data

This is mostly useful for debugging your output data.

msd ignores this table if it appears in the input data

Primary Key: scraper_id, scraper_company, scraper_brand

Other Indexes: (company, brand)

brand: canonical name for the brand. (This should never be empty; that's what scraper_company_map is for.)

company: canonical name for the company

scraper_brand: name used for the brand in the input data

scraper_company: name used for the company in the input data

scraper_id: unique identifier for the scraper that used this brand and company name

scraper_category_map: names of categories in the intput data

This is mostly useful for debugging your output data.

msd ignores this table if it appears in the input data

Primary Key: scraper_id, category, scraper_brand

Other Indexes: (category)

category: canonical name for a category (e.g. Food and Beverages)

scraper_brand: name used for the brand in the input data (e.g. `` food & beverages``).

scraper_id: unique identifier for the scraper that used this category name

scraper_company_map: names of companies in the input data

This is mostly useful for debugging your output data.

msd ignores this table if it appears in the input data

Primary Key: scraper_id, scraper_company

Other Indexes: (company)

company: canonical name for the company

scraper_brand: name used for the brand in the input data

scraper_id: unique identifier for the scraper that used this company name

subcategory: product category relationships

msd doesn't attempt to build a proper category tree; it's really just a directed graph of category relationships: if something is in category A (subcategory) it must also be in category B (category).

msd automatically infers implied relationships: if A is a subcategory of B and B is a subcategory of C, A is a subcategory of C.

category: canonical name for a category

is_implied: 0 or 1. If 1, this relationship was inferred by msd. Ignored in the input data.

subcategory: canonical name for a subcategory of category

url: hook for scraping URLs in the scraper data

This table only exists in the input data, and is only used to fill fields in the output data that would otherwise be empty.

This allows us to build generic scrapers that can grab Twitter handles, Facebook URLs, etc. directly from a company or brand's official page. See SpendRight's scrape-urls for an example.

facebook_url: optional facebook page for a company/brand

last_scraped: when the company/brand's page was scraped, as a UTC iso timestamp (e.g. 2015-08-03T20:55:36.795227Z). Not currently used.

twitter_handle: optional twitter handle for a company/brand, including the leading @.

url: url this data was scraped from

Writing your own scrapers

If you want to write something in Python, check out SpendRight's scrape-campaigns project, and submit a pull request (look in scrapers/) for examples.

If you'd rather write in another language, consider setting up your own scraper on morph.io, which can also handle scrapers in Ruby, PHP, Perl, and Node.js. See the morph.io Documentation for details. And let us know, so we can point msd's morph.io page at it.

Working on msd

msd is pretty straightforward. Here's a brief overview of how it works:

  1. msd starts in msd/cmd.py (look for msd.cmd.run()).
  2. It first dumps all the input data into a temporary "scratch" DB (msd-scratch.sqlite) with the correct columns and useful indexes (look for msd.scratch.build_scratch_db()).
  3. Then it creates the output database (msd.sqlite) and fills it table by table (look for msd.fill_output_db()).

Also, table definitions live in msd/table.py.

Using msd as a library

msd isn't really a library, but there's some useful stuff in msd (for example, msd/company.py knows how to strip all the various versions of "Inc." off company names).

If you want to call some of this stuff from another project, please let us know so that we can work out a sane, stable interface for you!

Contributors davidmarin spendright-scrapers

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling...  -----> Python app detected  ! The latest version of Python 3 is python-3.6.2 (you are using python-3.4.2, which is unsupported).  ! We recommend upgrading by specifying the latest version (python-3.6.2).  Learn More: https://devcenter.heroku.com/articles/python-runtimes -----> Installing python-3.4.2 -----> Installing pip -----> Installing requirements with pip  Collecting PyYAML>=3.11 (from -r /tmp/build/requirements.txt (line 1))  Downloading https://files.pythonhosted.org/packages/4a/85/db5a2df477072b2902b0eb892feb37d88ac635d36245a72a6a69b23b383a/PyYAML-3.12.tar.gz (253kB)  Collecting Unidecode>=0.04.9 (from -r /tmp/build/requirements.txt (line 2))  Downloading https://files.pythonhosted.org/packages/59/ef/67085e30e8bbcdd76e2f0a4ad8151c13a2c5bce77c85f8cad6e1f16fb141/Unidecode-1.0.22-py2.py3-none-any.whl (235kB)  Collecting titlecase>=0.7.1 (from -r /tmp/build/requirements.txt (line 3))  Downloading https://files.pythonhosted.org/packages/1c/6b/1af08c2d82cf3bad9caf764bca8c398c78f811d0638e5cb268952f45c917/titlecase-0.12.0-py3-none-any.whl  Installing collected packages: PyYAML, Unidecode, titlecase  Running setup.py install for PyYAML: started  Running setup.py install for PyYAML: finished with status 'done'  Successfully installed PyYAML-3.12 Unidecode-1.0.22 titlecase-0.12.0   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... scraper: downloading https://morph.io/spendright/scrape-campaigns/data.sqlite -> sr.campaign.sqlite scraper: downloading https://morph.io/spendright/scrape-companies/data.sqlite -> sr.company.sqlite scraper: downloading https://morph.io/spendright/scrape-urls/data.sqlite -> sr.url.sqlite Traceback (most recent call last): File "scraper.py", line 82, in <module> main() File "scraper.py", line 67, in main output_db_path=OUTPUT_PATH) TypeError: run() got an unexpected keyword argument 'force_rebuild_scratch'

Data

Downloaded 23 times by spendright-scrapers MikeRalphson thinkcontext davidmarin rdible mikekhristo

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (11.7 MB) Use the API

rows 10 / 10

author author_url campaign campaign_id contributors copyright date donate_url email facebook_url goal twitter_handle url
B Lab
B Corporation List
b_corp
© Copyright 2015 B Lab. All rights reserved
Redefine success in business
@bcorporation
Accord on Fire and Building Safety In Bangladesh
Bangladesh Accord Signatories
bang_accord
© 2015 Accord on Fire and Building Safety In Bangladesh
Make Bangladesh garment factories safe
@banglaccord
Climate Counts
Climate Counts Scorecard
climate_counts
Stonyfield Organic, University of New Hampshire
© 2006-2014 Climate Counts. All Rights Reserved.
Reduce large companies' climate impact
@ClimateCounts
Responsible Sourcing Network
Cotton Sourcing Snapshot
cotton_snapshot
2015
info@sourcingnetwork.org
stop forced labor in Uzbekistan
@SourcingNetwork
Not for Sale
Free2Work
free2work
International Labor Rights Forum, Baptist World Aid
©2010-2014 NOT FOR SALE
feedback@free2work.org
End Human Trafficking and Slavery
@F2W
Greenpeace International
Guide to Greener Electronics
greenpeace_electronics
© GREENPEACE 2015
Change the electronics industry
@greenpeace
Human Rights Campaign
HRC Buyer's Guide
hrc
LGBT inclusion in the workplace
@HRC
Responsible Sourcing Network
Mining the Disclosures
mining_the_disclosures
2015-09-22
info@sourcingnetwork.org
end the conflict minerals trade
@SourcingNetwork
Rank a Brand
rankabrand
buy sustainable
@rankabrand_org
World Wildlife Fund
Palm Oil Buyers Scorecard
wwf_palm_oil
Make palm oil sustainable
@WWF

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (11.7 MB) Use the API

rows 10 / 29

last_scraped scraper_id
2015-10-06T05:37:36.753954Z
sr.campaign.b_corp
2015-10-08T16:07:32.969876Z
sr.campaign.bang_accord
2015-10-08T16:08:38.744642Z
sr.campaign.climate_counts
2015-10-07T17:29:42.075386Z
sr.campaign.cotton_snapshot
2015-10-08T16:23:29.823567Z
sr.campaign.free2work
2015-10-08T16:23:34.268888Z
sr.campaign.greenpeace_electronics
2016-07-07T22:47:07.001668Z
sr.campaign.hrc
2015-10-07T18:15:17.244015Z
sr.campaign.mining_the_disclosures
2015-09-19T08:00:15.747304Z
sr.campaign.rankabrand
2015-10-08T16:24:48.379906Z
sr.campaign.wwf_palm_oil

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (11.7 MB) Use the API

rows 10 / 1395

category scraper_category scraper_id
BPA Free Baby and Sport
BPA Free Baby and Sport Products
sr.campaign.b_corp
Leadership and Organisational Development
Leadership and Organisational Development
sr.campaign.b_corp
Cloud Computing Products and Services
Cloud Computing Products & Services
sr.campaign.b_corp
Business Incubation/Acceleration
Business incubation/acceleration
sr.campaign.b_corp
Fashion Apparel, Accessories and Private Label Garment Production - Design
Fashion apparel, accessories and private label garment production - design
sr.campaign.b_corp
Children's Social Care Services
Children's social care services
sr.campaign.b_corp
Business Consulting
Business Consulting
sr.campaign.b_corp
Sustainable School Supplies and Stationery
Sustainable school supplies and stationery
sr.campaign.b_corp
Marketing and eCommerce
Marketing & eCommerce
sr.campaign.b_corp
Natural Health Product
Natural health product
sr.campaign.b_corp

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (11.7 MB) Use the API

rows 10 / 1745

category is_implied subcategory
0
1. Consulting Services 2. Social Enterprise Shopfront 3. Talent Education
0
1
100% Local, Clean Energy
0
100% Local
100% Natural, Sugar Free With Stevia Cookies
0
100% Natural
ADHD and Life Skills Coaching
0
ADHD
Business Incubation/Acceleration
0
Acceleration
Calzado, Accesorios, Ropa, Productos De Consumo
0
Accesorios
Accessories for Adopted Animals and Their People
0
Accessories for Adopted Animals
Accounting and Bookkeeping Services for Nonprofits
0
Accounting
Accounting and Consulting Services for Small Businesses and Non-Profits
0
Accounting

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (11.7 MB) Use the API

rows 10 / 3823

company scraper_company scraper_id
Manitowoc Company
Manitowoc Company Inc.
sr.campaign.hrc
Imps&Elfs
Imps&Elfs
sr.campaign.rankabrand
Ben Sherman
Ben Sherman
sr.campaign.rankabrand
Christian Dior
Christian Dior S.A
sr.campaign.rankabrand
Community Services.Net
Community Services.Net Pty Ltd
sr.campaign.b_corp
4th Bin
4th Bin Inc.
sr.campaign.b_corp
Airblaster
Airblaster, LLC
sr.campaign.rankabrand
L.L. Bean
L.L. Bean
sr.campaign.hrc
Next Retail
Next Retail Ltd.
sr.campaign.rankabrand
Gruppo Coin / OVS
Gruppo Coin / OVS
sr.campaign.bang_accord

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (11.7 MB) Use the API

rows 10 / 4510

company company_name is_alias is_full
Manitowoc Company
Manitowoc Company
0
0
Manitowoc Company
Manitowoc Company Inc.
0
1
Imps&Elfs
Imps&Elfs
0
1
Ben Sherman
Ben Sherman
0
1
Christian Dior
Christian Dior
0
0
Christian Dior
Christian Dior S.A.
0
1
Community Services.Net
Community Services.Net
0
0
Community Services.Net
Community Services.Net Pty Ltd
0
1
4th Bin
4th Bin
0
0
4th Bin
4th Bin Inc.
0
1

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (11.7 MB) Use the API

rows 10 / 7201

brand company scraper_brand scraper_company scraper_id
10 Days
10 Days
10 Days
10 Days
sr.campaign.rankabrand
Napco Marketing Corporation
1800Flowers
Napco Marketing Corporation
1800Flowers
sr.campaign.free2work
WineTasting Network
1800Flowers
WineTasting Network
1800Flowers
sr.campaign.free2work
Fannie May Confections, Inc
1800Flowers
Fannie May Confections, Inc
1800Flowers
sr.campaign.free2work
DesignPac Gifts LLC
1800Flowers
DesignPac Gifts LLC
1800Flowers
sr.campaign.free2work
1800Flowers
1800Flowers
1800Flowers
1800Flowers
sr.campaign.free2work
Bloomnet
1800Flowers
Bloomnet
1800Flowers
sr.campaign.free2work
The Popcorn Factory
1800Flowers
The Popcorn Factory
1800Flowers
sr.campaign.free2work
1800-Baskets.com
1800Flowers
1800-Baskets.com
1800Flowers
sr.campaign.free2work
Cheryl&Co.
1800Flowers
Cheryl&Co.
1800Flowers
sr.campaign.free2work

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (11.7 MB) Use the API

rows 10 / 6668

brand company facebook_url is_former is_licensed is_prescription logo_url tm twitter_handle url
10 Days
10 Days
0
0
0
1800-Baskets.com
1800Flowers
0
0
0
1800Flowers
1800Flowers
0
0
0
Bloomnet
1800Flowers
0
0
0
Cheryl&Co.
1800Flowers
0
0
0
DesignPac Gifts LLC
1800Flowers
0
0
0
Fannie May Confections, Inc
1800Flowers
0
0
0
Harry London Candies
1800Flowers
0
0
0
Napco Marketing Corporation
1800Flowers
0
0
0
The Popcorn Factory
1800Flowers
0
0
0

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (11.7 MB) Use the API

rows 10 / 16540

brand category company is_implied
10 Days
Casual Clothing
10 Days
0
10 Days
Fashion, Clothing and Shoes
10 Days
1
24Colours
Casual Clothing
24colours
0
24Colours
Fashion, Clothing and Shoes
24colours
1
Ace
Household
3M
0
Ace
Household and Personal Care
3M
1
Animalintex
Household
3M
1
Animalintex
Household and Personal Care
3M
1
Animalintex
Pet Care
3M
0
Attest
Household
3M
1

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (11.7 MB) Use the API

rows 10 / 10650

brand campaign_id claim company date judgment scope url
Ovomaltine
rankabrand
Although ABF purchases its cocoa for European cocoa-based drinks from UTZ and International Cocoa Alliance sources, ABF does not elucidate how much of a market share this represents overall in its business operations.
ABF
0
Ovomaltine
rankabrand
ABF indicates that its cocoa for its European cocoa-based drinks (including Ovomaltine) is sourced from UTZ certified cocoa, which encourages its farmers to partake in environmentally-conscious and sustainable farming practices.
ABF
1
Ovomaltine
rankabrand
ABF remains a committed member of the Roundtable on Sustainable Palm Oil (RSPO). As a member, ABF will continue to ensure an increased supply of Certified Sustainable palm oil and their requisite facilities, by 2015. Given necessary supplies, all businesses will use only Certified Sustainable Palm Oil.
ABF
1
Ovomaltine
rankabrand
ABF (brand owner of Primark) publicly reports ist total climate footprint where both, direct and indirect GHG emissions are covered.
ABF
1
Ovomaltine
rankabrand
Neither ABF nor Wander AG specify that it purchases socially certified sugar for its brand Ovomaltine. Neither ABF nor Wander AG specify that it purchases socially certified sugar for its brand Ovomaltine.
ABF
0
Ovomaltine
rankabrand
Neither ABF nor Wander AG does specify whether the filling and/or additional products come from sustainable sources.
ABF
0
Ovomaltine
rankabrand
Associated British Foods (ABF) subsidiary company Wander AG which produces Ovomaltine has taken several policy measures to reduce carbon emissions, such as the use of renewable energy and energy efficiency measures.
ABF
1
Ovomaltine
rankabrand
Neither ABF nor Wander AG communicate whether its fillings and/or additional ingredients for Ovomaltine are purchased from socially certified sources.
ABF
0
Ovomaltine
rankabrand
Wander AG (producer of Ovomaltine) reports to further reduce its GHG emissions by 15% until 2020. Also, Wander AG reports to be "free of CO2". However, neither details on that nor a base year for its reduction goal by 2020 are provided.
ABF
0
Ovomaltine
rankabrand
ABF communicates that it is involved with the International Cocoa Initiative and that all of its cocoa based drinks sold in Europe are UTZ certified Sustainable Cocoa certified. However, there is no specific mention of signing an agreement to ensure 100% of its products are certified by 2020.
ABF
0

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (11.7 MB) Use the API

rows 10 / 3706

brand campaign_id company date description grade judgment max_score min_score num_ranked rank scope score url
Ovomaltine
rankabrand
ABF
2014-07-01
Reasonable, could be better
C
0
22.0
0.0
8.0
Dr. Martens
rankabrand
AIR WAIR Int./ R. Griggs Group
2015-02-25
Dont buy
E
-1
36.0
0.0
3.0
Asics
rankabrand
ASICS
2015-06-24
Dont buy
E
-1
36.0
0.0
2.0
Asos
rankabrand
ASOS
2015-01-28
First milestones, should be better
D
-1
36.0
0.0
9.0
ASUS
rankabrand
ASUS
2014-12-01
First milestones, should be better
D
-1
39.0
0.0
6.0
Hollister
rankabrand
Abercrombie & Fitch
2015-05-15
Dont buy
E
-1
32.0
0.0
3.0
Acer
rankabrand
Acer
2014-12-01
First milestones, should be better
D
-1
39.0
0.0
9.0
Acne
rankabrand
Acne Studios
2013-11-12
Reasonable, could be better
C
0
18.0
0.0
7.0
Adidas
rankabrand
Adidas
2015-06-23
First milestones, should be better
D
-1
36.0
0.0
11.0
Reebok
rankabrand
Adidas
2015-06-23
First milestones, should be better
D
-1
36.0
0.0
11.0

Statistics

Average successful run time: 3 minutes

Total run time: 14 days

Total cpu time used: about 8 hours

Total disk space used: 12.2 MB

History

  • Auto ran revision 5139c3c7 and failed .
    nothing changed in the database
    3 pages scraped
  • Auto ran revision 5139c3c7 and failed .
    nothing changed in the database
    3 pages scraped
  • Auto ran revision 5139c3c7 and failed .
    nothing changed in the database
    3 pages scraped
  • Auto ran revision 5139c3c7 and failed .
    nothing changed in the database
    3 pages scraped
  • Auto ran revision 5139c3c7 and failed .
    nothing changed in the database
    3 pages scraped
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

msd / scraper.py