%META:TOPICINFO{author="RaymondLutz" comment="reprev" date="1547695862" format="1.1" reprev="41" version="42"}%

Election Data Harvesting for Nov 2018

Direct link to this page: https://copswiki.org/Common/ElectionDataHarvestingForNov2018
Share Button

Overview

We need help to harvest election data during election night and as the elections are subsequently processed. We have software to automatically check election results every X minutes and download and archive them whenever they change. Unfortunately, our election officials are all over the map in terms of where they place their election data, so we need help to get our software set up with links to those results. This is where you can help -- from anywhere in the world with an internet connection! (This tool is experimental at this stage and we are making improvements and changes frequently to adapt to what we find.)

What we can support

At this point, we support both direct links and indirect links.
  • direct - the URL provided accesses an object (like a web page, PDF, .doc, .csv, .xsl, etc.) which will be updated from time to time but will always be at the same address.
  • indirect - the link accesses a web page where a link exists to the object under some LINKTEXT, where will will find a link that changes each time the data is changed.
    • For example:
      • The CPUC publishes a "Daily Calendar" as a new file every business day.
      • Try accessing it indirectly by first going to this page: http://docs.cpuc.ca.gov/SearchRes.aspx?DocTypeID=9&Latest=1 (This takes a few seconds to load).
      • Then clicking the link at "PDF"
      • Thus, to configure the harvester to access this object:
        • URL = http://docs.cpuc.ca.gov/SearchRes.aspx?DocTypeID=9&Latest=1
        • Extract Type = Indirect
        • Link Text = PDF
        • File Extension = PDF

How you can help

We need volunteers who can
  • visit websites of election districts and locate the election results web page and find a link (URL) to it.
    • First check the attachment list below to see if those results are already being harvested.
    • Find out if the link stays the same as the data is changed, or if a new document is created each time at a new link.
    • If it is the latter, we need to back up one page and find the link to the intermediate page that then provides the latest link under some unchanging Link Text, like "Latest Results."
  • Submit each link separately in the form below.
  • We will be reviewing your submissions to make sure it seems reasonable before we add it to the harvesting tool.
  • Please monitor the progress of your submission by receiving emails:
    • You should get the first email as soon as we get the task added to the harvester task list.
    • Whenever the object is updated, you will receive an email and a link to the new object.
    • If there is any problem in accessing the page or if it is achiving too often (no obvious changes) then we will need to disable it.
  • The harvesting task can be updated by submitting a new request and by using the same email address and prefix.
  • The harvested documents will be added as attacments to this page.

Submission

We have two ways to submit tasks for harvesting, Batch mode and Individual Submission.

Batch Mode

Please create a spreadsheet (such as Googledocs Spreadsheet, Excel, LibreCalc, or CSV file) with the following fields
  • State - Two-character state code
  • District - Proper name of the district
  • ObjectURL - The URL of the resource
Email the following information to us:
  • The spreadsheet or URL to googledocs spreadsheet.
  • Name of the person who is submitting it.
  • Email address for alerts (if you want them).

Individual Submission

Individual submission gives you more control over each task. Additional advanced controls are not shown here and can be added by our team in special cases.

Election Data Harvesting Submission Form
Please complete the following form for each harvester task. See Form Completion Notes below for details on each field.
Your Full Name (reqd):
Your Email (reqd):
Your Phone:
State of Election District (reqd):
County or District Name (reqd):
Unique Task Item Prefix (reqd):
Enable Task: Enable      Disable
Starting URL:
Extraction Type: Direct      Indirect
Summary of this Task Item:
(For Indirect) Link Text:
(For Indirect) Allowed File Extension(s):
Default ObjectFile Extension(s):
Scan Interval: 5 min.      15 min.      Hourly
4 hours      Daily      Weekly
No-Change Alert Interval: Daily      Weekly      Monthly      None
Enable Email Alerts: Enable      Disable
Archive Each Version / Alert Only: Archive Each Version      Alert Only
Comments:
No Robots!

Form Completion Notes

You don't need to include quotes in your submission fields, even if shown below. If you want to make corrections to a submission which was already accepted, resubmit the form with the same Prefix.

Field Description
Your Full Name Please enter your full name (first and last).
Email Your email. This is used to identify your account and Harvester will sent alert notices to this address.
Phone Your phone number. We probably won't use this unless we need to talk over your submission if there is an error or if your email isn't working.
State of Election District This is the state of the district for which the data harvesting is submitted, not necessarily your own district.
County or District Name Normally, this is the county election district but it does vary by state. If you are monitoring statewide results, then use "Statewide"
Prefix Choose a prefix for data files that will make sense, no spaces (underscore is okay). So results for a state will be grouped together, use state code for first two characters, then district, adn then what type of file. Like "CA_SanDiego_Results". Remember what you entered here so you can update any prior requests. This should be unique among any requests you submit.
Enable Task Item Normally, enabled. But if you want to disable an entry, then send the same entry again (same prefix) and choose Disable, and no other entries will be considered or updated.
Starting URL If "Extraction Type" is "Direct", then this is the full URL to the item to be archived. If "Extraction Type" is "Indirect", then this is the full (unchanging) URL to a web page which will provide the (changing) URL to the item to be archived associated with hyperlinked text on the page.
Extraction Type If the object to be archived is at a single URL of a file which is dynamically modified, then choose "Direct". "Indirect" means that the harvester will first access an intervening web page to find the (changing) link to the data item associated with hyperlinked "Link Text"
Link Text Used only with Indirect Extraction, this identifies the text on the first page which provides the link to the data item, where the second link is updated with each revision of the object.
Allowed File Extension(s) Used only with Indirect Extraction, provide the file extension of the data item, like "PDF", "DOC", "XML", "XSL", "CSV", etc. (case insensitive). This will exclude spurious matches from the initial page. You can allow multiple types by specifying multiple extensions separated by commas.
Default Object File Extension Provide the file extension of the data item for workfile and archive file so it can be easily opened. Default is 'htm'. Will replace active extensions like 'aspx'
Scan Interval This is the rate at which the object will be checked. Default is 15 minutes. Set this as slow as reasonably possible to avoid harvesting too many intermediate revisions.
No Update Alert Interval If an update is not seen within this interval, Harvester will send an email alert. Actual alert time is set 20% longer than the interval shown to allow some slop in the update due to various factors (holidays, etc.)
Archive Each Version / Alert Only In some cases, it may be sufficient to get an alert if the object changes and then you can look at it yourself by accessing the URL. However, if the object is dynamically changing, as is the case with election results, it is best to archive each new version to provide a history of changes to the object.
Enable Alerts If you want to disable email alerts, set this to Disable. Enabled by default.
Comment Provide any additional comments here regarding if you are finding something unusual, have trouble finding the links, or need different Intervals than those shown.

Archived Objects

This list is now hardcoded here because the harvester has been turned off for this election. "Equivalent" files have been purged. (It turns out that HTML files frequently have embedded values in hidden fields that change even if the rest of the file is unchanged. To determine if they are equivalent, they are converted to only printing characters and all white space removed. It may be possible to do a better job in this respect.)

Project Form edit

Project Name Election Data Harvesting for Nov 2018
Project Description Automated election data harvesting.
Project Founder Ray Lutz
Project Curator Ray Lutz
Project Type Issue Oversight
Project Parents Election Integrity
Related Keywords Election Team
Project Status Hot
Publish Status Published
Thumbnail Link
Forum Link
List Serve Topic
Topic revision: r1 - 29 Sep 2020, RaymondLutz
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Cops? Send feedback