I need data on United States federal agency rules scraped from the [url removed, login to view] website. I would provide a list of Regulatory Information Numbers (RIN). This requires a search of the [url removed, login to view] website for each RIN ([url removed, login to view]). There are about 569 unique RIN numbers for which I need data. Some of the searches for RINs will not produce results because the database is only for rules produced after 1995.
Each search will produce a number of records for each RIN. I need data from each record. So, each RIN will produce a number of rows of data equal to the number of records produced by the search (see, for example, what is produced by a search for RIN # 2060-AM06). This can vary from 1 to as many as 20 rows of data for each RIN.
The columns would be data scraped from the “View all RIN Data” page for each search (see, for example, what this looks like for RIN #2060-AM06).
**The website includes a link to RIN data in XML. It may be easier to simply put all of this XML data into a spreadsheet.
At minimum, the specific data I need from the web pages are:
1. RIN [1 column]
2. Publication ID [1 column]
3. Publication year [1 column]: From the Publication ID
4. Publication season [1 column]: From the Publication ID. This should be either Fall or Spring.
5. Title [1 column]
6. Agency [1 column]
7. Priority [1 column]
8. RIN Status [1 column]
9. Agenda Stage of Rulemaking [1 column]
10. Major [1 column]
11. Unfunded Mandates [1 column]
12. CFR Citation [1 column]
13. Legal Authority [1 column]
14. Legal Deadline [Many columns]: In some cases this will say “None”. However, in some cases it will include data on an action, source, description and date. There may also be cases where there is more than one entry here. I need the entries for action, source, description, and date for each deadline included in the table included after “Legal Deadline” (if there is one). So, in a case where there are two entries for legal deadline, there should be 8 columns of data filled (action, source, description, date; action, source, description, date). In cases where there are no deadlines, these columns would be blank.
15. Timetable [Many columns]: This portion of the page often includes multiple entries (i.e., multiple rows of data. It includes entries for Action, Date, and FR Cite. In most cases there is more than one entry here. I need all of this data. So, in a case where there are four entries for timetable, there should be 12 columns of data filled (action, date, fr cite; action, date, fr cite, etc.).
16. Deadline [1 column]: The way the agency indicates that they are setting a deadline is that the day within a date is set at 00. So, for example, a date of 06/00/2005 (mm/dd/yyyy) is a deadline because there is no specific day listed. I would like a column that is coded with a 1 if any of the dates in the timetable have this format (indicating a deadline).
17. Regulatory Flexibility Analysis [1 column]
18. Government Levels Affected [1 column]
19. Federalism [1 column]
20. Included in the Regulatory Plan [1 column]
21. Related RINs [1 column]
22. Agency Contact: [8 columns; I need one column for Name, Title, Agency, Sub-agency, Address 1, Address 2, Address 3, Phone, Email]
This data should be collected for each record for each RIN and placed in the columns.
36 freelancers estão ofertando em média $380 para este trabalho
Hi, thanks for your consideration. I am expert in web scraping with good feedback, I have visited the site and sure I can get what you want, hope can work for you, thanks.
I have seen your requirements and read all the details. You can see my profile that i deliver as promised so i will take care of your project and will provide you 100% results on this project. Thanks, Ahmed
Hi, After reading your project description. I am willing to do this job for you and ready to work immediately, following your instructions. Kind Regards, Noushin Jahan