I am sure that this tool already exists, but I could not find it.
What I need is a tool, which would take the XML dump (see attached two files) and extract the following fields into Excel Spreadsheet:
First Author First Name (first author in the author list)
First Author Last Name
Journal Name (e.g. Nature) (in XML dump it usually goes as Title)
The program should check if the last name of the first author is present in the email address. If not, first and last name of the Last Author (last author in the author list) on the publication should be used.
The program must tolerate files with twenty thousand records.
Only about 20% of entries have emails.