Exporting Delicious Bookmarks anno 2017
The Export tab in your Delicious bookmarks account shows it has been disabled due to heavy load on the database. Too bad, but that doesn't prevent me from using the web scraping trick to download all my bookmarks.
The Export tab in your Delicious bookmarks account shows it has been disabled due to heavy load on the database. Too bad, but that doesn’t prevent me from using the web scraping trick to download all my bookmarks.
What scraper to use?
To select the elements to scrape directly on a Delicious page, I use Web Scraper from Martins Balodis. The documentation on webscraper.io is very clear. The Web Scraper is integrated with the Chrome Developer Tools, so open these to find the Web Scraper functionality.
Creating a sitemap
A Web Scraper sitemap is a description of the elements you want to scrape from the webpage, taking into account that each bookmarks page contains a subset of your total collection.
{
"_id": "delicious",
"selectors": [
{
"delay": "",
"id": "per_page_bookmarks_enumeration",
"multiple": false,
"parentSelectors": [
"_root"
],
"selector": "div.profileMidpanel",
"type": "SelectorElement"
},
{
"delay": "",
"id": "single_bookmark_data",
"multiple": true,
"parentSelectors": [
"per_page_bookmarks_enumeration"
],
"selector": "div.articleThumbBlockOuter",
"type": "SelectorElement"
},
{
"delay": "",
"id": "bookmark_title",
"multiple": false,
"parentSelectors": [
"single_bookmark_data"
],
"regex": "",
"selector": "a.title",
"type": "SelectorText"
},
{
"delay": "",
"id": "bookmark_link",
"multiple": false,
"parentSelectors": [
"single_bookmark_data"
],
"regex": "",
"selector": "div.articleInfoPan p:nth-of-type(1)",
"type": "SelectorText"
},
{
"delay": "",
"extractAttribute": "a",
"id": "tag",
"parentSelectors": [
"single_bookmark_data"
],
"selector": "ul.tagName li",
"type": "SelectorGroup"
},
{
"delay": "",
"id": "description",
"multiple": false,
"parentSelectors": [
"single_bookmark_data"
],
"regex": "",
"selector": "div.thumbTBriefTxt p:nth-of-type(2)",
"type": "SelectorText"
}
],
"startUrl": [
"https://del.icio.us/<your_account_id>",
"https://del.icio.us/<your_account_id>?&page=[2-<your_maximum_page>]"
]
}
The sitemap also contains the set of URL’s to scrape. At the end of the file above,
you will see the startUrl
section. Replace <your_account_id>
with your Delicious
account name. I listed the start page followed by the enumeration from page 2 up to
the maximum page number you have in your collection. Before you replace
<your_maximum_page>
with your real total number, try first with a small subset,
e.g. 2-5
.
Under Create new sitemap
, you can Import sitemap
.
Now start scraping.
When the scraping is done, you can Browse
the results under the Sitemap
section.
But now Export as CSV
to be able to post process them.