Using the SDK for Syncs or Imports
Syncs and Imports are both ways to load content into a collection. Both can create board groups, boards, sections, cards, tags, and attachments (images, pdfs, etc.). Imports are one-time and their content is editable in Guru. Synced content is not editable in Guru but you can run a sync again to update its content in Guru.
Imports and syncs both use this .zip format and the SDK helps you create the files in that format. You tell the SDK what content you have and how it's structured.
- What nodes exist. Each node needs an ID and title.
- The content associated with each node, as either HTML or markdown.
- The relationship between nodes -- nodes can be nested inside of each other.
Our SDK figures out, based on the hierarchy you describe, which items will become board groups, boards, sections, and cards. It'll create the .yaml and .html files required by our import format, zip the files, and upload them to Guru.
Here's an example
Below is a full example that downloads six pages from Wikipedia and will import them as a board containing six cards. You can also find this example in the SDK here: https://github.com/guruhq/py-sdk/blob/master/examples/wikipedia_sync1.py
import guru
urls = [
"https://en.wikipedia.org/wiki/Odessey_and_Oracle",
"https://en.wikipedia.org/wiki/Pet_Sounds",
"https://en.wikipedia.org/wiki/London_Calling",
"https://en.wikipedia.org/wiki/24_Hour_Revenge_Therapy",
"https://en.wikipedia.org/wiki/...And_Out_Come_the_Wolves",
"https://en.wikipedia.org/wiki/Left_and_Leaving"
]
g = guru.Guru()
bundle = g.bundle("favorite_albums")
favorite_albums = bundle.node(id="albums", title="Favorite Albums")
for url in urls:
doc = guru.load_html(url)
body = doc.select(".mw-parser-output")[0]
title = doc.find(id="firstHeading").text
# remove elements we don't want in the guru card (the right column, footer links, etc.)
for el in body.select(".ambox-content, .infobox, [role='navigation'], .wikitable.floatright, #toc, .shortdescription, .hatnote"):
el.decompose()
album_node = bundle.node(
id=title,
url=url,
title=title,
content=str(body)
)
album_node.add_to(favorite_albums)
bundle.zip()
bundle.view_in_browser()
Now we'll step through each of the key pieces:
g = guru.Guru()
bundle = g.bundle("favorite_albums")
A "bundle" is what we call the content that can either be loaded into Guru as a sync or import. It doesn't matter at this point whether it'll be a sync or import, you define the content the same way. When the content is uploaded to Guru, that's when you say whether it's a sync or import.
favorite_albums = bundle.node(id="albums", title="Favorite Albums")
This creates a node that we'll add the pages to. It needs an ID and a Title, but since we're just using it to group the other items it doesn't have any HTML content associated with it.
for url in urls:
doc = guru.load_html(url)
body = doc.select(".mw-parser-output")[0]
title = doc.find(id="firstHeading").text
# remove elements we don't want in the guru card (the right column, footer links, etc.)
for el in body.select(".ambox-content, .infobox, [role='navigation'], .wikitable.floatright, #toc, .shortdescription, .hatnote"):
el.decompose()
We load each URL and get each page's full HTML. From this HTML we can find the article's title. We also isolate the article's body in two steps:
- We find the .mw-parser-output element, which is wikipedia's container around the entire article.
- We remove extra elements, like the table of contents, that we don't need in the Guru card.
album_node = bundle.node(
id=title,
url=url,
title=title,
content=str(body)
)
album_node.add_to(favorite_albums)
This creates a node for the wikipedia article, using the article's title and content. The last line adds it to our "Favorite Albums" node. Each album node becomes Guru Card because they have HTML content -- boards don't have content, only cards do. Since we're adding the album nodes to the favorite albums node, "Favorite Albums" will become a board.
We know that Favorite Albums will become a board but we don't need to know this or tell the SDK this. As the content hierarchy gets more complicated, with more levels of nesting, the SDK will figure out what needs to be a board group, board, or section.
bundle.zip()
bundle.view_in_browser()
The call to zip()
tells the SDK we're done adding content. It can go ahead and figure out which nodes are boards, cards, etc. and write the .html and .yaml files.
The call to view_in_browser()
opens a preview page in your web browser so you can see the content before it's loaded into Guru. This page shows you the board structure that'll be created and lets you preview what each card will look like. Some cards may appear differently when they're imported and viewed in Guru, but this preview page gives you a quicker way to check the content without waiting for a full import to happen.
Previewing the bundle's content
This is what the preview page looks like:
The left side shows the content hierarchy. This shows all board groups, boards, sections, and cards that will be created. If your content is nested deeply, the SDK will make use of board groups and sections to handle the extra levels of nesting. In this example we just have six cards grouped under one item, so it simply becomes one board containing the cards.
The rest of the UI is two iframes -- the one on the left shows the content that'll be imported into Guru. The one on the right shows the original page. You can click on cards in the left to preview them or use the up/down arrow keys to cycle through them.
The Copy Spreadsheet button in the bottom-left copies a summary of the content to the clipboard so you can paste it into a spreadsheet. It looks like this:
When you have a lot of cards, this is a great way to identify large articles that you may want to split into multiple cards.
Updated about 1 year ago