Blockifiers

Blockifiers convert data into Steamship’s native Block format.

  • A Blockifier’s input is raw bytes. Examples include a PDF, image, audio, HTML, CSV, JSON-formatted API output, or so on.
  • A Blockifier’s output is an object in Steamship Block format.

All data imported into Steamship must be first blockified before it can be used.

You can use blockifiers when developing Steamship Packages, in your own Python app code, or as one-off functions that convert data in the cloud.

Using Blockifiers

To use a blockifier, create an instance with your Steamship client and apply it to a file.

# Load a Steamship Workspace
from steamship import Steamship, File
client = Steamship(workspace="my-workspace-handle")
 
# Upload a file
file = File.create(path="path/to/some_file").data
 
# Create the blockifier instance
blockifier = client.use_plugin('blockifier-handle', 'instance-handle')
 
# Apply the blockifier to the file
task = file.blockify(blockifier.handle)
 
# Wait until the blockify task completes remotely
task.wait()
 
# Refresh the file to see the output
file.refresh()
 
# file.blocks now has the blockified content

In the above code, the two key lines are:

blockifier = client.use_plugin('blockifier-handle')
task = file.blockify(blockifier.handle)

In these lines, blockifier-handle identifies which blockifier you would like to use, and instance-handle identifies your particular instance of this blockifier in a workspace. The same instance is reused, rather than created, if you load it like this again.

Common Blockifiers

Steamship maintains a growing collection of official blockifiers for common scenarios. Our goal is to always map our defaults to best of breed models so that you can get work done quickly without worrying about the details of model selection and tuning.

Our currently supported blockifiers are:

Input

The input to a blockify operation is a File with no Blocks.

Output

When you call blockify on a file, the object that is returned is a Task. You can wait() on this task, or continue on to do other work. The output of a blockify operation is Blocks and potentially Tags on that file. However, since the operation happens asynchronously on the back-end, you will need to refresh() the file to see the output.