Ruby API Reference¶
Ruby gem for ISCC (ISO 24138:2024) code generation via a native Rust extension
(Magnus). All 32 Tier 1 symbols are exposed as module-level
methods, constants, and classes on IsccLib.
Installation¶
Or add to your Gemfile:
Quick Example¶
require "iscc_lib"
result = IsccLib.gen_meta_code_v0("Die Unendliche Geschichte")
puts result.iscc # "ISCC:AAA..."
Constants¶
Algorithm constants exposed as module-level constants on IsccLib.
| Constant | Value | Description |
|---|---|---|
IsccLib::META_TRIM_NAME |
128 |
Maximum UTF-8 byte length for the name parameter |
IsccLib::META_TRIM_DESCRIPTION |
4096 |
Maximum UTF-8 byte length for the description parameter |
IsccLib::META_TRIM_META |
128_000 |
Maximum UTF-8 byte length for the meta parameter |
IsccLib::IO_READ_SIZE |
4_194_304 |
Default read buffer size for streaming I/O (4 MB) |
IsccLib::TEXT_NGRAM_SIZE |
13 |
N-gram window size for text similarity hashing |
Result Classes¶
All gen_*_v0 functions return typed result objects — subclasses of IsccLib::Result, which itself
extends Hash. Results support both attribute-style (result.iscc) and hash-style
(result["iscc"]) access.
| Class | Returned by | Fields |
|---|---|---|
IsccLib::MetaCodeResult |
gen_meta_code_v0 |
iscc, name, metahash, description?, meta? |
IsccLib::TextCodeResult |
gen_text_code_v0 |
iscc, characters |
IsccLib::ImageCodeResult |
gen_image_code_v0 |
iscc |
IsccLib::AudioCodeResult |
gen_audio_code_v0 |
iscc |
IsccLib::VideoCodeResult |
gen_video_code_v0 |
iscc |
IsccLib::MixedCodeResult |
gen_mixed_code_v0 |
iscc, parts |
IsccLib::DataCodeResult |
gen_data_code_v0 |
iscc |
IsccLib::InstanceCodeResult |
gen_instance_code_v0 |
iscc, datahash, filesize |
IsccLib::IsccCodeResult |
gen_iscc_code_v0 |
iscc |
IsccLib::SumCodeResult |
gen_sum_code_v0 |
iscc, datahash, filesize, units? |
Fields marked with ? are optional — present only when the corresponding input was provided.
result = IsccLib.gen_meta_code_v0("Example", description: "A description")
result.iscc # attribute-style access
result["iscc"] # hash-style access
result.keys # list all present fields
Code Generation Functions¶
All 10 gen_*_v0 functions return typed Result objects and raise RuntimeError on invalid input.
gen_meta_code_v0¶
Generate a Meta-Code from name and optional metadata.
| Parameter | Type | Description |
|---|---|---|
name |
String |
Title or name of the content (required, trimmed to 128 bytes) |
description: |
String, nil |
Optional description text (trimmed to 4096 bytes) |
meta: |
String, nil |
Optional JSON string or data: URL |
bits: |
Integer |
Bit length of the generated code (default: 64, multiple of 32) |
Returns MetaCodeResult with iscc, name, metahash, and optionally description, meta.
result = IsccLib.gen_meta_code_v0(
"Die Unendliche Geschichte",
description: "Von Michael Ende"
)
puts result.iscc
gen_text_code_v0¶
Generate a Text-Code from plain text content using MinHash over character n-grams.
| Parameter | Type | Description |
|---|---|---|
text |
String |
Plain text content (required) |
bits: |
Integer |
Bit length of the generated code (default: 64, multiple of 32) |
Returns TextCodeResult with iscc and characters.
result = IsccLib.gen_text_code_v0("Hello World")
puts result.characters # number of characters processed
gen_image_code_v0¶
Generate an Image-Code from 1024 grayscale pixel bytes using DCT-based perceptual hashing.
| Parameter | Type | Description |
|---|---|---|
pixels |
String |
32×32 grayscale pixel data (1024 bytes, binary encoded) |
bits: |
Integer |
Bit length of the generated code (default: 64, max 256) |
Returns ImageCodeResult with iscc.
gen_audio_code_v0¶
Generate an Audio-Code from a Chromaprint feature vector using multi-stage SimHash.
| Parameter | Type | Description |
|---|---|---|
cv |
Array<Integer> |
Signed 32-bit Chromaprint feature values |
bits: |
Integer |
Bit length of the generated code (default: 64, multiple of 32) |
Returns AudioCodeResult with iscc.
gen_video_code_v0¶
Generate a Video-Code from MPEG-7 frame signature vectors using WTA-Hash and SimHash.
| Parameter | Type | Description |
|---|---|---|
frame_sigs |
Array<Array<Integer>> |
Array of MPEG-7 frame signature vectors (each 380 integers) |
bits: |
Integer |
Bit length of the generated code (default: 64, multiple of 32) |
Returns VideoCodeResult with iscc.
gen_mixed_code_v0¶
Generate a Mixed-Code from multiple Content-Code strings using SimHash combination.
| Parameter | Type | Description |
|---|---|---|
codes |
Array<String> |
Array of ISCC Content-Code strings (requires >= 2) |
bits: |
Integer |
Bit length of the generated code (default: 64, multiple of 32) |
Returns MixedCodeResult with iscc and parts.
result = IsccLib.gen_mixed_code_v0([text_code, image_code])
puts result.parts # Array of input code strings
gen_data_code_v0¶
Generate a Data-Code from raw byte data using content-defined chunking and MinHash.
| Parameter | Type | Description |
|---|---|---|
data |
String |
Raw binary data |
bits: |
Integer |
Bit length of the generated code (default: 64, multiple of 32) |
Returns DataCodeResult with iscc.
gen_instance_code_v0¶
Generate an Instance-Code from raw byte data using BLAKE3 hashing.
| Parameter | Type | Description |
|---|---|---|
data |
String |
Raw binary data |
bits: |
Integer |
Bit length of the generated code (default: 64, multiple of 32) |
Returns InstanceCodeResult with iscc, datahash, and filesize.
result = IsccLib.gen_instance_code_v0("Hello World".b)
puts result.datahash # Multihash of the data
puts result.filesize # Size in bytes
gen_iscc_code_v0¶
Generate a composite ISCC-CODE from individual unit codes.
| Parameter | Type | Description |
|---|---|---|
codes |
Array<String> |
Array of ISCC unit code strings (must include Data + Instance) |
wide: |
Boolean |
true for 256-bit output, false for 128-bit (default) |
Returns IsccCodeResult with iscc.
data = ("Hello World" * 1000).b
data_code = IsccLib.gen_data_code_v0(data).iscc
instance_code = IsccLib.gen_instance_code_v0(data).iscc
result = IsccLib.gen_iscc_code_v0([data_code, instance_code])
gen_sum_code_v0¶
Generate a composite ISCC-CODE from a file in a single pass. Reads the file once, feeding both a DataHasher (CDC/MinHash) and an InstanceHasher (BLAKE3).
| Parameter | Type | Description |
|---|---|---|
path |
String |
File path |
bits: |
Integer |
Bit length of the generated code (default: 64) |
wide: |
Boolean |
Enable 256-bit wide mode (default: false) |
add_units: |
Boolean |
Include individual unit codes in result (default: false) |
Returns SumCodeResult with iscc, datahash, filesize, and optionally units.
result = IsccLib.gen_sum_code_v0("document.pdf", add_units: true)
puts result.iscc
puts result["units"] # Array of unit code strings
Text Utilities¶
Text processing functions for normalization and cleaning. Used internally by code generation but available for preprocessing.
text_clean¶
Clean and normalize text for display. Applies NFKC normalization, removes control characters (except newlines), normalizes CRLF to LF, collapses consecutive empty lines, and strips leading/trailing whitespace.
| Parameter | Type | Description |
|---|---|---|
text |
String |
Input text |
text_remove_newlines¶
Remove newlines and collapse whitespace to single spaces.
| Parameter | Type | Description |
|---|---|---|
text |
String |
Input text |
text_trim¶
Trim text so its UTF-8 encoded size does not exceed nbytes. Multi-byte characters that would be
split are dropped entirely. Leading/trailing whitespace is stripped from the result.
| Parameter | Type | Description |
|---|---|---|
text |
String |
Input text |
nbytes |
Integer |
Maximum UTF-8 byte length |
text_collapse¶
Normalize and simplify text for similarity hashing. Applies NFD normalization, lowercasing, removes whitespace and characters in Unicode categories C, M, and P, then recombines with NFKC.
| Parameter | Type | Description |
|---|---|---|
text |
String |
Input text |
Encoding & Codec¶
encode_base64¶
Encode bytes as base64url (RFC 4648 section 5, no padding).
| Parameter | Type | Description |
|---|---|---|
data |
String |
Binary data (.b encoded) |
json_to_data_url¶
Convert a JSON string to a data URL with base64 encoding. Uses application/ld+json media type when
the JSON contains an @context key, otherwise application/json.
| Parameter | Type | Description |
|---|---|---|
json |
String |
Valid JSON string |
Raises RuntimeError on invalid JSON input.
encode_component¶
Encode header fields and a raw digest into a base32-encoded ISCC unit string (without ISCC:
prefix).
| Parameter | Type | Description |
|---|---|---|
mtype |
Integer |
MainType enum value (0–255) |
stype |
Integer |
SubType enum value (0–255) |
version |
Integer |
Version enum value (0–255) |
bit_length |
Integer |
Digest bit length (multiple of 32) |
digest |
String |
Raw digest bytes (length >= bit_length / 8) |
Raises RuntimeError on invalid input.
digest = ("\x01\x02\x03\x04\x05\x06\x07\x08").b
code = IsccLib.encode_component(0, 0, 0, 64, digest)
iscc_decompose¶
Decompose a composite ISCC-CODE into individual ISCC-UNITs.
| Parameter | Type | Description |
|---|---|---|
iscc_code |
String |
Composite ISCC-CODE string |
Returns an Array<String> of base32-encoded ISCC-UNIT strings (without ISCC: prefix). Raises
RuntimeError on invalid input.
iscc_decode¶
Decode an ISCC unit string into its header components and raw digest.
| Parameter | Type | Description |
|---|---|---|
iscc_unit |
String |
ISCC unit string |
Returns a 5-element Array: [maintype, subtype, version, length_index, digest] where
length_index is the encoded length code (not the bit length) and digest is a binary String.
Raises RuntimeError on invalid input.
maintype, subtype, version, length_index, digest = IsccLib.iscc_decode("ISCC:EAASKDNZNYGUUF5A")
puts "Maintype: #{maintype}, Subtype: #{subtype}" # "Maintype: 2, Subtype: 0"
puts "Digest: #{digest.unpack1("H*")}"
Algorithm Primitives¶
Low-level algorithm functions for SimHash, MinHash, content-defined chunking, video hashing, and sliding window n-grams.
sliding_window¶
Generate sliding window n-grams from a string.
| Parameter | Type | Description |
|---|---|---|
text |
String |
Input string |
width |
Integer |
Window width in Unicode characters (must be >= 2) |
Returns Array<String>. Raises RuntimeError if width < 2.
IsccLib.sliding_window("Hello World", 3)
# => ["Hel", "ell", "llo", "lo ", "o W", " Wo", "Wor", "orl", "rld"]
alg_simhash¶
Compute a SimHash from a sequence of equal-length hash digests. Output length matches the input digest length.
| Parameter | Type | Description |
|---|---|---|
hash_digests |
Array<String> |
Array of equal-length binary digest strings |
Returns a binary String. Raises RuntimeError on empty input or unequal digest lengths.
alg_minhash_256¶
Compute a 256-bit MinHash digest from 32-bit integer features. Uses 64 universal hash functions with bit-interleaved compression.
| Parameter | Type | Description |
|---|---|---|
features |
Array<Integer> |
Array of 32-bit integer hash values |
Returns a 32-byte binary String.
alg_cdc_chunks¶
Split data into content-defined chunks using gear rolling hash.
| Parameter | Type | Description |
|---|---|---|
data |
String |
Raw binary data |
utf32 |
Boolean |
If true, align cut points to 4-byte boundaries |
avg_chunk_size |
Integer |
Target average chunk size in bytes (required) |
Returns Array<String> of binary chunk strings.
soft_hash_video_v0¶
Compute a similarity-preserving hash from video frame signatures.
| Parameter | Type | Description |
|---|---|---|
frame_sigs |
Array<Array<Integer>> |
Array of MPEG-7 frame signature vectors |
bits |
Integer |
Output bit length |
Returns a binary String of length bits / 8. Raises RuntimeError if input is empty.
frame_sigs = [Array.new(380, 0), Array.new(380, 1)]
hash = IsccLib.soft_hash_video_v0(frame_sigs, 256)
Streaming Hashers¶
Streaming hashers process large data incrementally without loading the entire content into memory.
Each hasher follows the lifecycle: new → update (repeated) → finalize.
Both hashers are single-use: after calling finalize, subsequent calls to update or finalize
raise RuntimeError. The update method returns self for method chaining.
DataHasher¶
Streaming Data-Code generator using content-defined chunking and MinHash.
update¶
| Parameter | Type | Description |
|---|---|---|
data |
String |
Binary data (.b encoded) |
Returns self for method chaining.
finalize¶
| Parameter | Type | Description |
|---|---|---|
bits: |
Integer |
Bit length of the generated code (default: 64) |
Returns DataCodeResult.
Example¶
hasher = IsccLib::DataHasher.new
File.open("large_file.bin", "rb") do |f|
while (chunk = f.read(65_536))
hasher.update(chunk)
end
end
result = hasher.finalize(bits: 64)
puts result.iscc
Method chaining:
InstanceHasher¶
Streaming Instance-Code generator using BLAKE3.
update¶
| Parameter | Type | Description |
|---|---|---|
data |
String |
Binary data (.b encoded) |
Returns self for method chaining.
finalize¶
| Parameter | Type | Description |
|---|---|---|
bits: |
Integer |
Bit length of the generated code (default: 64) |
Returns InstanceCodeResult with iscc, datahash, and filesize.
Example¶
hasher = IsccLib::InstanceHasher.new
File.open("large_file.bin", "rb") do |f|
while (chunk = f.read(65_536))
hasher.update(chunk)
end
end
result = hasher.finalize
puts result.iscc
puts result.datahash
puts result.filesize
Diagnostics¶
conformance_selftest¶
Run all ISCC conformance tests against vendored test vectors.
Returns true if all tests pass, false otherwise.
Error Handling¶
All methods that accept user input raise RuntimeError on invalid arguments:
- Empty or nil required parameters (e.g., empty
nameingen_meta_code_v0) - Invalid
bitsvalues (must be a multiple of 32, range 32–256) - Malformed ISCC strings passed to
iscc_decodeoriscc_decompose - Streaming hasher operations after finalization