Rust¶
A guide to using iscc-lib from Rust. Covers installation, code generation, structured results, streaming, and text utilities.
Installation¶
iscc-lib is a pure Rust crate with no system dependencies.
Code generation¶
All 10 gen_*_v0 functions follow the same pattern: pass content-specific input and a bits
parameter (typically 64), and receive a typed result struct with an .iscc field containing the
ISCC code string.
Meta-Code¶
Generate a Meta-Code from content metadata (title, description, structured metadata):
use iscc_lib::gen_meta_code_v0;
let result = gen_meta_code_v0(
"Die Unendliche Geschichte",
Some("Von Michael Ende"),
None,
64,
)?;
println!("{}", result.iscc); // "ISCC:AAA..."
println!("{}", result.name); // Normalized name
println!("{}", result.metahash); // BLAKE3 hash of metadata
The meta parameter accepts a JSON string or a data: URL with base64-encoded payload. When
provided, meta takes precedence over description for the similarity digest.
Text-Code¶
Generate a Text-Code from plain text content:
use iscc_lib::gen_text_code_v0;
let result = gen_text_code_v0("Hello World", 64)?;
println!("{}", result.iscc); // "ISCC:EAA..."
println!("{}", result.characters); // Number of characters processed
Image-Code¶
Generate an Image-Code from a 32x32 grayscale thumbnail (1024 bytes):
use iscc_lib::gen_image_code_v0;
// Pre-process your image to 32x32 grayscale (e.g., with the `image` crate)
let pixels = vec![128u8; 1024]; // Placeholder: uniform gray
let result = gen_image_code_v0(&pixels, 64)?;
println!("{}", result.iscc); // "ISCC:EEA..."
Audio-Code¶
Generate an Audio-Code from a Chromaprint fingerprint vector (signed integers):
use iscc_lib::gen_audio_code_v0;
// Obtain Chromaprint features externally (e.g., via FFI to libchromaprint)
let fingerprint: Vec<i32> = vec![123456, -789012, 345678, 901234];
let result = gen_audio_code_v0(&fingerprint, 64)?;
println!("{}", result.iscc); // "ISCC:EIA..."
Video-Code¶
Generate a Video-Code from MPEG-7 frame signature vectors:
use iscc_lib::gen_video_code_v0;
// Each frame signature is a Vec of 380 integers
let frame_sigs: Vec<Vec<i32>> = vec![vec![0i32; 380], vec![1i32; 380]];
let result = gen_video_code_v0(&frame_sigs, 64)?;
println!("{}", result.iscc); // "ISCC:EMA..."
Mixed-Code¶
Combine multiple Content-Codes of different types into a Mixed-Code:
use iscc_lib::{gen_text_code_v0, gen_image_code_v0, gen_mixed_code_v0};
let text_result = gen_text_code_v0("Hello World", 64)?;
let image_result = gen_image_code_v0(&vec![128u8; 1024], 64)?;
let codes: Vec<&str> = vec![&text_result.iscc, &image_result.iscc];
let result = gen_mixed_code_v0(&codes, 64)?;
println!("{}", result.iscc); // "ISCC:EQA..."
println!("{:?}", result.parts); // Input code strings
Data-Code¶
Generate a Data-Code from raw bytes using content-defined chunking and MinHash:
use iscc_lib::gen_data_code_v0;
let data = b"Hello World".repeat(1000);
let result = gen_data_code_v0(&data, 64)?;
println!("{}", result.iscc); // "ISCC:GAA..."
Instance-Code¶
Generate an Instance-Code from raw bytes using BLAKE3 hashing:
use iscc_lib::gen_instance_code_v0;
let data = b"Hello World";
let result = gen_instance_code_v0(data, 64)?;
println!("{}", result.iscc); // "ISCC:IAA..."
println!("{}", result.datahash); // Multihash of the data
println!("{}", result.filesize); // Size in bytes
ISCC-CODE¶
Combine individual ISCC unit codes into a composite ISCC-CODE:
use iscc_lib::{gen_data_code_v0, gen_instance_code_v0, gen_iscc_code_v0};
let data = b"Hello World".repeat(1000);
let data_result = gen_data_code_v0(&data, 64)?;
let instance_result = gen_instance_code_v0(&data, 64)?;
let codes: Vec<&str> = vec![&data_result.iscc, &instance_result.iscc];
let result = gen_iscc_code_v0(&codes, false)?;
println!("{}", result.iscc); // "ISCC:KAA..."
Sum-Code¶
Generate a composite ISCC-CODE from a file in a single pass:
use iscc_lib::gen_sum_code_v0;
use std::path::Path;
std::fs::write("example.bin", b"Hello World".repeat(1000))?;
let result = gen_sum_code_v0(Path::new("example.bin"), 64, false, false)?;
println!("{}", result.iscc); // "ISCC:KAA..."
println!("{}", result.datahash); // Multihash of the data
println!("{}", result.filesize); // Size in bytes
Structured results¶
Every gen_*_v0 function returns a dedicated result struct carrying the ISCC code string plus
additional fields specific to each code type. This differs from the Python API which returns
dict-like objects.
use iscc_lib::gen_meta_code_v0;
let result = gen_meta_code_v0(
"Example Title",
Some("Example description"),
None,
64,
)?;
// Access fields directly on the struct
println!("{}", result.iscc);
println!("{}", result.name);
println!("{}", result.metahash);
// Optional fields use Option<T>
if let Some(desc) = &result.description {
println!("Description: {desc}");
}
if let Some(meta) = &result.meta {
println!("Meta: {meta}");
}
Result types and their fields:
| Result type | Fields |
|---|---|
MetaCodeResult |
iscc, name, metahash, description?, meta? |
TextCodeResult |
iscc, characters |
ImageCodeResult |
iscc |
AudioCodeResult |
iscc |
VideoCodeResult |
iscc |
MixedCodeResult |
iscc, parts |
DataCodeResult |
iscc |
InstanceCodeResult |
iscc, datahash, filesize |
IsccCodeResult |
iscc |
SumCodeResult |
iscc, datahash, filesize |
Fields marked with ? are Option<String> — present only when the corresponding input was
provided.
All result structs are marked #[non_exhaustive], which means fields may be added in future
versions without a breaking semver change. This is why result structs cannot be constructed directly
outside the crate.
Streaming¶
For large files, use DataHasher and InstanceHasher to process data incrementally without loading
everything into memory. Both follow the new() -> update(&[u8]) -> finalize() pattern.
DataHasher¶
use iscc_lib::DataHasher;
use std::io::Read;
let mut hasher = DataHasher::new();
let mut file = std::fs::File::open("large_file.bin")?;
let mut buf = [0u8; 65536];
loop {
let n = file.read(&mut buf)?;
if n == 0 {
break;
}
hasher.update(&buf[..n]);
}
let result = hasher.finalize(64)?;
println!("{}", result.iscc); // Identical to gen_data_code_v0(entire_file, 64)
InstanceHasher¶
InstanceHasher follows the same pattern. Its result includes datahash and filesize fields:
use iscc_lib::InstanceHasher;
use std::io::Read;
let mut hasher = InstanceHasher::new();
let mut file = std::fs::File::open("large_file.bin")?;
let mut buf = [0u8; 65536];
loop {
let n = file.read(&mut buf)?;
if n == 0 { break; }
hasher.update(&buf[..n]);
}
let result = hasher.finalize(64)?;
println!("{}", result.iscc); // Identical to gen_instance_code_v0(entire_file, 64)
println!("{}", result.datahash); // Multihash of the complete data
println!("{}", result.filesize); // Total bytes processed
Both hashers produce results identical to their one-shot counterparts for the same byte stream.
After calling finalize(), the hasher is consumed by Rust's ownership system — no runtime error is
needed to prevent reuse.
Text utilities¶
iscc-lib provides text normalization functions used internally by the code generation pipeline. These are available for preprocessing your own text inputs.
text_clean¶
Normalize text for display: applies NFKC normalization, removes control characters (except newlines), normalizes line endings, collapses consecutive empty lines, and strips leading/trailing whitespace.
use iscc_lib::text_clean;
let cleaned = text_clean(" Hello\r\n\r\n\r\nWorld ");
assert_eq!(cleaned, "Hello\n\nWorld");
text_collapse¶
Simplify text for similarity hashing: lowercases, strips whitespace, punctuation, and diacritics.
Used internally by gen_text_code_v0.
use iscc_lib::text_collapse;
let collapsed = text_collapse("Hello, World!");
assert_eq!(collapsed, "helloworld");
text_remove_newlines¶
Remove newlines and collapse whitespace to single spaces:
use iscc_lib::text_remove_newlines;
let single_line = text_remove_newlines("Hello\nWorld\nFoo");
assert_eq!(single_line, "Hello World Foo");
text_trim¶
Trim text so its UTF-8 byte size does not exceed a limit. Multi-byte characters that would be split are dropped entirely:
Algorithm primitives¶
iscc-lib exposes low-level algorithm functions as part of the public Rust API. These are building
blocks used internally by gen_*_v0 functions.
-
sliding_window(text: &str, width: usize) -> Vec<String>— Generate character n-grams from text using a sliding window of the given width. -
alg_simhash(hash_digests: &[impl AsRef<[u8]>]) -> Vec<u8>— Compute a SimHash fingerprint from a collection of byte digests. Used by Meta-Code, Audio-Code, and Mixed-Code generation. -
alg_minhash_256(features: &[u32]) -> Vec<u8>— Compute a 256-bit MinHash fingerprint from a collection of u32 feature hashes. Used by Text-Code and Data-Code generation. -
alg_cdc_chunks(data: &[u8], utf32: bool, avg_chunk_size: u32) -> IsccResult<Vec<&[u8]>>— Split data into content-defined chunks using a rolling hash. Used by Data-Code generation.
Conformance testing¶
Verify that the library produces correct results for all official test vectors:
Error handling¶
All gen_*_v0 functions return IsccResult<T>, which is an alias for Result<T, IsccError>. The
only error variant is IsccError::InvalidInput, used for malformed or out-of-range parameters. Use
standard Rust error handling with ? or match:
use iscc_lib::{gen_text_code_v0, IsccError};
match gen_text_code_v0("Hello World", 64) {
Ok(result) => println!("Generated: {}", result.iscc),
Err(IsccError::InvalidInput(msg)) => eprintln!("Invalid input: {msg}"),
}
The ? operator propagates errors in any function returning a compatible Result type.