Submission + - Introducing...book2screenplay from Crankshaft News from Shyamal Chandra (google.com)
advancecoder writes: Abstract:
We present book2screenplay, an end-to-end system that transforms a PDF book into (1) a production-formatted screenplay rendered with the screenplay.cls LaTeX class and (2) a narrated 4K picture-video with per-scene imagery, multi-voice text-to-speech, and optional rich caption overlays. The pipeline is implemented in Rust with tokio-based concurrency and orchestrates a hierarchy of local large language model (LLM) agents served by multiple Ollama instances. A Director agent produces a story bible and beat structure; parallel Writer agents draft Fountain-structured scenes; Continuity and Polish passes reconcile tone and formatting; image and speech stages materialise the screenplay for audiovisual output. We report architectural findings from development on a 96 GB Apple Silicon Mac Studio: memory is dominated not by the Rust orchestrator but by concurrent model residency and 4K ffmpeg encodes; explicit model eviction and KV-cache quantisation reduce peak resident set from 70–90 GB to 48 GB without sacrificing throughput on a two-server Ollama pool. We document failure modes—JSON truncation, schema drift in model output, and optional-filter absence in minimal ffmpeg builds—and the mitigations adopted (context budgeting, lenient deserialisation, capability probing). The system demonstrates that fully local, agentic book-to-film pre-visualisation is technically feasible for feature-length targets, though page-count fidelity and cross-scene continuity remain open research problems.
More details coming soon!
We present book2screenplay, an end-to-end system that transforms a PDF book into (1) a production-formatted screenplay rendered with the screenplay.cls LaTeX class and (2) a narrated 4K picture-video with per-scene imagery, multi-voice text-to-speech, and optional rich caption overlays. The pipeline is implemented in Rust with tokio-based concurrency and orchestrates a hierarchy of local large language model (LLM) agents served by multiple Ollama instances. A Director agent produces a story bible and beat structure; parallel Writer agents draft Fountain-structured scenes; Continuity and Polish passes reconcile tone and formatting; image and speech stages materialise the screenplay for audiovisual output. We report architectural findings from development on a 96 GB Apple Silicon Mac Studio: memory is dominated not by the Rust orchestrator but by concurrent model residency and 4K ffmpeg encodes; explicit model eviction and KV-cache quantisation reduce peak resident set from 70–90 GB to 48 GB without sacrificing throughput on a two-server Ollama pool. We document failure modes—JSON truncation, schema drift in model output, and optional-filter absence in minimal ffmpeg builds—and the mitigations adopted (context budgeting, lenient deserialisation, capability probing). The system demonstrates that fully local, agentic book-to-film pre-visualisation is technically feasible for feature-length targets, though page-count fidelity and cross-scene continuity remain open research problems.
More details coming soon!