Leone Cineplex • AI & Media Infrastructure • Nicholas Leone

The Origin of Leone Cineplex

Long before Netflix existed, I had a simple dream: every movie, song, audiobook, and book I loved, available instantly at home — on my terms, with no subscription and no one else deciding what I could watch.

It started with VHS tapes that took over floors and walls. Then came DVDs, which eventually demanded entire walls of shelving. When the DVD burning era arrived, I saw the future and made the jump to digital files stored on ZIP disks and hard drives. That decision, more than twenty years ago, set me on the path of building something most people now pay monthly for.

Today, that vision lives on LeoneNAS 7 — a complete personal media empire for my family and close friends. Thousands of movies, TV shows, albums, audiobooks, comics, and books, perfectly organized and instantly streamable to any device, anywhere we are. No algorithms. No ads. No data harvested. Just our collection — preserved, improved, and fully under our control.

The same mindset that drove me from VHS to a self-hosted media library — own it, understand it, improve it — is exactly why I built LeoneNAS 9 for AI. These two machines have become both a playground and a classroom, not just for me, but for my sons.

30,000+ movies

1,000+ TV series

275 audiobooks

20,000+ albums

9,100+ authors

20,000+ books

The Path to Local AI

While most of the world rushed to use AI through cloud services, I chose to build it myself. Over the last year and a half, I’ve been building, breaking, and rebuilding AI servers because I wanted to understand these systems from the ground up — not just consume them.

Artificial intelligence is the most significant technological shift since the internet. Those who understand how it works at the infrastructure level will have a real advantage.

LeoneNAS 9 is my dedicated AI laboratory. By running powerful models locally instead of sending everything to corporate clouds, I maintain complete sovereignty over my data, my conversations, and my knowledge.

The deepest purpose is my sons. Both already show strong technical aptitude. I want them to grow up with an intimate, hands-on understanding of AI — how to run it, modify it, and trust it. This project gives them a living classroom in systems engineering, open-source architectures, and local AI orchestration.

It’s about building something that belongs to us — a foundation of knowledge and independence they can inherit and build upon for the rest of their lives.

96 TOPS GPU / NPU

36 TOPS Discrete GPU

78 TB RAIDZ Storage

64 GB LPDDR5X RAM

DEDICATED MEDIA SERVER

LeoneNAS 7

The Media Vault

This is the heart of our family media world. Every movie night, every audiobook on a long drive, every comic my boys dive into — it all flows from here. LeoneNAS 7 is quiet, reliable, and built for exactly this: holding an enormous collection and delivering it smoothly to any device.

CURRENT ROLE

Plex

Navidrome + AudioBookshelf

Calibre + Kavita

Hardware Specs

CPU

Intel Celeron N5105

4 Cores • Up to 2.9 GHz

MEMORY

32 GB DDR4

Dual-channel

STORAGE

73 TB Usable

8-bay • Seagate Exos

NETWORK

2 × 2.5 GbE (LACP)

Bonded

Why This Hardware Works Well for Media

→ The Intel UHD iGPU handles multiple 4K transcodes so everyone can watch on different devices without buffering.
→ 73 TB of reliable storage means we never have to delete anything. The full library stays in original quality.
→ Bonded 2.5 GbE + QNAP’s stable OS keeps streaming smooth even when multiple people are using it at once.

DEDICATED AI INFERENCE SERVER

LeoneNAS 9

The AI Brain

This is my dedicated AI powerhouse and the home of "Alister" — my private AI assistant and digital twin. Every large language model, every personal knowledge base, and every future generative tool I build runs here, completely under my control.

I spent the last year and a half building, breaking, and rebuilding AI servers because I wanted to understand this technology from the inside out. Artificial intelligence is the biggest technological shift since the internet, and I want my sons to grow up knowing how these systems actually work at the infrastructure level.

132+ TOPS TOTAL AI

Hardware Specs

CPU + iGPU

Intel Core Ultra 7 255H

16 Cores • 96 TOPS NPU

DISCRETE GPU

NVIDIA A2 16 GB

36 TOPS • Optimized for inference

MEMORY

64 GB LPDDR5X-8400

Extremely fast unified memory

STORAGE + NET

78 TB • 2× 10 GbE (LACP)

ProxMox VE

Why This Hardware Excels at AI

→ The NVIDIA A2 + Intel Arc (96 TOPS) dramatically speed up LLM inference so responses feel fast and natural.
→ 64 GB of blazing-fast LPDDR5X RAM lets us load very large models with long context windows.
→ ProxMox gives flexibility to run Ollama, OpenWebUI, AnythingLLM and future tools securely.

EDUCATIONAL DEEP DIVE

How It All Works

Click any topic to expand. Everything starts closed.

What is it? Large Language Models are AI systems trained on enormous amounts of text. “Inference” is the process of the model actually thinking and generating a response in real time.

Running inference locally on LeoneNAS 9 means all of this computation happens in my house. Nothing is sent to OpenAI, Google, or Anthropic. Your conversations stay completely private.

Ollama + Llama Models

Llama is a family of powerful open-source AI models from Meta. Ollama makes it easy to download and run them locally with excellent reasoning and conversation quality.

Open WebUI

A beautiful, private ChatGPT-like interface that connects to Ollama. It supports voice, documents, multi-user access, and RAG — making advanced local AI easy for the whole family to use.

Normal LLMs have no knowledge of your specific life or documents. RAG fixes this by searching your private files first, then feeding the most relevant pieces to the model before answering. The result is grounded, accurate, and personal.

Meet "Alister"

My personal AI twin running on LeoneNAS 9. Built with AnythingLLM + Ollama. It knows my projects, business docs, family stories, and homelab knowledge — and can answer questions about them accurately.

Traditional search looks for exact word matches. Semantic search understands meaning and intent. On LeoneNAS 9, the GPU accelerates creating “embeddings” so searches across thousands of documents happen almost instantly — even when the exact words don’t match.

Beyond answering questions, generative AI can create images, artwork, event graphics, and more. On LeoneNAS 9 we run local models (Stable Diffusion, Flux, etc.) so everything stays private and under our control — no Midjourney accounts or data leaving the house.

This infrastructure represents years of learning, building, and a commitment to ownership and privacy. It’s not just servers — it’s a foundation I’m building for my sons and anyone who cares about controlling their own digital life.