Key Features
Below is a structured layout of subsections for each key feature in the project. Each feature includes placeholders for implementation details, code snippets, and diagrams. These are purely illustrative; fill them in with the actual content as needed.
Project Structure
The overall layout of the BlueprintAI project is designed to keep extension logic, Python OCR functionality, and the React-based web UI separate yet interlinked. Below is a detailed breakdown of the directory and file organization, along with short explanations of each major part. Note that there are no multiple design themes for users to pick from—only the functionality to generate or refine pages via AI prompts or screenshots.
Root-Level Files & Directories
• eslint.config.mjs, package.json, package-lock.json: Basic configuration and dependency management for the main VS Code extension. ESLint rules and Node modules for the backend logic live here.
• tsconfig.json: Manages TypeScript compilation settings for the extension codebase (excluding the web UI, which has its own config).
• README.md: Provides a high-level overview of BlueprintAI, plus instructions on setup and usage.
• python-ocr/: Houses Python scripts related to OCR. This includes:
• ocr_service.py – Invoked by the extension to run EasyOCR on user-uploaded screenshots. Outputs recognized text lines for the AI summarization step.
• requirements.txt – Lists the Python dependencies (e.g., numpy, opencv-python-headless, easyocr).
Core Extension Source (src
Folder)
The src folder contains all TypeScript code that runs in the VS Code extension context. It includes AI prompt orchestration, Python bridging, and panel creation.
• ai/ – Manages OpenAI prompts, OCR-based summarization, and bridging to the Python script. Key files:
• blueprintAiPrompts.ts – Reusable prompt templates.
• BlueprintAiService.ts – High-level methods for final CraftJS layout generation.
• getSummariesFromScreenshot.ts – Chains OCR output with UI/GUI summarization prompts.
• pythonBridge.ts – Spawns Python processes (e.g., ocr_service.py) and returns results.
• extension.ts – The main extension entry point, registering commands and setting up the webview panel.
• panels/ – Contains MainWebViewPanel.ts, defining how the React web UI is injected into the VS Code panel and how it communicates with the backend.
• utils/ – Utility modules such as extensionContext.ts (for storing context across sessions) and validateApiKey.ts (for verifying user-supplied or stored AI keys).
React Web UI (webview-ui
Folder)
The webview-ui folder encapsulates the entire front-end built with React and CraftJS. This is compiled separately (using Vite) and then embedded as the main interface within the VS Code extension’s webview.
• index.html, package.json, package-lock.json – The starting point for the React app and the local Node modules (distinct from the root-level package.json).
• postcss.config.js, tailwind.config.js – Setup for Tailwind CSS and other post-processing tools, ensuring consistent styling across the UI.
• vite.config.ts – Configuration for bundling the React source code, outputting the final webview-ui bundle.
• src/ – Contains the core React code:
• App.tsx, main.tsx, and global.css – The bootstrap logic and global styles for the web application.
• components/ – A suite of React components for the UI, including:
• AiSidebar – Where users enter text/image prompts for iterative AI changes.
• CreateWithImagination – Manages the initial “Create from text or screenshot” workflow (though final design variants are chosen by the user or AI).
• ExportMenu – Allows specifying which pages to export, generating HTML/CSS/JS, and downloading as a ZIP.
• PrimarySidebar – Left-side panel that includes LayoutTab, PagesTab, SaveModal, etc.
• PropertiesSidebar – Right-side panel for editing attributes of selected components (text, color, layout).
• SuggestedPages – Dialogs for AI-suggested page creation (e.g. “Home,” “AboutUs,” etc.).
• UserComponents – Definitions of custom draggable components (Button, Container, Image, Navigation, etc.) that appear on the CraftJS canvas.
• pages/ – High-level “page” containers, such as MainInterface (the main editor with sidebars and canvas).
• store/ – Provides global state management (e.g., store.ts) for the web UI, storing user actions, selected components, and more.
Key Workflow
When users launch the extension, extension.ts sets up the main webview by loading the compiled React app from webview-ui. Within that app, the user can:
• Generate a page from text or screenshot via the CreateWithImagination flow.
• Edit layouts on the MainInterface using Component Sidebar, Layout Sidebar, Pages Sidebar, and Properties Sidebar.
• Use AiSidebar to iteratively prompt the AI to add or modify page elements.
• Perform OCR if a screenshot is uploaded (via pythonBridge.ts), feed the text into AI summarizations, and see updated designs in real time.
• Export the final pages using ExportMenu for local storage or further development.
This layered approach separates the core AI logic (in src/ai and python-ocr) from the webview-ui front-end. It allows for easy expansion of both sides: new AI prompts or different user components can be added without disrupting the rest of the codebase.
AI Prompts
The AI prompts are designed to guide the AI in generating layouts
based on user input. Click on each prompt for more details.
You are a "UI Summarization AI" receiving raw OCR text from any type of website or application screenshot. The text may be partial, jumbled, or repeated. Your goal is to produce a **short, structured** list of lines that: 1. **Identify the UI’s domain or purpose if possible** (e.g., “YouTube” if key lines appear like “Home,” “Subscriptions,” “Trending,” “Watch later,” etc.). 2. **Group or unify lines** that are obviously connected—like a list of channel names under “Subscriptions,” or a set of recommended videos—and produce a small set of sample items. 3. **Summarize or skip** lines that are purely repeated, out of scope, or contain large blocks of text. For instance: - If many channels are listed, show only 1–2 examples, then note “(other channels omitted).” - If there’s a big list of videos, produce 2–3 example items with short “snippet” lines. 4. **Apply domain knowledge** for a known UI (e.g., YouTube has “Subscriptions” or “Trending”). If you detect “Home,” “Music,” “Gaming,” “Shorts,” it’s likely the YouTube homepage or a similar streaming/video service. Emphasize the main nav items. 5. **Redact** personal info or large amounts of text. E.g., if an OCR line includes a user’s handle or partial personal data, keep just the handle if it’s relevant as a channel name. Omit or anonymize anything sensitive. 6. **No disclaimers**: do not mention “I removed data” or “lines omitted.” Instead, incorporate placeholders or short summaries for repeated items. 7. **Avoid extraneous lines**. For example: - If you see “History” repeated multiple times, keep only one instance if it’s obviously the same nav item. - If multiple lines show partial text from overlapping UI sections or random text lumps, unify or skip them if they’re not relevant to the interface structure. 8. **Format** the final output as a short bullet or line-by-line list that reveals: - The main UI nav items (e.g., “Home,” “Subscriptions,” “Trending,” “Shorts,” “Search,” etc.). - Possibly 1–2 “example content” lines if there’s a large repeated list (channels, videos, emails). - Summaries for big blocks of text. For example, “Sample video #1: ‘Visiting the Most Expensive Grocery Store…’ snippet” or “Channel #1: Mrwhosetheboss.” - Domain-specific placeholders if recognized (like “YouTube recommended videos,” “Breaking news highlights,” etc.). 9. **Don’t** output raw lines that are obviously partial or leftover. Merge them if you can interpret them, or ignore them if they’re irrelevant. 10. **No final commentary** or disclaimers. The final output is only a short cleaned list that best represents the key UI elements plus a small set of example data. ### Special Considerations / Edge Cases: - If the screenshot text suggests “Gmail” or a “mail client,” then: - Keep nav items like “Inbox,” “Sent,” “Drafts,” “Compose.” - Summarize multiple emails as “Email #1: subject snippet…,” “Email #2: subject snippet…,” etc. - If it’s YouTube (lines like “Home,” “Shorts,” “Subscriptions,” channel names, video titles): - Group them under a short heading or keep them in bullet points, e.g., “Main navigation: Home, Shorts, Subscriptions…” - Summarize recommended videos with 1–2 examples. - If lines contain references to personal or partial data, like phone #s, credit card info, or references to partial addresses: - Omit or anonymize them (“[REDACTED]” or skip). - If the text strongly implies a certain domain (like “Breaking news,” “LA wildfires,” “Trending,” “Sky News”), it might be a news site: - Keep main nav items or top stories in short bullet points. - If multiple lines are obviously random or worthless (like “2.13 20.13 Eh? Um:. 0.54”), skip them unless you can unify them into “some numeric data” relevant to the UI. ### Final Output Example: - A bullet/line list: 1) “YouTube UI recognized” (or no explicit mention if you prefer, just keep the lines) 2) “Search,” “Home,” “Shorts,” “Subscriptions,” “Library,” etc. 3) “Channel #1: Mrwhosetheboss,” “Channel #2: MoreSidemen,” …(others omitted) 4) “Video #1: ‘We Tried EVERY TYPE OF POTATO’ snippet,” “Video #2: ‘Visiting the Most Expensive Grocery Store…’ snippet,” etc. 5) Possibly “News section: LA wildfires,” “Trending,” etc. **No disclaimers, no code fences, no mention of how you summarized**—just the cleaned lines that best reveal the UI structure plus a few key content examples.
YOU ARE A “GUI EXTRACTION AI,” SPECIALIZED IN ANALYZING WEBPAGE SCREENSHOTS. OBJECTIVE: Receive text or descriptive clues about a screenshot. From that, produce a **concise yet complete** breakdown of the **visual GUI structure**, focusing on: - Layout sections (header, banners, sidebars, main content columns, footers). - Approximate positions, relative sizes (e.g., “a full‐width banner at the top, about 300px tall”). - Prominent graphical or navigational elements (search bars, logos, key nav links). - High‐level grouping of content (“3 columns of product panels,” “left sidebar with vertical menu,” etc.). - Color themes or brand cues (“dominant orange accent,” “black header,” etc.). - Redacting any personal or sensitive data (names, personal messages) in the screenshot (or replacing them with generic placeholders if needed). IGNORE: - Detailed textual content beyond what is needed to identify the GUI element. (E.g., if you see “Your credit card ending in 5901” text, do not quote it; mention only “Payment method line in the top bar, redacted.”) - Exhaustive paragraphs or fluff from email bodies or personal data. We only care about the **interface structure**. FORMAT & STYLE: - Provide a **single, structured text** (a brief, high‐level summary) that enumerates major regions. - Each region might look like: - “**Header** (approx 70px tall, white background, logo on left, search input in center, user icon on right).” - “**Main Banner** (full width, colorful promotional image with a short slogan).” - “**Column #1** (left side, ~1/3rd width), shows vertical product list…” - etc. - Avoid disclaimers or extraneous commentary; just outline the interface. - Keep the final text **under ~300 words** if possible, focusing on the layout’s core details. EXAMPLES OF DESCRIPTIONS FOR A WEBPAGE: 1. “**Top Navigation Bar**: black background, includes left‐aligned site logo, center‐aligned search field, right‐aligned ‘Sign In’ + ‘Basket’ icons. ~60–80px tall.” 2. “**Secondary Nav**: a horizontal bar of categories below the main nav (‘All’, ‘Grocery’, ‘Electronics’). ~40px tall, dark background.” 3. “**Hero Banner**: wide, ~300–400px tall, large product image on the right, main headline on the left, orange accent color.” 4. “**Below Banner**: 3 columns of product suggestions, each ~300px wide, with white backgrounds.” 5. “**Footer**: references site disclaimers and links. ~200px tall, repeated site menu links.” NO EXTRA OUTPUT: - Do not output disclaimers, developer notes, or code. - Provide only the organized GUI layout summary, **redacting** personal user info or large private content. REMEMBER: - You are summarizing the layout in a screenshot: mention major sections, approximate positioning, color scheme, any brand cues, and relevant nav or product placeholders. - If personal data is recognized, omit or genericize it. - Keep it concise and structured.
YOU ARE "BLUEPRINT AI," A HIGHLY ADVANCED SYSTEM FOR CRAFTJS LAYOUT GENERATION. OBJECTIVE: Produce a SINGLE-PAGE layout for CraftJS as **strictly valid JSON** using only the following components (exact names): - Button - Container - Navigation - SearchBox - Slider - StarRating - Text - Video Your output must be a single JSON object with a top-level key "layout". Within it, define "type", "props", and optional "children" objects, recursively, in valid JSON. No other top-level keys are allowed besides "layout". ------------------------------------------------------------------------------- STRUCTURE: { "layout": { "type": "OneOfTheAllowedComponents", "props": { // e.g. style, text, color, data, etc. }, "children": [ // zero or more child objects, each with the same structure ] } } ------------------------------------------------------------------------------- CRITICAL REQUIREMENTS: 1) Strictly Valid JSON - No code fences or additional commentary in the final output. - Only one top-level key: "layout". 2) Single Static Page - No multi-page references or navigation to other pages. - Everything must be in one JSON object under "layout". 3) Text/Data - Combine user instructions with relevant points from the GUI summary and OCR summary. - If there are conflicts, user instructions override. - If user instructions are minimal, rely on GUI/OCR for content. - If all sources are minimal, produce a reasonable single-page layout (e.g., a typical homepage). 4) Color & Style - Use any brand or style cues indicated by user, GUI, or OCR. - Do not leave placeholders (like "#FFFFFF") if a specific color scheme is given or implied. - If a brand is mentioned (e.g., eBay, Amazon), you may incorporate typical brand colors or styling. 5) User Instructions Have Highest Priority - If the user explicitly says "make it like X," prioritize that over any conflicting details. - In absence of detail, infer or invent coherent design choices. 6) GUI Summary - May describe layout structure, color scheme, etc. - If provided, interpret it as guidelines for how to structure containers, headings, etc. - If missing, focus on user instructions and OCR text. 7) OCR Summary - Text from a screenshot may hint at brand, features, or layout. - Incorporate relevant text if it aligns with user instructions. - You can omit or shorten extraneous lines if not explicitly required. 8) Charts/Data - Since only the specified 9 components are allowed, do not add chart components (BarChart, PieChart, etc.) even if the OCR or user mentions them. - If they request a chart, you cannot fulfill that request here because those components are not in this list. 9) Images or Icons - If a brand logo or icon is relevant, use the "Icon" component with a suitable iconName. - If images are forbidden or not relevant, skip them. 10) No Extra Output - Only return valid JSON with the single "layout" key (plus any child objects). - No disclaimers or placeholders. ------------------------------------------------------------------------------- IF ANY SOURCE IS MISSING: - If no user instructions, rely on GUI/OCR. - If no GUI summary, rely on user/OCR. - If no OCR text, rely on user/GUI. - If everything is minimal, produce a simple layout with typical branding or a basic homepage. ------------------------------------------------------------------------------- USER’S TEXTUAL INSTRUCTIONS: "${userText}" GUI SUMMARY (IF ANY): ${guiExtractionData} OCR TEXT SUMMARY (IF ANY): ${ocrTextSummary} ------------------------------------------------------------------------------- DETAILED COMPONENT REFERENCE (use exactly these component names): 1) Button Props: label: string (default "Click Me") variant: "button" | "radio" (default "button") color: string (CSS color, default "#ffffff") background: string (CSS color, default "#007bff") width: string ("auto", "100px", etc., default "auto") height: string ("auto", "40px", etc., default "auto") margin: [number, number, number, number] (default [5, 5, 5, 5]) padding: [number, number, number, number] (default [10, 20, 10, 20]) radius: number (default 4) shadow: number (default 5, 0 = no shadow) border: { borderStyle?: "none" | "solid" | "dashed" | "dotted"; borderColor?: string; borderWidth?: number; } (default { borderStyle: "solid", borderColor: "#cccccc", borderWidth: 1 }) checked: boolean (default false, only if variant="radio") onClick: (e: MouseEvent) => void (no-op in JSON) Notes: - Renders "button" unless variant="radio", which renders a radio input with a label. 2) Container Props: layoutType: "container" | "row" | "section" | "grid" (default "container") background: string (CSS color, default "#ffffff") fillSpace: "yes" | "no" (default "no") width: string (default "auto") height: string (default "auto") margin: [number, number, number, number] (default [10, 10, 10, 10]) padding: [number, number, number, number] (default [20, 20, 20, 20]) shadow: number (default 5) radius: number (default 8) border: { borderStyle?: "none" | "solid" | "dashed" | "dotted"; borderColor?: string; borderWidth?: number; } (default { borderStyle: "solid", borderColor: "#cccccc", borderWidth: 1 }) flexDirection: "row" | "column" (default "row") alignItems: "flex-start" | "flex-end" | "center" | "baseline" | "stretch" | "start" | "end" (default "flex-start") justifyContent: "flex-start" | "flex-end" | "center" | "space-between" | "space-around" (default "center") gap: number (default 0, relevant if layoutType="row") flexWrap: "nowrap" | "wrap" | "wrap-reverse" (default "nowrap", relevant if layoutType="row") columns: number (default 2, relevant if layoutType="grid") rows: number (default 2, relevant if layoutType="grid") rowGap: number (default 10, relevant if layoutType="grid") columnGap: number (default 10, relevant if layoutType="grid") justifyItems: "start" | "center" | "end" | "stretch" (default "stretch") alignGridItems: "start" | "center" | "end" | "stretch" (default "stretch") Notes: - For layoutType="grid", columns/rows define the grid. - For layoutType="row", gap/flexWrap apply. 3) Navigation Props: navType: "navbar" | "sidebar" (default "navbar") displayName: string (default "MySite") background: string (CSS color, default "#ffffff") collapsible: boolean (default true) collapsedWidth: string (default "60px") expandedWidth: string (default "250px") width: string (default "200px") height: string (default "100%") linkStyle: object (default {}) highlightSelected: boolean (default true) textColor: string (CSS color, default "#333") margin: string (default "0") padding: string (default "10px") pageDisplayNames: Record"number, string" (optional) Notes: - Renders horizontal navbar or vertical sidebar. - If sidebar + collapsible=true, toggles between collapsed/expanded widths. - Do not reference multiple pages in the final layout JSON. 4) SearchBox Props: placeholder: string (default "Search...") searchText: string (default "") backgroundColor: string (CSS color, default "#ffffff") textColor: string (CSS color, default "#000000") borderColor: string (CSS color, default "#cccccc") borderWidth: number (default 1) borderStyle: string (default "solid") borderRadius: number (default 4) padding: [number, number, number, number] (default [4, 8, 4, 8]) margin: [number, number, number, number] (default [0, 0, 0, 0]) shadow: number (default 0) width: string (default "200px") height: string (default "auto") Notes: - Renders an "input" inside a styled container. 5) Slider Props: min: number (default 0) max: number (default 100) step: number (default 1) currentValue: number (default 50) orientation: "horizontal" | "vertical" (default "horizontal") width: string (default "300px") height: string (default "40px") thumbColor: string (default "#ffffff") trackColor: string (default "#0078d4") marginTop: string (default "0px") marginRight: string (default "0px") marginBottom: string (default "0px") marginLeft: string (default "0px") paddingTop: string (default "0px") paddingRight: string (default "0px") paddingBottom: string (default "0px") paddingLeft: string (default "0px") trackThickness: number (default 8) showValue: boolean (default true) valueColor: string (default "#000000") valueFontSize: string (default "14px") valueFontWeight: string (default "normal") Notes: - A simple Fluent UI-based slider. 6) StarRating Props: rating: number (default 3) maxRating: number (default 5) starColor: string (default "#FFD700") starSpacing: number (default 4) background: string (default "#ffffff") width: string (default "150px") height: string (default "50px") margin: [number, number, number, number] (default [0, 0, 0, 0]) padding: [number, number, number, number] (default [0, 0, 0, 0]) Notes: - Displays filled vs. empty stars. - Not interactive in the given code snippet. 7) Text Props: renderMode: "textbox" | "link" | "dropdown" (default "textbox") fontSize: number (default 15) textAlign: "left" | "right" | "center" | "justify" (default "left") fontWeight: string (default "500") textColor: string | { r: number; g: number; b: number; a: number } (default "#5c5a5a") shadow: number (default 0) text: string (default "Text") selectedValue: string (dropdown mode only) margin: [number, number, number, number] (default [0, 0, 0, 0]) padding: [number, number, number, number] (default [5, 5, 5, 5]) placeholder: string (default "Enter text...") fontFamily: string (default "Arial, sans-serif") background: string (default "#ffffff") multiline: boolean (default false) disabled: boolean (default false) readOnly: boolean (default false) radius: number (default 0) borderColor: string (default "#000000") borderStyle: string (default "solid") borderWidth: number (default 1) width: string (default "auto") height: string (default "auto") maxLength: number (optional) rows: number (optional) cols: number (optional) autoFocus: boolean (default false) spellCheck: boolean (default true) href: string (default "#") linkType: "externalUrl" | "page" (default "externalUrl") pageId: number (optional) linkTitle: string (optional) ariaLabel: string (optional) hasCheckbox: boolean (default false) checked: boolean (default false, if hasCheckbox=true) checkboxPosition: "left" | "right" (default "left") enableResizer: boolean (default true) Notes: - renderMode="textbox" => "input" or "textarea" if multiline=true. - renderMode="link" => "a" with href or page link. - renderMode="dropdown" => "select" from items in text split by "||". - If hasCheckbox=true, a checkbox is shown next to the text. 8) Video Props: videoId: string (default "91_ZULhScRc") width: string (default "400px") height: string (default "225px") autoplay: boolean (default false) controls: boolean (default true) interactable: boolean (default false) Notes: - Embeds a YouTube player with react-player. ------------------------------------------------------------------------------- IMPORTANT: - Replace "\${userText}", \${guiExtractionData}, and \${ocrTextSummary} in your final code with the actual user input, GUI summary, and OCR text if applicable. - Output ONLY the JSON with a single "layout" key and any nested children. No disclaimers, no extra keys. - Merge references from user, GUI, and OCR. If brand cues are given, incorporate them logically. - If minimal data, create a sensible layout with the above components in typical sections (e.g., header Navigation, main Container, optional SearchBox, etc.). - The final layout text must be in English. ------------------------------------------------------------------------------- ADDITIONAL INSTRUCTION: After you produce the final CraftJS JSON layout with the single top-level "layout" key, also provide a separate output containing "suggestedPageNames". This output should be in the form of an array-like structure, for example: {"Home", "AboutUs", "ContactUs"}. These are future page ideas relevant to the design. Do NOT reference them within the final CraftJS JSON layout itself. They should appear as a separate data structure after the JSON layout is complete.
Prompt Explanations
1) UI Summarization Prompt
This prompt is used when Blueprint AI needs a concise text outline
of elements discovered through OCR on a website or application
screenshot. It instructs the AI to analyze raw or possibly jumbled
lines of text (such as navigation items or repeated labels) and turn
them into a short, bullet-style breakdown of relevant UI features.
Specifically, it groups similar lines (e.g., channel names, repeated
links), filters out duplicates and large text blocks, and redacts
personal or sensitive data. Everything is kept minimal and
domain-focused, illustrating key UI items like “Home,”
“Subscriptions,” or “Recommended Videos” when the domain is
recognized. The end result is a compact textual summary that
captures main interface elements without disclaimers or extraneous
commentary.
2) GUI Summarization Prompt
This prompt targets a higher-level, visual or structural overview of
the screenshot. Rather than focusing on text lines, it describes the
general layout (headers, footers, banners, columns, color themes),
approximate positioning, and major sections (e.g., a full-width hero
banner, left sidebar, or row of product cards). Sensitive or
personal text is either omitted or replaced with generic
placeholders. The response intentionally concentrates on the graphic
design elements, mentioning the presence of search boxes,
navigational bars, or color highlights. The resulting summary is
kept short but thoroughly outlines the GUI structure, ensuring
developers or users have a clear visual map of the screenshot’s
layout.
3) Final CraftJS Layout Prompt
This is the core directive that merges everything into a valid,
single-page CraftJS layout in strict JSON format. It tells the AI
how to create a final “layout” object using only the allowed
components (e.g., Container, Button,
Text, Navigation, etc.). The AI must synthesize
user instructions, plus any prior OCR or GUI summaries, to build a
single JSON object that stands for one complete page design. All
style, positioning, and text content stems from either the user
prompt or the summarized data. The output must be valid JSON with no
code fences or extra keys beyond “layout.” In addition, there is a
secondary array-like structure indicating suggested page names for
future expansions, but these are provided outside the main “layout”
JSON. This final prompt ensures the design is strictly formed,
referencing props for each component according to the project’s
requirements, and it also respects the user’s highest-priority
instructions if conflicts arise.
The “Backend” Explained
In Blueprint AI, the entire system responsible for OCR extraction, AI prompting, and final layout generation exists behind the scenes as a “backend” flow that orchestrates multiple key steps. Though the user primarily sees a visual editor and a chat-like interaction, a collection of TypeScript modules and a Python script coordinate to deliver final results. Below is an extensive, step-by-step explanation of how each relevant piece is structured, without showing full code, so you can understand precisely what happens whenever you request an AI-generated layout:
Overall Conceptual Flow
At a high level, the user’s textual instructions and optional screenshot enter the
system via getBlueprintLayout(...)
. This triggers a multi-stage process:
(1) We run OCR on the screenshot if provided, (2) we generate short
textual summaries of both the UI (from OCR) and the GUI structure, and (3) we
combine everything (including user instructions) into a single, final CraftJS layout
JSON. The “backend” is what executes these steps, calling the appropriate prompts
and bridging to Python for OCR as needed.
1) blueprintAiClient.ts
This file contains three main functions—getUiSummary
,
getGuiSummary
, and getFinalCraftJsLayout
—each referencing
a distinct “meta prompt” from blueprintAiPrompts.ts
. It also has a small
helper named callChatGPT
that handles Axios-based API calls to OpenAI.
Together, these pieces let the backend dispatch prompts with the correct system
instructions and user content:
•
getUiSummary(...)
: Sends raw OCR text (and possibly
a truncated, base64-encoded screenshot) to OpenAI with the
UI_SUMMARY_META_PROMPT
, receiving back a short bullet-style summary
of the recognized interface text. This is specifically meant to highlight items
like “navigation,” “list of categories,” or “footer links,” while skipping
extraneous lines.
•
getGuiSummary(...)
: Concentrates on layout structure.
It encodes only the screenshot in base64 (again truncated, if too large) and
applies the GUI_SUMMARY_META_PROMPT
, retrieving a high-level
description of the visual design (e.g., “a tall header, left sidebar, multi-column
content”). This text is more about structure and color or brand cues, less about
the line-by-line text content.
•
getFinalCraftJsLayout(...)
: The culminating step
that merges user instructions, UI summary, and GUI summary
into a valid JSON layout for CraftJS. It uses the FINAL_CRAFTJS_META_PROMPT
as a “system” prompt and injects all relevant textual data. The AI’s response is
strictly JSON, containing a “layout” key with nested CraftJS components like
Container
, Button
, Text
, etc.
callChatGPT
is a simple utility that sets up the “system” and
“user” roles, along with the chosen OpenAI model (e.g. gpt-3.5-turbo), and
handles returning the raw text from ChatGPT’s response.
2) BlueprintAiService.ts
This file defines an exported function named getBlueprintLayout(...)
that
acts as the project’s main AI entry point. Whenever the user wants a new CraftJS layout
(possibly with a screenshot for reference), we do:
(a) call getSummariesFromScreenshot
to produce both the UI
summary and GUI summary, then (b) pass those summaries along with the user’s
text to getFinalCraftJsLayout
. Here is its conceptual structure:
•
Calls getSummariesFromScreenshot
: If a screenshot was provided,
it runs OCR to get the raw text lines, then uses getUiSummary
and
getGuiSummary
to convert them into two distinct short-form texts. If
no screenshot is passed in, the UI summary is minimal and the GUI summary is empty.
•
Calls getFinalCraftJsLayout
: Takes userText
plus
those two summaries, feeding them into the FINAL_CRAFTJS_META_PROMPT
.
The AI returns strictly valid JSON in the shape of a single-page CraftJS layout.
• Returns the final JSON: The raw string (with “layout”: { ... } ) can then be used by the front-end to render or store the new design.
3) getSummariesFromScreenshot.ts
This is where the extension decides how to handle the screenshot, if any:
•
runPythonOcr
is called if rawScreenshot
was provided,
returning an array of recognized text objects. These text items (with bounding box
info and confidence scores) are joined into a single string. If no screenshot is
present, we skip OCR altogether.
•
getUiSummary
is invoked with that joined text and the screenshot
buffer. The AI returns a condensed bullet list describing text-based UI elements.
•
getGuiSummary
is invoked only if a screenshot is actually there, to
produce the layout-based summary. If the screenshot is missing, we leave the GUI
summary empty.
• Both summaries are returned in an object so the caller can decide how to apply them next—most often passing them to the final layout generation step.
4) pythonBridge.ts
This TypeScript file encapsulates how Blueprint AI triggers a Python script for OCR:
•
Locating Python & the Script: We check for a local Python environment
(under python-ocr/venv/Scripts/python.exe
) and the presence of
ocr_service.py
in the same folder. If either is missing, an error is
thrown.
•
Writing a Temporary Screenshot: The raw Buffer
from
rawScreenshot
is written to a random temp filename in the
globalStoragePath
, ensuring we have a real image file for the Python
script to process. Once done, the file is removed.
•
Spawning the OCR Process: We run the Python script with the temp image
path, collecting stdout
and stderr
. If the script
completes with exit code 0, stdout
is parsed as JSON to produce an
array of OCR result objects. If there's an error or non-zero exit code, we handle
it gracefully, optionally showing a VSCode error message.
5) ocr_service.py
This Python script forms the final link in the OCR chain. It:
• Loads and optionally upscales the image to help with small fonts, then converts it to grayscale.
• Runs EasyOCR in paragraph mode, returning recognized lines with confidence scores. Each recognized block contains bounding box coordinates plus the text itself.
•
Outputs JSON to stdout
, which the TypeScript code then
parses. This JSON array represents each recognized line or block of text that
might factor into the subsequent UI summarization.
Putting It All Together
The chain of function calls and modules described above is what drives the “backend”
logic for Blueprint AI. The user sees a single “Generate Layout” action, but behind the
scenes:
• If a screenshot is included, the system writes it to disk, calls Python OCR, and obtains the recognized text lines.
•
We pass that text and the screenshot to getUiSummary
(which uses the
UI_SUMMARY_META_PROMPT
) and getGuiSummary
(which uses
GUI_SUMMARY_META_PROMPT
) for more refined summarizations.
•
Finally, we merge everything with the user’s instructions in
getFinalCraftJsLayout
(guided by FINAL_CRAFTJS_META_PROMPT
)
to produce a single-page CraftJS layout in JSON form.
Code Samples
1) blueprintAiClient.ts
import axios from 'axios';
import {
UI_SUMMARY_META_PROMPT,
GUI_SUMMARY_META_PROMPT,
FINAL_CRAFTJS_META_PROMPT,
} from './blueprintAiPrompts';
/**
* Simple helper to call OpenAI ChatGPT with Axios.
* - Expects process.env.OPENAI_API_KEY to be defined.
* - Uses the gpt-3.5-turbo (or gpt-4 if you have access).
*/
async function callChatGPT(
systemPrompt: string,
userPrompt: string
): Promise {
const apiKey = process.env.OPENAI_API_KEY;
if (!apiKey) {
throw new Error('Missing OPENAI_API_KEY environment variable.');
}
// Example: using GPT-3.5-Turbo
const model = 'gpt-3.5-turbo';
try {
const response = await axios.post(
'https://api.openai.com/v1/chat/completions',
{
model,
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt },
],
temperature: 0.7,
},
{
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${apiKey}`,
},
}
);
const rawText = response.data.choices?.[0]?.message?.content;
return rawText ? rawText.trim() : '';
} catch (error: any) {
console.error('Error calling ChatGPT:', error?.response?.data || error);
throw new Error(
`OpenAI API error: ${
error?.response?.data?.error?.message || error.message
}`
);
}
}
/**
* If needed, convert the screenshot to a truncated base64 string.
* Helps avoid huge prompts that might exceed token limits.
*/
function maybeBase64EncodeScreenshot(screenshot?: Buffer): string | undefined {
if (!screenshot) {
return undefined;
}
const base64 = screenshot.toString('base64');
// Truncate the base64 string if it's too large.
// E.g. limit to 100k characters (arbitrary).
const maxLength = 100_000;
if (base64.length > maxLength) {
return base64.slice(0, maxLength) + '...[TRUNCATED BASE64]';
}
return base64;
}
/**
* Summarizes OCR text as a short, structured list of UI lines.
* Uses the UI_SUMMARY_META_PROMPT plus optional screenshot data.
*/
export async function getUiSummary(params: {
text: string;
screenshot?: Buffer;
}): Promise {
const { text, screenshot } = params;
const base64Screenshot = maybeBase64EncodeScreenshot(screenshot);
// We'll treat UI_SUMMARY_META_PROMPT as the "system" role for guidance,
// and the user content includes both the OCR text and optional base64 data.
const systemPrompt = UI_SUMMARY_META_PROMPT;
const userPrompt = `
Screenshot (base64, optional):
${base64Screenshot ? base64Screenshot : '[No screenshot provided]'}
=== RAW OCR TEXT ===
${text}
`;
return await callChatGPT(systemPrompt, userPrompt);
}
/**
* Extracts or summarizes the GUI structure from a screenshot only.
* Uses the GUI_SUMMARY_META_PROMPT plus the screenshot in base64 form.
*/
export async function getGuiSummary(params: {
screenshot: Buffer;
}): Promise {
const { screenshot } = params;
const base64Screenshot = maybeBase64EncodeScreenshot(screenshot);
const systemPrompt = GUI_SUMMARY_META_PROMPT;
const userPrompt = `
Screenshot (base64):
${base64Screenshot}
[No OCR text provided for GUI extraction—just the screenshot structure.]
`;
return await callChatGPT(systemPrompt, userPrompt);
}
/**
* Generates a single-page CraftJS layout JSON using the final meta prompt.
* Combines user instructions + the extracted UI & GUI summaries (if any).
* - userText: the user's own instructions
* - uiSummary: result from getUiSummary (possibly empty)
* - guiSummary: result from getGuiSummary (possibly empty)
*/
export async function getFinalCraftJsLayout(params: {
userText: string;
uiSummary: string;
guiSummary: string;
}): Promise {
const { userText, uiSummary, guiSummary } = params;
// We'll treat FINAL_CRAFTJS_META_PROMPT as the "system" role again.
// Then pass in the placeholders via the user prompt.
const systemPrompt = FINAL_CRAFTJS_META_PROMPT;
// Insert the relevant data into the "user" content:
const userPrompt = `
USER’S TEXTUAL INSTRUCTIONS:
"${userText}"
GUI SUMMARY (IF ANY):
${guiSummary}
OCR TEXT SUMMARY (IF ANY):
${uiSummary}
`;
return await callChatGPT(systemPrompt, userPrompt);
}
This TypeScript file provides a set of utility functions
that enable interaction with OpenAI for multiple tasks. It imports three dedicated
prompts—UI_SUMMARY_META_PROMPT, GUI_SUMMARY_META_PROMPT,
and FINAL_CRAFTJS_META_PROMPT—to guide the AI in producing either a text-based
UI summary, a structural GUI summary, or a final CraftJS layout. The core function
callChatGPT
handles Axios-based requests, ensuring we properly supply
system and user role messages. Additionally, maybeBase64EncodeScreenshot
helps prepare screenshot data in manageable chunks for the prompt to avoid input size
overloads. With getUiSummary
, we specifically format and send the OCR
text plus any screenshot snippet to OpenAI, while getGuiSummary
focuses
purely on structural representation. Finally, getFinalCraftJsLayout
is
where all prior context (user instructions, UI text, and GUI structure) are merged
and passed to OpenAI to receive a valid JSON layout. This interplay of prompts ensures
each step is distinct and optimized for its respective summarization or generation
goal.
2) BlueprintAiService.ts
/*
* BlueprintAiService.ts
* Demonstrates orchestrating:
* 1) Summaries from screenshot (OCR + UI + GUI).
* 2) CraftJS layout generation.
*/
// import { getSummariesFromScreenshot } from './getSummariesFromScreenshot';
// import { getFinalCraftJsLayout } from './blueprintAiClient';
/**
* Main function that drives the AI generation of a final CraftJS layout.
* @param params.userText - The user's textual instructions or prompt.
* @param params.rawScreenshot - An optional Buffer of the screenshot.
* @returns A Promise containing the final JSON layout for CraftJS.
*/
export async function getBlueprintLayout(params: {
userText: string;
rawScreenshot?: Buffer;
}): Promise {
// 1) Gather UI + GUI summaries from the screenshot (if provided).
// - getSummariesFromScreenshot internally calls Python OCR for recognized text
// and uses the AI prompts (UI_SUMMARY_META_PROMPT, GUI_SUMMARY_META_PROMPT).
const { uiSummary, guiSummary } = await getSummariesFromScreenshot({
rawScreenshot: params.rawScreenshot,
});
// 2) Pass the user instructions plus the two summaries into the final CraftJS meta prompt,
// returning a single-page JSON layout that references any brand or structural clues.
const craftJsJson = await getFinalCraftJsLayout({
userText: params.userText,
uiSummary,
guiSummary,
});
// 3) Return the JSON string, typically used by the front-end to render or store the new layout.
return craftJsJson;
}
This service file coordinates all backend actions for generating
the final layout. It first calls getSummariesFromScreenshot
, which
handles OCR and AI summarizations (UI text analysis and GUI structure), then sends
those results to getFinalCraftJsLayout
. The combined data—user
instructions, UI summary, and GUI summary—allows Blueprint AI to produce a valid,
single‐page CraftJS JSON layout that reflects both the content and design cues
extracted from the screenshot.
3) getSummariesFromScreenshot.tsx
/*
* getSummariesFromScreenshot.tsx
* This file orchestrates how we extract and interpret screenshot data
* for both textual (UI) and structural (GUI) summaries.
*/
// import { runPythonOcr } from './pythonBridge';
// import { getUiSummary, getGuiSummary } from './blueprintAiClient';
// interface OcrResult {
// text: string;
// confidence: number;
// bbox: [number, number, number, number];
// }
// interface SummariesRequest {
// // Optional screenshot buffer provided by the user.
// rawScreenshot?: Buffer;
// }
// interface SummariesResponse {
// // Summarized text from recognized lines (e.g., nav links, category lists).
// uiSummary: string;
// // Summarized layout structure (e.g., columns, header, color scheme).
// guiSummary: string;
// }
/**
* Main function for extracting and summarizing content from a screenshot.
* 1) Optionally runs Python-based OCR (if rawScreenshot is provided).
* 2) Calls getUiSummary to transform recognized text into a short bullet list.
* 3) Calls getGuiSummary to describe the overall GUI layout (header, columns, etc.).
* 4) Returns both summaries to the caller for further usage (e.g., final layout generation).
*/
export async function getSummariesFromScreenshot(
request: SummariesRequest
): Promise {
let recognizedText = '';
if (request.rawScreenshot) {
// 1) OCR step: This typically involves spawning Python, reading the image,
// and returning recognized lines. We'll store the joined text in recognizedText.
recognizedText = 'Extracted text lines via runPythonOcr...';
}
// 2) The UI summary is a short bullet-style text capturing the major interface items.
const uiSummary = await getUiSummary({
text: recognizedText,
screenshot: request.rawScreenshot,
});
// 3) The GUI summary focuses on layout and visual structure. We only run it if we have an actual screenshot.
let guiSummary = '';
if (request.rawScreenshot) {
guiSummary = await getGuiSummary({ screenshot: request.rawScreenshot });
}
// 4) Return both summaries, so higher-level code can decide how to merge them into a final layout.
return { uiSummary, guiSummary };
}
This module begins by determining whether a screenshot is present.
If yes, it invokes runPythonOcr
to obtain recognized text lines. That text
is then passed to getUiSummary
for a concise bullet list of UI elements
(like navigation items). Simultaneously, if the screenshot is available, we generate
a high‐level layout description via getGuiSummary
. The caller ultimately
receives two summaries—one for UI text, another for structural layout—allowing the
system to combine both in the final layout generation step.
4) pythonBridge.ts
import * as vscode from 'vscode';
import * as fs from 'fs';
import * as path from 'path';
import { spawn } from 'child_process';
import { randomBytes } from 'crypto';
import { getExtensionContext } from '../utils/extensionContext'; // <--- IMPORTANT
/**
* Runs the Python OCR script using the screenshot buffer as input.
*
* Expects:
* /python-ocr/venv/Scripts/python.exe
* /python-ocr/ocr_service.py
*
* Returns an array of OCR result objects parsed from the Python script's stdout,
* where each object typically includes:
* - text: The recognized string from the image
* - confidence: A floating-point confidence score
* - bbox: An array [minX, minY, maxX, maxY] bounding the recognized text
*/
export async function runPythonOcr(screenshotBuffer: Buffer): Promise {
// 1) Retrieve our extension context from the shared manager.
// This helps us access the extension's installation root and storage paths.
const extensionContext = getExtensionContext();
// 2) The extension's root directory where python-ocr folder resides.
const extensionRoot = extensionContext.extensionUri.fsPath;
// 3) Build the full paths to the Python executable & the OCR script:
// python.exe is assumed under the venv, while ocr_service.py performs EasyOCR.
const pythonPath = path.join(
extensionRoot,
'python-ocr',
'venv',
'Scripts',
'python.exe'
);
const scriptPath = path.join(extensionRoot, 'python-ocr', 'ocr_service.py');
// 3A) Verify both python.exe and the script exist to avoid runtime issues.
if (!fs.existsSync(pythonPath)) {
throw new Error(\`Cannot find Python interpreter at: \${pythonPath}\`);
}
if (!fs.existsSync(scriptPath)) {
throw new Error(\`Cannot find OCR script at: \${scriptPath}\`);
}
// 4) Generate a unique temporary file name for the screenshot (PNG).
const tempName = \`temp_screenshot_\${randomBytes(4).toString('hex')}.png\`;
// By using globalStoragePath, we ensure a consistent place to store files,
// even if a user has no local workspace open.
const tempFilePath = path.join(extensionContext.globalStoragePath, tempName);
// 4A) Create the globalStoragePath folder if it doesn't exist, to ensure we can write a file.
if (!fs.existsSync(extensionContext.globalStoragePath)) {
fs.mkdirSync(extensionContext.globalStoragePath, { recursive: true });
}
// 4B) Write the screenshot data to the temp file so the Python script has a real image to read.
try {
fs.writeFileSync(tempFilePath, screenshotBuffer);
} catch (err) {
throw new Error(\`Failed to write temp file at \${tempFilePath}: \${err}\`);
}
// 5) Spawn the Python process, passing the script and the temp file path as arguments.
return new Promise((resolve, reject) => {
const pyProcess = spawn(pythonPath, [scriptPath, tempFilePath], {
cwd: extensionRoot, // Ensures correct working directory for the script
});
let stdoutData = '';
let stderrData = '';
pyProcess.stdout.on('data', (chunk) => {
stdoutData += chunk.toString();
});
pyProcess.stderr.on('data', (chunk) => {
stderrData += chunk.toString();
});
// Handle the script's exit event:
pyProcess.on('close', (code) => {
// Attempt to remove the temp file, whether success or fail.
try {
fs.unlinkSync(tempFilePath);
} catch (cleanupErr) {
console.warn(\`Warning: Failed to remove temp file: \${tempFilePath}\`, cleanupErr);
}
if (code === 0) {
// If exit code 0, parse JSON from stdout.
try {
const results = JSON.parse(stdoutData);
resolve(results);
} catch (err) {
reject(
new Error(
\`Failed to parse JSON output from Python OCR script.\n\` +
\`Error: \${err}\n\nRaw stdout:\n\${stdoutData}\`
)
);
}
} else {
// Non-zero exit code => some error occurred during OCR or script execution.
const errorMessage =
\`Python OCR script exited with code \${code}.\n\` +
\`stderr:\n\${stderrData.trim()}\n\` +
\`stdout:\n\${stdoutData.trim()}\n\` +
\`Check that your python-ocr setup is correct.\`;
// Optionally display a VSCode UI message for clarity.
vscode.window.showErrorMessage(errorMessage);
reject(new Error(errorMessage));
}
});
// If the Python process fails to spawn at all:
pyProcess.on('error', (err) => {
try {
fs.unlinkSync(tempFilePath);
} catch (cleanupErr) {
console.warn(\`Warning: Failed to remove temp file: \${tempFilePath}\`, cleanupErr);
}
reject(new Error(\`Failed to spawn Python OCR process: \${err}\`));
});
});
}
This file provides the link between Blueprint AI and its Python-based
OCR workflow. After writing the screenshot buffer to a temporary file, the runPythonOcr
function spawns a Python process to run ocr_service.py
. That script uses
EasyOCR to detect text in the image, returning a JSON array of recognized lines (including
confidence scores and bounding boxes). Upon successful completion, the resulting text
blocks are parsed and sent back to the TypeScript layer for further summarization by
the AI prompts. Any error or Python exit code mismatch is handled gracefully, ensuring
the system can surface useful debug info if something goes wrong.
5) ocr_service.py
import sys import json import cv2 import numpy as np import easyocr def upscale_if_needed(img_bgr, min_width=1200): """ If the image width is below min_width, scale it up by a factor that ensures at least min_width. Helps EasyOCR see small fonts better. """ h, w = img_bgr.shape[:2] if w < min_width: scale_factor = min_width / w new_w = int(w * scale_factor) new_h = int(h * scale_factor) img_bgr = cv2.resize(img_bgr, (new_w, new_h), interpolation=cv2.INTER_CUBIC) return img_bgr def minimal_preprocess(image_path: str): """ Minimal approach: 1) Load color image with OpenCV 2) If width < 1200, upscale 3) Convert to grayscale (No further morphological or thresholding to avoid corrupting simpler images.) """ img_bgr = cv2.imread(image_path, cv2.IMREAD_COLOR) if img_bgr is None: raise ValueError(f"Could not load image from: {image_path}") # 1) Upscale if needed img_bgr = upscale_if_needed(img_bgr, min_width=1200) # 2) Convert to grayscale gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY) return gray def run_easyocr(numpy_image): """ Use EasyOCR with GPU if available (fallback CPU). paragraph=True merges lines into blocks for complicated text. """ # ‘verbose=False’ to skip progress bars that sometimes cause Unicode issues on Windows reader = easyocr.Reader(['en'], gpu=True, verbose=False) results = reader.readtext(numpy_image, detail=1, paragraph=True) return results def main(): # Attempt to set stdout to UTF-8 on Windows, in case the console is CP1252 try: sys.stdout.reconfigure(encoding='utf-8') except Exception: pass if len(sys.argv) < 2: print("Usage: ocr_service.py", file=sys.stderr) sys.exit(1) image_path = sys.argv[1] # 1) Minimal Preprocessing processed = minimal_preprocess(image_path) # 2) Perform OCR ocr_results = run_easyocr(processed) # 3) Build structured results, handling variable output formats output_data = [] for result in ocr_results: # result might be (coords, text) or (coords, text, conf) if not isinstance(result, (list, tuple)): continue if len(result) < 2: continue coords = result[0] text = result[1] confidence = result[2] if len(result) >= 3 else 1.0 # coords => bounding box corners xs = [pt[0] for pt in coords] ys = [pt[1] for pt in coords] min_x, max_x = int(min(xs)), int(max(xs)) min_y, max_y = int(min(ys)), int(max(ys)) output_data.append({ "text": text, "confidence": float(confidence), "bbox": [min_x, min_y, max_x, max_y], }) # 4) Print JSON to stdout print(json.dumps(output_data, ensure_ascii=False)) if __name__ == "__main__": main()
This Python script performs OCR using EasyOCR in a
stepwise fashion. It first upscales images below a certain width (improving recognition
on small fonts), then converts them to grayscale to simplify processing. The
run_easyocr
function is configured with paragraph=True
to
merge lines into blocks for more coherent results. Finally, each recognized line is
collected along with its bounding box and confidence score, and output as structured
JSON to stdout
. This JSON is then consumed by the TypeScript layer via
pythonBridge.ts
.
The Frontend
ExportEditorView Component
Loading export editor view code...
The ExportEditorView component serves as a post-processing workspace where users can review, refine, and finalize the HTML/CSS code generated by the ExportMenu
. Upon receiving the initial HTML and CSS, it uses Monaco Editor to provide a rich, code-centric editing environment. This approach facilitates direct manipulation of the exported layout, giving developers the flexibility to tweak or enhance their pages before saving.
Beyond simply displaying the raw code, the ExportEditorView runs a specialized routine that gathers computed styles from every element within #droppable-canvas-border
. These styles, which include the precise browser-calculated CSS properties, are then beautified and appended to the existing stylesheet. This ensures that any responsive or dynamic changes made during the design process are accurately captured in the final export.
The component also features a download as ZIP function, bundling the updated HTML and CSS files into a compressed archive. This allows users to conveniently store and share their designs. By incorporating JSZip and FileSaver, the ExportEditorView automates the packaging process, minimizing manual file handling.
Ultimately, the ExportEditorView component bridges the gap between raw layout output and a polished, ready-to-use design asset. It elevates the user experience by integrating real-time code editing and practical file export capabilities, thereby enhancing the efficiency and completeness of the Craft.js editor’s export workflow.
ContainerComponent
Loading container component code...
The Container component is a flexible, multi-purpose layout element specifically designed for use with Craft.js. It supports four distinct layout types (container, row, section, and grid) to accommodate various design scenarios—ranging from basic flex boxes to more complex grid structures. By merging default properties and user-defined settings, the Container makes it straightforward to configure margins, padding, borders, shadows, and background colors, ensuring visually appealing and well-organized interfaces.
This component leverages Craft.js hooks like useNode
to integrate seamlessly with the editor environment, allowing developers to drag, drop, and resize elements within a live editing interface. For instance, if the Container is not the root element, it wraps its contents in a custom Resizer
component, enabling users to manually adjust its dimensions. If it is the root, it enforces constraints and styles that differentiate it from child containers.
Each layout type is accompanied by relevant style properties—like gap, flex direction, grid columns, or row gap—giving precise control over element alignment and spacing. This approach keeps your layouts adaptable and modular, while the Container component’s user-friendly settings panel (exposed through
In Blueprint AI, all persistent data is stored locally in the user's
browser via
ContainerProperties
) simplifies customization.
Data storage
Loading dataStore code...
localStorage
—specifically keyed under
"blueprint-ai-data"
. This local storage mechanism is
powered by the logic in our store.ts
file and is designed
to conditionally load and save project data for every session of the
Blueprint AI extension. This allows for immediate retrieval of user
preferences and project details within the same browser environment,
without reliance on external databases or cloud services. Everything
below describes precisely how Blueprint AI performs, updates, and
retrieves these data fields from localStorage
, always
conditioned on the logic in store.ts
:
1) BlueprintAI Store State Shape:
• Our local store is defined by the
StoreState
interface in store.ts
, which
includes exactly four fields: pages
,
selectedPageId
, suggestedPages
, and
userPrompt
.
• The pages
array is an exhaustive list of
all the user-created or AI-suggested pages, each represented by the
Page
interface (id
, name
, an
optional thumbnail
, and the layout
tree in
CraftJS JSON). By default, only one page exists (id: 1
,
named “Page 1”). Blueprint AI conditionally populates this array each
time the user or the AI system adds or modifies a page.
• The selectedPageId
indicates which page is
currently being edited in the Blueprint AI interface. This
conditional pointer ensures that the design canvas,
properties sidebar, and other features always reference the
appropriate page.
• The suggestedPages
array holds additional
recommended page names (e.g., “Account,” “Buy Again,” “Best Sellers,”
“Returns & Orders”) that Blueprint AI proposes to the user. These
suggestions are surfaced in the Pages Sidebar or within other modals
to guide potential new pages the user may want to generate.
• The userPrompt
string is a flexible area
for saving any text prompt that the user entered in the AI-driven
flows (such as designing a new layout, adjusting an existing design,
or describing new features). Each time a user interacts with the
iterative AI chat or the “Create With Imagination” page builder,
Blueprint AI conditionally updates userPrompt
so that it
remains accessible across sessions.
2) Default Local State:
• The initial data structure is declared inside
storeState
in store.ts
. This default
includes one sample page and an empty userPrompt
—ensuring
a consistent starting point for first-time or reset sessions in
Blueprint AI. The store is primed with four default suggested pages.
This ensures that even before the user creates or loads anything,
there's a clear reference in the UI to build from.
• Blueprint AI only populates local storage with these
defaults if no prior saved data exists under
"blueprint-ai-data"
. If there is existing data, the store
merges the fields from local storage into memory conditionally.
3) Conditional Loading at Startup:
• On every launch of the Blueprint AI extension, the code
attempts to retrieve the JSON string from
localStorage.getItem(STORAGE_KEY)
. If
savedData
is non-null, it conditionally parses the string
and merges each key into the current storeState
. For
example, if the parsed data has pages
, it updates
storeState.pages
; if it has selectedPageId
,
it sets that too, etc.
• If the user had previously created multiple pages or
typed in a multi-sentence prompt, all of that is immediately reloaded
into the Blueprint AI interface on extension open. This ensures a
frictionless user experience where previous session designs or AI
prompts are restored exactly as they left them.
4) Accessing Stored Data (Getters):
• Blueprint AI uses dedicated
getter functions from store.ts
to conditionally
read data from memory, such as getPages()
for the full
pages list, getSelectedPage()
for the currently active
page object, getSuggestedPages()
for recommended page
names, and getUserPrompt()
for the last user prompt.
Because the store synchronizes to local storage on demand, these
getter calls reflect precisely what's persisted in the browser when
saved.
• For example, when the user opens the
Pages Sidebar in Blueprint AI, the application calls
getPages()
to render the entire list of local pages.
Likewise, the AI Chat Flow reads
getUserPrompt()
to show the user’s most recent text input in the chat or iteration
interface.
5) Handling State Changes (Subscriptions):
• Multiple arrays of listener functions exist
within store.ts
, each of which is notified conditionally
when a relevant section of the store changes (e.g.,
pageListeners
, selectedPageListeners
, and
promptListeners
). This ensures that whenever the user or
the AI modifies the layout or updates the user prompt, the
corresponding parts of the Blueprint AI interface re-render
automatically.
• By subscribing to pageListeners
, any UI or
logic that depends on the array of pages or suggested pages will be
refreshed. Similarly, components reliant on which page is currently
selected subscribe to selectedPageListeners
, and features
tied to user input text watch promptListeners
. This
subscription model helps maintain a dynamic, reactive environment for
the entire Blueprint AI design experience.
6) Updating and Saving (Mutations):
• setPages(newPages)
replaces the entire
local pages
array with a new list. For example, the AI
might generate a fresh layout for the user’s “Buy Again” page, and in
response, setPages
stores the updated structure. In
Blueprint AI, once the user finalizes or accepts an AI response, the
relevant page is replaced or appended.
• updatePage(id, partialData)
merges changes
into a particular Page
object, such as if the user
updates the name from “Page 1” to “Home Page,” or modifies the layout
JSON with an AI-generated snippet. This function is used heavily in
any direct manipulation of a single page (dragging a component in the
CraftJS canvas, etc.).
• setSelectedPageId(id)
changes which page is
currently active. For instance, if the user navigates from “Page 1” to
“Best Sellers,” setSelectedPageId
updates the local store
and triggers selectedPageListeners
to recast the design
canvas.
• setSuggestedPages(newPages)
is called
conditionally when the AI or user wants to refresh the recommended
page list. Blueprint AI might push new suggestions after seeing what
the user typed into the AI Chat. This ensures
Pages Sidebar always shows relevant next-page ideas.
• setUserPrompt(newPrompt)
is invoked
whenever the user edits the text prompt or when the AI modifies it for
iterative flows. The store updates
userPrompt
accordingly, and the entire system can respond
in real time.
7) Local Persistence Workflow:
• At any point after these setter or updater functions
run, the
saveStoreToLocalStorage()
function can be called to write
the current storeState
object back into
localStorage
. Internally, it uses
JSON.stringify
on the entire store (pages,
selectedPageId, suggestedPages, userPrompt) and places it under the
key STORAGE_KEY
, i.e. "blueprint-ai-data"
.
• Because saving happens conditionally upon user
interactions or explicit calls, no large overhead or complex logic is
needed. The user can also trigger a “Save Locally” button from within
Blueprint AI’s main sidebar, which calls
saveStoreToLocalStorage()
in the background.
8) Resetting the Store:
• When the user requests a full reset—perhaps by hitting
“Refresh All Pages” or “Clear Storage”—Blueprint AI calls
clearStoreFromLocalStorage()
. This removes the entire
key/value pair from localStorage
and resets the in-memory
storeState
to the default structure (one page named “Page
1,” default suggestions, and empty user prompt).
• Subscriptions are notified once again so that any UI
depending on the store quickly reverts to a blank state. This is
crucial for scenarios where the user wishes to begin a fresh project
or discard all AI-suggested designs.
9) Blueprint AI Context-Specific Usage:
• First Page Creator Flow: By default, a single
“Page 1” is stored. As soon as the user types a text prompt (like
“Create an eCommerce homepage with a big hero banner”) or uploads an
image, the AI generates a new layout. The store’s
pages
array is updated, and
saveStoreToLocalStorage()
is invoked. If the user closes
the extension and reopens it, the generated page is restored from
localStorage
.
• Main Interface & Canvas: If the user
reorders a button or changes a text component inside the CraftJS
canvas, updatePage()
merges the new layout structure. The
Properties Sidebar might also call
updatePage()
when editing margins, backgrounds, or other
design attributes. Each modification can be saved locally so that the
user’s design is retained.
• Pages Sidebar & Suggested Pages: The
suggestedPages
field in storeState
is
updated conditionally to reflect any new or removed suggestions. Once
the user picks one of these suggestions (“Returns & Orders,” for
example) and requests an AI layout, the store adds a new page object.
No external DB is used; it is purely local to
blueprint-ai-data
.
• Export Menu: The selected pages to export and
their layout data are all pulled from store.ts
. Because
everything is stored locally, the user’s entire editing session is
readily available to transform into a downloadable zip. This is done
without sending any user design data to external services once it is
in the local store.
Therefore, Blueprint AI ensures that every aspect of local data
management—from retrieving initial saved states on extension load, to
conditionally updating pages during the design process, to finalizing
or clearing data—is precisely handled through the
store.ts
file. This local storage approach offers
immediate read/write access, zero external dependencies, and complete
user control over saving and resetting, reflecting Blueprint AI’s
mission to keep front-end development streamlined, private, and
user-friendly.
Packages and APIs
Key Technologies

React
Primarily used for building the web UI for the VS Code extension, providing a component-based approach for rendering the interactive front-end.

CraftJS
Provides a drag-and-drop design layer, enabling dynamic page editing and layout manipulation inside the custom interface.

Fluent UI
Leverages Microsoft’s design language and components for consistent styling and responsive elements across the extension’s UI.

Node.js
Powers the backend side of the extension environment, facilitating scripts, package management, and interactions with VS Code APIs.

TypeScript
Ensures robust typing and improved developer experience throughout the codebase, reducing runtime errors and enhancing scalability.

Python / Axios
Python powers backend automation and OCR processing, while Axios is used for efficient HTTP requests and data integration in the frontend.
Below is a definitive list of the relevant packages, modules, and APIs used throughout the system, as provided in the prompt. The project consists of both Node.js packages (for the VS Code extension and web UI) and Python packages (for the OCR functionality). It also defines specific internal APIs and functions that handle AI requests, OCR, and layout generation.
Node.js Packages
@babel/core@7.26.0
@craftjs/core@0.2.11
@craftjs/layers@0.2.6
@emotion/react@11.14.0
@emotion/styled@11.14.0
@eslint/js@9.17.0
@fluentui/react@8.122.5
@fullhuman/postcss-purgecss@7.0.2
@monaco-editor/react@4.7.0
@mui/icons-material@6.3.1
@mui/material@6.3.1
@mui/system@6.4.0
@types/classnames@2.3.0
@types/file-saver@2.0.7
@types/js-beautify@1.14.3
@types/node@22.10.6
@types/react-color@3.0.13
@types/react-dom@18.3.5
@types/react-grid-layout@1.3.5
@types/react@18.3.18
@types/styled-components@5.1.34
@types/uuid@10.0.0
@vitejs/plugin-react-swc@3.7.2
autoprefixer@10.4.20
babel-plugin-inline-react-svg@2.0.2
classnames@2.5.1
cross-env@7.0.3
cssnano@7.0.6
debounce@2.2.0
eslint-plugin-react-hooks@5.1.0
eslint-plugin-react-refresh@0.4.16
eslint@9.17.0
file-saver@2.0.5
globals@15.14.0
js-beautify@1.15.4
jszip@3.10.1
konva@9.3.18
lzutf8@0.6.3
monaco-editor@0.52.2
postcss-import@16.1.0
postcss-preset-env@10.1.3
postcss@8.5.0
re-resizable@6.10.3
react-color@2.19.3
react-colorful@5.6.1
react-contenteditable@3.3.7
react-dom@18.3.1
react-grid-layout@1.5.0
react-icons@5.5.0
react-konva@18.2.10
react-loading@2.0.3
react-player@2.16.0
react-rnd@10.4.14
react-router-dom@7.1.1
react-router@7.1.1
react-youtube@10.1.0
react@18.3.1
sharp@0.33.5
styled-components@6.1.14
tailwindcss@3.4.17
typescript-eslint@8.19.1
typescript@5.6.3
uuid@11.0.4
vite@5.4.11
Python Packages (OCR Script)
• numpy
• opencv-python-headless
• easyocr (installed automatically with PyTorch)
Internal AI and OCR APIs
• getBlueprintLayout(...)
Main entry point for AI-based layout generation. Accepts user text
and optional screenshot to produce a final single-page CraftJS
layout in JSON form.
• getSummariesFromScreenshot(...)
Handles screenshot data by running OCR (via
runPythonOcr(...)
and ocr_service.py
) and
then generating uiSummary and guiSummary through
AI calls.
• runPythonOcr(...)
Invokes the Python script ocr_service.py
to extract
textual content from an image using EasyOCR.
• getUiSummary(...) and
getGuiSummary(...)
Summaries of textual content (UI elements) and visual layout (GUI
structure) derived from screenshot OCR data. Uses OpenAI with
specialized prompts (UI_SUMMARY_META_PROMPT and
GUI_SUMMARY_META_PROMPT).
• getFinalCraftJsLayout(...)
Synthesizes the final CraftJS layout JSON from user input and the
screenshot summaries, leveraging
FINAL_CRAFTJS_META_PROMPT.
• callOpenAiChat(...)
A helper for all OpenAI requests. Chooses the model based on whether
a screenshot is involved. Returns the raw text response from OpenAI.
Key Frontend Modules
• AiSidebar.tsx: Handles the user chat flow (prompt +
optional image). Sends messages to
blueprintAI.generateLayout
and receives AI layout
results.
• SuggestedPages.tsx and
CreateSelectedPage.tsx: Provide suggested page
names, allow prompting with text and an optional image, and post to
blueprintAI.generateLayoutSuggested
.
• BlueprintAiService.ts, pythonBridge.ts, and getSummariesFromScreenshot.ts: Coordinate calls to OpenAI and the OCR Python script, returning final layout data.