Blueprint AI - Implementation

Key Features

Below is a structured layout of subsections for each key feature in the project. Each feature includes placeholders for implementation details, code snippets, and diagrams. These are purely illustrative; fill them in with the actual content as needed.

Project Structure

The overall layout of the BlueprintAI project is designed to keep extension logic, Python OCR functionality, and the React-based web UI separate yet interlinked. Below is a detailed breakdown of the directory and file organization, along with short explanations of each major part. Note that there are no multiple design themes for users to pick from—only the functionality to generate or refine pages via AI prompts or screenshots.

Root-Level Files & Directories

• eslint.config.mjs, package.json, package-lock.json: Basic configuration and dependency management for the main VS Code extension. ESLint rules and Node modules for the backend logic live here.

• tsconfig.json: Manages TypeScript compilation settings for the extension codebase (excluding the web UI, which has its own config).

• README.md: Provides a high-level overview of BlueprintAI, plus instructions on setup and usage.

• python-ocr/: Houses Python scripts related to OCR. This includes:

• ocr_service.py – Invoked by the extension to run EasyOCR on user-uploaded screenshots. Outputs recognized text lines for the AI summarization step.

• requirements.txt – Lists the Python dependencies (e.g., numpy, opencv-python-headless, easyocr).

Core Extension Source (`src` Folder)

The src folder contains all TypeScript code that runs in the VS Code extension context. It includes AI prompt orchestration, Python bridging, and panel creation.

• ai/ – Manages OpenAI prompts, OCR-based summarization, and bridging to the Python script. Key files:

• blueprintAiPrompts.ts – Reusable prompt templates.

• BlueprintAiService.ts – High-level methods for final CraftJS layout generation.

• getSummariesFromScreenshot.ts – Chains OCR output with UI/GUI summarization prompts.

• pythonBridge.ts – Spawns Python processes (e.g., ocr_service.py) and returns results.

• extension.ts – The main extension entry point, registering commands and setting up the webview panel.

• panels/ – Contains MainWebViewPanel.ts, defining how the React web UI is injected into the VS Code panel and how it communicates with the backend.

• utils/ – Utility modules such as extensionContext.ts (for storing context across sessions) and validateApiKey.ts (for verifying user-supplied or stored AI keys).

React Web UI (`webview-ui` Folder)

The webview-ui folder encapsulates the entire front-end built with React and CraftJS. This is compiled separately (using Vite) and then embedded as the main interface within the VS Code extension’s webview.

• index.html, package.json, package-lock.json – The starting point for the React app and the local Node modules (distinct from the root-level package.json).

• postcss.config.js, tailwind.config.js – Setup for Tailwind CSS and other post-processing tools, ensuring consistent styling across the UI.

• vite.config.ts – Configuration for bundling the React source code, outputting the final webview-ui bundle.

• src/ – Contains the core React code:

• App.tsx, main.tsx, and global.css – The bootstrap logic and global styles for the web application.

• components/ – A suite of React components for the UI, including:

• AiSidebar – Where users enter text/image prompts for iterative AI changes.

• CreateWithImagination – Manages the initial “Create from text or screenshot” workflow (though final design variants are chosen by the user or AI).

• ExportMenu – Allows specifying which pages to export, generating HTML/CSS/JS, and downloading as a ZIP.

• PrimarySidebar – Left-side panel that includes LayoutTab, PagesTab, SaveModal, etc.

• PropertiesSidebar – Right-side panel for editing attributes of selected components (text, color, layout).

• SuggestedPages – Dialogs for AI-suggested page creation (e.g. “Home,” “AboutUs,” etc.).

• UserComponents – Definitions of custom draggable components (Button, Container, Image, Navigation, etc.) that appear on the CraftJS canvas.

• pages/ – High-level “page” containers, such as MainInterface (the main editor with sidebars and canvas).

• store/ – Provides global state management (e.g., store.ts) for the web UI, storing user actions, selected components, and more.

Key Workflow

When users launch the extension, extension.ts sets up the main webview by loading the compiled React app from webview-ui. Within that app, the user can:

• Generate a page from text or screenshot via the CreateWithImagination flow.

• Edit layouts on the MainInterface using Component Sidebar, Layout Sidebar, Pages Sidebar, and Properties Sidebar.

• Use AiSidebar to iteratively prompt the AI to add or modify page elements.

• Perform OCR if a screenshot is uploaded (via pythonBridge.ts), feed the text into AI summarizations, and see updated designs in real time.

• Export the final pages using ExportMenu for local storage or further development.

This layered approach separates the core AI logic (in src/ai and python-ocr) from the webview-ui front-end. It allows for easy expansion of both sides: new AI prompts or different user components can be added without disrupting the rest of the codebase.

AI Prompts

The AI prompts are designed to guide the AI in generating layouts based on user input. Click on each prompt for more details.

        You are a "UI Summarization AI" receiving raw OCR text from any type of website or application screenshot. The text may be partial, jumbled, or repeated. Your goal is to produce a **short, structured** list of lines that:
        
        1. **Identify the UI’s domain or purpose if possible** (e.g., “YouTube” if key lines appear like “Home,” “Subscriptions,” “Trending,” “Watch later,” etc.).  
        2. **Group or unify lines** that are obviously connected—like a list of channel names under “Subscriptions,” or a set of recommended videos—and produce a small set of sample items.  
        3. **Summarize or skip** lines that are purely repeated, out of scope, or contain large blocks of text. For instance:  
          - If many channels are listed, show only 1–2 examples, then note “(other channels omitted).”  
          - If there’s a big list of videos, produce 2–3 example items with short “snippet” lines.  
        4. **Apply domain knowledge** for a known UI (e.g., YouTube has “Subscriptions” or “Trending”). If you detect “Home,” “Music,” “Gaming,” “Shorts,” it’s likely the YouTube homepage or a similar streaming/video service. Emphasize the main nav items.  
        5. **Redact** personal info or large amounts of text. E.g., if an OCR line includes a user’s handle or partial personal data, keep just the handle if it’s relevant as a channel name. Omit or anonymize anything sensitive.  
        6. **No disclaimers**: do not mention “I removed data” or “lines omitted.” Instead, incorporate placeholders or short summaries for repeated items.  
        7. **Avoid extraneous lines**. For example:  
          - If you see “History” repeated multiple times, keep only one instance if it’s obviously the same nav item.  
          - If multiple lines show partial text from overlapping UI sections or random text lumps, unify or skip them if they’re not relevant to the interface structure.  
        8. **Format** the final output as a short bullet or line-by-line list that reveals:  
          - The main UI nav items (e.g., “Home,” “Subscriptions,” “Trending,” “Shorts,” “Search,” etc.).  
          - Possibly 1–2 “example content” lines if there’s a large repeated list (channels, videos, emails).  
          - Summaries for big blocks of text. For example, “Sample video #1: ‘Visiting the Most Expensive Grocery Store…’ snippet” or “Channel #1: Mrwhosetheboss.”  
          - Domain-specific placeholders if recognized (like “YouTube recommended videos,” “Breaking news highlights,” etc.).  
        9. **Don’t** output raw lines that are obviously partial or leftover. Merge them if you can interpret them, or ignore them if they’re irrelevant.  
        10. **No final commentary** or disclaimers. The final output is only a short cleaned list that best represents the key UI elements plus a small set of example data.
        
        ### Special Considerations / Edge Cases:
        - If the screenshot text suggests “Gmail” or a “mail client,” then:
          - Keep nav items like “Inbox,” “Sent,” “Drafts,” “Compose.”  
          - Summarize multiple emails as “Email #1: subject snippet…,” “Email #2: subject snippet…,” etc.  
        - If it’s YouTube (lines like “Home,” “Shorts,” “Subscriptions,” channel names, video titles):
          - Group them under a short heading or keep them in bullet points, e.g., “Main navigation: Home, Shorts, Subscriptions…”  
          - Summarize recommended videos with 1–2 examples.  
        - If lines contain references to personal or partial data, like phone #s, credit card info, or references to partial addresses:
          - Omit or anonymize them (“[REDACTED]” or skip).  
        - If the text strongly implies a certain domain (like “Breaking news,” “LA wildfires,” “Trending,” “Sky News”), it might be a news site:
          - Keep main nav items or top stories in short bullet points.  
        - If multiple lines are obviously random or worthless (like “2.13 20.13 Eh? Um:. 0.54”), skip them unless you can unify them into “some numeric data” relevant to the UI.
        
        ### Final Output Example:
        - A bullet/line list:
          1) “YouTube UI recognized” (or no explicit mention if you prefer, just keep the lines)
          2) “Search,” “Home,” “Shorts,” “Subscriptions,” “Library,” etc.
          3) “Channel #1: Mrwhosetheboss,” “Channel #2: MoreSidemen,” …(others omitted)
          4) “Video #1: ‘We Tried EVERY TYPE OF POTATO’ snippet,” “Video #2: ‘Visiting the Most Expensive Grocery Store…’ snippet,” etc.
          5) Possibly “News section: LA wildfires,” “Trending,” etc.
        
        **No disclaimers, no code fences, no mention of how you summarized**—just the cleaned lines that best reveal the UI structure plus a few key content examples.

        YOU ARE A “GUI EXTRACTION AI,” SPECIALIZED IN ANALYZING WEBPAGE SCREENSHOTS.
        
        OBJECTIVE:
        Receive text or descriptive clues about a screenshot. From that, produce a **concise yet complete** breakdown of the **visual GUI structure**, focusing on:
        - Layout sections (header, banners, sidebars, main content columns, footers).
        - Approximate positions, relative sizes (e.g., “a full‐width banner at the top, about 300px tall”).
        - Prominent graphical or navigational elements (search bars, logos, key nav links).
        - High‐level grouping of content (“3 columns of product panels,” “left sidebar with vertical menu,” etc.).
        - Color themes or brand cues (“dominant orange accent,” “black header,” etc.).
        - Redacting any personal or sensitive data (names, personal messages) in the screenshot (or replacing them with generic placeholders if needed).
        
        IGNORE:
        - Detailed textual content beyond what is needed to identify the GUI element. (E.g., if you see “Your credit card ending in 5901” text, do not quote it; mention only “Payment method line in the top bar, redacted.”)
        - Exhaustive paragraphs or fluff from email bodies or personal data. We only care about the **interface structure**.
        
        FORMAT & STYLE:
        - Provide a **single, structured text** (a brief, high‐level summary) that enumerates major regions. 
        - Each region might look like:  
          - “**Header** (approx 70px tall, white background, logo on left, search input in center, user icon on right).”
          - “**Main Banner** (full width, colorful promotional image with a short slogan).”
          - “**Column #1** (left side, ~1/3rd width), shows vertical product list…”
          - etc.
        - Avoid disclaimers or extraneous commentary; just outline the interface.
        - Keep the final text **under ~300 words** if possible, focusing on the layout’s core details.
        
        EXAMPLES OF DESCRIPTIONS FOR A WEBPAGE:
        1. “**Top Navigation Bar**: black background, includes left‐aligned site logo, center‐aligned search field, right‐aligned ‘Sign In’ + ‘Basket’ icons. ~60–80px tall.”
        2. “**Secondary Nav**: a horizontal bar of categories below the main nav (‘All’, ‘Grocery’, ‘Electronics’). ~40px tall, dark background.”
        3. “**Hero Banner**: wide, ~300–400px tall, large product image on the right, main headline on the left, orange accent color.”
        4. “**Below Banner**: 3 columns of product suggestions, each ~300px wide, with white backgrounds.”
        5. “**Footer**: references site disclaimers and links. ~200px tall, repeated site menu links.”
        
        NO EXTRA OUTPUT:
        - Do not output disclaimers, developer notes, or code.
        - Provide only the organized GUI layout summary, **redacting** personal user info or large private content.
        
        REMEMBER:
        - You are summarizing the layout in a screenshot: mention major sections, approximate positioning, color scheme, any brand cues, and relevant nav or product placeholders. 
        - If personal data is recognized, omit or genericize it.
        - Keep it concise and structured.

        YOU ARE "BLUEPRINT AI," A HIGHLY ADVANCED SYSTEM FOR CRAFTJS LAYOUT GENERATION.
        
        OBJECTIVE:
        Produce a SINGLE-PAGE layout for CraftJS as **strictly valid JSON** using only the following components (exact names):
          - Button
          - Container
          - Navigation
          - SearchBox
          - Slider
          - StarRating
          - Text
          - Video
        
        Your output must be a single JSON object with a top-level key "layout". Within it, define "type", "props", and optional "children" objects, recursively, in valid JSON. No other top-level keys are allowed besides "layout".
        
        -------------------------------------------------------------------------------
        STRUCTURE:
        {
          "layout": {
            "type": "OneOfTheAllowedComponents",
            "props": {
              // e.g. style, text, color, data, etc.
            },
            "children": [
              // zero or more child objects, each with the same structure
            ]
          }
        }
        
        -------------------------------------------------------------------------------
        CRITICAL REQUIREMENTS:
        1) Strictly Valid JSON  
           - No code fences or additional commentary in the final output.  
           - Only one top-level key: "layout".
        
        2) Single Static Page  
           - No multi-page references or navigation to other pages.  
           - Everything must be in one JSON object under "layout".
        
        3) Text/Data  
           - Combine user instructions with relevant points from the GUI summary and OCR summary.  
           - If there are conflicts, user instructions override.  
           - If user instructions are minimal, rely on GUI/OCR for content.  
           - If all sources are minimal, produce a reasonable single-page layout (e.g., a typical homepage).
        
        4) Color & Style  
           - Use any brand or style cues indicated by user, GUI, or OCR.  
           - Do not leave placeholders (like "#FFFFFF") if a specific color scheme is given or implied.  
           - If a brand is mentioned (e.g., eBay, Amazon), you may incorporate typical brand colors or styling.
        
        5) User Instructions Have Highest Priority  
           - If the user explicitly says "make it like X," prioritize that over any conflicting details.  
           - In absence of detail, infer or invent coherent design choices.
        
        6) GUI Summary  
           - May describe layout structure, color scheme, etc.  
           - If provided, interpret it as guidelines for how to structure containers, headings, etc.  
           - If missing, focus on user instructions and OCR text.
        
        7) OCR Summary  
           - Text from a screenshot may hint at brand, features, or layout.  
           - Incorporate relevant text if it aligns with user instructions.  
           - You can omit or shorten extraneous lines if not explicitly required.
        
        8) Charts/Data  
           - Since only the specified 9 components are allowed, do not add chart components (BarChart, PieChart, etc.) even if the OCR or user mentions them.  
           - If they request a chart, you cannot fulfill that request here because those components are not in this list.
        
        9) Images or Icons  
           - If a brand logo or icon is relevant, use the "Icon" component with a suitable iconName.  
           - If images are forbidden or not relevant, skip them.
        
        10) No Extra Output  
           - Only return valid JSON with the single "layout" key (plus any child objects).  
           - No disclaimers or placeholders.
        
        -------------------------------------------------------------------------------
        IF ANY SOURCE IS MISSING:
        - If no user instructions, rely on GUI/OCR.  
        - If no GUI summary, rely on user/OCR.  
        - If no OCR text, rely on user/GUI.  
        - If everything is minimal, produce a simple layout with typical branding or a basic homepage.
        
        -------------------------------------------------------------------------------
        USER’S TEXTUAL INSTRUCTIONS:
        "${userText}"
        
        GUI SUMMARY (IF ANY):
        ${guiExtractionData}
        
        OCR TEXT SUMMARY (IF ANY):
        ${ocrTextSummary}
        
        -------------------------------------------------------------------------------
        DETAILED COMPONENT REFERENCE (use exactly these component names):
        
        1) Button
           Props:
             label: string (default "Click Me")
             variant: "button" | "radio" (default "button")
             color: string (CSS color, default "#ffffff")
             background: string (CSS color, default "#007bff")
             width: string ("auto", "100px", etc., default "auto")
             height: string ("auto", "40px", etc., default "auto")
             margin: [number, number, number, number] (default [5, 5, 5, 5])
             padding: [number, number, number, number] (default [10, 20, 10, 20])
             radius: number (default 4)
             shadow: number (default 5, 0 = no shadow)
             border: {
               borderStyle?: "none" | "solid" | "dashed" | "dotted";
               borderColor?: string;
               borderWidth?: number;
             } (default { borderStyle: "solid", borderColor: "#cccccc", borderWidth: 1 })
             checked: boolean (default false, only if variant="radio")
             onClick: (e: MouseEvent) => void (no-op in JSON)
           Notes:
             - Renders "button" unless variant="radio", which renders a radio input with a label.
        
        2) Container
           Props:
             layoutType: "container" | "row" | "section" | "grid" (default "container")
             background: string (CSS color, default "#ffffff")
             fillSpace: "yes" | "no" (default "no")
             width: string (default "auto")
             height: string (default "auto")
             margin: [number, number, number, number] (default [10, 10, 10, 10])
             padding: [number, number, number, number] (default [20, 20, 20, 20])
             shadow: number (default 5)
             radius: number (default 8)
             border: {
               borderStyle?: "none" | "solid" | "dashed" | "dotted";
               borderColor?: string;
               borderWidth?: number;
             } (default { borderStyle: "solid", borderColor: "#cccccc", borderWidth: 1 })
             flexDirection: "row" | "column" (default "row")
             alignItems: "flex-start" | "flex-end" | "center" | "baseline" | "stretch" | "start" | "end" (default "flex-start")
             justifyContent: "flex-start" | "flex-end" | "center" | "space-between" | "space-around" (default "center")
             gap: number (default 0, relevant if layoutType="row")
             flexWrap: "nowrap" | "wrap" | "wrap-reverse" (default "nowrap", relevant if layoutType="row")
             columns: number (default 2, relevant if layoutType="grid")
             rows: number (default 2, relevant if layoutType="grid")
             rowGap: number (default 10, relevant if layoutType="grid")
             columnGap: number (default 10, relevant if layoutType="grid")
             justifyItems: "start" | "center" | "end" | "stretch" (default "stretch")
             alignGridItems: "start" | "center" | "end" | "stretch" (default "stretch")
           Notes:
             - For layoutType="grid", columns/rows define the grid.  
             - For layoutType="row", gap/flexWrap apply.
        
        3) Navigation
           Props:
             navType: "navbar" | "sidebar" (default "navbar")
             displayName: string (default "MySite")
             background: string (CSS color, default "#ffffff")
             collapsible: boolean (default true)
             collapsedWidth: string (default "60px")
             expandedWidth: string (default "250px")
             width: string (default "200px")
             height: string (default "100%")
             linkStyle: object (default {})
             highlightSelected: boolean (default true)
             textColor: string (CSS color, default "#333")
             margin: string (default "0")
             padding: string (default "10px")
             pageDisplayNames: Record"number, string" (optional)
           Notes:
             - Renders horizontal navbar or vertical sidebar.  
             - If sidebar + collapsible=true, toggles between collapsed/expanded widths.  
             - Do not reference multiple pages in the final layout JSON.
        
        4) SearchBox
           Props:
             placeholder: string (default "Search...")
             searchText: string (default "")
             backgroundColor: string (CSS color, default "#ffffff")
             textColor: string (CSS color, default "#000000")
             borderColor: string (CSS color, default "#cccccc")
             borderWidth: number (default 1)
             borderStyle: string (default "solid")
             borderRadius: number (default 4)
             padding: [number, number, number, number] (default [4, 8, 4, 8])
             margin: [number, number, number, number] (default [0, 0, 0, 0])
             shadow: number (default 0)
             width: string (default "200px")
             height: string (default "auto")
           Notes:
             - Renders an "input" inside a styled container.
        
        5) Slider
           Props:
             min: number (default 0)
             max: number (default 100)
             step: number (default 1)
             currentValue: number (default 50)
             orientation: "horizontal" | "vertical" (default "horizontal")
             width: string (default "300px")
             height: string (default "40px")
             thumbColor: string (default "#ffffff")
             trackColor: string (default "#0078d4")
             marginTop: string (default "0px")
             marginRight: string (default "0px")
             marginBottom: string (default "0px")
             marginLeft: string (default "0px")
             paddingTop: string (default "0px")
             paddingRight: string (default "0px")
             paddingBottom: string (default "0px")
             paddingLeft: string (default "0px")
             trackThickness: number (default 8)
             showValue: boolean (default true)
             valueColor: string (default "#000000")
             valueFontSize: string (default "14px")
             valueFontWeight: string (default "normal")
           Notes:
             - A simple Fluent UI-based slider.
        
        6) StarRating
           Props:
             rating: number (default 3)
             maxRating: number (default 5)
             starColor: string (default "#FFD700")
             starSpacing: number (default 4)
             background: string (default "#ffffff")
             width: string (default "150px")
             height: string (default "50px")
             margin: [number, number, number, number] (default [0, 0, 0, 0])
             padding: [number, number, number, number] (default [0, 0, 0, 0])
           Notes:
             - Displays filled vs. empty stars.  
             - Not interactive in the given code snippet.
        
        7) Text
           Props:
             renderMode: "textbox" | "link" | "dropdown" (default "textbox")
             fontSize: number (default 15)
             textAlign: "left" | "right" | "center" | "justify" (default "left")
             fontWeight: string (default "500")
             textColor: string | { r: number; g: number; b: number; a: number } (default "#5c5a5a")
             shadow: number (default 0)
             text: string (default "Text")
             selectedValue: string (dropdown mode only)
             margin: [number, number, number, number] (default [0, 0, 0, 0])
             padding: [number, number, number, number] (default [5, 5, 5, 5])
             placeholder: string (default "Enter text...")
             fontFamily: string (default "Arial, sans-serif")
             background: string (default "#ffffff")
             multiline: boolean (default false)
             disabled: boolean (default false)
             readOnly: boolean (default false)
             radius: number (default 0)
             borderColor: string (default "#000000")
             borderStyle: string (default "solid")
             borderWidth: number (default 1)
             width: string (default "auto")
             height: string (default "auto")
             maxLength: number (optional)
             rows: number (optional)
             cols: number (optional)
             autoFocus: boolean (default false)
             spellCheck: boolean (default true)
             href: string (default "#")
             linkType: "externalUrl" | "page" (default "externalUrl")
             pageId: number (optional)
             linkTitle: string (optional)
             ariaLabel: string (optional)
             hasCheckbox: boolean (default false)
             checked: boolean (default false, if hasCheckbox=true)
             checkboxPosition: "left" | "right" (default "left")
             enableResizer: boolean (default true)
           Notes:
             - renderMode="textbox" => "input" or "textarea" if multiline=true.  
             - renderMode="link" => "a" with href or page link.  
             - renderMode="dropdown" => "select" from items in text split by "||".  
             - If hasCheckbox=true, a checkbox is shown next to the text.
        
        8) Video
           Props:
             videoId: string (default "91_ZULhScRc")
             width: string (default "400px")
             height: string (default "225px")
             autoplay: boolean (default false)
             controls: boolean (default true)
             interactable: boolean (default false)
           Notes:
             - Embeds a YouTube player with react-player.
        
        -------------------------------------------------------------------------------
        IMPORTANT:
        - Replace "\${userText}", \${guiExtractionData}, and \${ocrTextSummary} in your final code with the actual user input, GUI summary, and OCR text if applicable.
        - Output ONLY the JSON with a single "layout" key and any nested children. No disclaimers, no extra keys.
        - Merge references from user, GUI, and OCR. If brand cues are given, incorporate them logically.
        - If minimal data, create a sensible layout with the above components in typical sections (e.g., header Navigation, main Container, optional SearchBox, etc.).
        - The final layout text must be in English.
        
        -------------------------------------------------------------------------------
        ADDITIONAL INSTRUCTION:
        After you produce the final CraftJS JSON layout with the single top-level "layout" key, also provide a separate output containing "suggestedPageNames". This output should be in the form of an array-like structure, for example: {"Home", "AboutUs", "ContactUs"}. These are future page ideas relevant to the design. Do NOT reference them within the final CraftJS JSON layout itself. They should appear as a separate data structure after the JSON layout is complete.

Prompt Explanations

1) UI Summarization Prompt
This prompt is used when Blueprint AI needs a concise text outline of elements discovered through OCR on a website or application screenshot. It instructs the AI to analyze raw or possibly jumbled lines of text (such as navigation items or repeated labels) and turn them into a short, bullet-style breakdown of relevant UI features. Specifically, it groups similar lines (e.g., channel names, repeated links), filters out duplicates and large text blocks, and redacts personal or sensitive data. Everything is kept minimal and domain-focused, illustrating key UI items like “Home,” “Subscriptions,” or “Recommended Videos” when the domain is recognized. The end result is a compact textual summary that captures main interface elements without disclaimers or extraneous commentary.

2) GUI Summarization Prompt
This prompt targets a higher-level, visual or structural overview of the screenshot. Rather than focusing on text lines, it describes the general layout (headers, footers, banners, columns, color themes), approximate positioning, and major sections (e.g., a full-width hero banner, left sidebar, or row of product cards). Sensitive or personal text is either omitted or replaced with generic placeholders. The response intentionally concentrates on the graphic design elements, mentioning the presence of search boxes, navigational bars, or color highlights. The resulting summary is kept short but thoroughly outlines the GUI structure, ensuring developers or users have a clear visual map of the screenshot’s layout.

3) Final CraftJS Layout Prompt
This is the core directive that merges everything into a valid, single-page CraftJS layout in strict JSON format. It tells the AI how to create a final “layout” object using only the allowed components (e.g., Container, Button, Text, Navigation, etc.). The AI must synthesize user instructions, plus any prior OCR or GUI summaries, to build a single JSON object that stands for one complete page design. All style, positioning, and text content stems from either the user prompt or the summarized data. The output must be valid JSON with no code fences or extra keys beyond “layout.” In addition, there is a secondary array-like structure indicating suggested page names for future expansions, but these are provided outside the main “layout” JSON. This final prompt ensures the design is strictly formed, referencing props for each component according to the project’s requirements, and it also respects the user’s highest-priority instructions if conflicts arise.

The “Backend” Explained

In Blueprint AI, the entire system responsible for OCR extraction, AI prompting, and final layout generation exists behind the scenes as a “backend” flow that orchestrates multiple key steps. Though the user primarily sees a visual editor and a chat-like interaction, a collection of TypeScript modules and a Python script coordinate to deliver final results. Below is an extensive, step-by-step explanation of how each relevant piece is structured, without showing full code, so you can understand precisely what happens whenever you request an AI-generated layout:

Overall Conceptual Flow
At a high level, the user’s textual instructions and optional screenshot enter the system via getBlueprintLayout(...). This triggers a multi-stage process: (1) We run OCR on the screenshot if provided, (2) we generate short textual summaries of both the UI (from OCR) and the GUI structure, and (3) we combine everything (including user instructions) into a single, final CraftJS layout JSON. The “backend” is what executes these steps, calling the appropriate prompts and bridging to Python for OCR as needed.

1) blueprintAiClient.ts
This file contains three main functions—getUiSummary, getGuiSummary, and getFinalCraftJsLayout—each referencing a distinct “meta prompt” from blueprintAiPrompts.ts. It also has a small helper named callChatGPT that handles Axios-based API calls to OpenAI. Together, these pieces let the backend dispatch prompts with the correct system instructions and user content:

• getUiSummary(...): Sends raw OCR text (and possibly a truncated, base64-encoded screenshot) to OpenAI with the UI_SUMMARY_META_PROMPT, receiving back a short bullet-style summary of the recognized interface text. This is specifically meant to highlight items like “navigation,” “list of categories,” or “footer links,” while skipping extraneous lines.

• getGuiSummary(...): Concentrates on layout structure. It encodes only the screenshot in base64 (again truncated, if too large) and applies the GUI_SUMMARY_META_PROMPT, retrieving a high-level description of the visual design (e.g., “a tall header, left sidebar, multi-column content”). This text is more about structure and color or brand cues, less about the line-by-line text content.

• getFinalCraftJsLayout(...): The culminating step that merges user instructions, UI summary, and GUI summary into a valid JSON layout for CraftJS. It uses the FINAL_CRAFTJS_META_PROMPT as a “system” prompt and injects all relevant textual data. The AI’s response is strictly JSON, containing a “layout” key with nested CraftJS components like Container, Button, Text, etc.

Additionally, callChatGPT is a simple utility that sets up the “system” and “user” roles, along with the chosen OpenAI model (e.g. gpt-3.5-turbo), and handles returning the raw text from ChatGPT’s response.

2) BlueprintAiService.ts
This file defines an exported function named getBlueprintLayout(...) that acts as the project’s main AI entry point. Whenever the user wants a new CraftJS layout (possibly with a screenshot for reference), we do: (a) call getSummariesFromScreenshot to produce both the UI summary and GUI summary, then (b) pass those summaries along with the user’s text to getFinalCraftJsLayout. Here is its conceptual structure:

• Calls getSummariesFromScreenshot: If a screenshot was provided, it runs OCR to get the raw text lines, then uses getUiSummary and getGuiSummary to convert them into two distinct short-form texts. If no screenshot is passed in, the UI summary is minimal and the GUI summary is empty.

• Calls getFinalCraftJsLayout: Takes userText plus those two summaries, feeding them into the FINAL_CRAFTJS_META_PROMPT. The AI returns strictly valid JSON in the shape of a single-page CraftJS layout.

• Returns the final JSON: The raw string (with “layout”: { ... } ) can then be used by the front-end to render or store the new design.

Essentially, BlueprintAiService.ts is the bridging code that orchestrates the summarization steps and the final layout generation.

3) getSummariesFromScreenshot.ts
This is where the extension decides how to handle the screenshot, if any:

• runPythonOcr is called if rawScreenshot was provided, returning an array of recognized text objects. These text items (with bounding box info and confidence scores) are joined into a single string. If no screenshot is present, we skip OCR altogether.

• getUiSummary is invoked with that joined text and the screenshot buffer. The AI returns a condensed bullet list describing text-based UI elements.

• getGuiSummary is invoked only if a screenshot is actually there, to produce the layout-based summary. If the screenshot is missing, we leave the GUI summary empty.

• Both summaries are returned in an object so the caller can decide how to apply them next—most often passing them to the final layout generation step.

Any errors during OCR or summarization are captured and returned as textual error messages, ensuring the front-end can display an appropriate notification if something fails.

4) pythonBridge.ts
This TypeScript file encapsulates how Blueprint AI triggers a Python script for OCR:

• Locating Python & the Script: We check for a local Python environment (under python-ocr/venv/Scripts/python.exe) and the presence of ocr_service.py in the same folder. If either is missing, an error is thrown.

• Writing a Temporary Screenshot: The raw Buffer from rawScreenshot is written to a random temp filename in the globalStoragePath, ensuring we have a real image file for the Python script to process. Once done, the file is removed.

• Spawning the OCR Process: We run the Python script with the temp image path, collecting stdout and stderr. If the script completes with exit code 0, stdout is parsed as JSON to produce an array of OCR result objects. If there's an error or non-zero exit code, we handle it gracefully, optionally showing a VSCode error message.

By encapsulating all of these steps, we keep the front-end logic free from direct Python calls. pythonBridge.ts is the only TypeScript file that knows how to spawn the OCR script, parse the results, and handle errors.

5) ocr_service.py
This Python script forms the final link in the OCR chain. It:

• Loads and optionally upscales the image to help with small fonts, then converts it to grayscale.

• Runs EasyOCR in paragraph mode, returning recognized lines with confidence scores. Each recognized block contains bounding box coordinates plus the text itself.

• Outputs JSON to stdout, which the TypeScript code then parses. This JSON array represents each recognized line or block of text that might factor into the subsequent UI summarization.

Because Python’s EasyOCR can handle many image scenarios, it’s well-suited for capturing textual details from user-provided screenshots. Combined with minimal pre-processing (grayscale, optional upscaling), it typically extracts text for a wide range of UI designs.

Putting It All Together
The chain of function calls and modules described above is what drives the “backend” logic for Blueprint AI. The user sees a single “Generate Layout” action, but behind the scenes:

• If a screenshot is included, the system writes it to disk, calls Python OCR, and obtains the recognized text lines.

• We pass that text and the screenshot to getUiSummary (which uses the UI_SUMMARY_META_PROMPT) and getGuiSummary (which uses GUI_SUMMARY_META_PROMPT) for more refined summarizations.

• Finally, we merge everything with the user’s instructions in getFinalCraftJsLayout (guided by FINAL_CRAFTJS_META_PROMPT) to produce a single-page CraftJS layout in JSON form.

This approach ensures that text-based clues from OCR and visual layout hints from the screenshot can be combined with the user’s explicit requests—resulting in a high-fidelity design that closely reflects the screenshot or user concept. By maintaining each piece (OCR, UI summarization, GUI summarization, final layout generation) as its own step, we keep the entire process modular and easier to debug or update in the future.

Code Samples

1) blueprintAiClient.ts

                            
            import axios from 'axios';
            import {
            UI_SUMMARY_META_PROMPT,
            GUI_SUMMARY_META_PROMPT,
            FINAL_CRAFTJS_META_PROMPT,
            } from './blueprintAiPrompts';
            
            /**
             * Simple helper to call OpenAI ChatGPT with Axios.
             * - Expects process.env.OPENAI_API_KEY to be defined.
             * - Uses the gpt-3.5-turbo (or gpt-4 if you have access).
             */
            async function callChatGPT(
            systemPrompt: string,
            userPrompt: string
            ): Promise {
            const apiKey = process.env.OPENAI_API_KEY;
            if (!apiKey) {
              throw new Error('Missing OPENAI_API_KEY environment variable.');
            }
            
            // Example: using GPT-3.5-Turbo
            const model = 'gpt-3.5-turbo';
            
            try {
              const response = await axios.post(
              'https://api.openai.com/v1/chat/completions',
              {
                model,
                messages: [
                { role: 'system', content: systemPrompt },
                { role: 'user', content: userPrompt },
                ],
                temperature: 0.7,
              },
              {
                headers: {
                'Content-Type': 'application/json',
                Authorization: `Bearer ${apiKey}`,
                },
              }
              );
            
              const rawText = response.data.choices?.[0]?.message?.content;
              return rawText ? rawText.trim() : '';
            } catch (error: any) {
              console.error('Error calling ChatGPT:', error?.response?.data || error);
              throw new Error(
              `OpenAI API error: ${
                error?.response?.data?.error?.message || error.message
              }`
              );
            }
            }
            
            /**
             * If needed, convert the screenshot to a truncated base64 string.
             * Helps avoid huge prompts that might exceed token limits.
             */
            function maybeBase64EncodeScreenshot(screenshot?: Buffer): string | undefined {
            if (!screenshot) {
              return undefined;
            }
            const base64 = screenshot.toString('base64');
            
            // Truncate the base64 string if it's too large.
            // E.g. limit to 100k characters (arbitrary).
            const maxLength = 100_000;
            if (base64.length > maxLength) {
              return base64.slice(0, maxLength) + '...[TRUNCATED BASE64]';
            }
            
            return base64;
            }
            
            /**
             * Summarizes OCR text as a short, structured list of UI lines.
             * Uses the UI_SUMMARY_META_PROMPT plus optional screenshot data.
             */
            export async function getUiSummary(params: {
            text: string;
            screenshot?: Buffer;
            }): Promise {
            const { text, screenshot } = params;
            const base64Screenshot = maybeBase64EncodeScreenshot(screenshot);
            
            // We'll treat UI_SUMMARY_META_PROMPT as the "system" role for guidance,
            // and the user content includes both the OCR text and optional base64 data.
            const systemPrompt = UI_SUMMARY_META_PROMPT;
            const userPrompt = `
            Screenshot (base64, optional): 
            ${base64Screenshot ? base64Screenshot : '[No screenshot provided]'}
            
            === RAW OCR TEXT ===
            ${text}
            `;
            
            return await callChatGPT(systemPrompt, userPrompt);
            }
            
            /**
             * Extracts or summarizes the GUI structure from a screenshot only.
             * Uses the GUI_SUMMARY_META_PROMPT plus the screenshot in base64 form.
             */
            export async function getGuiSummary(params: {
            screenshot: Buffer;
            }): Promise {
            const { screenshot } = params;
            const base64Screenshot = maybeBase64EncodeScreenshot(screenshot);
            
            const systemPrompt = GUI_SUMMARY_META_PROMPT;
            const userPrompt = `
            Screenshot (base64):
            ${base64Screenshot}
            
            [No OCR text provided for GUI extraction—just the screenshot structure.]
            `;
            
            return await callChatGPT(systemPrompt, userPrompt);
            }
            
            /**
             * Generates a single-page CraftJS layout JSON using the final meta prompt.
             * Combines user instructions + the extracted UI & GUI summaries (if any).
             * - userText: the user's own instructions
             * - uiSummary: result from getUiSummary (possibly empty)
             * - guiSummary: result from getGuiSummary (possibly empty)
             */
            export async function getFinalCraftJsLayout(params: {
            userText: string;
            uiSummary: string;
            guiSummary: string;
            }): Promise {
            const { userText, uiSummary, guiSummary } = params;
            
            // We'll treat FINAL_CRAFTJS_META_PROMPT as the "system" role again.
            // Then pass in the placeholders via the user prompt.
            const systemPrompt = FINAL_CRAFTJS_META_PROMPT;
            
            // Insert the relevant data into the "user" content:
            const userPrompt = `
            USER’S TEXTUAL INSTRUCTIONS:
            "${userText}"
            
            GUI SUMMARY (IF ANY):
            ${guiSummary}
            
            OCR TEXT SUMMARY (IF ANY):
            ${uiSummary}
            `;
            
            return await callChatGPT(systemPrompt, userPrompt);
            }

This TypeScript file provides a set of utility functions that enable interaction with OpenAI for multiple tasks. It imports three dedicated prompts—UI_SUMMARY_META_PROMPT, GUI_SUMMARY_META_PROMPT, and FINAL_CRAFTJS_META_PROMPT—to guide the AI in producing either a text-based UI summary, a structural GUI summary, or a final CraftJS layout. The core function callChatGPT handles Axios-based requests, ensuring we properly supply system and user role messages. Additionally, maybeBase64EncodeScreenshot helps prepare screenshot data in manageable chunks for the prompt to avoid input size overloads. With getUiSummary, we specifically format and send the OCR text plus any screenshot snippet to OpenAI, while getGuiSummary focuses purely on structural representation. Finally, getFinalCraftJsLayout is where all prior context (user instructions, UI text, and GUI structure) are merged and passed to OpenAI to receive a valid JSON layout. This interplay of prompts ensures each step is distinct and optimized for its respective summarization or generation goal.

2) BlueprintAiService.ts

              
          /*
           * BlueprintAiService.ts
           * Demonstrates orchestrating:
           *  1) Summaries from screenshot (OCR + UI + GUI).
           *  2) CraftJS layout generation.
           */
          
          // import { getSummariesFromScreenshot } from './getSummariesFromScreenshot';
          // import { getFinalCraftJsLayout } from './blueprintAiClient';
          
          /**
           * Main function that drives the AI generation of a final CraftJS layout.
           * @param params.userText - The user's textual instructions or prompt.
           * @param params.rawScreenshot - An optional Buffer of the screenshot.
           * @returns A Promise containing the final JSON layout for CraftJS.
           */
          export async function getBlueprintLayout(params: {
            userText: string;
            rawScreenshot?: Buffer;
          }): Promise {
            // 1) Gather UI + GUI summaries from the screenshot (if provided).
            //    - getSummariesFromScreenshot internally calls Python OCR for recognized text
            //      and uses the AI prompts (UI_SUMMARY_META_PROMPT, GUI_SUMMARY_META_PROMPT).
            const { uiSummary, guiSummary } = await getSummariesFromScreenshot({
              rawScreenshot: params.rawScreenshot,
            });
          
            // 2) Pass the user instructions plus the two summaries into the final CraftJS meta prompt,
            //    returning a single-page JSON layout that references any brand or structural clues.
            const craftJsJson = await getFinalCraftJsLayout({
              userText: params.userText,
              uiSummary,
              guiSummary,
            });
          
            // 3) Return the JSON string, typically used by the front-end to render or store the new layout.
            return craftJsJson;
          }

This service file coordinates all backend actions for generating the final layout. It first calls getSummariesFromScreenshot, which handles OCR and AI summarizations (UI text analysis and GUI structure), then sends those results to getFinalCraftJsLayout. The combined data—user instructions, UI summary, and GUI summary—allows Blueprint AI to produce a valid, single‐page CraftJS JSON layout that reflects both the content and design cues extracted from the screenshot.

3) getSummariesFromScreenshot.tsx

              
          /*
           * getSummariesFromScreenshot.tsx
           * This file orchestrates how we extract and interpret screenshot data
           * for both textual (UI) and structural (GUI) summaries.
           */
          
          // import { runPythonOcr } from './pythonBridge';
          // import { getUiSummary, getGuiSummary } from './blueprintAiClient';
          
          // interface OcrResult {
          //   text: string;
          //   confidence: number;
          //   bbox: [number, number, number, number];
          // }
          
          // interface SummariesRequest {
          //   // Optional screenshot buffer provided by the user.
          //   rawScreenshot?: Buffer;
          // }
          
          // interface SummariesResponse {
          //   // Summarized text from recognized lines (e.g., nav links, category lists).
          //   uiSummary: string;
          //   // Summarized layout structure (e.g., columns, header, color scheme).
          //   guiSummary: string;
          // }
          
          /**
           * Main function for extracting and summarizing content from a screenshot.
           * 1) Optionally runs Python-based OCR (if rawScreenshot is provided).
           * 2) Calls getUiSummary to transform recognized text into a short bullet list.
           * 3) Calls getGuiSummary to describe the overall GUI layout (header, columns, etc.).
           * 4) Returns both summaries to the caller for further usage (e.g., final layout generation).
           */
          export async function getSummariesFromScreenshot(
            request: SummariesRequest
          ): Promise {
            let recognizedText = '';
          
            if (request.rawScreenshot) {
              // 1) OCR step: This typically involves spawning Python, reading the image,
              // and returning recognized lines. We'll store the joined text in recognizedText.
              recognizedText = 'Extracted text lines via runPythonOcr...';
            }
          
            // 2) The UI summary is a short bullet-style text capturing the major interface items.
            const uiSummary = await getUiSummary({
              text: recognizedText,
              screenshot: request.rawScreenshot,
            });
          
            // 3) The GUI summary focuses on layout and visual structure. We only run it if we have an actual screenshot.
            let guiSummary = '';
            if (request.rawScreenshot) {
              guiSummary = await getGuiSummary({ screenshot: request.rawScreenshot });
            }
          
            // 4) Return both summaries, so higher-level code can decide how to merge them into a final layout.
            return { uiSummary, guiSummary };
          }

This module begins by determining whether a screenshot is present. If yes, it invokes runPythonOcr to obtain recognized text lines. That text is then passed to getUiSummary for a concise bullet list of UI elements (like navigation items). Simultaneously, if the screenshot is available, we generate a high‐level layout description via getGuiSummary. The caller ultimately receives two summaries—one for UI text, another for structural layout—allowing the system to combine both in the final layout generation step.

4) pythonBridge.ts

                      
          import * as vscode from 'vscode';
          import * as fs from 'fs';
          import * as path from 'path';
          import { spawn } from 'child_process';
          import { randomBytes } from 'crypto';
          import { getExtensionContext } from '../utils/extensionContext'; // <--- IMPORTANT
          
          /**
           * Runs the Python OCR script using the screenshot buffer as input.
           *
           * Expects:
           *   /python-ocr/venv/Scripts/python.exe
           *   /python-ocr/ocr_service.py
           *
           * Returns an array of OCR result objects parsed from the Python script's stdout,
           * where each object typically includes:
           *   - text: The recognized string from the image
           *   - confidence: A floating-point confidence score
           *   - bbox: An array [minX, minY, maxX, maxY] bounding the recognized text
           */
          export async function runPythonOcr(screenshotBuffer: Buffer): Promise {
            // 1) Retrieve our extension context from the shared manager.
            //    This helps us access the extension's installation root and storage paths.
            const extensionContext = getExtensionContext();
          
            // 2) The extension's root directory where python-ocr folder resides.
            const extensionRoot = extensionContext.extensionUri.fsPath;
          
            // 3) Build the full paths to the Python executable & the OCR script:
            //    python.exe is assumed under the venv, while ocr_service.py performs EasyOCR.
            const pythonPath = path.join(
              extensionRoot,
              'python-ocr',
              'venv',
              'Scripts',
              'python.exe'
            );
            const scriptPath = path.join(extensionRoot, 'python-ocr', 'ocr_service.py');
          
            // 3A) Verify both python.exe and the script exist to avoid runtime issues.
            if (!fs.existsSync(pythonPath)) {
              throw new Error(\`Cannot find Python interpreter at: \${pythonPath}\`);
            }
            if (!fs.existsSync(scriptPath)) {
              throw new Error(\`Cannot find OCR script at: \${scriptPath}\`);
            }
          
            // 4) Generate a unique temporary file name for the screenshot (PNG).
            const tempName = \`temp_screenshot_\${randomBytes(4).toString('hex')}.png\`;
          
            // By using globalStoragePath, we ensure a consistent place to store files,
            // even if a user has no local workspace open.
            const tempFilePath = path.join(extensionContext.globalStoragePath, tempName);
          
            // 4A) Create the globalStoragePath folder if it doesn't exist, to ensure we can write a file.
            if (!fs.existsSync(extensionContext.globalStoragePath)) {
              fs.mkdirSync(extensionContext.globalStoragePath, { recursive: true });
            }
          
            // 4B) Write the screenshot data to the temp file so the Python script has a real image to read.
            try {
              fs.writeFileSync(tempFilePath, screenshotBuffer);
            } catch (err) {
              throw new Error(\`Failed to write temp file at \${tempFilePath}: \${err}\`);
            }
          
            // 5) Spawn the Python process, passing the script and the temp file path as arguments.
            return new Promise((resolve, reject) => {
              const pyProcess = spawn(pythonPath, [scriptPath, tempFilePath], {
                cwd: extensionRoot, // Ensures correct working directory for the script
              });
          
              let stdoutData = '';
              let stderrData = '';
          
              pyProcess.stdout.on('data', (chunk) => {
                stdoutData += chunk.toString();
              });
          
              pyProcess.stderr.on('data', (chunk) => {
                stderrData += chunk.toString();
              });
          
              // Handle the script's exit event:
              pyProcess.on('close', (code) => {
                // Attempt to remove the temp file, whether success or fail.
                try {
                  fs.unlinkSync(tempFilePath);
                } catch (cleanupErr) {
                  console.warn(\`Warning: Failed to remove temp file: \${tempFilePath}\`, cleanupErr);
                }
          
                if (code === 0) {
                  // If exit code 0, parse JSON from stdout.
                  try {
                    const results = JSON.parse(stdoutData);
                    resolve(results);
                  } catch (err) {
                    reject(
                      new Error(
                        \`Failed to parse JSON output from Python OCR script.\n\` +
                        \`Error: \${err}\n\nRaw stdout:\n\${stdoutData}\`
                      )
                    );
                  }
                } else {
                  // Non-zero exit code => some error occurred during OCR or script execution.
                  const errorMessage =
                    \`Python OCR script exited with code \${code}.\n\` +
                    \`stderr:\n\${stderrData.trim()}\n\` +
                    \`stdout:\n\${stdoutData.trim()}\n\` +
                    \`Check that your python-ocr setup is correct.\`;
          
                  // Optionally display a VSCode UI message for clarity.
                  vscode.window.showErrorMessage(errorMessage);
                  reject(new Error(errorMessage));
                }
              });
          
              // If the Python process fails to spawn at all:
              pyProcess.on('error', (err) => {
                try {
                  fs.unlinkSync(tempFilePath);
                } catch (cleanupErr) {
                  console.warn(\`Warning: Failed to remove temp file: \${tempFilePath}\`, cleanupErr);
                }
                reject(new Error(\`Failed to spawn Python OCR process: \${err}\`));
              });
            });
          }

This file provides the link between Blueprint AI and its Python-based OCR workflow. After writing the screenshot buffer to a temporary file, the runPythonOcr function spawns a Python process to run ocr_service.py. That script uses EasyOCR to detect text in the image, returning a JSON array of recognized lines (including confidence scores and bounding boxes). Upon successful completion, the resulting text blocks are parsed and sent back to the TypeScript layer for further summarization by the AI prompts. Any error or Python exit code mismatch is handled gracefully, ensuring the system can surface useful debug info if something goes wrong.

5) ocr_service.py

              import sys
          import json
          import cv2
          import numpy as np
          import easyocr
          
          def upscale_if_needed(img_bgr, min_width=1200):
              """
              If the image width is below min_width, scale it up by a factor
              that ensures at least min_width. Helps EasyOCR see small fonts better.
              """
              h, w = img_bgr.shape[:2]
              if w < min_width:
                  scale_factor = min_width / w
                  new_w = int(w * scale_factor)
                  new_h = int(h * scale_factor)
                  img_bgr = cv2.resize(img_bgr, (new_w, new_h), interpolation=cv2.INTER_CUBIC)
              return img_bgr
          
          def minimal_preprocess(image_path: str):
              """
              Minimal approach:
              1) Load color image with OpenCV
              2) If width < 1200, upscale
              3) Convert to grayscale
              (No further morphological or thresholding to avoid corrupting simpler images.)
              """
              img_bgr = cv2.imread(image_path, cv2.IMREAD_COLOR)
              if img_bgr is None:
                  raise ValueError(f"Could not load image from: {image_path}")
          
              # 1) Upscale if needed
              img_bgr = upscale_if_needed(img_bgr, min_width=1200)
          
              # 2) Convert to grayscale
              gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
              return gray
          
          def run_easyocr(numpy_image):
              """
              Use EasyOCR with GPU if available (fallback CPU).
              paragraph=True merges lines into blocks for complicated text.
              """
              # ‘verbose=False’ to skip progress bars that sometimes cause Unicode issues on Windows
              reader = easyocr.Reader(['en'], gpu=True, verbose=False)
              results = reader.readtext(numpy_image, detail=1, paragraph=True)
              return results
          
          def main():
              # Attempt to set stdout to UTF-8 on Windows, in case the console is CP1252
              try:
                  sys.stdout.reconfigure(encoding='utf-8')
              except Exception:
                  pass
          
              if len(sys.argv) < 2:
                  print("Usage: ocr_service.py ", file=sys.stderr)
                  sys.exit(1)
          
              image_path = sys.argv[1]
          
              # 1) Minimal Preprocessing
              processed = minimal_preprocess(image_path)
          
              # 2) Perform OCR
              ocr_results = run_easyocr(processed)
          
              # 3) Build structured results, handling variable output formats
              output_data = []
              for result in ocr_results:
                  # result might be (coords, text) or (coords, text, conf)
                  if not isinstance(result, (list, tuple)):
                      continue
                  if len(result) < 2:
                      continue
          
                  coords = result[0]
                  text = result[1]
                  confidence = result[2] if len(result) >= 3 else 1.0
          
                  # coords => bounding box corners
                  xs = [pt[0] for pt in coords]
                  ys = [pt[1] for pt in coords]
                  min_x, max_x = int(min(xs)), int(max(xs))
                  min_y, max_y = int(min(ys)), int(max(ys))
          
                  output_data.append({
                      "text": text,
                      "confidence": float(confidence),
                      "bbox": [min_x, min_y, max_x, max_y],
                  })
          
              # 4) Print JSON to stdout
              print(json.dumps(output_data, ensure_ascii=False))
          
          if __name__ == "__main__":
              main()

This Python script performs OCR using EasyOCR in a stepwise fashion. It first upscales images below a certain width (improving recognition on small fonts), then converts them to grayscale to simplify processing. The run_easyocr function is configured with paragraph=True to merge lines into blocks for more coherent results. Finally, each recognized line is collected along with its bounding box and confidence score, and output as structured JSON to stdout. This JSON is then consumed by the TypeScript layer via pythonBridge.ts.

The Frontend

AiSidebar Component

                  Loading sidebar code...

The AiSidebar component is a sophisticated, interactive module designed to streamline the integration of AI-driven design enhancements within the Craft.js editor.

It establishes a mini “AI workflow” that seamlessly combines multiple input modalities and real-time feedback mechanisms. Specifically, the component leverages Craft.js to continuously monitor and reflect the currently selected element in the editor, ensuring that any AI-generated modifications are contextually relevant. Additionally, it interfaces with a global store to persist and synchronize user prompts, guaranteeing that changes remain consistent across sessions.

Users can input their design intentions through a text area or by uploading an image. The image undergoes client-side validation and is converted into a base64-encoded preview for immediate visual confirmation.

When the “Generate” button is triggered, the component simulates an asynchronous AI process that processes the user’s input—whether textual, visual, or a combination of both—to produce a preliminary design layout. This process is visually supported by a loading overlay that informs the user of ongoing operations.

Upon completion of the AI generation, the component presents a generated preview of the proposed changes. If the showAcceptChanges property is enabled, the interface dynamically reveals accept and reject options, allowing users to finalize or discard the AI modifications with granular control.

This feedback loop not only promotes rapid prototyping but also empowers designers and developers to iteratively refine their projects with minimal disruption. Overall, the AiSidebar is engineered to bridge manual design input with automated AI enhancements, thereby fostering an efficient, user-centric, and adaptive design process.

ExportMenu Component

                  
                Loading export menu code...

The ExportMenu component is designed to capture the current layout from the Craft.js editor’s canvas, transform it into clean HTML, and prepare a basic CSS template for further editing. This is achieved by identifying the DOM element labeled #droppable-canvas-border and extracting its HTML structure to ensure that any user-configured elements and styles are preserved during export.

In its simplified form, the component maintains two key pieces of state: one for the generated HTML and another for a default CSS snippet. When the user clicks the Export button, the raw HTML is retrieved, beautified for readability, and saved in a React state variable. This streamlined HTML is then paired with minimal CSS to offer a starting point for any future design modifications.

Once the HTML and CSS are prepared, the ExportMenu transitions to the ExportEditorView, passing along the captured code. This ensures a seamless handoff, enabling the user to continue refining and customizing the exported layout within a more comprehensive editor interface. The ExportMenu thereby simplifies the process of extracting a layout from the editor for external usage.

Finally, the ExportMenu provides a concise yet effective user experience by combining clear layout extraction, basic styling defaults, and an intuitive UI flow. This minimalistic structure allows for straightforward integration with broader functionality, such as folder selection and multi-page exports, while maintaining clarity in its core export purpose.

ExportEditorView Component

                  
                Loading export editor view code...

The ExportEditorView component serves as a post-processing workspace where users can review, refine, and finalize the HTML/CSS code generated by the ExportMenu. Upon receiving the initial HTML and CSS, it uses Monaco Editor to provide a rich, code-centric editing environment. This approach facilitates direct manipulation of the exported layout, giving developers the flexibility to tweak or enhance their pages before saving.

Beyond simply displaying the raw code, the ExportEditorView runs a specialized routine that gathers computed styles from every element within #droppable-canvas-border. These styles, which include the precise browser-calculated CSS properties, are then beautified and appended to the existing stylesheet. This ensures that any responsive or dynamic changes made during the design process are accurately captured in the final export.

The component also features a download as ZIP function, bundling the updated HTML and CSS files into a compressed archive. This allows users to conveniently store and share their designs. By incorporating JSZip and FileSaver, the ExportEditorView automates the packaging process, minimizing manual file handling.

Ultimately, the ExportEditorView component bridges the gap between raw layout output and a polished, ready-to-use design asset. It elevates the user experience by integrating real-time code editing and practical file export capabilities, thereby enhancing the efficiency and completeness of the Craft.js editor’s export workflow.

ContainerComponent

                  
                Loading container component code...

The Container component is a flexible, multi-purpose layout element specifically designed for use with Craft.js. It supports four distinct layout types (container, row, section, and grid) to accommodate various design scenarios—ranging from basic flex boxes to more complex grid structures. By merging default properties and user-defined settings, the Container makes it straightforward to configure margins, padding, borders, shadows, and background colors, ensuring visually appealing and well-organized interfaces.

This component leverages Craft.js hooks like useNode to integrate seamlessly with the editor environment, allowing developers to drag, drop, and resize elements within a live editing interface. For instance, if the Container is not the root element, it wraps its contents in a custom Resizer component, enabling users to manually adjust its dimensions. If it is the root, it enforces constraints and styles that differentiate it from child containers.

Each layout type is accompanied by relevant style properties—like gap, flex direction, grid columns, or row gap—giving precise control over element alignment and spacing. This approach keeps your layouts adaptable and modular, while the Container component’s user-friendly settings panel (exposed through ContainerProperties) simplifies customization.

Data storage

                      
                      Loading dataStore code...

In Blueprint AI, all persistent data is stored locally in the user's browser via localStorage—specifically keyed under "blueprint-ai-data". This local storage mechanism is powered by the logic in our store.ts file and is designed to conditionally load and save project data for every session of the Blueprint AI extension. This allows for immediate retrieval of user preferences and project details within the same browser environment, without reliance on external databases or cloud services. Everything below describes precisely how Blueprint AI performs, updates, and retrieves these data fields from localStorage, always conditioned on the logic in store.ts:

1) BlueprintAI Store State Shape:
  • Our local store is defined by the StoreState interface in store.ts, which includes exactly four fields: pages, selectedPageId, suggestedPages, and userPrompt.
  • The pages array is an exhaustive list of all the user-created or AI-suggested pages, each represented by the Page interface (id, name, an optional thumbnail, and the layout tree in CraftJS JSON). By default, only one page exists (id: 1, named “Page 1”). Blueprint AI conditionally populates this array each time the user or the AI system adds or modifies a page.
  • The selectedPageId indicates which page is currently being edited in the Blueprint AI interface. This conditional pointer ensures that the design canvas, properties sidebar, and other features always reference the appropriate page.
  • The suggestedPages array holds additional recommended page names (e.g., “Account,” “Buy Again,” “Best Sellers,” “Returns & Orders”) that Blueprint AI proposes to the user. These suggestions are surfaced in the Pages Sidebar or within other modals to guide potential new pages the user may want to generate.
  • The userPrompt string is a flexible area for saving any text prompt that the user entered in the AI-driven flows (such as designing a new layout, adjusting an existing design, or describing new features). Each time a user interacts with the iterative AI chat or the “Create With Imagination” page builder, Blueprint AI conditionally updates userPrompt so that it remains accessible across sessions.

2) Default Local State:
  • The initial data structure is declared inside storeState in store.ts. This default includes one sample page and an empty userPrompt—ensuring a consistent starting point for first-time or reset sessions in Blueprint AI. The store is primed with four default suggested pages. This ensures that even before the user creates or loads anything, there's a clear reference in the UI to build from.
  • Blueprint AI only populates local storage with these defaults if no prior saved data exists under "blueprint-ai-data". If there is existing data, the store merges the fields from local storage into memory conditionally.

3) Conditional Loading at Startup:
  • On every launch of the Blueprint AI extension, the code attempts to retrieve the JSON string from localStorage.getItem(STORAGE_KEY). If savedData is non-null, it conditionally parses the string and merges each key into the current storeState. For example, if the parsed data has pages, it updates storeState.pages; if it has selectedPageId, it sets that too, etc.
  • If the user had previously created multiple pages or typed in a multi-sentence prompt, all of that is immediately reloaded into the Blueprint AI interface on extension open. This ensures a frictionless user experience where previous session designs or AI prompts are restored exactly as they left them.

4) Accessing Stored Data (Getters):
  • Blueprint AI uses dedicated getter functions from store.ts to conditionally read data from memory, such as getPages() for the full pages list, getSelectedPage() for the currently active page object, getSuggestedPages() for recommended page names, and getUserPrompt() for the last user prompt. Because the store synchronizes to local storage on demand, these getter calls reflect precisely what's persisted in the browser when saved.
  • For example, when the user opens the Pages Sidebar in Blueprint AI, the application calls getPages() to render the entire list of local pages. Likewise, the AI Chat Flow reads getUserPrompt() to show the user’s most recent text input in the chat or iteration interface.

5) Handling State Changes (Subscriptions):
  • Multiple arrays of listener functions exist within store.ts, each of which is notified conditionally when a relevant section of the store changes (e.g., pageListeners, selectedPageListeners, and promptListeners). This ensures that whenever the user or the AI modifies the layout or updates the user prompt, the corresponding parts of the Blueprint AI interface re-render automatically.
  • By subscribing to pageListeners, any UI or logic that depends on the array of pages or suggested pages will be refreshed. Similarly, components reliant on which page is currently selected subscribe to selectedPageListeners, and features tied to user input text watch promptListeners. This subscription model helps maintain a dynamic, reactive environment for the entire Blueprint AI design experience.

6) Updating and Saving (Mutations):
  • setPages(newPages) replaces the entire local pages array with a new list. For example, the AI might generate a fresh layout for the user’s “Buy Again” page, and in response, setPages stores the updated structure. In Blueprint AI, once the user finalizes or accepts an AI response, the relevant page is replaced or appended.
  • updatePage(id, partialData) merges changes into a particular Page object, such as if the user updates the name from “Page 1” to “Home Page,” or modifies the layout JSON with an AI-generated snippet. This function is used heavily in any direct manipulation of a single page (dragging a component in the CraftJS canvas, etc.).
  • setSelectedPageId(id) changes which page is currently active. For instance, if the user navigates from “Page 1” to “Best Sellers,” setSelectedPageId updates the local store and triggers selectedPageListeners to recast the design canvas.
  • setSuggestedPages(newPages) is called conditionally when the AI or user wants to refresh the recommended page list. Blueprint AI might push new suggestions after seeing what the user typed into the AI Chat. This ensures Pages Sidebar always shows relevant next-page ideas.
  • setUserPrompt(newPrompt) is invoked whenever the user edits the text prompt or when the AI modifies it for iterative flows. The store updates userPrompt accordingly, and the entire system can respond in real time.

7) Local Persistence Workflow:
  • At any point after these setter or updater functions run, the saveStoreToLocalStorage() function can be called to write the current storeState object back into localStorage. Internally, it uses JSON.stringify on the entire store (pages, selectedPageId, suggestedPages, userPrompt) and places it under the key STORAGE_KEY, i.e. "blueprint-ai-data".
  • Because saving happens conditionally upon user interactions or explicit calls, no large overhead or complex logic is needed. The user can also trigger a “Save Locally” button from within Blueprint AI’s main sidebar, which calls saveStoreToLocalStorage() in the background.

8) Resetting the Store:
  • When the user requests a full reset—perhaps by hitting “Refresh All Pages” or “Clear Storage”—Blueprint AI calls clearStoreFromLocalStorage(). This removes the entire key/value pair from localStorage and resets the in-memory storeState to the default structure (one page named “Page 1,” default suggestions, and empty user prompt).
  • Subscriptions are notified once again so that any UI depending on the store quickly reverts to a blank state. This is crucial for scenarios where the user wishes to begin a fresh project or discard all AI-suggested designs.

9) Blueprint AI Context-Specific Usage:
  • First Page Creator Flow: By default, a single “Page 1” is stored. As soon as the user types a text prompt (like “Create an eCommerce homepage with a big hero banner”) or uploads an image, the AI generates a new layout. The store’s pages array is updated, and saveStoreToLocalStorage() is invoked. If the user closes the extension and reopens it, the generated page is restored from localStorage.
  • Main Interface & Canvas: If the user reorders a button or changes a text component inside the CraftJS canvas, updatePage() merges the new layout structure. The Properties Sidebar might also call updatePage() when editing margins, backgrounds, or other design attributes. Each modification can be saved locally so that the user’s design is retained.
  • Pages Sidebar & Suggested Pages: The suggestedPages field in storeState is updated conditionally to reflect any new or removed suggestions. Once the user picks one of these suggestions (“Returns & Orders,” for example) and requests an AI layout, the store adds a new page object. No external DB is used; it is purely local to blueprint-ai-data.
  • Export Menu: The selected pages to export and their layout data are all pulled from store.ts. Because everything is stored locally, the user’s entire editing session is readily available to transform into a downloadable zip. This is done without sending any user design data to external services once it is in the local store.

Therefore, Blueprint AI ensures that every aspect of local data management—from retrieving initial saved states on extension load, to conditionally updating pages during the design process, to finalizing or clearing data—is precisely handled through the store.ts file. This local storage approach offers immediate read/write access, zero external dependencies, and complete user control over saving and resetting, reflecting Blueprint AI’s mission to keep front-end development streamlined, private, and user-friendly.

Packages and APIs

Key Technologies

React

Primarily used for building the web UI for the VS Code extension, providing a component-based approach for rendering the interactive front-end.

CraftJS

Provides a drag-and-drop design layer, enabling dynamic page editing and layout manipulation inside the custom interface.

Fluent UI

Leverages Microsoft’s design language and components for consistent styling and responsive elements across the extension’s UI.

Node.js

Powers the backend side of the extension environment, facilitating scripts, package management, and interactions with VS Code APIs.

TypeScript

Ensures robust typing and improved developer experience throughout the codebase, reducing runtime errors and enhancing scalability.

Python / Axios

Python powers backend automation and OCR processing, while Axios is used for efficient HTTP requests and data integration in the frontend.

Below is a definitive list of the relevant packages, modules, and APIs used throughout the system, as provided in the prompt. The project consists of both Node.js packages (for the VS Code extension and web UI) and Python packages (for the OCR functionality). It also defines specific internal APIs and functions that handle AI requests, OCR, and layout generation.

Node.js Packages

@babel/core@7.26.0

@craftjs/core@0.2.11

@craftjs/layers@0.2.6

@emotion/react@11.14.0

@emotion/styled@11.14.0

@eslint/js@9.17.0

@fluentui/react@8.122.5

@fullhuman/postcss-purgecss@7.0.2

@monaco-editor/react@4.7.0

@mui/icons-material@6.3.1

@mui/material@6.3.1

@mui/system@6.4.0

@types/classnames@2.3.0

@types/file-saver@2.0.7

@types/js-beautify@1.14.3

@types/node@22.10.6

@types/react-color@3.0.13

@types/react-dom@18.3.5

@types/react-grid-layout@1.3.5

@types/react@18.3.18

@types/styled-components@5.1.34

@types/uuid@10.0.0

@vitejs/plugin-react-swc@3.7.2

autoprefixer@10.4.20

babel-plugin-inline-react-svg@2.0.2

classnames@2.5.1

cross-env@7.0.3

cssnano@7.0.6

debounce@2.2.0

eslint-plugin-react-hooks@5.1.0

eslint-plugin-react-refresh@0.4.16

eslint@9.17.0

file-saver@2.0.5

globals@15.14.0

js-beautify@1.15.4

jszip@3.10.1

konva@9.3.18

lzutf8@0.6.3

monaco-editor@0.52.2

postcss-import@16.1.0

postcss-preset-env@10.1.3

postcss@8.5.0

re-resizable@6.10.3

react-color@2.19.3

react-colorful@5.6.1

react-contenteditable@3.3.7

react-dom@18.3.1

react-grid-layout@1.5.0

react-icons@5.5.0

react-konva@18.2.10

react-loading@2.0.3

react-player@2.16.0

react-rnd@10.4.14

react-router-dom@7.1.1

react-router@7.1.1

react-youtube@10.1.0

react@18.3.1

sharp@0.33.5

styled-components@6.1.14

tailwindcss@3.4.17

typescript-eslint@8.19.1

typescript@5.6.3

uuid@11.0.4

vite@5.4.11

Python Packages (OCR Script)

• numpy

• opencv-python-headless

• easyocr (installed automatically with PyTorch)

Internal AI and OCR APIs

• getBlueprintLayout(...)
Main entry point for AI-based layout generation. Accepts user text and optional screenshot to produce a final single-page CraftJS layout in JSON form.

• getSummariesFromScreenshot(...)
Handles screenshot data by running OCR (via runPythonOcr(...) and ocr_service.py) and then generating uiSummary and guiSummary through AI calls.

• runPythonOcr(...)
Invokes the Python script ocr_service.py to extract textual content from an image using EasyOCR.

• getUiSummary(...) and getGuiSummary(...)
Summaries of textual content (UI elements) and visual layout (GUI structure) derived from screenshot OCR data. Uses OpenAI with specialized prompts (UI_SUMMARY_META_PROMPT and GUI_SUMMARY_META_PROMPT).

• getFinalCraftJsLayout(...)
Synthesizes the final CraftJS layout JSON from user input and the screenshot summaries, leveraging FINAL_CRAFTJS_META_PROMPT.

• callOpenAiChat(...)
A helper for all OpenAI requests. Chooses the model based on whether a screenshot is involved. Returns the raw text response from OpenAI.

Key Frontend Modules

• AiSidebar.tsx: Handles the user chat flow (prompt + optional image). Sends messages to blueprintAI.generateLayout and receives AI layout results.

• SuggestedPages.tsx and CreateSelectedPage.tsx: Provide suggested page names, allow prompting with text and an optional image, and post to blueprintAI.generateLayoutSuggested.

• BlueprintAiService.ts, pythonBridge.ts, and getSummariesFromScreenshot.ts: Coordinate calls to OpenAI and the OCR Python script, returning final layout data.