Developing a 'Chat Log Manager' Desktop App with Electron

2024-10-18

The code repository can be found at github.com/wangziao9/directory-of-dialogues, maintained periodically.

Motivation: Managing Chat Sessions like Notes

The Universality of Linear Reverse Order Lists

Most AI chat assistants, whether web-based or desktop applications, generally follow ChatGPT's convention for managing chat sessions by providing a linear, reverse chronological session list. This is akin to an LRU cache, which is quite practical in most cases. The problem is that users cannot organize them in any form (folders? canvas?), nor can they directly search the content within.

Although ChatGPT allows data export, providing a massive JSON file and an HTML file for browsing and searching chat logs, when you open the HTML file in a browser, you still get a reverse chronological chat log list.

Sessions as Notes

From my personal usage needs, the lifecycle of a session can be quite long, such as using a session specifically for

Chatting about VPS configuration, asking how to deploy an application today, inquiring about the technical principles of the application a week later, and troubleshooting an error with the deployed application a month later.
Discussing a specific technical direction, like Web development, where all related questions are asked in that session.
Recording cooking: asking about unclear parts of a recipe, taking photos to inquire about how to use kitchen utensils, and finally recording the quality of the finished product to reflect on the reasons for success or failure.
...

The benefit of long lifecycle, specialized sessions is that not only can AI provide more targeted suggestions using the context, but you can also refer to previous Q&A, making the session serve as a note.

The Shortcomings of Linear Reverse Order Lists

Since sessions are treated as notes, having "linear reverse order lists" as the only indexing method is quite poor. Searching for a note requires an O(N) linear scan of the entire list.

Moreover, when low-priority, short-lifecycle sessions are opened, they occupy positions in the list, making your "notes" harder to find.

Traditional electronic notes are managed in a hierarchical folder structure, where, as long as the structure is reasonable, searching is O(log N), with clear structure and efficient searching. The computer's built-in file system provides such a hierarchical structure, and our client only needs to support exporting chat sessions (in JSON format) to the file system or importing from the file system to solve the shortcomings of linear reverse order lists.

JavaScript Programming Challenges

This section mainly references Runoob tutorials.

JavaScript Browser Script Programming vs Node.js Programming

JavaScript has two usage scenarios:

As a script code inserted into HTML, running in the browser (frontend). Common operations include manipulating the DOM Document Object Model, handling events, and making requests to the server (AJAX).
Running on the server-side (backend) relying on a special runtime (Node.js). Common operations include file system operations, network requests, etc.

Generally, the former is called JavaScript programming, and the latter is called Node.js programming. This time, Electron is chosen as the development framework for the desktop application to define both "frontend" and "backend" behavior on the desktop side.

Node.js Module

Node.js modules provide a way for Node.js files to call each other. A simple example can be found at Runoob.

== main.js ==
var hello = require('./hello');
hello.world();

== hello.js ==
exports.world = function() {
  console.log('Hello World');
}

Node.js modules include native modules (such as fs, http), local modules (see the example above), and third-party modules (such as electron).

The ECMAScript 2015 (abbreviated as ES6) standard released in 2015 replaced the CommonJS module system with ES Modules, using import and export keywords for importing and exporting (instead of require and module.exports). However, the Node.js REPL (interactive interpreter) still does not support import.

An in-depth look into JavaScript Runtime

The JavaScript runtime is essentially single-threaded. It maintains an event loop responsible for executing code and handling all pending events.

JavaScript uses tasks and microtasks to schedule asynchronous events. The order of handling the event loop is as follows: 1) Synchronous code, 2) Execute all tasks in the microtask queue until the microtask queue is empty, 3) Execute a task from the task queue. For example:

console.log('Start');

setTimeout(() => {
  console.log('Timeout Task');
}, 0);

Promise.resolve().then(() => {
  console.log('Promise Microtask');
});

console.log('End');

Note that operations like Promise and await register microtasks, while setTimeout registers tasks. Therefore, the output order is:

Start
End
Promise Microtask
Timeout Task

Javascript Asynchronous Programming

The essence of synchronous functions is blocking, while the essence of asynchronous functions is non-blocking with callbacks.

Callback Functions

The concept of callback functions is simple, defining what to do after an asynchronous task ends through a callback function:

// Example: AJAX - Asynchronous JavaScript and XML
var xhr = new XMLHttpRequest();
 
xhr.onload = function () {
    document.getElementById("log").innerHTML += " xhr.onload() ";
    document.getElementById("demo").innerHTML=xhr.responseText;
}
 
// 发送异步 GET 请求
xhr.open("GET", "https://www.runoob.com/try/ajax/ajax_info.txt", true);
xhr.send();
document.getElementById("log").innerHTML += " done ";

In the log element, done appears before xhr.onload(), indicating that xhr.send() did not block execution.

Promise

The Promise constructor includes an "executor function" as a parameter. The executor function is synchronous and executes immediately when the Promise is constructed. It accepts resolve and reject as parameters. If successful, it calls resolve to pass the successful result; otherwise, it calls reject to pass the failed result.

The asynchronous aspect is that resolve and reject change the state of the Promise and trigger the callback functions passed into then and catch.

const promise = new Promise((resolve, reject) => {
  // 异步操作
  setTimeout(() => {
    if (Math.random() < 0.5) {
      resolve('success');
    } else {
      reject('error');
    }
  }, 1000);
});
 
promise.then(result => {
  console.log(result);
}).catch(error => {
  console.log(error);
});

This program will output either "error" or "success".

Q: Since the executor function is synchronous, why can it call the asynchronous setTimeout within its body?
A: As mentioned earlier, the essence of synchronous functions is blocking, while the essence of asynchronous functions is non-blocking with callbacks. There is no rule that prevents calling asynchronous functions within synchronous functions. On the contrary, if a time-consuming operation is needed within a synchronous function f, we prefer to write it as an asynchronous function rather than a synchronous one, as this reduces the blocking of the main thread by f.

Async Function

ECMAScript 2017 (ES8) defines the standard for asynchronous functions (async function), which is widely supported by browsers.

Calling an async function returns a Promise object.

The await keyword can be used to call functions that return a Promise object. In function f, using the await keyword to call g has the semantics of transferring control flow to g and registering the statements after the await statement in function f as a microtask, which will only be executed after g finishes running.

The await keyword can only be used within the body of an async function. This is because the semantics of synchronous functions must ensure that all statements within the function body are executed in order, and the control flow cannot be interrupted.

Comprehensive: Javascript Asynchronous Programming Interview Question

Question: In what order do the debugging statements in the following program output? Why?

async function async1() {
    console.log('async1 start')
    await async2()
    console.log('async1 end')
}
async function async2() {
    console.log('async2 start')
    await async3()
    console.log('async2 end')
}
async function async3() {
    console.log('async3')
}
console.log('script start')
setTimeout(() => {
    console.log('setTimeout')
}, 0);
async1()
new Promise(resolve => {
    console.log('promise1')
    resolve()
})
    .then(() => {
        console.log('promise2')
    })
console.log('script end')

First, execute all statements in order:

First, output "script start"
setTimeout registers a macro task
Execute the async1() function, enter the body of async1, and output "async1 start". Encounter the await keyword, register the unexecuted statements as a microtask, and transfer control flow to async 2
Execute the async2() function, output "async2 start". Encounter the await keyword, register the unexecuted statements as a microtask, and transfer control flow to async 3
async3 outputs "async3", then control flow returns to the lowest level
Construct a Promise, the executor function executes immediately, output "promise 1", and register the function in then as a microtask
At the end, output "script end"

Then clear the microtask queue:

async2 is incomplete, the remaining part of the async1 function cannot execute; async3 is complete, the remaining part of the async2 function executes, output "async2 end"
The callback function of the successfully returned Promise executes, output "promise 2"
The queue is still not empty, async2 is complete, the remaining part of the async1 function executes, output "async1 end"

Finally, execute an element from the macro task queue:

Output "setTimeout"

Structure of Electron Applications

Electron's Process Model

Electron's process model is inherited from Chrome's process model, where each tab uses an independent process for rendering, achieving fault isolation. The Chrome browser uses a main process to manage all tab rendering processes.

Electron application developers control two processes: main and renderer, corresponding to the main process and the rendering process. The main process runs in a Node.js environment, allowing modules to be imported using require; the rendering process operates like a browser, lacking the full Node.js environment and cannot import modules.

When the rendering process initializes, it executes preload.js, which has higher permissions and can expose more interfaces to renderer.js.

Inter-process Communication in Electron

Communication between main.js and renderer.js is provided through the preload script preload.js. Specifically, it involves using contextbridge to register functions that need to be exposed to renderer.js. The simplest example is:

const marked = require('marked');
contextBridge.exposeInMainWorld('api', {
  render: (markdown) => {return marked(markdown)},
  f1: (...) => {...},
  f2: (...) => {...}
}

Callback functions can complicate matters, and the ipcRenderer responsible for communication between the Electron main process main.js and the rendering process renderer.js works precisely using callback functions.

The main process listens to a series of channels and registers callback functions for them, which can complicate inter-process communication.

Packaging Applications for Different Platforms

Electron applications are packaged using Electron Forge to generate applications. Quoting from the tutorial,

npm install --save-dev @electron-forge/cli
npx electron-forge import

This step modifies package.json, package-lock.json, and creates forge.config.js. Then, open the modified package.json and manually fill in the author and description fields. Finally,

npm run make

Electron Forge will by default generate an application suitable for your current platform. To build cross-platform, you need to change the configuration in package.json.

In the out folder, you can find the application suitable for your current platform. If you try to copy it to another folder, you will find it lacks the necessary dll to work, indicating it needs the dll and other resources (including the OpenAI API Key provided by the .env file in the source folder) in the same folder to function.

Pitfalls in Web Development

Input using Textarea

Q: How to make a textarea box for user input occupy the full width?
A: Use flex layout display: flex on the parent element, and use flex-grow: 1 on the textarea element to occupy the remaining space.

Q: How can a textarea adjust its height as the user inputs text?
A: Using CSS alone is not enough; you need to use JS to listen to each input and adjust the height:

// Adjust textarea height based on content
textarea.addEventListener('input', function () {
    this.style.height = 'auto'; // Reset the height
    this.style.height = Math.min(this.scrollHeight, 10 * 1.5 * 16) + 'px'; // Limit to 10 lines
});

The key logic here is to manually set the height to the element's scroll height (scrollHeight). Without this line, the textarea will not adjust its height.

How to Specify HTML Elements

By opening the developer tools in a web browser (including Electron windows) and selecting the console, you can access global variables in renderer.js. Alternatively, start from the document variable to access the DOM (Document Object Model), for example, document.children[0].children[1]...

After entering the variable of the corresponding HTML element in the console, the developer tools will automatically display the object's type, highlight the corresponding area on the webpage, and provide autocomplete suggestions. Using this method, you can figure out how to access the element you want to modify in your JS code. The HTML elements in the code snippet below were tested in this way.

api.onStreamChunk((chunk) => {
    console.log("chunk = ", chunk);
    // Append chunk to the output element
    c = messageList.children;
    c[c.length-1].children[0].innerText += chunk;
});

Rare Bug: Timing of Setting Callback Functions

Regarding how to dynamically display the response generation process, an automatic code generation tool provided the following solution:

const outputElement = document.getElementById('output'); // Assuming there's an element with id 'output'

async function main() {
  const messages = [{ role: "user", content: "Say this is a test" }];

  // Start the streaming process
  await window.api.startStream(messages);

  // Handle each chunk of data as it arrives
  window.api.onStreamChunk((chunk) => {
    outputElement.textContent += chunk; // Append chunk to the output element
  });

  // Handle the end of the stream
  window.api.onStreamEnd(() => {
    console.log('Stream ended');
  });
}

// Call the main function on page load or button click
main();

In this solution, the response does not appear on the webpage, and the callback functions onStreamChunk and onStreamEnd are not called. Where is the problem?

This await is quite problematic. Before startStream completes execution and returns, the remaining part of the main function will not execute. However, startStream needs to return only after receiving all the chunks (and sending stream-chunk one by one) and sending stream-end at the end. Therefore, when the main process is sending stream-chunk, the corresponding callback functions have not been registered yet, so they cannot be processed.

There are two lessons: firstly, when using await, you need to know what you are doing; secondly, it's best to place the statements for registering callback functions outside of functions, such as at the top of the file, and especially not inside other callback function bodies like f. This can make the code logic confusing (registering every time f is called, is that right?). Below is an example of confusing code logic:

addbutton.addEventListener('click', async () => {
  ...
  api.onStreamChunk((chunk) => {
    ...
  })
})