Motivation
Finite state automata underlie many foundational areas of computer science, including regular language definitions, state machines in programs, model checking, and more. Their simple visual representation makes hand-drawing an automaton on paper or a whiteboard the typical first step of implementation. However, the translation from a visual representation to a working implementation is often tedious and error-prone. This project aims to create a system that takes a hand-drawn automaton, derives logical constraints from the image, and synthesizes an implementation of the desired automaton.
Solving this problem is important because it will reduce the time and potential for bugs when implementing systems based on automata. This reduction enables software developers to focus more on the program logic and less on the minutia of implementation. I am personally interested in this problem because my area of research is in program synthesis, and an image-based program specification is novel in this area. Plus, I have at many times had to do this automaton-image-to-implementation translation manually, and it was time-consuming and error prone. Having a system as proposed would have been beneficial to me.
Project Goal
The goal of this project is to provide a system for taking an image of an automaton drawn on a whiteboard (including positive/negative examples) and synthesize an executable program that implements the drawn automaton. Formally, we desire to extract the following components from the image:
- \(Q’\), the set of observed states
- \(F’\), the set of observed accepting states
- \(T’\), the set of observed transitions, where each element is a tuple \((Q\times\Sigma\times Q)\)
- \(S^+\) and \(S^-\), the sets of positive and negative examples
From these, find:
- \(Q\), the set of “true” states, including a zero (terminal error) state and zero or more “extra” states
- \(F\), the set of “true” accepting states. In the current implementation, this is always equal to \(F’\)
- \(q_0\), the initial state
- \(\delta : Q \times \Sigma \to Q\), a closed-form transition function
This is a true program synthesis problem, as \(T’\) may not be complete and no initial state is provided from the image. It is possible to do a “mechanical” translation from \(T’\) to \(\delta\); however, this assumes all transitions for all states and symbols are specified (or have reasonable defaults), and that all states (except a possible zero state) are present in the image. Additionally, this does not take into account any examples provided by the user, which does not permit error checking by the tooling.
Prior Art
Recognizing hand-drawn finite automata diagrams have seen recent work. Bresler, Průša, and Hlaváč (2016)1 studied \textit{on-line} recognition of flow charts and finite automata, where the input is a set of line strokes from a drawing program, and it produces a structural description of the diagram. This work was extended to off-line recognition by deriving line strokes from images and applying the existing techniques for solving on-line recognition. Other approaches, such as that of Schäfer, Keuper, and Stuckenschmidt (2021)2 use off-line recognition to directly derive diagram features from the images using deep learning. I am less interested in deep learning techniques and more interested in “classical” computer vision techniques (e.g., engineering specific features instead of using deep learning) due to the end result of creating a program specification, which must be exact and precise. I expect the format of a drawn diagram to be more restricted in my system as opposed to a neural-network-based approach, but I believe the predictability and observability of my approach will increase confidence that the correct program specification is produced, as opposed to one that appears similar but is incorrect.
I was not able to find prior work for this exact problem, going from an automata image all the way to a synthesized program. Similar approaches either rely on drawing the automaton on the computer (so the system is directly told all constraints and is simply rendering the automaton), or direct input of the automaton constraint. Using an image as a program specification in this manner appears to be mostly3 novel. Existing approaches either are focused on UI design (e.g., Microsoft’s Sketch2Code) or synthesizing image manipulations (e.g., using the actual image data as a specification, like Ellis, Ritchie, et al., 20184), as opposed to specifying an executable program through diagrams in an image (e.g., extracting features and text that specify a program). For this particular case of automata, deriving a specification from an image is a natural step forward for usability.
Feature Extraction
To extract the observed states, transitions, and examples from the automaton image, we use “classical” computer vision techniques, relying heavily on morphological properties to distinguish the relevant features of the image. In particular, we detect states and example boxes through their circularity (resp. rectangularity), and arrows through their endpoints and orientation.
Engineering these features proved to be a challenging problem, mostly related to the medium: whiteboards are often dirty in practice, creating an uneven “whiteboard noise”, compounded by messy drawing and uneven marker strokes. This leads to signifiant reliance of heuristics to clean and distinguish features. However, we were ultimately able to engineer features which worked consistently across all our example images.
Read more about feature extraction…
Specification Generation
Once the states, transitions, and examples have been extracted from an automaton image, a formal specification in SMT-LIB2 is generated. This specification provides a declarative description of the automaton, laying out the states, transitions, and examples directly. This is in contrast to the target program, which is a closed-form executable transition function.
Read more about specification generation…
Synthesis
Once the SMT-LIB2 specification is generated, synthesis is as simple as passing the specification to an SMT solver and requesting it to check satisfiability (“does a transition function and initial state exist such that the specification is true”) and produce a model (the satisfying function and initial state). This model contains both the desired initial state and a closed-form transition function. The SMT solver can also report back that the specification is unsatisfiable, indicating an invalid automaton.
Read more about our program synthesis implementation…
Results
Our system was able to successfully synthesize all of the ten example automata prepared. The automata were divided into two categories: automata A1, A2, and A3 were development images, against which the system was initially developed. The remaining automata were designated as test automata, to evaluate the performance of the system. Three of these images were synthesized as-is, while the remaining four required tweaks to the system heuristics to properly detect all components.
Name | All Features? | Extra States? | Tweaks Needed? |
---|---|---|---|
A1 | Yes | 0 | Development |
A2 | Yes | 1 | Development |
A3 | Yes | 1 | Development |
Odd One | Yes | 0 | Transitions, OCR Heuristics |
One of Each | Yes | 0 | None |
Stutter | Yes | 0 | None |
Multiple of 3 | Yes | 0 | OCR Heuristics |
ASCII Digits | Yes | 0 | Transitions |
Syllables | Yes | 0 | None |
Five Ones | Yes | 0 | Transitions |
Discussion and Future Work
We were able to successfully engineer features that worked for all of our images, allowing 100% accuracy. This claim is, ultimately, a bit misleading, as many of the images required significant effort to tweak the heuristics enough to properly recognize the features while continuing to work on the previous images.
Despite the required effort, I think this approach is valuable in this context of program synthesis, as opposed to other deep learning techniques. In particular, as we translating images for formal logic, it is more valuable to get an answer of “cannot recognize” from this system rather than some arbitrary value from a deep learning system.
Another advantage is debuggability: during development, I was able to track down exactly why various features
were not being recognized and correct the system to account for the problem. This is in contrast to the OCR
system: I was not able to debug strange results, such as the confusion of B
and 1
, and I needed to use a
layer of heuristics after the OCR process to account for situations like this.
Perhaps a two-layer approach would be valuable: a deep learning system segments the image and performs some initial feature recognization, while a “classical” layer filters the output of the deep learning system and performs a rational reconstruction of the target automaton.
Three other areas of future work were identified:
-
Improvements to OCR. Currently, only
A
,B
,0
, and1
are supported. It would be nice to accept an arbitrary input language. It would also be nice to improve accuracy to the point where the post-OCR heuristics are no longer needed. -
Additional automata types. Currently, only determinstic finite automata are supported. Extending this to non-deterministic finite automata would be relatively trivial, and it would be interesting to attempt more exotic types, such as push-down automata, transducers, and Büchi automata
-
Synthesis under uncertainty. Currently, we accept the states, transitions, and examples provided by the user as absolute truth. However, instead of requiring “perfect” recognition, we could give confidence values of various features to the synthesizer to attempt to construct a “best” program. This is an interesting synthesis problem, but is this something a user would actually want, instead of the system just asking for feedback on uncertainty?
-
Bresler, M., Průša, D. \& Hlaváč, V. Online recognition of sketched arrow-connected diagrams. IJDAR 19, 253–267 (2016). https://doi.org/10.1007/s10032-016-0269-z ↩
-
Schäfer, B., Keuper, M. \& Stuckenschmidt, H. Arrow R-CNN for handwritten diagram recognition. IJDAR 24, 3–17 (2021). https://doi.org/10.1007/s10032-020-00361-1 ↩
-
An honorable mention goes to the joke paper “93% of Paint Splatters are Valid Perl Programs” by McMillen and Toady [ PDF ]. ↩
-
Ellis K, Ritchie D, Solar-Lezama A, Tenenbaum J. Learning to infer graphics programs from hand-drawn images. Advances in neural information processing systems. 2018;31. ↩