1 | = Design Notes
|
---|
2 |
|
---|
3 | == Problems:
|
---|
4 |
|
---|
5 | Translating C to Go is harder than it looks.
|
---|
6 |
|
---|
7 | Jan says: It's impossible in the general case to turn C char* into Go
|
---|
8 | []byte. It's possible to do it probably often for concrete C code
|
---|
9 | cases - based also on author's C coding style. The first problem this
|
---|
10 | runs into is that Go does not guarantee that the backing array will
|
---|
11 | keep its address stable due to Go movable stacks. C expects the
|
---|
12 | opposite, a pointer never magically modifies itself, so some code will
|
---|
13 | fail.
|
---|
14 |
|
---|
15 | INSERT CODE EXAMPLES ILLUSTRATING THE PROBLEM HERE
|
---|
16 |
|
---|
17 | == How the parser works
|
---|
18 |
|
---|
19 | There are no comment nodes in the C AST. Instead every cc.Token has a
|
---|
20 | Sep field: https://godoc.org/modernc.org/cc/v3#Token
|
---|
21 |
|
---|
22 | It captures, when configured to do so, all white space preceding the
|
---|
23 | token, combined, including comments, if any. So we have all white
|
---|
24 | space/comments information for every token in the AST. A final white
|
---|
25 | space/comment, preceding EOF, is available as field TrailingSeperator
|
---|
26 | in the AST: https://godoc.org/modernc.org/cc/v3#AST.
|
---|
27 |
|
---|
28 | To get the lexically first white space/comment for any node, use
|
---|
29 | tokenSeparator():
|
---|
30 | https://gitlab.com/cznic/ccgo/-/blob/6551e2544a758fdc265c8fac71fb2587fb3e1042/v3/go.go#L1476
|
---|
31 |
|
---|
32 | The same with a default value is comment():
|
---|
33 | https://gitlab.com/cznic/ccgo/-/blob/6551e2544a758fdc265c8fac71fb2587fb3e1042/v3/go.go#L1467
|
---|
34 |
|
---|
35 | == Looking forward
|
---|
36 |
|
---|
37 | Eric says: In my visualization of how the translator would work, the
|
---|
38 | output of a ccgo translation of a module at any given time is a file
|
---|
39 | of pseudo-Go code in which some sections may be enclosed by a Unicode
|
---|
40 | bracketing character (presently using the guillemot quotes U+ab and
|
---|
41 | U+bb) meaning "this is not Go yet" that intentionally makes the Go
|
---|
42 | compiler barf. This expresses a color on the AST nodes.
|
---|
43 |
|
---|
44 | So, for example, if I'm translating hello.c with a ruleset that does not
|
---|
45 | include print -> fmt.Printf, this:
|
---|
46 |
|
---|
47 | ---------------------------------------------------------
|
---|
48 | #include <stdio>
|
---|
49 |
|
---|
50 | /* an example comment */
|
---|
51 |
|
---|
52 | int main(int argc, char *argv[])
|
---|
53 | {
|
---|
54 | printf("Hello, World")
|
---|
55 | }
|
---|
56 | ---------------------------------------------------------
|
---|
57 |
|
---|
58 | becomes this without any explicit rules at all:
|
---|
59 |
|
---|
60 | ---------------------------------------------------------
|
---|
61 | «#include <stdio>»
|
---|
62 |
|
---|
63 | /* an example comment */
|
---|
64 |
|
---|
65 | func main
|
---|
66 | {
|
---|
67 | «printf(»"Hello, World"!\n"«)»
|
---|
68 | }
|
---|
69 | ---------------------------------------------------------
|
---|
70 |
|
---|
71 | Then, when the rule print -> fmt.Printf is added, it becomes
|
---|
72 |
|
---|
73 | ---------------------------------------------------------
|
---|
74 | import (
|
---|
75 | "fmt"
|
---|
76 | )
|
---|
77 |
|
---|
78 | /* an example comment */
|
---|
79 |
|
---|
80 | func main
|
---|
81 | {
|
---|
82 | fmt.Printf("Hello, World"!\n")
|
---|
83 | }
|
---|
84 | ---------------------------------------------------------
|
---|
85 |
|
---|
86 | because with that rule the AST node corresponding to the printf
|
---|
87 | call can be translated and colored "Go". This implies an import
|
---|
88 | of fmt. We observe that there are no longer C-colored spans
|
---|
89 | and drop the #includes.
|
---|
90 |
|
---|
91 | // end
|
---|