Fork me on GitHub
#parinfer
<
2017-07-20
>
cfleming01:07:20

@shaunlebron Ok, thanks. My problem is that instead of returning a whole string, I just return a series of edits, since that’s much more efficient in IntelliJ (the edits are nearly always small).

cfleming01:07:41

I’m finding that the edits are not consistently created in one space or the other (input or output)

cfleming01:07:56

I think it’s probably a subtlety in the paren trail handling - I also process the whole document, I don’t break it into lines. So if result.x isn’t correctly updated when closing the paren trail out, that wouldn’t affect your processing but probably affects mine.

cfleming01:07:49

Returning edits also meant that I got the cursor movement for free, since that wasn’t there previously.

cfleming01:07:59

I think the issue is this: when replaceWithinLine is called, that then shifts the x values that you need to use for changes after the current one, unless the edits are consistently made in input space.

cfleming01:07:52

That happens in commitChar because ch is set to "". However it doesn’t happen in correctParenTrail, but I think it doesn’t matter for you because you’re modifying line by line.

cfleming02:07:59

I’m thinking I’m going to try to consistently produce the edits in input space, which is what I had previously and I already have the code for working out the correct deltas to apply the changes. The only complication is that I then need the paren trail x values in input space, hopefully that’s not too tricky.

cfleming02:07:53

@shaunlebron Actually, do you only use the output coords for knowing where to make your edits? If I want my edits to be in input coords, could I simplify things by only working with input coords?

shaunlebron02:07:44

i track my edits in input coords

shaunlebron02:07:05

but I’m not able to follow much of what you said actually

shaunlebron02:07:49

is the divergence in behavior due to an inconvenience in how parinfer.js is behaving, related to intellij?

shaunlebron02:07:16

i know there may be particular reasons for creating separate implementation, perhaps mostly to explore ideas that may inform core—having said that, I think it will be difficult to help you keep yours in sync with parinfer.js as changes will be inevitable due to the pace of smartMode changes

cfleming03:07:23

The edits I’m seeing are definitely not all in input coords. For example, edits are often created from the paren trail x values, and they’re created from result.x

cfleming03:07:40

Should they be created from result.inputX?

cfleming03:07:48

So my initial implementation diverged from how parinfer.js worked for a couple of reasons. Mostly the difference was that I process the whole input document rather than splitting it into lines, to reduce GC pressure.

cfleming03:07:48

Similarly, I return a series of edits to apply rather than returning a whole new document.

cfleming03:07:35

When I was perf testing, node was actually faster at a lot of this string manipulation than the JVM, presumably they have to optimise it based on usage patterns since JS doesn’t have a mutable StringBuffer-like thing.

cfleming03:07:00

That made the performance much better, and wasn’t a significant change to the algorithm.

cfleming03:07:36

However those changes seem to have been complicated by the new input/output coord stuff.

cfleming03:07:06

It may also just be a simple bug in my port, but it’s hard to tell - there’s quite a lot of code now, and it’s subtle stuff in places.

shaunlebron03:07:19

ah, I’m sorry I was thinking of changes when you said edits for some reason

cfleming03:07:46

No, I mean the x values passed to replaceWithinLine

cfleming03:07:17

But it actually seems like the paren trail x values should be in input coords anyway, since they’re compared to the cursorX values.

shaunlebron03:07:59

yeah, edits are incremental, so they have to be in output coords given what i’m currently doing

shaunlebron03:07:09

cursorX is updated when edits are made behind it

cfleming03:07:49

So if I wanted all my edits to be in input coords, do you think I could get rid of the x/inputX distinction altogether?

cfleming03:07:07

Is that likely to complicate my life?

cfleming03:07:41

I accumulate a list of edits, and when I’m applying them maintain a running delta, so I take care of the incremental nature at that time.

shaunlebron03:07:43

I think that running delta will probably be fine

cfleming03:07:21

Ok. I’ll try those changes in my port and see if that helps.

shaunlebron03:07:12

I guess you keep track the line number and column as a derivative value of your “global” index?

cfleming03:07:30

I’ll see afterwards if I could apply the same changes to the JS version, and send a pull request if so. It’s a fairly minor change to not require splitting into lines, and that will help the GC on JS anyway even if it’s better at optimising it.

shaunlebron03:07:15

cool! I think line splitting and array joining is super fast on JS, but probably not elsewhere?

cfleming03:07:34

Right, I maintain an offset which is a char offset into the original doc. Then I count lines as I see newlines, and reset the x at that point.

shaunlebron03:07:06

I just have no concept of how fast insertion and removal of characters inside a giant string can be

cfleming03:07:16

Right, but on a big file you’re still (at least) doubling the memory requirements - in a long-running editor that can add up.

cfleming03:07:54

It would be interesting to see if creating an array of edits was faster in JS too.

shaunlebron03:07:06

this is a bit of a blind spot for me—isn’t a single character insertion inside a giant string reallocating the whole thing anyway?

cfleming03:07:25

I suspect that in any editor that maintains non-trivial indices, applying small edits is likely much faster.

shaunlebron03:07:45

oh, I see, you’re not reallocating, you’re modifying the editor buffer in-place with a list of edits

cfleming03:07:46

Well, editors don’t generally use a giant string for their documents.

shaunlebron03:07:15

I like this idea

cfleming03:07:49

It would be interesting to know if e.g. Atom works natively with a single offset into a document rather than cols/lines - I’m willing to bet that it requires some calculation to work out line/col coords.

cfleming03:07:14

Although they probably do some trickery to make it efficient.

cfleming03:07:30

What might be possible would be an API which returns a list of changes, and then one like the current one that calls the other, and then applies the edits to the original string and returns that. Building that new string is probably still much more efficient than line splitting, since you’re only splitting/joining strings where things have actually changed.

cfleming03:07:18

Ok, I have to go now - I might actually look at getting the JS version working like this first, which would make my port easier, and you can take a look at it.

shaunlebron03:07:37

that would be great, thanks colin!

cfleming03:07:48

Then I can start from something working, and ensure the tests keep working.

cfleming03:07:06

Cool, seeya