Fork me on GitHub
#parinfer
<
2016-02-24
>
cfleming00:02:46

BTW benchmark.js shows the main parinfer algorithm running at 8ms, I think the previous benchmarking code didn’t warm the VM up enough.

cfleming04:02:46

Actually, I just updated those with a few things I saw while porting the change to the JVM and force pushed the new changes: https://github.com/cursive-ide/parinfer/commit/0cfa086277accb1371776c8f71707b1ec9402257 and https://github.com/cursive-ide/parinfer/commit/7c99ba029a19bb86a6a44f71b27e56b42e46f233

shaunlebron07:02:50

hey @snoe, loud and clear on the change api concerns. the early change api was particular to input, whereas this new stuff is now about output.

shaunlebron07:02:19

i’m looking through the changes now

shaunlebron07:02:47

I got distracted and read through nvim-parinfer.js, nice code @snoe!

shaunlebron07:02:53

@cfleming, hey thanks for forwarding the changes

shaunlebron07:02:01

I looked at calculateText to get a feel for the structure of result.edits

shaunlebron08:02:54

I’m seeing how the full text can be constructed from these edits

shaunlebron08:02:46

but I’m not seeing how this edit structure helps an editor make incremental changes to a document

shaunlebron08:02:09

i might be stuck in viewing edits as a sequence of mutations to a string though

shaunlebron08:02:41

if I apply the first edit to the document, the location of the second edit must be offset by the amount added or removed by the first edit

shaunlebron08:02:12

maybe parinfer should accept readChar and writeChar editor functions

shaunlebron08:02:28

so we can directly write to and read from the editor buffer

shaunlebron08:02:40

each would have to take a lineNo and x coordinate though

shaunlebron08:02:32

that would allow us to do away with 1. tracking all edits in terms of source space replacements (colin’s new solution) 2. making the edits twice: one for parinfer’s line array, and one for the editor’s buffer (current solution)

shaunlebron08:02:11

and about performance differences, i think any solution that changes the way edits are made will have no significance on JS because string ops seem super optimized, and there seem to be few edits that Parinfer will make in really_long_file (as you mentioned)

cfleming23:02:08

@shaunlebron: Giving parinfer direct access to the editor buffer is an interesting idea. I’d worry that all editors might not allow that, or that it might not be efficient for something like neovim’s RPC.

cfleming23:02:40

You’re basically right that the edits are a sequence of mutations to a string (the editor buffer).

cfleming23:02:16

However applying these edits is likely to be much more efficient than replacing the whole text.

cfleming23:02:04

The problem is that it’s difficult to know what the best return value is to allow efficient implementation across all editors. Perhaps @chrisoakman might have a better idea of that.

cfleming23:02:30

I’m going to try implementing the Cursive version with the diffs since for IntelliJ I’m almost certain it’s better.

chrisoakman23:02:30

catching up on this now

chrisoakman23:02:56

I haven't really looked at the big diffs you linked

chrisoakman23:02:24

Is there a one or two sentence summary of what this is about?

chrisoakman23:02:47

a series of small character changes as opposed to text replacement?

cfleming23:02:52

Basically, my change works by creating a series of diffs to … right

cfleming23:02:41

In IntelliJ, this has a lot of advantages. The document isn’t stored as one linear string, and small diffs can be more efficiently applied.

cfleming23:02:09

It will also create more sensible diff events for anything else listening to document events.

cfleming23:02:22

I speculate that it’s much faster, but have no idea how to benchmark it.

cfleming23:02:01

I thought that the parinfer algorithm would be much faster, but it’s not. It will certainly create less garbage, on the JVM at least, but that doesn’t appear to affect runtime.

chrisoakman23:02:27

that's really interesting; that is definitely not the case for the editors I have worked with

chrisoakman23:02:54

re: this statement "The document isn’t stored as one linear string, and small diffs can be more efficiently applied."

cfleming23:02:24

Oh, and the algorithm doesn’t have to explicitly track cursor movement, since the cursor will be moved automatically when applying the diffs (same for the selection)

cfleming23:02:50

Well, one thing is the API that the editor presents, and another thing is what it uses behind the scenes.

cfleming23:02:12

IntelliJ presents a view of the document as a linear CharSequence, but it’s not stored like that.

chrisoakman23:02:11

are there docs for the IntelliJ API?

cfleming23:02:58

I’d be very surprised if any editor actually used a single char buffer, since the vast majority of editing activity is small insertions and deletions at arbitrary points in the document. This would be very inefficient if you had to copy all the time to do that.

cfleming23:02:08

Not to speak of, what are you looking for?

cfleming23:02:46

There’s some Javadoc on some of the interfaces, but it doesn’t explain the implementation.

chrisoakman23:02:58

I was just curious

chrisoakman23:02:35

so everytime parinfer runs it returns a data structure of the edits to be made to the text it processed - is that correct?

chrisoakman23:02:05

my first thought is dealing with "undo"

cfleming23:02:07

Yes, and the current API (subject to discussion) returns a function that you can invoke to get the fully-processed text.

cfleming23:02:25

It’s possible that this change is only useful to me.

cfleming23:02:42

I have it ported to the Kotlin version but haven’t checked that in anywhere yet.

chrisoakman23:02:52

that would be almost impossible to work with the Sublime Text API

chrisoakman23:02:57

in terms of handling "undo"

chrisoakman23:02:08

it would be possible, but very very tricky

cfleming23:02:14

You can’t group a series of changes into a single undoable thing?

cfleming23:02:38

You must still have to group the text change and the related caret movement, right?

chrisoakman23:02:32

undo in Sublime Text was pretty difficult; it doesn't give you the option to apply any change and not have it added to the undo stack

cfleming23:02:46

That’s… limited.

chrisoakman23:02:02

so I basically had to create a sentinel command for any parinfer change, and then overwrite "undo" to check if that command was the last command and undo twice

chrisoakman23:02:40

I have no idea what it would do if someone else's plugin overwrote "undo" like the Parinfer plugin does; I suspect one or both of them would break

cfleming23:02:03

That’s… funky simple_smile

chrisoakman23:02:21

it took a while to 1) figure that out and 2) make it work correctly

chrisoakman23:02:37

Atom lets you apply changes to the buffer and skip the undo stack

chrisoakman23:02:50

so it would be relatively simple to apply N number of changes that all skipped the undo stack

cfleming23:02:56

Ok. I’m totally open to a decision that this change is not generally useful. I’ll definitely use it for my implementation.

chrisoakman23:02:26

Were you able to do whole line replacement in IntelliJ?

chrisoakman23:02:41

Was there a performance problem with that?

cfleming23:02:22

@shaunlebron proposed above a modification where we passed a function to parinfer so it could modify the document directly. It sounds like this is not an option due to the variety of editor APIs.

cfleming23:02:57

So in the original algorithm, I’m worried about the garbage that it creates, although I didn’t see any problems in the benchmarks.

chrisoakman23:02:02

yeah; I don't think that would work well in certain circumstances

cfleming23:02:19

On the JVM, String.substring() copies the underlying char array.

cfleming23:02:34

Since the V8 version is so fast, I suspect it doesn’t.

chrisoakman23:02:00

I thought the JVM version was faster than the JS version once we set up that benchmark?

cfleming23:02:26

They’re about the same. One of my changes was to use benchmark.js to benchmark the V8 version - it runs in about 8ms.

cfleming23:02:40

The previous code probably didn’t warm the JVM up enough

cfleming23:02:56

So that’s about the same speed as the JVM one is now.

chrisoakman23:02:32

seems like the sort of things parinfer is doing would be relatively simple for a VM to optimize

cfleming23:02:44

I profiled the code, it spends 70% of its time in processChar and updateParenTrailBounds.

chrisoakman23:02:46

object creation, string manipulation

chrisoakman23:02:59

yeah - that is consistent with the Python results too

cfleming23:02:50

I’m planning to use this diff version I have and implement it using the Cursive lexer - I speculate that will be faster due to the fact that updateParenTrailBounds will only run per token rather than per char.

cfleming23:02:26

Although it’s already pretty fast.

cfleming23:02:51

It’s really hard to determine what the performance impact in the editor as a whole of the different approaches is, though.