Digging into xonsh history backends

Motivation

I want to write a per-directory-history xontrib for xonsh, like the one I used for zsh. The piece of information that that I need to do this is missing from xonsh history entries right now: the working directory a historical command was executed in. jimhester/per-directory-history tracks this by hooking into zsh's history searching and creation commands and putting each command in two history files, a global one and one specifically for the directory the command was executed in. I want to write my xontrib in the most xonshious (my word for if something works with the xonsh philosophy) way, so I don't want to rip this implementation scheme from per-directory-history and jam it into xonsh where there's a better way. So I have to see where I have to collect, store, and read this metadata.

Some ideas I have so far are:

add a new history backend that writes entries with the additional metadata of wherever the command was executed
essentially step (1), but instead of adding a new history backend, augment whatever history backend is in use with this new functionlity (composition by way of monkey-patching)
listen to some existing hooks/events for xonsh history lookup and additions and add the functionality there

Or some combination of the above, depending on what I find.

This post documents my exploration of how history is implemented in xonsh.

I'll pay particular attention to how history backends work with the shell backend abstraction so that I can write a xontrib that is as agnostic as possible about the shell implementation in use (ptk, ptk2, readline, jupyter, etc.).

Reading the existing documentation first

So I figured that I should read any existing documentation first, since it's possible that:

The xonsh docs already include a section that tell me how to do this or something close to it
I might find learn something that I realize is undocumented, and then I can contribute that back to the project docs

I found three documents dealing with history on xon.sh:

Tutorial: History - explains the richer model of history that xonsh offers, and introduces history command usage
Tutorial: Writing Your Own History Backend - walks through authoring a new history backend with a CouchDB-backed history backend and replacing the default history backend with this new one
History API -

While each of these is good at doing what it says, notice that none of them discusses how history backends are instantiated or how history entries are constructed during shell execution. The History API docs come closest, but that's cheating because those docs are autogenerated from docstrings in the Python source for xonsh.

How history entries are managed

Since there is no smoking gun in the Xonsh docs talking about how history backends are created and where the components of history entries come from, I decided I have to dig into the xonsh code now rather than later.

Rather than just explain when and how xonsh creates new history entries (which I will do some of), I also want to explain how I came to this understanding, since it's incredibly unlikely you're reading this doc just to learn how to write a clone of jimhester/per-directory-history`.

xonsh has support for multiple history backends, as we know. It ships with 3 backend implementations: history.json.JsonHistory and history.sqlite.SqliteHistory.

These backends are implementations of the history backend abstraction history.base.History. history.base.History doesn't do anything useful on its own - it is just inherited by implementations and defines the things the xonsh shell expects a history backend to be able to do:

append (add something to the history)
flush (force whatever is in memory to persist to the backend's storage, such as disk)
items (getting items for the current history session)
all_items (getting... all the items)
info (providing shell history info)
run_gc (garbage collecting).
It also allows list-like behavior via index access and slicing with getitem.

The fact that history backends implement history.base.History is our first clue into how xonsh backends work. This fact means the xonsh shell does not interact directly with a history backend, so the shell's code doesn't know what backend it's working with - this is handled by our good friend polymorphism. For understanding how history entries are created, this establishes some constraints on what a history backend can accept as input - if the central part of the xonsh shell's code is interacting with a unique history backend through a generic abstraction, that unique history backend cannot use input that isn't passed into the generic abstraction. In other words, the xonsh shell gives a particular history item data structure to every history backend, no matter how special that history backend is, and if we want the history backend to be able to act on some other piece of data (such as the working directory the history item was executed in!), we have to alter that data structure.

The history entry data structure

I had trouble finding where these entries were defined and where they were appended to the history backend, but I soon realized I could drop an ipdb break statement into my active history backend's append method (JsonHistory.append) and use the debugger's where command to get a stacktrace, leading me directly to where xonsh appends history to the backend. I started up my debuggified xonsh, ran a command, watched as it paused in ipdb, and got the traceback:

(Note that you should make sure $XONSH_DEBUG is on or, alternatively, install xonsh as an editable package to avoid almalgamation and can see your changes right away without re-running setup.py.)

eddie@eddie-ubuntu ~ $ echo 'hey'                                                                                                                 
hey
> /home/eddie/source/xonsh/xonsh/history/json.py(353)append()
    352         import ipdb; ipdb.set_trace()
--> 353         self.buffer.append(cmd)
    354         self._len += 1  # must come before flushing

ipdb> where
  /home/eddie/.virtualenvs/xonsh/bin/xonsh(7)<module>()
      5 __file__ = '/home/eddie/source/xonsh/scripts/xonsh'
      6 with open(__file__) as f:
----> 7     exec(compile(f.read(), __file__, 'exec'))

  /home/eddie/source/xonsh/scripts/xonsh(4)<module>()
      2 
      3 from xonsh.main import main
----> 4 main()

  /home/eddie/source/xonsh/xonsh/main.py(402)main()
    401         args = premain(argv)
--> 402         return main_xonsh(args)
    403     except Exception as err:

  /home/eddie/source/xonsh/xonsh/main.py(431)main_xonsh()
    430             try:
--> 431                 shell.shell.cmdloop()
    432             finally:

  /home/eddie/source/xonsh/xonsh/ptk2/shell.py(194)cmdloop()
    193                     line = self.precmd(line)
--> 194                     self.default(line)
    195             except (KeyboardInterrupt, SystemExit):

  /home/eddie/source/xonsh/xonsh/base_shell.py(375)default()
    374             tee_out = tee.getvalue()
--> 375             self._append_history(inp=src, ts=[ts0, ts1], tee_out=tee_out)
    376             self.accumulated_inputs += src

  /home/eddie/source/xonsh/xonsh/base_shell.py(410)_append_history()
    409         if hist is not None:
--> 410             hist.append(info)
    411             hist.last_cmd_rtn = hist.last_cmd_out = None

> /home/eddie/source/xonsh/xonsh/history/json.py(353)append()
    352         import ipdb; ipdb.set_trace()
--> 353         self.buffer.append(cmd)
    354         self._len += 1  # must come before flushing

Maybe that wouldn't have been so hard to track down manually, but history is appended to in BaseShell.default() with a method called BaseShell._append_history().

So what kind of information is passed to _append_history?

def default(self, line):
    """Implements code execution."""
    line = line if line.endswith("\n") else line + "\n"
    src, code = self.push(line)
    if code is None:
        return

    events.on_precommand.fire(cmd=src)

    env = builtins.__xonsh__.env
    hist = builtins.__xonsh__.history  # pylint: disable=no-member
    ts1 = None
    enc = env.get("XONSH_ENCODING")
    err = env.get("XONSH_ENCODING_ERRORS")
    tee = Tee(encoding=enc, errors=err)
    try:
        ts0 = time.time()
        run_compiled_code(code, self.ctx, None, "single")
        ts1 = time.time()
        if hist is not None and hist.last_cmd_rtn is None:
            hist.last_cmd_rtn = 0  # returncode for success
    except XonshError as e:
        print(e.args[0], file=sys.stderr)
        if hist is not None and hist.last_cmd_rtn is None:
            hist.last_cmd_rtn = 1  # return code for failure
    except Exception:  # pylint: disable=broad-except
        print_exception()
        if hist is not None and hist.last_cmd_rtn is None:
            hist.last_cmd_rtn = 1  # return code for failure
    finally:
        ts1 = ts1 or time.time()
        tee_out = tee.getvalue()
        self._append_history(inp=src, ts=[ts0, ts1], tee_out=tee_out)
        self.accumulated_inputs += src
        if (
            tee_out
            and env.get("XONSH_APPEND_NEWLINE")
            and not tee_out.endswith(os.linesep)
        ):
            print(os.linesep, end="")
        tee.close()
        self._fix_cwd()
    if builtins.__xonsh__.exit:  # pylint: disable=no-member
        return True

In the finally block, we see inp is src, which, after digging around a big into what happens above this call, appears to be the string that was typed into the command prompt, as opposed to the code, which is the xonsh code that was compiled and run (successfully or not) from compiling this src. Interestingly, this means we are typing in source code each time we enter text the xonsh REPL, and xonsh is compiling/running it. The essential piece of a history entry is a bit of uncompiled source code (like ls -alh or import sys)!

Let's follow an ls command entry from the prompt through BaseShell.default() and the code that appends the entry to history.

The code block picks up just after I've entered the ls command at the prompt.

> /home/eddie/source/xonsh/xonsh/base_shell.py(348)default()
    347         src, code = self.push(line)
--> 348         if code is None:
    349             return

ipdb> code                                                                                                                                                                                                                                                                                            
<code object <module> at 0x7f3682be44b0, file "/home/eddie/.virtualenvs/xonsh/lib/python3.7/site-packages/xontrib/fzf-widgets.xsh", line 1>
ipdb> src                                                                                                                                                                                                                                                                                             
'ls\n'

Note that code is apparently wrapped in some non-ls xontrib code I have installed. I'm unsure exactly why right now.

But note that src is the ls command I typed in, followed by a newline.

Once we get down to the actual appending, we see that ts0 and ts1 are the start and end timestamps of the code's execution. tee_out is simply the output of the command.

--> 377             self._append_history(inp=src, ts=[ts0, ts1], tee_out=tee_out)
    378             self.accumulated_inputs += src
    379             if (
    380                 tee_out
    381                 and env.get("XONSH_APPEND_NEWLINE")
    382                 and not tee_out.endswith(os.linesep)
    383             ):
    384                 print(os.linesep, end="")
    385             tee.close()
    386             self._fix_cwd()
    387         if builtins.__xonsh__.exit:  # pylint: disable=no-member
    388             return True
    389

ipdb> src                                                                                                                                                                                                                                                                                             
'ls\n'
ipdb> ts0                                                                                                                                                                                                                                                                                             
1560283660.1879137
ipdb> ts1                                                                                                                                                                                                                                                                                             
1560283660.3324323

Let's step into _append_history():

def _append_history(self, tee_out=None, **info):
    """Append information about the command to the history.

    This also handles on_postcommand because this is the place where all the
    information is available.
    """
    hist = builtins.__xonsh__.history  # pylint: disable=no-member
    info["rtn"] = hist.last_cmd_rtn if hist is not None else None
    tee_out = tee_out or None
    last_out = hist.last_cmd_out if hist is not None else None
    if last_out is None and tee_out is None:
        pass
    elif last_out is None and tee_out is not None:
        info["out"] = tee_out
    elif last_out is not None and tee_out is None:
        info["out"] = last_out
    else:
        info["out"] = tee_out + "\n" + last_out
    events.on_postcommand.fire(
        cmd=info["inp"], rtn=info["rtn"], out=info.get("out", None), ts=info["ts"]
    )
    if hist is not None:
        hist.append(info)
        hist.last_cmd_rtn = hist.last_cmd_out = None

It isn't the most exciting code. It is really just a matter of adding return code information for failed commands and, if available, the output of the command, to the info (history entry) provided to the backend. As a funny aside most of this method is a heuristic for deciding whether to use tee output or last_cmd_out from the history backend, which last_cmd_out seems to be an unused property in at least all the built-in history backends. Would be interesting to know why it ever existed at all!

The crucial thing we learn here, though, is that info is effectively what we've been calling the history entry. It is the "packet" (concretely, a dict) of information that is appended to the history. It defines what our history backend can save, delete, search, manipulate, etc. So any additional information we would need to add for our history backend would have to be added to info.

Let's take a look at the info object for two different cases. In the first, I'll call ls in a directory with exactly one empty regular file: test, and in the second I'll call grep something test in the same directory. The ls call will provide a successful return value and the grep call will not (since test will be empty).

Calling ls:

{'inp': 'ls\n', 'ts': [1560285242.9592671, 1560285243.0506482], 'rtn': 0, 'out': 'test\n'}

Calling grep something test:

{'inp': 'grep something test\n', 'ts': [1560285373.0232306, 1560285373.125136], 'rtn': 1}

There you have it - all the information available to a history backend's append() method as far as I can tell.

Thoughts on where I should go

So I've been digging around in here to ultimately change what history items are loaded when a user interactively scrolls through the history, uses the history command, etc., with the aim of showing only the history items that are associated with the current working directory. To do that, I have to get cwd information into each history item.

To fast-forward a bit, I've now done that, and it's pretty simple, though it did require a change to the xonsh source code:

diff --git a/xonsh/base_shell.py b/xonsh/base_shell.py
index b7e9aff2..7088427f 100644
--- a/xonsh/base_shell.py
+++ b/xonsh/base_shell.py
@@ -393,6 +393,8 @@ class BaseShell(object):
         """
         hist = builtins.__xonsh__.history  # pylint: disable=no-member
         info["rtn"] = hist.last_cmd_rtn if hist is not None else None
+        if builtins.__xonsh__.env.get("XONSH_STORE_CWD"):
+            info['cwd'] = os.getcwd()
         tee_out = tee_out or None
         last_out = hist.last_cmd_out if hist is not None else None
         if last_out is None and tee_out is None:

Luckily, when I had asked whether such a xontrib as per-directory-history existed yet, xonsh creator Anthony Scopatz told me he'd be up for modifying the history mechanism to support this kind of xontrib, so we're good here.

The next question I had was how I could make history:

aware of this new information
optionally able to use this information by installing a xontrib
hopefully prompt-backend agnostic

To make history aware of this new info, I had to alter the history backends - each history backend has a different way of handling the attributes of history items. I decided to follow a depth-first way of experimenting, hoping that if I got my history functionality working with JsonHistory, probably the most commonly used backend, I could either figure out how to get it working with other backends, or (less good) just make my xontrib available to people using the JsonHistory backend.

Next, I looked at where history strings are loaded by xonsh, so that I could start limiting the items loaded to those that matched by cwd. My thought was that each time history was searched by the user, by whatever mechanism, if I found the point where history strings were loaded, I could filter out those that didn't match.

I thought that creating a history backend with an overridden method would help since custom history backends are easily pluggable with the XONSH_HISTORY_BACKEND environment varible, thus making any solution that used a custom one pretty easily installable via a xontrib.

Unfortunately there was no clear and easy way to override the history backend functionality to filter out history entries on arbitrary criteria, so I added yet another thing to the xonsh source:

diff --git a/xonsh/history/json.py b/xonsh/history/json.py
index 50b6326b..7313cfc9 100644
--- a/xonsh/history/json.py
+++ b/xonsh/history/json.py
@@ -328,6 +328,7 @@ class JsonHistory(History):
--- a/xonsh/history/json.py
--- a/xonsh/history/json.py
+++ b/xonsh/history/json.py
@@ -328,6 +328,7 @@ class JsonHistory(History):
         self.tss = JsonCommandField("ts", self)
         self.inps = JsonCommandField("inp", self)
         self.outs = JsonCommandField("out", self)
+        self.cwds = JsonCommandField("cwd", self)
         self.rtns = JsonCommandField("rtn", self)

     def __len__(self):
@@ -382,10 +383,11 @@ class JsonHistory(History):
     def items(self, newest_first=False):
         """Display history items of current session."""
         if newest_first:
-            items = zip(reversed(self.inps), reversed(self.tss))
+            items = zip(
+                reversed(self.inps), reversed(self.tss), reversed(self.cwds))
         else:
-            items = zip(self.inps, self.tss)
-        for item, tss in items:
+            items = zip(self.inps, self.tss, self.cwds)
+        for item, tss, _ in items:
             yield {"inp": item.rstrip(), "ts": tss[0]}

     def all_items(self, newest_first=False, **kwargs):
@@ -413,10 +415,16 @@ class JsonHistory(History):
             if newest_first:
                 commands = reversed(commands)
             for c in commands:
-                yield {"inp": c["inp"].rstrip(), "ts": c["ts"][0]}
+                if self._include_history_item(c):
+                    yield {"inp": c["inp"].rstrip(), "ts": c["ts"][0]}
         # all items should also include session items
         yield from self.items()

+    def _include_history_item(self, item):
+        """Whether to include the history item.
+        Allows filtering history results by subclass."""
+        return True
+

In short, this diff just adds a method that checks whether a history item should be used, and in the default case (the JsonHistory base class), it simply allows all history items.

In my xontrib, I created a custom history backend that performed the filtering I wanted:

class JsonPerDirectoryHistory(JsonHistory):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.use_local_history = True

    def _include_history_item(self, item):
        run_in_terminal(lambda: print(f'Got item {item}'))
        run_in_terminal(lambda: print(f'Use local history: {self.use_local_history}'))
        if self.use_local_history and item.get('cwd') and os.getcwd() == item.get('cwd'):
            run_in_terminal(lambda: print('Using item'))
            return True
        run_in_terminal(lambda: print('Not using item'))
        return False

Notice in this that there are some Prompt Toolkit-specific run_in_terminal print calls, which I've added just for very verbose logging while developing. I'd remove these when releasing this xontrib, assuming I keep support for non-Prompt Toolkit shells.

In the xontrib, I set a prompt_toolkit2 keybinding to switch this functionality on and off, and to tell the user what mode they've switched to:

import os
from sys import stdout

from prompt_toolkit import keys, print_formatted_text
from prompt_toolkit.application import run_in_terminal

from builtins import __xonsh__
from xonsh.history.json import JsonHistory
from xonsh.platform import ptk_shell_type


def toggle_per_directory_history():
    if isinstance(__xonsh__.history, JsonPerDirectoryHistory):
        hist = __xonsh__.history
        hist.use_local_history = not hist.use_local_history
        if hist.use_local_history:
            return 'local'
        else:
            return 'global'


@events.on_ptk_create
def custom_keybindings(bindings, **kw):
    def do_nothing(func):
        pass

    if ptk_shell_type() == 'prompt_toolkit2':
        binder = bindings.add
    else:
        binder = bindings.registry.add_binding

    key = ${...}.get('PER_DIRECTORY_HISTORY_TOGGLE')

    @binder(key)
    def switch_between_global_and_local_history(_):
        new_hist_type = toggle_per_directory_history()
        run_in_terminal(lambda: print(f'Switching to {new_hist_type} history.'))

Finally, I set my history backend to my custom one in my .xonshrc and turned on per-directory history:

from xontrib.per_directory_history import JsonPerDirectoryHistory
$XONSH_HISTORY_BACKEND = JsonPerDirectoryHistory
$XONSH_STORE_CWD = True

Initial results

Failure, mostly. Upon opening a gnome-terminal instance, I saw the debugging messages printed from my history backend, which was nice:

Got item {'cwd': '/home/eddie/source/xonsh', 'inp': 'ls\n', 'rtn': 0, 'ts': [1560543198.0524652, 1560543198.1038635]}
Use local history: True
Not using item
Got item {'cwd': '/home/eddie/source/xonsh', 'inp': 'z xonsh\n', 'rtn': 0, 'ts': [1560543197.440549, 1560543197.4461908]}
Use local history: True
Not using item
Got item {'cwd': '/home/eddie', 'inp': 'cd ..\n', 'rtn': 0, 'ts': [1560543047.8660042, 1560543047.8693159]}
Use local history: True
Using item
Got item {'cwd': '/home/eddie/test', 'inp': 'fancy mccheeese\n', 'rtn': 1, 'ts': [1560543038.735667, 1560543039.5000844]}
Use local history: True
Not using item
Got item {'cwd': '/home/eddie/test', 'inp': 'cd test\n', 'rtn': 0, 'ts': [1560543024.288666, 1560543024.2920816]}
Use local history: True
Not using item
Got item {'cwd': '/home/eddie', 'inp': 'ls\n', 'rtn': 0, 'ts': [1560543021.7835386, 1560543021.8042111]}
Use local history: True
Using item
Got item {'cwd': '/home/eddie', 'inp': 'ls\n', 'rtn': 0, 'ts': [1560543020.0776978, 1560543020.1177895]}
Use local history: True
Using item
Got item {'cwd': '/home/eddie/test', 'inp': 'cd test\n', 'rtn': 0, 'ts': [1560542986.1126633, 1560542986.1164727]}
Use local history: True
Not using item

My history backend was apparently being used to load existing history strings, and it was only returning those that matched the cwd, which, in a new gnome-terminal for me is /home/eddie. Notice how commands with cwd info that matches /home/eddie are the only history items being used.

Cool. So I decided to switch to another directory. If things are working as expected, I should be able to enter history commands, go back through the history, and only get commands for this new directory. Before entering any commands in this directory, I shouldn't see any history items!

But I did. I could scroll back through all the history items that my command output just said were being used. What's more, if I switched to yet another directory, I could access all the commands I'd entered in the current prompt session since opening it.

Why???

This may be specific to Prompt Toolkit, but I don't know yet. It appears xonsh uses history backends in conjunction with prompt toolkit like this:

xonsh spins up a new shell instance in xonsh/shell.py when a new shell process is started
xonsh creates a Prompt Toolkit instance and hands it all the history data up to this point, which, in the case of JsonHistory, is all the history in our JSON history files
Prompt Toolkit runs the prompt
Each time a command is entered, Prompt Toolkit feeds this command to the xonsh shell, which appends the command history to the history backend in use
Indepedently, Prompt Toolkit maintains its own searchable/scrollable record of the history since the Prompt Toolkit instance was created
The next time a shell is loaded, the previous history is given to Prompt Toolkit from the xonsh history backend.

Do you see my problem? I had indeed changed what history items are loaded by the history backend, but I hadn't changed anything about what the Prompt Toolkit history mechanism does once a shell is up and running.

Where do I go from here?

I am going to bring this post to a close since it contains lots of cool info about how xonsh history backends work, but I will pick back up on my own historical exploits in another post.