Bookwyrm — development thread - Programming On Unix
Since somewhere last year I've been working on and off on bookwyrm, a cli-based program that takes some input that can identify a book (i.e. authors, title, year of release, etc.) and then searches through a variety of sources. Found items that matches the input are (will be) presented in a menu alá mutt, where the users can select which items to download.

The cause of this project being my start in uni, and the requirements of text books that costs a lot of money. I found out of a few specific sources that more likely than not had these books available as pdfs, and it felt only natural wanting a way to get these books via the command line. And here we are.

I began writing the project in Python. And while progress was fast, it felt like the program lacked structure and I constantly remade the data structures. After a while I started to rewrite the progress I've made in C++, and while I still have some stuff left, I feel like I've written some better code.

Project history aside, I'm creating this thread in hopes of spawning some discussion about design, both project- and usage-wise. Best case scenario I will end up with a specification far ahead of v1.0.0.

Thus far, it's been decided that the program takes arguments POSIX-wise, after which a menu (or a list, more like) alike mutt's will open and present all found items, and properties about them. The user can then mark which items to download. The list will update as items are found.

Project-wise, an item (pdf, dvju, epub, etc.) has exact, non-exact and misc properties. The first group is anything that can be represented with an integer/enum (year, file format, etc.). Non-exact properties are titles, authors, etc.; stuff that can be miss-entered. This is all stored in strings and matched fuzzily. The misc group is for anything else. (Currently it only contains the ISBN(s) of the item.)

Progress-wise, I've thus implemented argument parsing, logging, and item matching. Since I enjoyed it when the project was written in Python, I've decided to port fuzzywuzzy (project here; there is some bug-smashing to be done).

Initially, I planned to write support for sources in C++, but I think scripting these might be a better idea. This way, the C++ part is the "front-end" that handles user input and filters items, whereas the scripts feeds the program found items. It also allows a user to easily add support for more sources (including those that cannot be pushed upstream). Is this a good idea? (Keeping the front-end this simple, I might one day rewrite it in C).

So: any comments? Is there anything you think I should plan to implement? Is there anything I should change?

Bookwyrm can be found here.

So I've implemented python interfacing via a not yet merged branch of pybind11. Presently, a python script will construct the exact and nonexact PODs with dicts and then pass a tuple back to the program, from which the final item will be created.

I choose Python mainly for requests (edit: and BeautifulSoup), which makes source parsing so much easier, but if this thing catches on and people request bindings for other languages, I'll take a look at those too.

Next up is implementing the actual sources and writing the menu. Library Genesis was already implemented in the python/master branch, but will need some revisiting to work with bookwyrm (and will probably need some cleanup; I don't recall it being good code).

When it comes to the menu, I've seen it being mentioned multiple times that ncurses isn't very fun to work with, so I'll look for alternatives. Are there any that you recommend?

Aside from this, I'm not entirely sure how to handle paths to find sources. Right now, a relative path is hard-coded for debug-mode, but that wont fly in a release. I want user-defined source to exist in $XDG_CONFIG_HOME/bookwyrm/sources, but I'm not sure how portable that is. Can I safely assume $HOME/.config when $XDG_CONFIG_HOME isn't set? And for system-wide sources, will /etc/bookwyrm/sources work for *nix systems, or do some store these elsewhere?
What follows is the first draft of the v1.0.0 specification (note: I do not know how to write specifications):
Program start:
After command line options have been parsed (and no quit-directly flag has been passed), verify them and the download directory.
Terminate if the user didn't pass any "main" options or if we can't write to the download directory.
At last, construct the wanted item from passed options.

Finding sources:
bookwyrm will first try to import any python modules found in $XDG_CONFIG_HOME/bookwyrm/sources,
after which it will do the same with modules in /etc/bookwyrm/sources/ (or your platform equivalent).

Running the scripts:
The scripts will be given the wanted item and be able to fetch configuration options — from $XDG_CONFIG_HOME/bookwyrm/config.ext. It will hold login credentials and whatnot. Support for external commands, such as pass, is planned. Either .yaml or ini-style, I think (to section out source-specific options).
With this, and whatever other modules it want (or perhaps a selection? Hard to force via the program itself, I think) it will search for the wanted item on a single source.
Found items will be passed to bookwyrm.

The menu:
While the scripts all run asynchronously, a menu (ála mutt) will open that presents all found items.
Each entry will represent a single found item. The item info seen on this screen will be whatever info is available from the source's search result (e.g. a lot will be available from Library Genesis, since a lot of info is available from /search.php)
From this menu, the user can enter an entry (like opening an email in mutt) to try and fetch more data about the item.
Data is fetched from some database (Worldcat, Open Library (both? priority list?)) and written out in some neat format.

Returning to the main menu, the user can select which items to download.
The downloads — done after a second write-OK check on the download dir (we never know, permissions might change) — will most likely be handled with libcurl.

I have some other ideas that might fit in, but they are better suited for releases past the first major.
This look promising. I imagine aggregating data from multiple source is hard, yet still very usefull.

So you ported a library and a program from python to C++, with still interaction with python modules. Looks like fairly complete and rich experience with programming.

ncurses is probably hard to deal with, I cannot even compile it from the latest source. :-(

I suggest you to go with an expternal tool such as dmenu or equivalent for the terminal, or build a simple one from scratch in plain C / C++.

You can print terminal escape sequences like '\033[7m\033[K' to highlight a whole line, then '\033[m' at the end. Then you can redraw the screen at every cursor movement with '\033[H\033[J' which also put the cursor at the top.
(28-05-2017, 05:00 AM)josuah Wrote: You can print terminal escape sequences like '\033[7m\033[K' to highlight a whole line, then '\033[m' at the end. Then you can redraw the screen at every cursor movement with '\033[H\033[J' which also put the cursor at the top.
This won't be portable, this is the reason we use curses, they are wrapper over termcap/terminfo.
On the command line it would be preferable to use `tput(1)` instead of those escape code you shared, same for any language.
(28-05-2017, 05:00 AM)josuah Wrote: So you ported a library and a program from python to C++ [...]
Not such a big feat as it might sound. I never finished the python program to begin with, and the library was written in CPython (Cython?), so it was just a matter of removing Python interface cruft and wrapping it in some C++. I can't yet completely reproduce the behaviour of the python library, though. I have a feeling I might be because I'm not using the wchar functions.
(28-05-2017, 09:26 AM)Tmplt Wrote: Not such a big feat as it might sound.

All right... But you still did it. :-P

(28-05-2017, 09:26 AM)Tmplt Wrote: I have a feeling I might be because I'm not using the wchar functions.

If you think this is the reason, you can try to replace every str by wcs, and every char by wchar. There is also the print functions (wprint*), and some "strings" to convert to "strings"L, but otherwise this should be enough.

I only tried this once, though.

(28-05-2017, 07:02 AM)venam Wrote: This won't be portable, this is the reason we use curses, they are wrapper over termcap/terminfo.
On the command line it would be preferable to use `tput(1)` instead of those escape code you shared, same for any language.

You are right, even if the term for which this does not work are rare, you can encounter some that break. My approach was to ignore and avoid those terms, at the cost of being less flexible (and more lazy).
I've begun working on the menu using ncurses's menu library. I had some issues getting my hand on documentation, but I then found an answer on SO that pointed to the man pages! Who'd'a thunk it? So far I haven't had any bigger issues with it, so I assume I haven't reached that part in the implementation yet, given all the flack it appears to get.

I luckily stumbled upon my first multithread problem this early in development. It's seems solved with mutex guards, but since I lack any greater knowledge about them I'll have to revisit this part.

Aside from that, I learned that regular print-outs to stdout write upon curses's stdscr. I always thought of curses as some kind of "layer" atop stdout. So if something is written to stdout when a curses is up, the user could see if after terminating the program. Is something like this possible? I'd like to log warnings and errors to std{out,err} during runtime, but should I save these until the program has closed down curses instead?
After a few days tinkering with why set_menu_items() would some times segfault when using an intermediate std::vector to add new entries to the menu, I got fed up with Ncurses and decided to try termbox instead.

With termbox, the terminal is just a big pixel canvas. You use tb_change_cell() to alter a single character on it, and then print the changes out with tb_present(). Making a simple menu wasn't as troublesome as I thought it would be. As it looks now, I'll keep using it. Hopefully it's portable enough.

Here is an aciinema of the initial menu implementation in action:

Entries prefixed with '2' is from a second Python source script (and thus thread).
Bookwyrm is slowly taking form. The index menu has seen some improvement:
The part were the index menu is minimized is where I open another (unfinished) screen. In this screen, the program will fetch some item details from Open Library and print it out in a nice format. Hopefully we'll end up fetching info about the right item. But as long as we get an ISBN — which most sources do — it shouldn't be a problem.

Soon I can get dirty with writing Python scripts, but before that I'll need to find a way to gently interrupt a std::thread. Unfortunately this isn't something C++ allows natively. The reason being that we don't want to wait for the source scripts to end before we can terminate the program. The easiest way of doing this would probably be to poll program-terminating-or-not status within the scripts themselves, but I'd like to require as little boiler-plate as possible.

Members  |  Stats  |  Night Mode