Bookwyrm β€” development thread - Programming On Unix

Users browsing this thread: 1 Guest(s)
Tmplt
Long time nixers
Since somewhere last year I've been working on and off on bookwyrm, a cli-based program that takes some input that can identify a book (i.e. authors, title, year of release, etc.) and then searches through a variety of sources. Found items that matches the input are (will be) presented in a menu alΓ‘ mutt, where the users can select which items to download.

The cause of this project being my start in uni, and the requirements of text books that costs a lot of money. I found out of a few specific sources that more likely than not had these books available as pdfs, and it felt only natural wanting a way to get these books via the command line. And here we are.

I began writing the project in Python. And while progress was fast, it felt like the program lacked structure and I constantly remade the data structures. After a while I started to rewrite the progress I've made in C++, and while I still have some stuff left, I feel like I've written some better code.

Project history aside, I'm creating this thread in hopes of spawning some discussion about design, both project- and usage-wise. Best case scenario I will end up with a specification far ahead of v1.0.0.

Thus far, it's been decided that the program takes arguments POSIX-wise, after which a menu (or a list, more like) alike mutt's will open and present all found items, and properties about them. The user can then mark which items to download. The list will update as items are found.

Project-wise, an item (pdf, dvju, epub, etc.) has exact, non-exact and misc properties. The first group is anything that can be represented with an integer/enum (year, file format, etc.). Non-exact properties are titles, authors, etc.; stuff that can be miss-entered. This is all stored in strings and matched fuzzily. The misc group is for anything else. (Currently it only contains the ISBN(s) of the item.)

Progress-wise, I've thus implemented argument parsing, logging, and item matching. Since I enjoyed it when the project was written in Python, I've decided to port fuzzywuzzy (project here; there is some bug-smashing to be done).

Initially, I planned to write support for sources in C++, but I think scripting these might be a better idea. This way, the C++ part is the "front-end" that handles user input and filters items, whereas the scripts feeds the program found items. It also allows a user to easily add support for more sources (including those that cannot be pushed upstream). Is this a good idea? (Keeping the front-end this simple, I might one day rewrite it in C).

So: any comments? Is there anything you think I should plan to implement? Is there anything I should change?

Bookwyrm can be found here.

Thanks,
Tmplt
Tmplt
Long time nixers
So I've implemented python interfacing via a not yet merged branch of pybind11. Presently, a python script will construct the exact and nonexact PODs with dicts and then pass a tuple back to the program, from which the final item will be created.

I choose Python mainly for requests (edit: and BeautifulSoup), which makes source parsing so much easier, but if this thing catches on and people request bindings for other languages, I'll take a look at those too.

Next up is implementing the actual sources and writing the menu. Library Genesis was already implemented in the python/master branch, but will need some revisiting to work with bookwyrm (and will probably need some cleanup; I don't recall it being good code).

When it comes to the menu, I've seen it being mentioned multiple times that ncurses isn't very fun to work with, so I'll look for alternatives. Are there any that you recommend?

Aside from this, I'm not entirely sure how to handle paths to find sources. Right now, a relative path is hard-coded for debug-mode, but that wont fly in a release. I want user-defined source to exist in $XDG_CONFIG_HOME/bookwyrm/sources, but I'm not sure how portable that is. Can I safely assume $HOME/.config when $XDG_CONFIG_HOME isn't set? And for system-wide sources, will /etc/bookwyrm/sources work for *nix systems, or do some store these elsewhere?
Tmplt
Long time nixers
What follows is the first draft of the v1.0.0 specification (note: I do not know how to write specifications):
Program start:
After command line options have been parsed (and no quit-directly flag has been passed), verify them and the download directory.
Terminate if the user didn't pass any "main" options or if we can't write to the download directory.
At last, construct the wanted item from passed options.

Finding sources:
bookwyrm will first try to import any python modules found in $XDG_CONFIG_HOME/bookwyrm/sources,
after which it will do the same with modules in /etc/bookwyrm/sources/ (or your platform equivalent).

Running the scripts:
The scripts will be given the wanted item and be able to fetch configuration options β€” from $XDG_CONFIG_HOME/bookwyrm/config.ext. It will hold login credentials and whatnot. Support for external commands, such as pass, is planned. Either .yaml or ini-style, I think (to section out source-specific options).
With this, and whatever other modules it want (or perhaps a selection? Hard to force via the program itself, I think) it will search for the wanted item on a single source.
Found items will be passed to bookwyrm.

The menu:
While the scripts all run asynchronously, a menu (Γ‘la mutt) will open that presents all found items.
Each entry will represent a single found item. The item info seen on this screen will be whatever info is available from the source's search result (e.g. a lot will be available from Library Genesis, since a lot of info is available from /search.php)
From this menu, the user can enter an entry (like opening an email in mutt) to try and fetch more data about the item.
Data is fetched from some database (Worldcat, Open Library (both? priority list?)) and written out in some neat format.

Returning to the main menu, the user can select which items to download.
The downloads β€” done after a second write-OK check on the download dir (we never know, permissions might change) β€” will most likely be handled with libcurl.

I have some other ideas that might fit in, but they are better suited for releases past the first major.
josuah
Long time nixers
This look promising. I imagine aggregating data from multiple source is hard, yet still very usefull.

So you ported a library and a program from python to C++, with still interaction with python modules. Looks like fairly complete and rich experience with programming.

ncurses is probably hard to deal with, I cannot even compile it from the latest source. :-(

I suggest you to go with an expternal tool such as dmenu or equivalent for the terminal, or build a simple one from scratch in plain C / C++.

You can print terminal escape sequences like '\033[7m\033[K' to highlight a whole line, then '\033[m' at the end. Then you can redraw the screen at every cursor movement with '\033[H\033[J' which also put the cursor at the top.

https://en.wikipedia.org/wiki/ANSI_escape_code
venam
Administrators
(28-05-2017, 05:00 AM)josuah Wrote: You can print terminal escape sequences like '\033[7m\033[K' to highlight a whole line, then '\033[m' at the end. Then you can redraw the screen at every cursor movement with '\033[H\033[J' which also put the cursor at the top.
This won't be portable, this is the reason we use curses, they are wrapper over termcap/terminfo.
On the command line it would be preferable to use `tput(1)` instead of those escape code you shared, same for any language.
Tmplt
Long time nixers
(28-05-2017, 05:00 AM)josuah Wrote: So you ported a library and a program from python to C++ [...]
Not such a big feat as it might sound. I never finished the python program to begin with, and the library was written in CPython (Cython?), so it was just a matter of removing Python interface cruft and wrapping it in some C++. I can't yet completely reproduce the behaviour of the python library, though. I have a feeling I might be because I'm not using the wchar functions.
josuah
Long time nixers
(28-05-2017, 09:26 AM)Tmplt Wrote: Not such a big feat as it might sound.

All right... But you still did it. :-P

(28-05-2017, 09:26 AM)Tmplt Wrote: I have a feeling I might be because I'm not using the wchar functions.

If you think this is the reason, you can try to replace every str by wcs, and every char by wchar. There is also the print functions (wprint*), and some "strings" to convert to "strings"L, but otherwise this should be enough.

I only tried this once, though.

(28-05-2017, 07:02 AM)venam Wrote: This won't be portable, this is the reason we use curses, they are wrapper over termcap/terminfo.
On the command line it would be preferable to use `tput(1)` instead of those escape code you shared, same for any language.

You are right, even if the term for which this does not work are rare, you can encounter some that break. My approach was to ignore and avoid those terms, at the cost of being less flexible (and more lazy).
Tmplt
Long time nixers
I've begun working on the menu using ncurses's menu library. I had some issues getting my hand on documentation, but I then found an answer on SO that pointed to the man pages! Who'd'a thunk it? So far I haven't had any bigger issues with it, so I assume I haven't reached that part in the implementation yet, given all the flack it appears to get.

I luckily stumbled upon my first multithread problem this early in development. It's seems solved with mutex guards, but since I lack any greater knowledge about them I'll have to revisit this part.

Aside from that, I learned that regular print-outs to stdout write upon curses's stdscr. I always thought of curses as some kind of "layer" atop stdout. So if something is written to stdout when a curses is up, the user could see if after terminating the program. Is something like this possible? I'd like to log warnings and errors to std{out,err} during runtime, but should I save these until the program has closed down curses instead?
Tmplt
Long time nixers
After a few days tinkering with why set_menu_items() would some times segfault when using an intermediate std::vector to add new entries to the menu, I got fed up with Ncurses and decided to try termbox instead.

With termbox, the terminal is just a big pixel canvas. You use tb_change_cell() to alter a single character on it, and then print the changes out with tb_present(). Making a simple menu wasn't as troublesome as I thought it would be. As it looks now, I'll keep using it. Hopefully it's portable enough.

Here is an aciinema of the initial menu implementation in action: https://asciinema.org/a/wTiIdEBvd58x4hX9adk9rTaii

Entries prefixed with '2' is from a second Python source script (and thus thread).
Tmplt
Long time nixers
Bookwyrm is slowly taking form. The index menu has seen some improvement: https://asciinema.org/a/0NAQk2ws62XRLy7Q7H6wkFCDI
The part were the index menu is minimized is where I open another (unfinished) screen. In this screen, the program will fetch some item details from Open Library and print it out in a nice format. Hopefully we'll end up fetching info about the right item. But as long as we get an ISBN β€” which most sources do β€” it shouldn't be a problem.

Soon I can get dirty with writing Python scripts, but before that I'll need to find a way to gently interrupt a std::thread. Unfortunately this isn't something C++ allows natively. The reason being that we don't want to wait for the source scripts to end before we can terminate the program. The easiest way of doing this would probably be to poll program-terminating-or-not status within the scripts themselves, but I'd like to require as little boiler-plate as possible.
Tmplt
Long time nixers
Short update: after a friend of mine helped with some debugging last night the idea of library-fying the backend sprung to life. Restructuring bookwyrm to be but a library managing plugins, and exposing a sane API for this so one can easily write a front-end upon is seems to be a better idea. This will cut down the code-base a ton, and rids me some of the the responsibility of a decent front-end, which I frankly don't enjoy as much to maintain and develop. I will still write a TUI since that's what I want, however.
Houseoftea
Long time nixers
This is really neat!
I hadn't seen this thread before but I'm glad I did.
I'm sure it'll be helpful for uni.
Tmplt
Long time nixers
(12-01-2018, 10:36 AM)Houseoftea Wrote: I'm sure it'll be helpful for uni.
Yes, this is the main reason behind the project! It will be especially useful for CS curriculums where most text books in use are in English, most of which are on Library Genesis.
NetherOrb
Long time nixers
As a cs student I cant thank you enough.
Tmplt
Long time nixers
(24-02-2018, 01:41 AM)NetherOrb Wrote: As a cs student I cant thank you enough.
I thank you for your appriciation thus far, but the project isn't in a usable state yet!

By the by, I'm currently having some issues with pybind11 on one of my systems. If anyone is interested in helping me, please fetch the resolve-35 branch and execute the unit-test. Does the test exit with an uncought exception? For some reason it does on Arch Linux, but not on NixOS.

Code:
$ git clone https://github.com/Tmplt/bookwyrm.git && cd bookwyrm
$ nix-shell # if you're running NixOS
$ git submodule update --init --recursive
$ git checkout -b resolve-35 origin/resolve-35
$ mkdir build && cd build
$ cmake .. && cd unit-test && make && cd ..
$ unit-test/bookwyrm-test
Tmplt
Long time nixers
Above issue has been fixed, along with another segfault, with some very appriciated help from the pybind11 devs. Unfortunately the fix is Linux and GCC specific (please see this issue), but I do aim to make it work with multiple compilers and platforms.

Only two issues remain under the first-release tag; a reviewal of the Libgen parser and the writing of proper CMake installation targets. A friend of mine has offered to do some QA before I tag a release. Hopefully no too serious bugs will appear.
Tmplt
Long time nixers
bookwyrm is now in a quasi-usable state as of v0.5.0 (out now!) A fresh copy can be acquired at your closest mirror β€” probably git.nixers.net:bookwyrm.git or https://github.com/Tmplt/bookwyrm/archive/v0.5.0.tar.gz.

I welcome anyone interested to append to the ever-growing list of various issues at https://github.com/Tmplt/bookwyrm/issues/53, or by sending me a PM/email. All of these issues should be resolved before the release of v0.6.0, the deadline of which is undefined. I can only confirm that bookwyrm builds on Linux with GCC for now; sorry about that.

The only available plugin right now is for Library Genesis.

EDIT: the AUR release turned out to be a bit wonky. I'll see about fixing it.
Tmplt
Long time nixers
Bookwyrm now installs all files into correct directories, just specify a CMAKE_INSTALL_PREFIX and it shall be adhered.

The python libraries bs4, furl, requests and isbnlib are required for the libgen.py plugin.
Tmplt
Long time nixers
A first usable release (v0.6.0) has now been released! Get your tarball over at https://github.com/Tmplt/bookwyrm/releases! Unfortunately, GitHub does not check out submodules before building the release tarball, but build instructions are available in README.md. After all submodules are initialized it's the well-known CMake song and dance. Don't forget to install all Python packages listed in etc/requirements.txt.

In its current form, bookwyrm only queries a subset of Library Genesis, namely its text book search, and foreign fiction (foreign here meaning not Russian, I believe). Subsequent releases will query all of its categories, along with other sources.

Known problems: bookwyrm cannot be built with Clang, and I'm unsure if any *BSD systems will work; thus far I've only tested with GCC on Linux (NixOS specifically).