Bookwyrm — development thread

Bookwyrm — development thread - Programming On Unix

Users browsing this thread: 1 Guest(s)

	Tmplt Offline \| 13-04-2017, 03:22 PM \| #1

Since somewhere last year I've been working on and off on bookwyrm, a cli-based program that takes some input that can identify a book (i.e. authors, title, year of release, etc.) and then searches through a variety of sources. Found items that matches the input are (will be) presented in a menu alá mutt, where the users can select which items to download.

The cause of this project being my start in uni, and the requirements of text books that costs a lot of money. I found out of a few specific sources that more likely than not had these books available as pdfs, and it felt only natural wanting a way to get these books via the command line. And here we are.

I began writing the project in Python. And while progress was fast, it felt like the program lacked structure and I constantly remade the data structures. After a while I started to rewrite the progress I've made in C++, and while I still have some stuff left, I feel like I've written some better code.

Project history aside, I'm creating this thread in hopes of spawning some discussion about design, both project- and usage-wise. Best case scenario I will end up with a specification far ahead of v1.0.0.

Thus far, it's been decided that the program takes arguments POSIX-wise, after which a menu (or a list, more like) alike mutt's will open and present all found items, and properties about them. The user can then mark which items to download. The list will update as items are found.

Project-wise, an item (pdf, dvju, epub, etc.) has exact, non-exact and misc properties. The first group is anything that can be represented with an integer/enum (year, file format, etc.). Non-exact properties are titles, authors, etc.; stuff that can be miss-entered. This is all stored in strings and matched fuzzily. The misc group is for anything else. (Currently it only contains the ISBN(s) of the item.)

Progress-wise, I've thus implemented argument parsing, logging, and item matching. Since I enjoyed it when the project was written in Python, I've decided to port fuzzywuzzy (project here; there is some bug-smashing to be done).

Initially, I planned to write support for sources in C++, but I think scripting these might be a better idea. This way, the C++ part is the "front-end" that handles user input and filters items, whereas the scripts feeds the program found items. It also allows a user to easily add support for more sources (including those that cannot be pushed upstream). Is this a good idea? (Keeping the front-end this simple, I might one day rewrite it in C).

So: any comments? Is there anything you think I should plan to implement? Is there anything I should change?

Bookwyrm can be found here.

Thanks,
Tmplt

	Tmplt Offline \| 10-05-2017, 05:09 PM \| #2

So I've implemented python interfacing via a not yet merged branch of pybind11. Presently, a python script will construct the exact and nonexact PODs with dicts and then pass a tuple back to the program, from which the final item will be created.

I choose Python mainly for requests (edit: and BeautifulSoup), which makes source parsing so much easier, but if this thing catches on and people request bindings for other languages, I'll take a look at those too.

Next up is implementing the actual sources and writing the menu. Library Genesis was already implemented in the python/master branch, but will need some revisiting to work with bookwyrm (and will probably need some cleanup; I don't recall it being good code).

When it comes to the menu, I've seen it being mentioned multiple times that ncurses isn't very fun to work with, so I'll look for alternatives. Are there any that you recommend?

Aside from this, I'm not entirely sure how to handle paths to find sources. Right now, a relative path is hard-coded for debug-mode, but that wont fly in a release. I want user-defined source to exist in $XDG_CONFIG_HOME/bookwyrm/sources, but I'm not sure how portable that is. Can I safely assume $HOME/.config when $XDG_CONFIG_HOME isn't set? And for system-wide sources, will /etc/bookwyrm/sources work for *nix systems, or do some store these elsewhere?

	Tmplt Offline \| 27-05-2017, 02:45 PM \| #3

What follows is the first draft of the v1.0.0 specification (note: I do not know how to write specifications):

I have some other ideas that might fit in, but they are better suited for releases past the first major.

	josuah Offline \| 28-05-2017, 05:00 AM \| #4

This look promising. I imagine aggregating data from multiple source is hard, yet still very usefull.

So you ported a library and a program from python to C++, with still interaction with python modules. Looks like fairly complete and rich experience with programming.

ncurses is probably hard to deal with, I cannot even compile it from the latest source. :-(

I suggest you to go with an expternal tool such as dmenu or equivalent for the terminal, or build a simple one from scratch in plain C / C++.

You can print terminal escape sequences like '\033[7m\033[K' to highlight a whole line, then '\033[m' at the end. Then you can redraw the screen at every cursor movement with '\033[H\033[J' which also put the cursor at the top.

https://en.wikipedia.org/wiki/ANSI_escape_code

	venam Offline \| 28-05-2017, 07:02 AM \| #5

(28-05-2017, 05:00 AM)josuah Wrote: You can print terminal escape sequences like '\033[7m\033[K' to highlight a whole line, then '\033[m' at the end. Then you can redraw the screen at every cursor movement with '\033[H\033[J' which also put the cursor at the top.

This won't be portable, this is the reason we use curses, they are wrapper over termcap/terminfo.
On the command line it would be preferable to use `tput(1)` instead of those escape code you shared, same for any language.

	Tmplt Offline \| 28-05-2017, 09:26 AM \| #6

(28-05-2017, 05:00 AM)josuah Wrote: So you ported a library and a program from python to C++ [...]

Not such a big feat as it might sound. I never finished the python program to begin with, and the library was written in CPython (Cython?), so it was just a matter of removing Python interface cruft and wrapping it in some C++. I can't yet completely reproduce the behaviour of the python library, though. I have a feeling I might be because I'm not using the wchar functions.

	josuah Offline \| 30-05-2017, 02:56 PM \| #7

(28-05-2017, 09:26 AM)Tmplt Wrote: Not such a big feat as it might sound.

All right... But you still did it. :-P

(28-05-2017, 09:26 AM)Tmplt Wrote: I have a feeling I might be because I'm not using the wchar functions.

If you think this is the reason, you can try to replace every str by wcs, and every char by wchar. There is also the print functions (wprint*), and some "strings" to convert to "strings"L, but otherwise this should be enough.

I only tried this once, though.

(28-05-2017, 07:02 AM)venam Wrote: This won't be portable, this is the reason we use curses, they are wrapper over termcap/terminfo.
On the command line it would be preferable to use `tput(1)` instead of those escape code you shared, same for any language.

You are right, even if the term for which this does not work are rare, you can encounter some that break. My approach was to ignore and avoid those terms, at the cost of being less flexible (and more lazy).

	Tmplt Offline \| 09-06-2017, 09:28 PM \| #8

I've begun working on the menu using ncurses's menu library. I had some issues getting my hand on documentation, but I then found an answer on SO that pointed to the man pages! Who'd'a thunk it? So far I haven't had any bigger issues with it, so I assume I haven't reached that part in the implementation yet, given all the flack it appears to get.

I luckily stumbled upon my first multithread problem this early in development. It's seems solved with mutex guards, but since I lack any greater knowledge about them I'll have to revisit this part.

Aside from that, I learned that regular print-outs to stdout write upon curses's stdscr. I always thought of curses as some kind of "layer" atop stdout. So if something is written to stdout when a curses is up, the user could see if after terminating the program. Is something like this possible? I'd like to log warnings and errors to std{out,err} during runtime, but should I save these until the program has closed down curses instead?

	Tmplt Offline \| 19-06-2017, 05:45 PM \| #9

After a few days tinkering with why set_menu_items() would some times segfault when using an intermediate std::vector to add new entries to the menu, I got fed up with Ncurses and decided to try termbox instead.

With termbox, the terminal is just a big pixel canvas. You use tb_change_cell() to alter a single character on it, and then print the changes out with tb_present(). Making a simple menu wasn't as troublesome as I thought it would be. As it looks now, I'll keep using it. Hopefully it's portable enough.

Here is an aciinema of the initial menu implementation in action: https://asciinema.org/a/wTiIdEBvd58x4hX9adk9rTaii

Entries prefixed with '2' is from a second Python source script (and thus thread).

	Tmplt Offline \| 07-08-2017, 07:44 PM \| #10

Bookwyrm is slowly taking form. The index menu has seen some improvement: https://asciinema.org/a/0NAQk2ws62XRLy7Q7H6wkFCDI
The part were the index menu is minimized is where I open another (unfinished) screen. In this screen, the program will fetch some item details from Open Library and print it out in a nice format. Hopefully we'll end up fetching info about the right item. But as long as we get an ISBN — which most sources do — it shouldn't be a problem.

Soon I can get dirty with writing Python scripts, but before that I'll need to find a way to gently interrupt a std::thread. Unfortunately this isn't something C++ allows natively. The reason being that we don't want to wait for the source scripts to end before we can terminate the program. The easiest way of doing this would probably be to poll program-terminating-or-not status within the scripts themselves, but I'd like to require as little boiler-plate as possible.

	Tmplt Offline \| 12-01-2018, 08:16 AM \| #11

Short update: after a friend of mine helped with some debugging last night the idea of library-fying the backend sprung to life. Restructuring bookwyrm to be but a library managing plugins, and exposing a sane API for this so one can easily write a front-end upon is seems to be a better idea. This will cut down the code-base a ton, and rids me some of the the responsibility of a decent front-end, which I frankly don't enjoy as much to maintain and develop. I will still write a TUI since that's what I want, however.

	Houseoftea Offline \| 12-01-2018, 10:36 AM \| #12

This is really neat!
I hadn't seen this thread before but I'm glad I did.
I'm sure it'll be helpful for uni.

	Tmplt Offline \| 12-01-2018, 07:40 PM \| #13

(12-01-2018, 10:36 AM)Houseoftea Wrote: I'm sure it'll be helpful for uni.

Yes, this is the main reason behind the project! It will be especially useful for CS curriculums where most text books in use are in English, most of which are on Library Genesis.

	NetherOrb Offline \| 24-02-2018, 01:41 AM \| #14

As a cs student I cant thank you enough.

	Tmplt Offline \| 24-02-2018, 08:48 AM \| #15

(24-02-2018, 01:41 AM)NetherOrb Wrote: As a cs student I cant thank you enough.

I thank you for your appriciation thus far, but the project isn't in a usable state yet!

By the by, I'm currently having some issues with pybind11 on one of my systems. If anyone is interested in helping me, please fetch the resolve-35 branch and execute the unit-test. Does the test exit with an uncought exception? For some reason it does on Arch Linux, but not on NixOS.

Code:
$ git clone https://github.com/Tmplt/bookwyrm.git && cd bookwyrm

$ nix-shell # if you're running NixOS

$ git submodule update --init --recursive

$ git checkout -b resolve-35 origin/resolve-35

$ mkdir build && cd build

$ cmake .. && cd unit-test && make && cd ..

$ unit-test/bookwyrm-test

	Tmplt Offline \| 05-05-2018, 08:50 AM \| #16

Above issue has been fixed, along with another segfault, with some very appriciated help from the pybind11 devs. Unfortunately the fix is Linux and GCC specific (please see this issue), but I do aim to make it work with multiple compilers and platforms.

Only two issues remain under the first-release tag; a reviewal of the Libgen parser and the writing of proper CMake installation targets. A friend of mine has offered to do some QA before I tag a release. Hopefully no too serious bugs will appear.

	Tmplt Offline \| 11-05-2018, 07:59 PM \| #17

bookwyrm is now in a quasi-usable state as of v0.5.0 (out now!) A fresh copy can be acquired at your closest mirror — probably git.nixers.net:bookwyrm.git or https://github.com/Tmplt/bookwyrm/archive/v0.5.0.tar.gz.

I welcome anyone interested to append to the ever-growing list of various issues at https://github.com/Tmplt/bookwyrm/issues/53, or by sending me a PM/email. All of these issues should be resolved before the release of v0.6.0, the deadline of which is undefined. I can only confirm that bookwyrm builds on Linux with GCC for now; sorry about that.

The only available plugin right now is for Library Genesis.

EDIT: the AUR release turned out to be a bit wonky. I'll see about fixing it.

	Tmplt Offline \| 27-05-2018, 06:17 PM \| #18

Bookwyrm now installs all files into correct directories, just specify a CMAKE_INSTALL_PREFIX and it shall be adhered.

The python libraries bs4, furl, requests and isbnlib are required for the libgen.py plugin.

	Tmplt Offline \| 26-12-2018, 10:18 PM \| #19

A first usable release (v0.6.0) has now been released! Get your tarball over at https://github.com/Tmplt/bookwyrm/releases! Unfortunately, GitHub does not check out submodules before building the release tarball, but build instructions are available in README.md. After all submodules are initialized it's the well-known CMake song and dance. Don't forget to install all Python packages listed in etc/requirements.txt.

In its current form, bookwyrm only queries a subset of Library Genesis, namely its text book search, and foreign fiction (foreign here meaning not Russian, I believe). Subsequent releases will query all of its categories, along with other sources.

Known problems: bookwyrm cannot be built with Clang, and I'm unsure if any *BSD systems will work; thus far I've only tested with GCC on Linux (NixOS specifically).

View a Printable Version