Nixers project: Bittorrent library - Community & Forums Related Discussions
Users browsing this thread: 9 Guest(s)
|
|||
Hello fellow nixers!
I just started a new project, and though it would be great to involve more people in it! this project is a bittorrent library in C, as there doesn't seem to be much out there (even though many people reimplement the protocol over and over). Here is a link to the project: http://git.nixers.net/libgbt The goal is to implement the full BTP/1.0 RFC and add on top of that the ability to handle the DHT protocol. I know it might sound like a big project for some of you, but it is a good occasion to get started with C, or in a collaborative project! Who's up for it? |
|||
|
|||
I would love to, but I have currently exams until mid-August.
After that I'm trying to catch up and help :) |
|||
|
|||
Cool! Is the plan to extend synk(1) with this?
I'm definitely up for it. It's about time I learn C. |
|||
|
|||
That's the plan, but the library will be an all-purpose lib, so we'll write it without thinking about synk(1) :)
I started writing a bit of code, be sure to check it out! |
|||
|
|||
I did, but I believe I need to (at least) skim through the RFC.
Is "gbt" an abbreviation for grizzly bittorrent? |
|||
|
|||
|
|||
|
|||
I started writing some tests for the bencoding_parse() function, but they're not reoiable enough, so it might be a good idea to write the tests first.
|
|||
|
|||
The library can now parse integers, strings and lists in "Bencoding" format (check the RFC).
I added a test case as well, which reveals a segfault whenever an incorrect char is parsed (this should be fixed obviously). I also don't free a single bit of memory yet (I know, I know...), so we need a proper way to free a whole bencoding data struct. Something hard as well: NAMING. functions are "bencoding_*", variables/types "be*"... We need consistency, but I don't feel like using the full format name for variables. It will become unreadable. As the next sections are THP and PWP, I was considering "ben" as a prefix for everything. I'm just wondering if that makes sense... |
|||
|
|||
(30-07-2017, 08:45 PM)z3bra Wrote: I also don't free a single bit of memory yet (I know, I know...), so we need a proper way to free a whole bencoding data struct.Have you thought about including a simple garbage collector such as libgc ( http://www.hboehm.info/gc/ ). It might not be as flexible as freeing the memory manually but it's still handy. |
|||
|
|||
No I didn't think about it. I want to avoid external dependencies as much as possible, and for things I cannot handle myself (HTTP protocol, crypot, ...).
Freeing memory is some basic concept, so I don't feel like having an external library for the sake of being lazy :) |
|||
|
|||
Usually, your compiler's static analyzer (you do use a decent compiler, don't you?) should find forgotten mallocs for you and annoy you to fix them. Having a library to do that is only one step behind just using Java or something. /s
-- <mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen |
|||
|
|||
(31-07-2017, 04:13 AM)jkl Wrote: Usually, your compiler's static analyzer (you do use a decent compiler, don't you?) should find forgotten mallocs for you and annoy you to fix them. Having a library to do that is only one step behind just using Java or something. /s I use valgrind to hunt forgotten mallocs. It works well enough for my needs. You're right about the lib, and I don't see the point of using a library to do garbage collection when some languages offer it natively. |
|||
|
|||
What a cool looking project! As someone who knows a little C (I was taught C++ at uni last year), is there something I can do to contribute? Or should I try to get more accustomed to C before trying to help? I'm looking forward to see the evolution of this project.
|
|||
|
|||
No need to get a a pro in C, start reading what is already there, and ask questions about things you don't understand, it will help others. Then fire up your editor, and start coding!
|
|||
|
|||
Isn't the dictionary missing in your implementation?
And do we create branches or are we working on master? |
|||
|
|||
(01-08-2017, 05:31 PM)r4ndom Wrote: Isn't the dictionary missing in your implementation? A lot of things are missing! Dictionnaries are next on my list. Actually, they're the same thing as lists, except that there is a relation between each pair of member. So the difference really stands in the usage you made of them, rather than how you store them. For reference, this commit adds the dictionnary parsing functionality :) What needs to be implemented now is skimming through dictionnaries to find a specify key. As you can see, I'm working directly on master. I prefer linear histories and rebasing commits upon merging is always a pain (also, stagit won't show the branch history, but that's another story). I'd advise you to work on master, unless you want to try a new feature that might not emd up on master (eg. it will be painful to rollback) or if you want to work separately for a later review (might be great if you're not confident enough in your code). The end goal is to keep the master easy to browse, so people joining can easily replay the project history with "git log". |
|||
|
|||
I finaly implemented key search in dictionaries. That was way easier than expected ^^
I also wrote a TODO file! with a summary of what (i think) should be done. You can pick a task from here and start working if you want. I guess we can use this thread to discuss who's doing what for now? For anyone that's new to C, I'd suggest doing the "bencode" function :) It's already written in the test case, but in a quick and dirty way. The goal is to implement it in the library directly, and expose it through the API. Who feels up for it? (I can help if needed, no worries) |
|||
|
|||
(02-08-2017, 03:53 AM)z3bra Wrote: I finaly implemented key search in dictionaries. That was way easier than expected ^^Nice. But what I do not get: There is a `parseint`, `parsestr`, and `parselist` function, but why no `parsedict`. I know parselist and parsedict are somewhat the same, but isn't it counterintuitive to just have a parselist and no parsedict. Or is this irrelevant, since what is currently writte is just the logic within the library and the API is the point where verbosity matters? (02-08-2017, 03:53 AM)z3bra Wrote: Who feels up for it? (I can help if needed, no worries)Sure, I can do it. But beforehand I have some questions:
_____ Thanks for fixing the dictionary typo :P |
|||
|
|||
You could always create something like this:
Code: struct blist * But IMO it clutters the code, and anyway "bparselist" is not a good name for exposing it. I was considering a more general name, ala "bdecode()" (either renaming bparselist(), or making it a wrapper). For TAILQ, the man page is good enough IMO, but you need to understand the concept of linked-lists before. We can discuss it on IRC if ypu want. For the API, it's just about finding good names for functions and how to use it (its more a discussion than actual code though). |
|||
|
|||
i never trust HDT
always russian hax0rz in that pool |
|||
|
|||
Then that's a good way to prove this library is strong as a grizzly!
|
|||
|
|||
|
|||
A typo, he meant DHT (distributed hash table)
|
|||
|
|||
It could be good to have a small client along with it, just like lib bearssl have brssl. At least for testing if works across the whole chain.
If libgbt needs other libraries, this opens a library chain: someone writing a wrapper library over libgbt to build its client more "dev friendly", which is in turn used in a GUI widget to stream videos used in a larger program. What about an open and free implementation of sha1.c and md5.c in the repo ? Maybe it could not support md5 at all as it is quite deprecated in the RFC. |
|||
|
|||
Having a client shipped with it is indeed a good idea. The library is too young yet though.
It's you're talking about free impl. of SHA1, as I just started exporting sha1.c from libtomcrypt! For MD5, we could do the same if needed |
|||
|
|||
(05-08-2017, 06:18 PM)z3bra Wrote: The library is too young yet though. A tool to produce a bencoded .torrent metainfo file is already useful, and will permit to test bencoding and sha1 against other software already. First little reward. :) (30-07-2017, 08:45 PM)z3bra Wrote: "ben" as a prefix for everything "ben" prefix is fine for me, maintaining a list of prefix in comments would make it easy for devs new to the project? [EDIT]: .torrent metainfo file != torrent file |
|||
|
|||
r4ndom is working on the bencode() function right now. I wrote a small code to dump a torrent's content to stdout, but it's not really useful. For such tools, I don't think it's a good idea to include them in the repo, as we'll end up with many tools for no practical reason. I prefer keeping them outside the tree until the lib is feature complete.
As for the prefixes, I settled on simply 'b' for bencoding related function. And no, a list of prefixes in comment shouldn't be needed as it should be obvious when reading existing code. That's also why the first patches someone submits should be reviewed by existing contributors |
|||
|
|||
(06-08-2017, 07:10 AM)z3bra Wrote: As for the prefixes, I settled on simply 'b' for bencoding related function Perfect :) (06-08-2017, 07:10 AM)z3bra Wrote: to include them in the repo Maybe some could be turned into tests. It may be a bit early to think about it, but this came on its own while reading about bittorrent. Once we start to transfer data from a peer, how do we store it? Here is a prososition: An approach is to store the parts into files and directories: Parts gets downloaded to a memory buffer, and once one is complete, it gets saved to the disk as <hash of torrent>/<hash of the part>: Code: |-- 2072a695613e5103d9ac03c2885c5e2656cb5ff0 # hash of the torrent #1 Advantages:
Disadvantage:
To overcome this, it would be possible to store every single part in an unique dir for all the torrents, but then, race condition could occur: if two process/threads download the same part at the same time, the first one write it to the disk. Instead, before starting a download, a worker could seek in the parts directories of the other torrent if it can find the existing part. I would go to the simplest way, with one directory per torrent, which still permit optimizations. [EDIT] torrent != parts of a torrent |
|||
|
|||
After talking a bit on ##bittorrent @freenode, I learned how clients seems to implement it:
Some put parts on multiple files in some way or another (like above). But most are putting the parts directly in the torrent file: 1 - Write parts at the beginning of the torrent file (the full data blob, not the .torrent metainfo file), and sort them as they come:
2 - Or they allocate storage for the file (such as an empty 2GB file) and fill it with the parts as they come, writing them with the correct offset. This way is much simpler: as you have a list of which part goes where, there is no sorting involved: read where should the part go, an you have where you should read it. With this latter approach, in the case of multifile torrents:
These two approaches (1- and 2-) has an advantage: no need to keep the parts files (which cost a lot of storage [EDIT] and inodes). On the other hand, if the final file is moved, it can not be seeded anymore. If I was me, I would still do one file per part, but you are no me. :) |
|||