I'm using the latest version of phonetrack 0.1.1 on Nextcloud 12.0.4.
I installed the app and limited it to a group (so that only a few people can use it). Then I've noticed that the GPS Logger app on Android can't upload the data. I checked the debug log of the Android app and saw the following error: Could not send to custom URL.
I tried to send the request manually and got the following message from nextcloud:
Access forbiddenApp is not enabled
There are no errors in the nextcloud.log, the web server log, and the PHP log.
As soon as I uncheck the Limit to groups box, everything works.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
This is a serious problem. Please bear with me here. (I believe you closed this too early.)
I'll try to explain:
The current situation makes it possible to pollute the database with millions of records. Everybody who has an account can now use the PhoneTrack application.
If you collect data every minute, you have about half a million records per year. This is about 40-45 MB (estimated from an average record length I saw for a track point). This is one session, one device. Now please imagine 1,000 users doing this with several devices, because they also want to track their family members and friends. Whoa. Now we are in deep trouble. The tables do not even have indexes except for the primary keys, which don't help, if you filter for session and/or device.
So now we have a table with billions of records and no proper indexes (even with indexes this is too much to handle unless you use Range Partitioning and Clustering). A simple statement will bring the nextcloud database to a crawl and every other functionality of nextcloud will have a performance problem.
I love this application and I was looking forward to using it. But I'm to afraid making it available to everyone.
Please let's keep this issue open until there's nothing left to discuss or say.
Ok if i understand what you say : the problem is you really want to restrict PhoneTrack to a user group to avoid seeing your database grow too much.
Indeed there is no restriction on the amount of data stored by users in PhoneTrack... There is no relation with files quota. I never thought about that aspect.
As i wrote in #73 (closed), the problem is related to public page access when an app is restricted to a group : this Nextcloud issue.
Do you have any idea to bypass this problem ?
I think the page reached by logging URL must be public as not all logging apps are designed to send authentication information on each logging attempt. Even if they could, it would not be easy to perform a Nextcloud authentication before logging a point...
Yes, this is the main reason for limiting the PhoneTrack app. Although there is also a use case for people not having access or seeing the app.
I wouldn't care, if a separate DB was used or if it was a standalone web app (not part of nextcloud).
However, tracking/logging can grow the database in size considerably, although the main issue might be the number of rows. I'm not even sure, if a relational database is the best way to handle this kind of data. I think MongoDB would be better suited. Anyhow, there are ways to tune the db structure and add indexes to make queries (for filtering) faster.
I might have an idea how to bypass the current restriction. What about splitting your app in 2 parts. One is the API part (to log the GPS points), the other one is the GUI part. In this case you can make the API available to everyone and the GUI only available to certain groups (or also everyone). Without the GUI noone can create a session and thus a link for logging apps.
Would that work?
Your solution would work but would not be very satisfying... It would add complexity and make it less natural for the NC admins to install PhoneTrack.
It would be nice if an app could just declare it wants its public pages to be accessible even if there is a group restriction. NC was not designed for that case.
Let's consider options that don't involve group restriction and that will solve DB-related issues :
PhoneTrack could create an SQLite DB in each user file storage. This way it would not "polute" NC DB and the amount of data would be limited by files quota... Only problem is performance with SQLite.
Your solution would work but would not be very satisfying... It would add complexity and make it less natural for the NC admins to install PhoneTrack.
Yes, the complexity of the app would increase, but I don't think that an NC admin had any issues installing 2 apps instead of one. That's rather trivial.
It would be nice if an app could just declare it wants its public pages to be accessible even if there is a group restriction. NC was not designed for that case.It would be nice if an ap
Maybe there's a way to suggest a change to nextcloud. I mean currently it seems to be possible to have certain app URLs to be public while others are only accessible after authentication. At least it looks that way, otherwise the GUI would be accessible by everyone all the time. The fact that restricting an app to groups changes this behavior looks like an arbitrary restriction or bug.
Let's consider options that don't involve group restriction and that will solve DB-related issues :
PhoneTrack could create an SQLite DB in each user file storage. This way it would not "polute" NC DB and the amount of data would be limited by files quota... Only problem is performance with SQLite.
In theory this sounds like a good idea. Are you talking about one SQLite db file per user or per session?
Unfortunately this might introduce other problems. If you create a SQLite db file in the user's file storage, wouldn't that file be synced when using a sync client? So every time there's an update (let's say 60 seconds as it is the default for GPS Logger), the entire DB file will be synced. This would certainly be a problem, especially, if you had sync clients on different machines.
Also, as you mentioned, SQLite could turn out to be a performance bottleneck. I haven't used SQLite with millions of records. On the other side, we don't need multi-user capability (concurrency) so it could be alright. I believe some tests should be done to figure that out.
Even, if you were to use a separate MySQL database, you still would see cache pollution. All databases within a MySQL instance use the same memory space for their bufferpools. Therefore, a large table scan or even index scan would fill the memory with the GPS data while evicting "nextcloud" data at the same time.
Using SQLite would solve this problem, but I don't know how SQLite works internally. Let's say you have a 200MB SQLite db file and run an SQL query to retrieve the GPS points for a session or device. Is all data transferred into the PHP memory space? If that's the case, it is pretty much unusable, since this transfer would be done for every request.
I feel that's not so trivial to explain why the app has been splitted in two parts and that just one can be group-restricted and the other one must not... It looks like a dirty fix that brings confusion for everyone including me. I don't know if i can make all that appear like if there was just one app for the end users...
Maybe i'll post a new issue about that in Nextcloud's github project but considering the lack of answer of this issue, our chances of seeing that evolve are small.
Are you talking about one SQLite db file per user or per session?
One DB per user seems much easier to manage considering features like "moving a device from one session to another".
One thing that bothers me about making several databases is the management of features related to sharing. I don't want to explore all users DB to search for shared sessions. Having one single DB with all the information is the natural way to do it.
wouldn't that file be synced when using a sync client?
I think there's a way to keep some user files out of the "Files app storage" so that they are not sync. But maybe they are not considered in user space quota.
Is all data transferred into the PHP memory space?
SQLite is not that bad in terms of performance. I've seen big queries executed quickly and without using much RAM. On my NC instances with SQLite, in PhoneTrack, getting thousands of points from the DB is clearly not the bottleneck. Network transfer and graphic rendering take way more time.
To conclude, i don't think i want to use another DB than the main NC one which is made to store user's structured data. I think the real problem is about NC not letting public pages accessible when an app is group-restricted. I can push performance optimization further by trying to make data smaller in the DB etc... but it seems to me that having DB performance problems because there are too many points produced by PhoneTrack is out of my responsibility. MySQL and PostgreSQL are supposed to be efficient even with thousands of rows and it's up to NC admins to adapt their hardware/software architecture to the load generated by their users. I could warn them that PhoneTrack can cause a big increase of their DB size but that's inevitable, it's the PhoneTrack's purpose to generate data. An admin can't avoid having a user who adds millions of contacts in Contacts app. If they were able to restrict the app to a group to limit this amount of data, i would consider it satisfyingly enough .
You are talking about thousands of rows, while I'm talking about millions, potentially billions of rows.
When someone is the only one who is using the nextcloud instance or only has a few users, I wouldn't be too worried either.
But as I pointed out in my previous comment, all this can change when you have a lot of users.
I agree that relational databaase systems are supposed to be efficient, but they won't be, if indexes are not correctly set. Currently the database will do a table scan, if you filter the results and I hope I do not have to explain, why a table scan is bad.
Here are a few suggestions to make the table structure more efficient (performance wise):
change the type of deviceid in PREFIX_phonetrack_points from varchar(100) to int(11) and create a foreign key to column id of PREFIX_phonetrack_devices (I'm not sure why this field is a varchar in the first place, since it only holds the id which is in PREFIX_phonetrack_devices)
create an index on deviceid in PREFIX_phonetrack_points
create an index on timestamp in PREFIX_phonetrack_points
Should you ever decide to create a standalone application, I'm more than happy to work on the database backend and interface. I'd probably go for MongoDB where it is quite easy to create partial indexes (e.g. only for the last 200,000 records).
Nicely spotted ! I forgot to change the type of deviceid column in oc_phonetrack_points. In the early days of PhoneTrack, when it was just a proof-of-concept, the deviceid was the device name. I changed this in the controller but forgot to adapt the column type. It's now done.
I'll probably publish v0.2.1 very soon as v0.2.0 is full of mistakes...
Indexes, are mentioned though. My knowledge about databases does not go that far. I'll have to read things about that. You can probably make a first hole in the fog for me : When there's an index for a column, does it change the way queries are written to filter on this column ?
I don't think i'll ever create a standalone PhoneTrack-like application. I'm quite happy to benefit from the authentication/storage/sharing system of Nextcloud. Besides, if indexes are supported by Sqlite, MySQL, MariaDB, PostgreSQL and if it's possible to create indexes in a generic way in Nextcloud apps then PhoneTrack can use them.
If you want to make changes in some queries or the database schema, you're more than welcome. If your commits pass the CI tests, i'll be glad to accept your merge requests. I might be faster than you to put indexes into PhoneTrack tables if i grasp the concept quickly.
Thanks a lot for your interest and your advices BTW.
Indexes are now created for deviceid and timestamp in oc_phonetrack_points table. I don't see a drastic performance improvement when selecting 70000 points but i only tried with SQLite. The query takes 3 or 4 seconds.
At the end of this article, they say it slows down insertion (which is pretty logic). I tried to import a large file with 100000 points and didn't see a big change.
I wish I had a whiteboard and you in the same room as me. I'm not quite sure how proficient I will be to explain this in a post.
Furthermore I will have to go through your current database structure and your queries to make more suggestions. As far as I could see there's no relation (1:n or n:m) from sessions to devices. Thus I'm not completely sure how much of a performance impact certain indexes will have.
When data is inserted into a table in a database, the data will be physically stored in sequence (that is if you are not using clustered tables and/or indexes). In such a case, if you now want to retrieve all rows that fulfill a certain criterion (a where clause), the database engine has to retrieve all records in the table to return the correct result set. When an index exist on that criterion, only a subset of the data has to be retrieved for the result set. See an index as a reference guide to the physical data in a table.
As an example, you want to get all rows where the decviceid equals X. In your table you inserted records in the following sequence (I only list the deviceid for this example): 1,1,2,3,2,6,3,3,1,2,8,1
Now you want all records for deviceid 1. The engine does not know that at the end there is a 1 or where records are stored with deviceid 1, thus it will have to retrieve the entire table (table scan) to return the correct rows. If you have an index on deviceid, it will look like this (1 -> 1,2,9,12 | 2 -> 3,5,10 | 3 -> 4,7,8 | 6 -> 6 | 8 -> 11). This is very simplified but you get the idea. If you want all records for deviceid 1, the index will tell you that the data is available on positions 1,2,9,12) and it is right in the beginning of the index. So there's no reason to read the entire table and even the entire index.
This also has advantages if you have a start and/or a stop predicate (where deviceid > 3 and deviceid < 8).
Now imagine a query where you want deviceid = X and timetamp < (2 days ago). In this case a combined index would be even more efficient. Anyway, I believe this would go to far to elaborate here, but suffice to say index placement should be done in accordance to the queries.
While most database systems support interesecting indexes, some do not. (So even if you had an index on deviceid and another index on timestamp, the performance would be almost as fast as having an index on both columns: index on (deviceid,timestamp).)
And yes, you are correct. Having indexes makes the insert slower. It makes sense, if you think about it, because the database engine does not only have to insert the data into the table, but also has to maintain the index (insert references). However, this application does not have a high transaction load, where you have to insert thousands of records per second. So the insert performance is not the issue. With this application the query performance is more important.
As mentioned before, I will have to go through your structure again, because if there's no relation between session and device, queries might still be ineffecient. That's because of the fact that the engine does not know which devices belong to a session.
Please let me know, if you have any specific questions.
because if there's no relation between session and device
What do you mean by relation ? Relation in database terms (foreign key) ? Relation in real life terms ? Actually there is a relation, device's sessionid is session's token. I kept old column names to avoid having to migrate the values from the old one to the new. I admit it would be clearer if sessionid was more explicit like sessiontoken.
Anyway the bottleneck is not about session-device relation. The only frequent big query is the one to get the points. There are much less queries to get devices than to get points. Don't you think ?
When I create a tracking session in the GUI, I can also add several devices to this session.
When I now click on the session in the GUI to watch this session, the gps data for this session has to be retrieved from the database. But the table oc_phonetrack_points does not have a sessionid field. Thus I need to know, which devices belong to a certain session.
I couldn't find a relation, but maybe I just missed it.
Furthermore I am extremly puzzled by the automatic refresh (5 secs by default) in the GUI. As I have earlier mentioned, the size of one year's data for one device is about 40MB. Does this mean that every 5 seconds these 40MB are retrieved from the database? What if the session has more than one device, e.g. 3 devices and there's gps data available for 3 years? All of a sudden, we have 360MB to transfer every 5 seconds.
This seems truly excessive. Even, if I set the refresh rate to 600 seconds, I'd have to retrieve 360MB (+delta) for maybe 10-20 additional (new) GPS points - and this is done every 10 minutes.
IMO the retrieval algorithm should only retrieve the delta (the data that is newer than the last refresh).
There's no way to turn off the automatic refresh. When I set it to 0, it uses the default of 5 seconds.
Is there any chance you could turn off the automatic referesh, when 0 is selected?
There is a database relation. Otherwise it would be impossible to get the points for a session
Point->deviceid ==> Device->id
Device->sessionid ==> Session->token
The UI remembers the oldest and most recent point for each device. On refresh, it just asks for points older than the oldest or more recent than the most recent. "Refresh" should probably be named "Update points".
Did you use PhoneTrack a lot ? It seems you assume more than observe .
Several aspects of PhoneTrack make it possible to deal with huge amounts of points like you do :
filters : i always keep filtering active with just the 7 last days. If i need to see more, i can then adjust the filters. On server's side, timestamp filters are applied to what's sent to reduce the amount of data to download.
i recently changed the default behaviour : when activating a session, lines and points are disabled for all devices. So the UI just gets the last point of each device which is faster than light.
You can set automatic refresh to 10000000 to almost disable it but yes, it seems a good idea to disable it when it's set to 0.
Please submit a new issue when the subject becomes too different.
Otherwise it would be impossible to get the points for a session
Haha, exactly. That's what I thought.
Did you use PhoneTrack a lot ? It seems you assume more than observe .
You are absolutely right. I haven't used it that much, since I had to deactivate it temporarily because of the nextcloud problem. I cannot make it available to all my nextcloud users thus I will have to find some sort of a workaround.
I also mentioned I would have to check the code in more detail (to understand the database access), but unfortunately I haven't had the time to do so yet. Thus all my dumb and annoying questions...
I will setup a test instance of nextcloud to run a few other tests (SQL injection, edge cases, ...). I'm always kind of paranoid when it comes to an open API endpoint.
You can set automatic refresh to 10000000 to almost disable it but yes, it seems a good idea to disable it when it's set to 0.
It still does the count down thingy when set to a high number, which drives me crazy...
You can now disable auto refresh by setting the value to an empty string, 0 or whatever that is not a number.
This is great. Thank you. This app is truly awesome. I just hope I can find a workaround for this ticket's original problem.
Every string input in SQL query is protected with dbconnection->quote() generic escape method provided by ownCloud/Nextcloud API. Every string variable in the UI is wrapped in escapeHTML Javascript function.
I recently wrote controller-side tests. In those tests, i try a lot of edge cases. Tests are executed by the continuous integration system. If you want to have a look : tests/php/controller/pageControllerTest.php.
I realize i didn't test SQL injection resistance in the tests. Thanks for mentioning it ! I'll probably add it soon. Feel free to add tests. If you fork this repo, CI tests will be executed every time you push to your master branch. No need to deploy a test environment locally.
Thanks for the compliment, it became progressively better thanks to a lot of issues like yours.
About the group restriction issue, i think we have very few chances to see it move on Nextcloud's development side. Maybe we should submit another issue and post something in help.nextcloud.com.
The only workaround i see is to split the app like you said. I'm still not a big fan of this solution .
Thanks for the info. I currently don't have time to add test cases, but I'll keep it in mind.
About the group restriction issue, i think we have very few chances to see it move on Nextcloud's development side. Maybe we should submit another issue and post something in help.nextcloud.com.
Yes, I think that's a good idea. IMO it's a bug in nextcloud, since it is an arbitrary restriction.
The only workaround i see is to split the app like you said. I'm still not a big fan of this solution .
I understand your reluctance. I don't particularly like it either. It was the only workaround I could think of at the time. Maybe there's another one. Or even better yet: nextcloud fixes their shitty behaviour.
Guess what : I fixed Nextcloud core source code. Non-logged users can access public pages of group-restricted apps. It was very simple to fix and it's not a dirty fix. I wonder why nobody reacted to the two issues.
I still need to perform unit tests and submit a pull-request for that fix but i see the light at the end of the tunnel...
...with a tiny little problem, logged-in users who are not in the authorized group cannot access public pages. The routing system redirects them to "Files" app. I didn't find where to fix that yet.
I just hope the Nextcloud devs take the PR seriously. Also this part: Logged-in users who are not in the authorized group cannot access public pages of an app. They are redirected to "Files" app.
I'm sure it would be a matter of a few minutes for core devs to figure out what to change, but I am afraid they rather develop new features. In certain cases I truly wonder if any of the designers put a rational thought in the architecture. Anyway, let's hope for the best.
Congratulations, this is awesome. Although I'm not quite sure, why they don't add that fix to the next point release for 13. I definitely will fix my 12.0.4 manually. ;-)
From what they added, i think the second problem (Logged-in users who are not in the authorized group) is still there.
I'm puzzled why the core devs didn't have a quick look at that. They could've at least pointed in the right direction. Right now I can only think of using xdebug to run a php trace.... :-(
Hmm, your original commit was changed. According to the code, the current change might fix the second problem as well. But I haven't tested it yet and my understanding of the nextcloud code is insufficient to make that determination just by reading it.
I've tried with NC14 beta 3. I confirm it's now possible to access public pages from an app which is limited to a group when not logged in. Yeepa !
I also tried to access a public page when logged in as a user who does not have access to the app and...it does not work. Request is routed to "Files" app. It's not a blocking issue as there are very few chances a request to PhoneTrack logging URLs is done by something/someone logged in (moreover as a non-allowed user).
I've tried with NC14 beta 3. I confirm it's now possible to access public pages from an app which is limited to a group when not logged in. Yeepa !
This is great news.
I also tried to access a public page when logged in as a user who does not have access to the app and...it does not work. Request is routed to "Files" app. It's not a blocking issue as there are very few chances a request to PhoneTrack logging URLs is done by something/someone logged in (moreover as a non-allowed user).
This is not so great news. A public page is public, even for logged in users. Is this an issue with Nextcloud and has to be fixed server side? Or is this an app issue?
Is this an issue with Nextcloud and has to be fixed server side?
Yes it is caused by the routing system of Nextcloud core. Nothing can be done on the app's side, pages are declared public or not and Nextcloud does the rest.
The fix for the first aspect (access when not logged) was quite easy to implement. It was just a blocking condition in the security system.
The second one (logged in users) is trickier. We need to spot the part of the routing system that ignores page/app access rights. Maybe i'll dive into this part to fix it one day. It seems Nextcloud's core team is not very concerned about this issue.
It seems Nextcloud's core team is not very concerned about this issue.
Yes, Nextcloud developers want to make money by creating new features instead of fixing bad design issues. The priorities in the Nextcloud project are way off and it's not going to change. I like the product (even though I think a 5 year old came up with some of the design), but I highly dislike the development practice and their attitude.