├── .github └── workflows │ └── python-publish.yml ├── LICENSE ├── README.md ├── channel_analyse.py ├── config.json ├── nltk_analyse.py ├── requirements.txt ├── stopwords_list.py ├── telanalysis.py ├── utils.py └── words_analyze.py /.github/workflows/python-publish.yml: -------------------------------------------------------------------------------- 1 | # This workflow will upload a Python Package using Twine when a release is created 2 | # For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries 3 | 4 | # This workflow uses actions that are not certified by GitHub. 5 | # They are provided by a third-party and are governed by 6 | # separate terms of service, privacy policy, and support 7 | # documentation. 8 | 9 | name: Upload Python Package 10 | 11 | on: 12 | release: 13 | types: [published] 14 | 15 | permissions: 16 | contents: read 17 | 18 | jobs: 19 | deploy: 20 | 21 | runs-on: ubuntu-latest 22 | 23 | steps: 24 | - uses: actions/checkout@v4 25 | - name: Set up Python 26 | uses: actions/setup-python@v3 27 | with: 28 | python-version: '3.x' 29 | - name: Install dependencies 30 | run: | 31 | python -m pip install --upgrade pip 32 | pip install build 33 | - name: Build package 34 | run: python -m build 35 | - name: Publish package 36 | uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29 37 | with: 38 | user: __token__ 39 | password: ${{ secrets.PYPI_API_TOKEN }} 40 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. <> 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU General Public License is a free, copyleft license for 11 | software and other kinds of works. 12 | 13 | The licenses for most software and other practical works are designed 14 | to take away your freedom to share and change the works. By contrast, 15 | the GNU General Public License is intended to guarantee your freedom to 16 | share and change all versions of a program--to make sure it remains free 17 | software for all its users. We, the Free Software Foundation, use the 18 | GNU General Public License for most of our software; it applies also to 19 | any other work released this way by its authors. You can apply it to 20 | your programs, too. 21 | 22 | When we speak of free software, we are referring to freedom, not 23 | price. Our General Public Licenses are designed to make sure that you 24 | have the freedom to distribute copies of free software (and charge for 25 | them if you wish), that you receive source code or can get it if you 26 | want it, that you can change the software or use pieces of it in new 27 | free programs, and that you know you can do these things. 28 | 29 | To protect your rights, we need to prevent others from denying you 30 | these rights or asking you to surrender the rights. Therefore, you have 31 | certain responsibilities if you distribute copies of the software, or if 32 | you modify it: responsibilities to respect the freedom of others. 33 | 34 | For example, if you distribute copies of such a program, whether 35 | gratis or for a fee, you must pass on to the recipients the same 36 | freedoms that you received. You must make sure that they, too, receive 37 | or can get the source code. And you must show them these terms so they 38 | know their rights. 39 | 40 | Developers that use the GNU GPL protect your rights with two steps: 41 | (1) assert copyright on the software, and (2) offer you this License 42 | giving you legal permission to copy, distribute and/or modify it. 43 | 44 | For the developers' and authors' protection, the GPL clearly explains 45 | that there is no warranty for this free software. For both users' and 46 | authors' sake, the GPL requires that modified versions be marked as 47 | changed, so that their problems will not be attributed erroneously to 48 | authors of previous versions. 49 | 50 | Some devices are designed to deny users access to install or run 51 | modified versions of the software inside them, although the manufacturer 52 | can do so. This is fundamentally incompatible with the aim of 53 | protecting users' freedom to change the software. The systematic 54 | pattern of such abuse occurs in the area of products for individuals to 55 | use, which is precisely where it is most unacceptable. Therefore, we 56 | have designed this version of the GPL to prohibit the practice for those 57 | products. If such problems arise substantially in other domains, we 58 | stand ready to extend this provision to those domains in future versions 59 | of the GPL, as needed to protect the freedom of users. 60 | 61 | Finally, every program is threatened constantly by software patents. 62 | States should not allow patents to restrict development and use of 63 | software on general-purpose computers, but in those that do, we wish to 64 | avoid the special danger that patents applied to a free program could 65 | make it effectively proprietary. To prevent this, the GPL assures that 66 | patents cannot be used to render the program non-free. 67 | 68 | The precise terms and conditions for copying, distribution and 69 | modification follow. 70 | 71 | TERMS AND CONDITIONS 72 | 73 | 0. Definitions. 74 | 75 | "This License" refers to version 3 of the GNU General Public License. 76 | 77 | "Copyright" also means copyright-like laws that apply to other kinds of 78 | works, such as semiconductor masks. 79 | 80 | "The Program" refers to any copyrightable work licensed under this 81 | License. Each licensee is addressed as "you". "Licensees" and 82 | "recipients" may be individuals or organizations. 83 | 84 | To "modify" a work means to copy from or adapt all or part of the work 85 | in a fashion requiring copyright permission, other than the making of an 86 | exact copy. The resulting work is called a "modified version" of the 87 | earlier work or a work "based on" the earlier work. 88 | 89 | A "covered work" means either the unmodified Program or a work based 90 | on the Program. 91 | 92 | To "propagate" a work means to do anything with it that, without 93 | permission, would make you directly or secondarily liable for 94 | infringement under applicable copyright law, except executing it on a 95 | computer or modifying a private copy. Propagation includes copying, 96 | distribution (with or without modification), making available to the 97 | public, and in some countries other activities as well. 98 | 99 | To "convey" a work means any kind of propagation that enables other 100 | parties to make or receive copies. Mere interaction with a user through 101 | a computer network, with no transfer of a copy, is not conveying. 102 | 103 | An interactive user interface displays "Appropriate Legal Notices" 104 | to the extent that it includes a convenient and prominently visible 105 | feature that (1) displays an appropriate copyright notice, and (2) 106 | tells the user that there is no warranty for the work (except to the 107 | extent that warranties are provided), that licensees may convey the 108 | work under this License, and how to view a copy of this License. If 109 | the interface presents a list of user commands or options, such as a 110 | menu, a prominent item in the list meets this criterion. 111 | 112 | 1. Source Code. 113 | 114 | The "source code" for a work means the preferred form of the work 115 | for making modifications to it. "Object code" means any non-source 116 | form of a work. 117 | 118 | A "Standard Interface" means an interface that either is an official 119 | standard defined by a recognized standards body, or, in the case of 120 | interfaces specified for a particular programming language, one that 121 | is widely used among developers working in that language. 122 | 123 | The "System Libraries" of an executable work include anything, other 124 | than the work as a whole, that (a) is included in the normal form of 125 | packaging a Major Component, but which is not part of that Major 126 | Component, and (b) serves only to enable use of the work with that 127 | Major Component, or to implement a Standard Interface for which an 128 | implementation is available to the public in source code form. A 129 | "Major Component", in this context, means a major essential component 130 | (kernel, window system, and so on) of the specific operating system 131 | (if any) on which the executable work runs, or a compiler used to 132 | produce the work, or an object code interpreter used to run it. 133 | 134 | The "Corresponding Source" for a work in object code form means all 135 | the source code needed to generate, install, and (for an executable 136 | work) run the object code and to modify the work, including scripts to 137 | control those activities. However, it does not include the work's 138 | System Libraries, or general-purpose tools or generally available free 139 | programs which are used unmodified in performing those activities but 140 | which are not part of the work. For example, Corresponding Source 141 | includes interface definition files associated with source files for 142 | the work, and the source code for shared libraries and dynamically 143 | linked subprograms that the work is specifically designed to require, 144 | such as by intimate data communication or control flow between those 145 | subprograms and other parts of the work. 146 | 147 | The Corresponding Source need not include anything that users 148 | can regenerate automatically from other parts of the Corresponding 149 | Source. 150 | 151 | The Corresponding Source for a work in source code form is that 152 | same work. 153 | 154 | 2. Basic Permissions. 155 | 156 | All rights granted under this License are granted for the term of 157 | copyright on the Program, and are irrevocable provided the stated 158 | conditions are met. This License explicitly affirms your unlimited 159 | permission to run the unmodified Program. The output from running a 160 | covered work is covered by this License only if the output, given its 161 | content, constitutes a covered work. This License acknowledges your 162 | rights of fair use or other equivalent, as provided by copyright law. 163 | 164 | You may make, run and propagate covered works that you do not 165 | convey, without conditions so long as your license otherwise remains 166 | in force. You may convey covered works to others for the sole purpose 167 | of having them make modifications exclusively for you, or provide you 168 | with facilities for running those works, provided that you comply with 169 | the terms of this License in conveying all material for which you do 170 | not control copyright. Those thus making or running the covered works 171 | for you must do so exclusively on your behalf, under your direction 172 | and control, on terms that prohibit them from making any copies of 173 | your copyrighted material outside their relationship with you. 174 | 175 | Conveying under any other circumstances is permitted solely under 176 | the conditions stated below. Sublicensing is not allowed; section 10 177 | makes it unnecessary. 178 | 179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 180 | 181 | No covered work shall be deemed part of an effective technological 182 | measure under any applicable law fulfilling obligations under article 183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 184 | similar laws prohibiting or restricting circumvention of such 185 | measures. 186 | 187 | When you convey a covered work, you waive any legal power to forbid 188 | circumvention of technological measures to the extent such circumvention 189 | is effected by exercising rights under this License with respect to 190 | the covered work, and you disclaim any intention to limit operation or 191 | modification of the work as a means of enforcing, against the work's 192 | users, your or third parties' legal rights to forbid circumvention of 193 | technological measures. 194 | 195 | 4. Conveying Verbatim Copies. 196 | 197 | You may convey verbatim copies of the Program's source code as you 198 | receive it, in any medium, provided that you conspicuously and 199 | appropriately publish on each copy an appropriate copyright notice; 200 | keep intact all notices stating that this License and any 201 | non-permissive terms added in accord with section 7 apply to the code; 202 | keep intact all notices of the absence of any warranty; and give all 203 | recipients a copy of this License along with the Program. 204 | 205 | You may charge any price or no price for each copy that you convey, 206 | and you may offer support or warranty protection for a fee. 207 | 208 | 5. Conveying Modified Source Versions. 209 | 210 | You may convey a work based on the Program, or the modifications to 211 | produce it from the Program, in the form of source code under the 212 | terms of section 4, provided that you also meet all of these conditions: 213 | 214 | a) The work must carry prominent notices stating that you modified 215 | it, and giving a relevant date. 216 | 217 | b) The work must carry prominent notices stating that it is 218 | released under this License and any conditions added under section 219 | 7. This requirement modifies the requirement in section 4 to 220 | "keep intact all notices". 221 | 222 | c) You must license the entire work, as a whole, under this 223 | License to anyone who comes into possession of a copy. This 224 | License will therefore apply, along with any applicable section 7 225 | additional terms, to the whole of the work, and all its parts, 226 | regardless of how they are packaged. This License gives no 227 | permission to license the work in any other way, but it does not 228 | invalidate such permission if you have separately received it. 229 | 230 | d) If the work has interactive user interfaces, each must display 231 | Appropriate Legal Notices; however, if the Program has interactive 232 | interfaces that do not display Appropriate Legal Notices, your 233 | work need not make them do so. 234 | 235 | A compilation of a covered work with other separate and independent 236 | works, which are not by their nature extensions of the covered work, 237 | and which are not combined with it such as to form a larger program, 238 | in or on a volume of a storage or distribution medium, is called an 239 | "aggregate" if the compilation and its resulting copyright are not 240 | used to limit the access or legal rights of the compilation's users 241 | beyond what the individual works permit. Inclusion of a covered work 242 | in an aggregate does not cause this License to apply to the other 243 | parts of the aggregate. 244 | 245 | 6. Conveying Non-Source Forms. 246 | 247 | You may convey a covered work in object code form under the terms 248 | of sections 4 and 5, provided that you also convey the 249 | machine-readable Corresponding Source under the terms of this License, 250 | in one of these ways: 251 | 252 | a) Convey the object code in, or embodied in, a physical product 253 | (including a physical distribution medium), accompanied by the 254 | Corresponding Source fixed on a durable physical medium 255 | customarily used for software interchange. 256 | 257 | b) Convey the object code in, or embodied in, a physical product 258 | (including a physical distribution medium), accompanied by a 259 | written offer, valid for at least three years and valid for as 260 | long as you offer spare parts or customer support for that product 261 | model, to give anyone who possesses the object code either (1) a 262 | copy of the Corresponding Source for all the software in the 263 | product that is covered by this License, on a durable physical 264 | medium customarily used for software interchange, for a price no 265 | more than your reasonable cost of physically performing this 266 | conveying of source, or (2) access to copy the 267 | Corresponding Source from a network server at no charge. 268 | 269 | c) Convey individual copies of the object code with a copy of the 270 | written offer to provide the Corresponding Source. This 271 | alternative is allowed only occasionally and noncommercially, and 272 | only if you received the object code with such an offer, in accord 273 | with subsection 6b. 274 | 275 | d) Convey the object code by offering access from a designated 276 | place (gratis or for a charge), and offer equivalent access to the 277 | Corresponding Source in the same way through the same place at no 278 | further charge. You need not require recipients to copy the 279 | Corresponding Source along with the object code. If the place to 280 | copy the object code is a network server, the Corresponding Source 281 | may be on a different server (operated by you or a third party) 282 | that supports equivalent copying facilities, provided you maintain 283 | clear directions next to the object code saying where to find the 284 | Corresponding Source. Regardless of what server hosts the 285 | Corresponding Source, you remain obligated to ensure that it is 286 | available for as long as needed to satisfy these requirements. 287 | 288 | e) Convey the object code using peer-to-peer transmission, provided 289 | you inform other peers where the object code and Corresponding 290 | Source of the work are being offered to the general public at no 291 | charge under subsection 6d. 292 | 293 | A separable portion of the object code, whose source code is excluded 294 | from the Corresponding Source as a System Library, need not be 295 | included in conveying the object code work. 296 | 297 | A "User Product" is either (1) a "consumer product", which means any 298 | tangible personal property which is normally used for personal, family, 299 | or household purposes, or (2) anything designed or sold for incorporation 300 | into a dwelling. In determining whether a product is a consumer product, 301 | doubtful cases shall be resolved in favor of coverage. For a particular 302 | product received by a particular user, "normally used" refers to a 303 | typical or common use of that class of product, regardless of the status 304 | of the particular user or of the way in which the particular user 305 | actually uses, or expects or is expected to use, the product. A product 306 | is a consumer product regardless of whether the product has substantial 307 | commercial, industrial or non-consumer uses, unless such uses represent 308 | the only significant mode of use of the product. 309 | 310 | "Installation Information" for a User Product means any methods, 311 | procedures, authorization keys, or other information required to install 312 | and execute modified versions of a covered work in that User Product from 313 | a modified version of its Corresponding Source. The information must 314 | suffice to ensure that the continued functioning of the modified object 315 | code is in no case prevented or interfered with solely because 316 | modification has been made. 317 | 318 | If you convey an object code work under this section in, or with, or 319 | specifically for use in, a User Product, and the conveying occurs as 320 | part of a transaction in which the right of possession and use of the 321 | User Product is transferred to the recipient in perpetuity or for a 322 | fixed term (regardless of how the transaction is characterized), the 323 | Corresponding Source conveyed under this section must be accompanied 324 | by the Installation Information. But this requirement does not apply 325 | if neither you nor any third party retains the ability to install 326 | modified object code on the User Product (for example, the work has 327 | been installed in ROM). 328 | 329 | The requirement to provide Installation Information does not include a 330 | requirement to continue to provide support service, warranty, or updates 331 | for a work that has been modified or installed by the recipient, or for 332 | the User Product in which it has been modified or installed. Access to a 333 | network may be denied when the modification itself materially and 334 | adversely affects the operation of the network or violates the rules and 335 | protocols for communication across the network. 336 | 337 | Corresponding Source conveyed, and Installation Information provided, 338 | in accord with this section must be in a format that is publicly 339 | documented (and with an implementation available to the public in 340 | source code form), and must require no special password or key for 341 | unpacking, reading or copying. 342 | 343 | 7. Additional Terms. 344 | 345 | "Additional permissions" are terms that supplement the terms of this 346 | License by making exceptions from one or more of its conditions. 347 | Additional permissions that are applicable to the entire Program shall 348 | be treated as though they were included in this License, to the extent 349 | that they are valid under applicable law. If additional permissions 350 | apply only to part of the Program, that part may be used separately 351 | under those permissions, but the entire Program remains governed by 352 | this License without regard to the additional permissions. 353 | 354 | When you convey a copy of a covered work, you may at your option 355 | remove any additional permissions from that copy, or from any part of 356 | it. (Additional permissions may be written to require their own 357 | removal in certain cases when you modify the work.) You may place 358 | additional permissions on material, added by you to a covered work, 359 | for which you have or can give appropriate copyright permission. 360 | 361 | Notwithstanding any other provision of this License, for material you 362 | add to a covered work, you may (if authorized by the copyright holders of 363 | that material) supplement the terms of this License with terms: 364 | 365 | a) Disclaiming warranty or limiting liability differently from the 366 | terms of sections 15 and 16 of this License; or 367 | 368 | b) Requiring preservation of specified reasonable legal notices or 369 | author attributions in that material or in the Appropriate Legal 370 | Notices displayed by works containing it; or 371 | 372 | c) Prohibiting misrepresentation of the origin of that material, or 373 | requiring that modified versions of such material be marked in 374 | reasonable ways as different from the original version; or 375 | 376 | d) Limiting the use for publicity purposes of names of licensors or 377 | authors of the material; or 378 | 379 | e) Declining to grant rights under trademark law for use of some 380 | trade names, trademarks, or service marks; or 381 | 382 | f) Requiring indemnification of licensors and authors of that 383 | material by anyone who conveys the material (or modified versions of 384 | it) with contractual assumptions of liability to the recipient, for 385 | any liability that these contractual assumptions directly impose on 386 | those licensors and authors. 387 | 388 | All other non-permissive additional terms are considered "further 389 | restrictions" within the meaning of section 10. If the Program as you 390 | received it, or any part of it, contains a notice stating that it is 391 | governed by this License along with a term that is a further 392 | restriction, you may remove that term. If a license document contains 393 | a further restriction but permits relicensing or conveying under this 394 | License, you may add to a covered work material governed by the terms 395 | of that license document, provided that the further restriction does 396 | not survive such relicensing or conveying. 397 | 398 | If you add terms to a covered work in accord with this section, you 399 | must place, in the relevant source files, a statement of the 400 | additional terms that apply to those files, or a notice indicating 401 | where to find the applicable terms. 402 | 403 | Additional terms, permissive or non-permissive, may be stated in the 404 | form of a separately written license, or stated as exceptions; 405 | the above requirements apply either way. 406 | 407 | 8. Termination. 408 | 409 | You may not propagate or modify a covered work except as expressly 410 | provided under this License. Any attempt otherwise to propagate or 411 | modify it is void, and will automatically terminate your rights under 412 | this License (including any patent licenses granted under the third 413 | paragraph of section 11). 414 | 415 | However, if you cease all violation of this License, then your 416 | license from a particular copyright holder is reinstated (a) 417 | provisionally, unless and until the copyright holder explicitly and 418 | finally terminates your license, and (b) permanently, if the copyright 419 | holder fails to notify you of the violation by some reasonable means 420 | prior to 60 days after the cessation. 421 | 422 | Moreover, your license from a particular copyright holder is 423 | reinstated permanently if the copyright holder notifies you of the 424 | violation by some reasonable means, this is the first time you have 425 | received notice of violation of this License (for any work) from that 426 | copyright holder, and you cure the violation prior to 30 days after 427 | your receipt of the notice. 428 | 429 | Termination of your rights under this section does not terminate the 430 | licenses of parties who have received copies or rights from you under 431 | this License. If your rights have been terminated and not permanently 432 | reinstated, you do not qualify to receive new licenses for the same 433 | material under section 10. 434 | 435 | 9. Acceptance Not Required for Having Copies. 436 | 437 | You are not required to accept this License in order to receive or 438 | run a copy of the Program. Ancillary propagation of a covered work 439 | occurring solely as a consequence of using peer-to-peer transmission 440 | to receive a copy likewise does not require acceptance. However, 441 | nothing other than this License grants you permission to propagate or 442 | modify any covered work. These actions infringe copyright if you do 443 | not accept this License. Therefore, by modifying or propagating a 444 | covered work, you indicate your acceptance of this License to do so. 445 | 446 | 10. Automatic Licensing of Downstream Recipients. 447 | 448 | Each time you convey a covered work, the recipient automatically 449 | receives a license from the original licensors, to run, modify and 450 | propagate that work, subject to this License. You are not responsible 451 | for enforcing compliance by third parties with this License. 452 | 453 | An "entity transaction" is a transaction transferring control of an 454 | organization, or substantially all assets of one, or subdividing an 455 | organization, or merging organizations. If propagation of a covered 456 | work results from an entity transaction, each party to that 457 | transaction who receives a copy of the work also receives whatever 458 | licenses to the work the party's predecessor in interest had or could 459 | give under the previous paragraph, plus a right to possession of the 460 | Corresponding Source of the work from the predecessor in interest, if 461 | the predecessor has it or can get it with reasonable efforts. 462 | 463 | You may not impose any further restrictions on the exercise of the 464 | rights granted or affirmed under this License. For example, you may 465 | not impose a license fee, royalty, or other charge for exercise of 466 | rights granted under this License, and you may not initiate litigation 467 | (including a cross-claim or counterclaim in a lawsuit) alleging that 468 | any patent claim is infringed by making, using, selling, offering for 469 | sale, or importing the Program or any portion of it. 470 | 471 | 11. Patents. 472 | 473 | A "contributor" is a copyright holder who authorizes use under this 474 | License of the Program or a work on which the Program is based. The 475 | work thus licensed is called the contributor's "contributor version". 476 | 477 | A contributor's "essential patent claims" are all patent claims 478 | owned or controlled by the contributor, whether already acquired or 479 | hereafter acquired, that would be infringed by some manner, permitted 480 | by this License, of making, using, or selling its contributor version, 481 | but do not include claims that would be infringed only as a 482 | consequence of further modification of the contributor version. For 483 | purposes of this definition, "control" includes the right to grant 484 | patent sublicenses in a manner consistent with the requirements of 485 | this License. 486 | 487 | Each contributor grants you a non-exclusive, worldwide, royalty-free 488 | patent license under the contributor's essential patent claims, to 489 | make, use, sell, offer for sale, import and otherwise run, modify and 490 | propagate the contents of its contributor version. 491 | 492 | In the following three paragraphs, a "patent license" is any express 493 | agreement or commitment, however denominated, not to enforce a patent 494 | (such as an express permission to practice a patent or covenant not to 495 | sue for patent infringement). To "grant" such a patent license to a 496 | party means to make such an agreement or commitment not to enforce a 497 | patent against the party. 498 | 499 | If you convey a covered work, knowingly relying on a patent license, 500 | and the Corresponding Source of the work is not available for anyone 501 | to copy, free of charge and under the terms of this License, through a 502 | publicly available network server or other readily accessible means, 503 | then you must either (1) cause the Corresponding Source to be so 504 | available, or (2) arrange to deprive yourself of the benefit of the 505 | patent license for this particular work, or (3) arrange, in a manner 506 | consistent with the requirements of this License, to extend the patent 507 | license to downstream recipients. "Knowingly relying" means you have 508 | actual knowledge that, but for the patent license, your conveying the 509 | covered work in a country, or your recipient's use of the covered work 510 | in a country, would infringe one or more identifiable patents in that 511 | country that you have reason to believe are valid. 512 | 513 | If, pursuant to or in connection with a single transaction or 514 | arrangement, you convey, or propagate by procuring conveyance of, a 515 | covered work, and grant a patent license to some of the parties 516 | receiving the covered work authorizing them to use, propagate, modify 517 | or convey a specific copy of the covered work, then the patent license 518 | you grant is automatically extended to all recipients of the covered 519 | work and works based on it. 520 | 521 | A patent license is "discriminatory" if it does not include within 522 | the scope of its coverage, prohibits the exercise of, or is 523 | conditioned on the non-exercise of one or more of the rights that are 524 | specifically granted under this License. You may not convey a covered 525 | work if you are a party to an arrangement with a third party that is 526 | in the business of distributing software, under which you make payment 527 | to the third party based on the extent of your activity of conveying 528 | the work, and under which the third party grants, to any of the 529 | parties who would receive the covered work from you, a discriminatory 530 | patent license (a) in connection with copies of the covered work 531 | conveyed by you (or copies made from those copies), or (b) primarily 532 | for and in connection with specific products or compilations that 533 | contain the covered work, unless you entered into that arrangement, 534 | or that patent license was granted, prior to 28 March 2007. 535 | 536 | Nothing in this License shall be construed as excluding or limiting 537 | any implied license or other defenses to infringement that may 538 | otherwise be available to you under applicable patent law. 539 | 540 | 12. No Surrender of Others' Freedom. 541 | 542 | If conditions are imposed on you (whether by court order, agreement or 543 | otherwise) that contradict the conditions of this License, they do not 544 | excuse you from the conditions of this License. If you cannot convey a 545 | covered work so as to satisfy simultaneously your obligations under this 546 | License and any other pertinent obligations, then as a consequence you may 547 | not convey it at all. For example, if you agree to terms that obligate you 548 | to collect a royalty for further conveying from those to whom you convey 549 | the Program, the only way you could satisfy both those terms and this 550 | License would be to refrain entirely from conveying the Program. 551 | 552 | 13. Use with the GNU Affero General Public License. 553 | 554 | Notwithstanding any other provision of this License, you have 555 | permission to link or combine any covered work with a work licensed 556 | under version 3 of the GNU Affero General Public License into a single 557 | combined work, and to convey the resulting work. The terms of this 558 | License will continue to apply to the part which is the covered work, 559 | but the special requirements of the GNU Affero General Public License, 560 | section 13, concerning interaction through a network will apply to the 561 | combination as such. 562 | 563 | 14. Revised Versions of this License. 564 | 565 | The Free Software Foundation may publish revised and/or new versions of 566 | the GNU General Public License from time to time. Such new versions will 567 | be similar in spirit to the present version, but may differ in detail to 568 | address new problems or concerns. 569 | 570 | Each version is given a distinguishing version number. If the 571 | Program specifies that a certain numbered version of the GNU General 572 | Public License "or any later version" applies to it, you have the 573 | option of following the terms and conditions either of that numbered 574 | version or of any later version published by the Free Software 575 | Foundation. If the Program does not specify a version number of the 576 | GNU General Public License, you may choose any version ever published 577 | by the Free Software Foundation. 578 | 579 | If the Program specifies that a proxy can decide which future 580 | versions of the GNU General Public License can be used, that proxy's 581 | public statement of acceptance of a version permanently authorizes you 582 | to choose that version for the Program. 583 | 584 | Later license versions may give you additional or different 585 | permissions. However, no additional obligations are imposed on any 586 | author or copyright holder as a result of your choosing to follow a 587 | later version. 588 | 589 | 15. Disclaimer of Warranty. 590 | 591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 599 | 600 | 16. Limitation of Liability. 601 | 602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 610 | SUCH DAMAGES. 611 | 612 | 17. Interpretation of Sections 15 and 16. 613 | 614 | If the disclaimer of warranty and limitation of liability provided 615 | above cannot be given local legal effect according to their terms, 616 | reviewing courts shall apply local law that most closely approximates 617 | an absolute waiver of all civil liability in connection with the 618 | Program, unless a warranty or assumption of liability accompanies a 619 | copy of the Program in return for a fee. 620 | 621 | END OF TERMS AND CONDITIONS 622 | 623 | How to Apply These Terms to Your New Programs 624 | 625 | If you develop a new program, and you want it to be of the greatest 626 | possible use to the public, the best way to achieve this is to make it 627 | free software which everyone can redistribute and change under these terms. 628 | 629 | To do so, attach the following notices to the program. It is safest 630 | to attach them to the start of each source file to most effectively 631 | state the exclusion of warranty; and each file should have at least 632 | the "copyright" line and a pointer to where the full notice is found. 633 | 634 | 635 | Copyright (C) 636 | 637 | This program is free software: you can redistribute it and/or modify 638 | it under the terms of the GNU General Public License as published by 639 | the Free Software Foundation, either version 3 of the License, or 640 | (at your option) any later version. 641 | 642 | This program is distributed in the hope that it will be useful, 643 | but WITHOUT ANY WARRANTY; without even the implied warranty of 644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 645 | GNU General Public License for more details. 646 | 647 | You should have received a copy of the GNU General Public License 648 | along with this program. If not, see . 649 | 650 | Also add information on how to contact you by electronic and paper mail. 651 | 652 | If the program does terminal interaction, make it output a short 653 | notice like this when it starts in an interactive mode: 654 | 655 | Copyright (C) 656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 657 | This is free software, and you are welcome to redistribute it 658 | under certain conditions; type `show c' for details. 659 | 660 | The hypothetical commands `show w' and `show c' should show the appropriate 661 | parts of the General Public License. Of course, your program's commands 662 | might be different; for a GUI interface, you would use an "about box". 663 | 664 | You should also get your employer (if you work as a programmer) or school, 665 | if any, to sign a "copyright disclaimer" for the program, if necessary. 666 | For more information on this, and how to apply and follow the GNU GPL, see 667 | . 668 | 669 | The GNU General Public License does not permit incorporating your program 670 | into proprietary programs. If your program is a subroutine library, you 671 | may consider it more useful to permit linking proprietary applications with 672 | the library. If this is what you want to do, use the GNU Lesser General 673 | Public License instead of this License. But first, please read 674 | . 675 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # TelAnalysis 2 | 3 | ![image](https://user-images.githubusercontent.com/107117398/223553327-a0ef0115-6cfe-4c38-9f0b-67062354a79c.png) 4 | ![image](https://user-images.githubusercontent.com/107117398/223553309-ba92ee44-ff54-4e3e-b49a-70596cde4198.png) 5 | ![image](https://user-images.githubusercontent.com/107117398/223553300-a5874615-fe67-4f8d-a042-df3aa5e3b0e6.png) 6 | ![image](https://user-images.githubusercontent.com/107117398/209858730-fe6ff0a3-9fcd-4d13-be6a-3f2a6bdd198b.png) 7 | 8 | # TelAnalysis 9 | 10 | ## Описание 11 | 12 | TelAnalysis — это инструмент для анализа сообщений в Telegram-чатах, группах и каналах. Он помогает извлекать текст, выявлять ключевые слова, а также анализировать эмоции и извлекать контактную информацию, такую как email и номера телефонов. 13 | 14 | ## Новые функции и улучшения 15 | 16 | 1. **Анализ эмоций**: 17 | - Добавлен анализ эмоций для каждого сообщения, выводимый рядом с текстом. 18 | - Среднее значение эмоций для каждого пользователя, отображаемое над списком топ-слов. 19 | - Общий анализ эмоций для всех сообщений в чате (или канале). 20 | 21 | 2. **Улучшение обработки сообщений**: 22 | - Исправлены ошибки в вычислении среднего значения эмоций, чтобы избежать исключений при суммировании. 23 | - Добавлены проверки на тип данных, чтобы гарантировать корректность обрабатываемых значений. 24 | 25 | 3. **Извлечение контактной информации**: 26 | - Добавлен функционал для извлечения email и телефонных номеров из сообщений. 27 | 28 | ## Установка 29 | 30 | 1. Клонируйте репозиторий: 31 | 32 | ```bash 33 | git clone https://github.com/krakodjaba/TelAnalysis.git 34 | ``` 35 | 36 | 2. Установите зависимости: 37 | 38 | ```bash 39 | pip install -r requirements.txt 40 | ``` 41 | 42 | ## Запуск 43 | 44 | Запустите скрипт: 45 | 46 | ```bash 47 | python3 telanalysis.py 48 | ``` 49 | 50 | ## Контрибьюция 51 | 52 | Если вы хотите внести свой вклад, пожалуйста, создайте форк репозитория и отправьте пул-реквест. 53 | 54 | ## Донаты 55 | 56 | Если вам понравился проект и вы хотите поддержать его разработку, можете сделать донат на кофеек! ☕️ 57 | tg@e_isaevsan 58 | ``` 59 | ) ( 60 | ( ) ) 61 | ) ( ( 62 | mrf_______)_ 63 | .-'---------| 64 | ( C|/\/\/\/\/| 65 | '-./\/\/\/\/| 66 | '_________' 67 | '-------' 68 | ``` 69 | 70 | Спасибо за вашу поддержку! 71 | -------------------------------------------------------------------------------- /channel_analyse.py: -------------------------------------------------------------------------------- 1 | import json 2 | import jmespath 3 | import nltk_analyse 4 | import utils 5 | import time 6 | from pywebio import input, config 7 | from pywebio.output import put_html, put_text, put_image, put_table 8 | from wordcloud import WordCloud 9 | 10 | 11 | # Чтение конфигурации 12 | select_type_stem = utils.read_conf('select_type_stem') 13 | most_com = utils.read_conf('most_com_channel') 14 | 15 | def channel(filename): 16 | # Извлечение имени канала из файла 17 | filename = filename.split(".")[0].split("/")[1] 18 | with open(f'asset/{filename}.json', 'r', encoding='utf-8') as f: 19 | jsondata = json.load(f) 20 | name_channel = jmespath.search('name', jsondata) 21 | 22 | # Отображение имени канала 23 | put_html(f"

{name_channel}

") 24 | messages_find = jmespath.search('messages[*].text', jsondata) 25 | 26 | text_list = [] 27 | 28 | # Обработка сообщений канала 29 | for message in messages_find: 30 | if isinstance(message, list): 31 | for mes in message: 32 | text = jmespath.search('text', mes) or mes 33 | text_list.append(utils.remove_emojis(text)) 34 | else: 35 | message = message.replace(" ", " ").replace("\n", "").replace("\t", "").strip() 36 | if len(message) > 4: 37 | text_list.append(utils.remove_emojis(message)) 38 | 39 | # Анализ текста и генерация облака слов 40 | fdist, tokens = nltk_analyse.analyse(text_list, most_com) 41 | all_tokens = list(tokens) 42 | all_tokens, data = nltk_analyse.analyse_all(all_tokens, most_com) 43 | 44 | # Генерация облака слов 45 | text_raw = " ".join(data) 46 | wordcloud = WordCloud().generate(text_raw) 47 | filename_path = f'asset/{filename}_wordcloud.png' 48 | wordcloud.to_file(filename_path) 49 | 50 | 51 | # Отображение результата 52 | with open(filename_path, 'rb') as img_file: 53 | img = img_file.read() 54 | 55 | time.sleep(2) 56 | put_text(f"Wordcloud[{most_com}]:") 57 | put_image(img, width='600px') 58 | put_text(f"\nCount of all tokens: {len(tokens)}") 59 | put_text(f"\nСhannel frequency analysis[{most_com}]:") 60 | 61 | # Форматирование данных для таблицы 62 | gemy = [[x, y] for x, y in all_tokens] 63 | put_table(gemy, header=['word', 'count']) 64 | -------------------------------------------------------------------------------- /config.json: -------------------------------------------------------------------------------- 1 | {"select_type_stem": "Off", "most_com": 30, "most_com_channel": 100} -------------------------------------------------------------------------------- /nltk_analyse.py: -------------------------------------------------------------------------------- 1 | import json 2 | import re 3 | import string 4 | import collections 5 | import nltk 6 | from nltk import word_tokenize 7 | from nltk.probability import FreqDist 8 | from nltk.corpus import stopwords 9 | from nltk.stem.snowball import SnowballStemmer 10 | from utils import remove_chars_from_text, remove_emojis, read_conf 11 | import stopwords_list 12 | 13 | # Инициализация стеммера 14 | stemmer = SnowballStemmer("russian") 15 | 16 | # Специальные символы для очистки текста 17 | spec_chars = string.punctuation + '\n\xa0«»\t—…"<>?!.,;:꧁@#$%^&*()_-+=№%༺༺\༺/༺-•' 18 | 19 | # Действия, которые нужно игнорировать в тексте 20 | action_map = ['Invite Member', 'Kicked Members', 'Joined by Link', 'Pinned Message'] 21 | 22 | def analyse(data, most_com): 23 | # Настройка стоп-слов 24 | russian_stopwords = stopwords.words("russian") 25 | russian_stopwords.extend(['это', 'ну', 'но', 'еще', 'ещё', 'оно', 'типа']) 26 | english_stopwords = stopwords.words("english") 27 | 28 | # Приведение текста к нижнему регистру и удаление лишних символов 29 | text = str(data).lower().replace("'", "").replace(",", "").replace("[", "").replace("]", "").replace("-", " ") 30 | 31 | for action in action_map: 32 | text = text.replace(action.lower(), "") 33 | 34 | text = remove_chars_from_text(text, spec_chars) 35 | text = remove_chars_from_text(text, string.digits) 36 | 37 | # Проверка, что текст не пустой 38 | if len(text) < 1: 39 | return [], [] 40 | 41 | # Токенизация текста 42 | text_tokens = word_tokenize(text) 43 | 44 | # Стемминг токенов, если выбран 45 | if read_conf('select_type_stem') == 'On': 46 | text_tokens = [stemmer.stem(word) for word in text_tokens] 47 | 48 | # Фильтрация токенов 49 | text_tokens = [token.strip() for token in text_tokens if 50 | token not in russian_stopwords and 51 | len(token) >= 3 and 52 | len(token) < 26 and 53 | token not in english_stopwords and 54 | 'http' not in token and 55 | token not in stopwords_list.stopword_txt] 56 | 57 | # Частотное распределение 58 | text = nltk.Text(text_tokens) 59 | fdist = FreqDist(text) 60 | fdist = fdist.most_common(most_com) 61 | 62 | return fdist, text_tokens 63 | 64 | def analyse_all(data, most_com): 65 | # Настройка стоп-слов 66 | russian_stopwords = stopwords.words("russian") 67 | english_stopwords = stopwords.words("english") 68 | russian_stopwords.extend(['это', 'ну', 'но', 'еще', 'ещё', 'оно', 'типа']) 69 | 70 | # Приведение текста к нижнему регистру и удаление лишних символов 71 | text = str(data).lower().replace("'", "").replace(",", "").replace("[", "").replace("]", "").replace("-", " ") 72 | text = remove_chars_from_text(text, spec_chars) 73 | text = remove_chars_from_text(text, string.digits) 74 | #text = remove_emojis(text) 75 | 76 | if len(text) >= 1: 77 | text_tokens = word_tokenize(text) 78 | else: 79 | return [], [] 80 | 81 | # Стемминг токенов, если выбран 82 | if read_conf('select_type_stem') == 'On': 83 | text_tokens = [stemmer.stem(word) for word in text_tokens] 84 | 85 | # Фильтрация токенов 86 | text_tokens = [token.strip() for token in text_tokens if 87 | token not in russian_stopwords and 88 | len(token) >= 4 and 89 | len(token) < 26 and 90 | token not in english_stopwords and 91 | 'http' not in token and 92 | token not in stopwords_list.stopword_txt] 93 | 94 | # Частотное распределение 95 | text = nltk.Text(text_tokens) 96 | fdist = FreqDist(text) 97 | fdist = fdist.most_common(most_com) 98 | 99 | data = [i[0] for i in fdist] 100 | return fdist, data 101 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pywebio 2 | jmespath 3 | nltk 4 | networkx 5 | matplotlib 6 | emoji 7 | wordcloud 8 | phonenumbers 9 | validate_email 10 | pandas 11 | vaderSentiment 12 | PyQt6 13 | PyQt6-Qt6 14 | PyQt6-sip 15 | -------------------------------------------------------------------------------- /stopwords_list.py: -------------------------------------------------------------------------------- 1 | stopword_txt = [ 2 | "а", "без", "белый", "больше", "большой", "будем", "будет", "будешь", 3 | "буду", "будут", "будь", "бы", "бывает", "был", "была", "были", 4 | "было", "в", "ваш", "все", "всем", "всех", "всего", "вы", "где", 5 | "да", "даже", "два", "долго", "друг", "для", "е", "его", "ее", 6 | "ей", "ему", "если", "есть", "здесь", "и", "или", "им", "к", 7 | "как", "когда", "кого", "ком", "кто", "мы", "на", "наш", "не", 8 | "нет", "ни", "один", "одиннадцать", "она", "они", "оно", "опять", 9 | "от", "по", "потом", "просто", "с", "сам", "свой", "так", "также", 10 | "там", "тебя", "то", "только", "у", "хотя", "что", "чтобы", "я", 11 | "являюсь", "все", "это", "этого", "это", "этой", "этим", "это", 12 | "такой", "все еще", "весь", "где-то", "зачем", "такой", "чуть", 13 | "им", "вместе", "сейчас", "тоже", "то", "другой", "вдруг", "и т.д." 14 | ] 15 | -------------------------------------------------------------------------------- /telanalysis.py: -------------------------------------------------------------------------------- 1 | #Telanalysis by Eduard Isaev @e_isaevsan 2 | 3 | from pywebio import start_server, input, config 4 | from pywebio.output import put_html,put_text,put_image, put_button,put_table, put_collapse, put_code, clear, put_file,Output, toast 5 | from pywebio.input import file_upload as file 6 | from pywebio.session import run_js 7 | from pywebio.input import select, slider 8 | import json, re, jmespath, string, collections, time 9 | from utils import remove_chars_from_text, remove_emojis, clear_user, clear_console, read_conf, write_conf,open_url 10 | import nltk_analyse, channel_analyse 11 | import networkx as nx 12 | import matplotlib.pyplot as plt 13 | import sys 14 | import matplotlib 15 | matplotlib.use('Agg') 16 | 17 | global select_type_stem 18 | 19 | 20 | ## config pywebio 21 | config(theme='dark',title="TelAnalysis", description="Analysing Telegram CHATS-CHANNELS-GROUPS") 22 | 23 | 24 | def generator(filename): 25 | import collections # Импортируем здесь, если используется в функции 26 | import networkx as nx 27 | import matplotlib.pyplot as plt 28 | 29 | tables = [] 30 | clear_console() 31 | filename = filename.split(".")[0] 32 | filename = filename.split("/")[1] 33 | dates_list = [] 34 | names = [] 35 | 36 | open(f'asset/edges_{filename}.csv', 'w', encoding='utf-8').write("source,target,label") 37 | 38 | with open(f'asset/{filename}.json', 'r', encoding='utf-8') as f: 39 | jsondata = json.load(f) 40 | group_name = jmespath.search('name', jsondata) 41 | put_html(f"

{group_name}

") 42 | sf = jmespath.search('messages[*]', jsondata) 43 | toast(content='Wait Result..', duration=0) 44 | 45 | for message in sf: 46 | fromm = jmespath.search('from', message) 47 | if fromm is None: 48 | continue 49 | from_id = jmespath.search('from_id', message) 50 | date = jmespath.search('date', message) 51 | dates_list.append(date) 52 | 53 | if from_id in ['source', 'target', None]: 54 | continue 55 | 56 | name_id = f'{fromm}, {from_id}' 57 | names.append(name_id) 58 | 59 | text_message = jmespath.search('text', message) 60 | if isinstance(text_message, list): 61 | for textt in message: 62 | try: 63 | if isinstance(textt, dict) and 'text' in textt: # Проверяем, что textt - словарь, и у него есть ключ 'text' 64 | test = textt['text'] 65 | elif isinstance(textt, str): # Если это строка, обрабатываем ее как есть 66 | test = textt 67 | else: 68 | continue 69 | 70 | test = test.replace("\\n", "").replace("\n", "").strip() 71 | try: 72 | message_clean = remove_emojis(test) 73 | except: 74 | message_clean = test 75 | except Exception as ex: 76 | print(f"Error: {ex}") 77 | continue 78 | 79 | else: 80 | try: 81 | message_clean = remove_emojis(text_message) 82 | except: 83 | message_clean = text_message 84 | 85 | if not message_clean: 86 | continue 87 | 88 | reply_to_message_id = jmespath.search('reply_to_message_id', message) 89 | if reply_to_message_id: 90 | for reply_message in sf: 91 | message_id = jmespath.search('id', reply_message) 92 | if reply_to_message_id == message_id: 93 | reply_to = jmespath.search('from', reply_message) 94 | reply_to_id = jmespath.search('from_id', reply_message) 95 | reply_name_id = f'{reply_to}, {reply_to_id}' 96 | names.append(reply_name_id) 97 | try: 98 | open(f'asset/edges_{filename}.csv', 'a', encoding='utf-8').write(f'\n{from_id},{reply_to_id},{fromm}-{reply_to}') 99 | except Exception as ex: 100 | print(ex) 101 | pass 102 | else: 103 | try: 104 | open(f'asset/edges_{filename}.csv', 'a', encoding='utf-8').write(f'\n{from_id},{from_id},{fromm}') 105 | except Exception as ex: 106 | print(ex) 107 | pass 108 | 109 | # Создаем nodes.csv 110 | open(f'asset/nodes_{filename}.csv', 'w', encoding='utf-8').write("id,label,weight") 111 | with open(f'asset/nodes_{filename}.csv', 'a', encoding='utf-8') as odin: 112 | #odin.write('id,label,weight') 113 | c = collections.Counter(names) 114 | users_table = [] 115 | for i in c: 116 | id_stroka = i.split(',')[1] 117 | if id_stroka in ['id', 'label', 'weight', 'None']: 118 | continue 119 | 120 | name_stroka = i.split(',')[0] 121 | weight = c[i] 122 | users_table.append([id_stroka.replace("user", ""), name_stroka, weight]) 123 | odin.write(f'\n{id_stroka},{name_stroka},{weight}') 124 | 125 | # Вывод таблицы пользователей 126 | put_table(users_table, header=['USER ID', 'USERNAME', 'COUNT']) 127 | 128 | # Визуализация графа 129 | try: 130 | G = nx.DiGraph() # Создание графа 131 | 132 | # Чтение узлов 133 | with open(f'asset/nodes_{filename}.csv', 'r', encoding='utf-8') as nodes: 134 | for node in nodes: 135 | node = node.strip() 136 | if node == "" or node.startswith("id,label,weight"): 137 | continue # Пропускаем заголовок и пустые строки 138 | 139 | parts = node.split(',') 140 | if len(parts) != 3: 141 | print(f"Skipping malformed node line: {node}") 142 | continue 143 | 144 | ids, label, weight = parts 145 | try: 146 | weight = float(weight) # Преобразуем в float 147 | if weight < 0: 148 | weight = 1 149 | except ValueError: 150 | print(f"Invalid weight value: {weight} for node {label}. Skipping...") 151 | continue 152 | 153 | G.add_node(label, weight=weight) 154 | print(f"Added node: {label} with weight: {weight}") 155 | 156 | # Чтение рёбер 157 | with open(f'asset/edges_{filename}.csv', 'r') as edges: 158 | for edge in edges: 159 | if 'source,target,label' in edge or 'None' in edge: 160 | continue 161 | source, target, label = edge.strip().split(',') 162 | G.add_edge(source, target, weight=1.3) 163 | 164 | # Визуализация графа 165 | sizes = [] 166 | colors = [] 167 | labels = {} 168 | 169 | for n in G.nodes: 170 | weight = G.nodes[n].get('weight', 1) # По умолчанию 1, если нет weight 171 | 172 | if isinstance(weight, (int, float)) and weight >= 0: # Проверяем, что число неотрицательное 173 | min_size = 50 # Минимальный размер узла 174 | scale_factor = 10 # Коэффициент масштабирования 175 | 176 | sizes.append(max(min_size, weight * scale_factor)) 177 | 178 | colors.append(weight) 179 | labels[n] = f"{n} - {weight}" # Добавляем в labels только корректные узлы 180 | else: 181 | print(f"Invalid weight for node {n}: {weight} (type: {type(weight)})") 182 | 183 | 184 | pos = nx.circular_layout(G) # Определяем расположение узлов 185 | nx.draw( 186 | G, pos, 187 | with_labels=True, 188 | labels=labels, 189 | font_weight='bold', 190 | node_size=sizes if sizes else 300, # Значение по умолчанию 191 | node_color=colors if colors else "blue", # Значение по умолчанию 192 | cmap=plt.cm.Blues # Цветовая карта 193 | ) 194 | 195 | plt.savefig(f'asset/{filename}.png', bbox_inches='tight') # Сохраняем граф в файл 196 | plt.close() # Закрываем график 197 | except Exception as ex: 198 | print(f"Error generating graph: {ex}") 199 | 200 | 201 | 202 | # Вывод даты первого и последнего сообщения 203 | firstmes = dates_list[0].replace("T", " ") 204 | lastmes = dates_list[-1].replace("T", " ") 205 | put_table([[firstmes]], header=['First Message']) 206 | put_table([[lastmes]], header=['Last Message']) 207 | 208 | # Отправка файлов 209 | try: 210 | nodes_content = open(f'asset/nodes_{filename}.csv', 'rb').read() 211 | put_file(f'nodes_{filename}.csv', label='Nodes', content=nodes_content) 212 | except Exception as ex: 213 | put_text(f"Error: {ex}") 214 | 215 | try: 216 | edges_content = open(f'asset/edges_{filename}.csv', 'rb').read() 217 | put_file(f'edges_{filename}.csv', label='Edges', content=edges_content) 218 | except Exception as ex: 219 | put_text(f"Error: {ex}") 220 | 221 | try: 222 | graph_content = open(f'asset/{filename}.png', 'rb').read() 223 | put_file(f'{filename}.png', label='Graph', content=graph_content) 224 | except Exception as ex: 225 | put_text(f"Error: {ex}") 226 | 227 | put_button("clear", onclick=lambda: run_js('window.location.reload()')) 228 | put_button("Scroll Up", onclick=lambda: run_js('window.scrollTo(document.body.scrollHeight, 0)')) 229 | 230 | 231 | 232 | def start_gen(): 233 | clear_console() 234 | clear() 235 | put_button("Scroll Down",onclick=lambda: run_js('window.scrollTo(0, document.body.scrollHeight)')) 236 | put_button("Return",onclick=lambda: run_js('window.location.reload()'), color='danger') 237 | put_html("

Graph of Telegram Chat


") 238 | f = file("Select a file:", accept='.json') 239 | open('asset/'+f['filename'], 'wb').write(f['content']) 240 | print(f['filename']) 241 | generator(f"asset/{f['filename']}") 242 | 243 | 244 | def start_two(): 245 | clear_console() 246 | clear() 247 | put_button("Scroll Down",onclick=lambda: run_js('window.scrollTo(0, document.body.scrollHeight)')) 248 | put_button("Return",onclick=lambda: run_js('window.location.reload()'), color='danger') 249 | put_html("

Analyse of Telegram Chat


") 250 | f = file("Select a file:", accept='.json') 251 | open('asset/'+f['filename'], 'wb').write(f['content']) 252 | filename = 'asset/'+f['filename'] 253 | import os 254 | os.system(f'python words_analyze.py {filename}') 255 | 256 | def start_three(): 257 | clear_console() 258 | clear() 259 | put_button("Scroll Down",onclick=lambda: run_js('window.scrollTo(0, document.body.scrollHeight)')) 260 | put_html("

Analyse of Telegram Channel


") 261 | put_button("Return",onclick=lambda: run_js('window.location.reload()'), color='danger') 262 | f = file("Select a file:", accept='.json') 263 | open('asset/'+f['filename'], 'wb').write(f['content']) 264 | filename = 'asset/'+f['filename'] 265 | import os 266 | channel_analyse.channel(filename) 267 | 268 | def config(): 269 | while True: 270 | clear_console() 271 | try: 272 | clear() 273 | put_button("Close",onclick=lambda: run_js('window.location.reload()'), color='danger') 274 | put_html("

Configuration

") 275 | put_text(f"select_type_stem: {read_conf('select_type_stem')}") 276 | put_text(f"most_common: {read_conf('most_com')}") 277 | put_text(f"most_common_channel: {read_conf('most_com_channel')}") 278 | select_type_stem = select('Stemming mode:', ['Off','On'], multiple=False) 279 | most_com = read_conf('most_com') 280 | most_com_channel = read_conf('most_com_channel') 281 | write_conf({"select_type_stem":select_type_stem, "most_com":most_com, "most_com_channel":most_com_channel}) 282 | toast("Config saved.") 283 | except Exception as ex: 284 | error = f"Error: {ex}" 285 | toast(error) 286 | try: 287 | clear() 288 | put_button("Close",onclick=lambda: run_js('window.location.reload()'), color='danger') 289 | put_html("

Configuration

") 290 | put_text(f"select_type_stem: {read_conf('select_type_stem')}") 291 | put_text(f"most_common: {read_conf('most_com')}") 292 | put_text(f"most_common_channel: {read_conf('most_com_channel')}") 293 | most_com = slider('Most Common words [USER]:') 294 | most_com_channel = read_conf('most_com_channel') 295 | write_conf({"select_type_stem":select_type_stem, "most_com":most_com, "most_com_channel":most_com_channel}) 296 | toast("Config saved.") 297 | except Exception as ex: 298 | error = f"Error: {ex}" 299 | toast(error) 300 | try: 301 | clear() 302 | put_button("Close",onclick=lambda: run_js('window.location.reload()'), color='danger') 303 | put_html("

Configuration

") 304 | put_text(f"select_type_stem: {read_conf('select_type_stem')}") 305 | put_text(f"most_common: {read_conf('most_com')}") 306 | put_text(f"most_common_channel: {read_conf('most_com_channel')}") 307 | most_com_channel = slider('Most Common words [Channel]:') 308 | write_conf({"select_type_stem":select_type_stem, "most_com":most_com, "most_com_channel":most_com_channel}) 309 | toast("Config saved.") 310 | except Exception as ex: 311 | error = f"Error: {ex}" 312 | toast(error) 313 | 314 | def default(): 315 | clear() 316 | clear_console() 317 | put_button("Config", onclick=config, color='warning') 318 | put_html("

Welcome to TelAnalysis

") 319 | put_html("

Select a module:

") 320 | put_button("Generating Graphs", onclick=start_gen) 321 | put_button("Analysing Chat", onclick=start_two) 322 | put_button("Analysing Channel", onclick=start_three) 323 | 324 | def starting(): 325 | clear_console() 326 | try: 327 | if not os.path.exists('config.json'): 328 | write_conf({"select_type_stem": "Off", "most_com": 30, "most_com_channel":100}) 329 | else: 330 | select_type_stem = read_conf('select_type_stem') 331 | most_com = read_conf('most_com') 332 | most_com_channel = read_conf('most_com_channel') 333 | except: 334 | write_conf({"select_type_stem": "Off", "most_com": 30, "most_com_channel":100}) 335 | pass 336 | while True: 337 | import nltk 338 | nltk.download('stopwords') 339 | nltk.download('punkt') 340 | nltk.download('punkt_tab') 341 | 342 | clear_console() 343 | try: 344 | import os 345 | if not os.path.exists('asset'): 346 | os.makedirs('asset') 347 | open_url() 348 | start_server(default, host='127.0.0.1', port=9993, debug=True, background='gray') 349 | except KeyboardInterrupt: 350 | break 351 | exit() 352 | except Exception as ex: 353 | print(ex) 354 | break 355 | exit(1) 356 | 357 | if __name__ == "__main__": 358 | starting() 359 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import re 2 | import string 3 | import emoji 4 | import os 5 | import subprocess 6 | import platform 7 | import json, jmespath, requests 8 | from pywebio.output import put_html,toast,put_text,put_image,put_collapse, put_button, put_code, clear, put_file, popup, put_table 9 | spec_chars = string.punctuation + '\n\xa0«»\t—…"<>?!.,;:꧁@#$%^&*()_+=№%༺༺\༺/༺•' 10 | 11 | ##config telanalysis 12 | def read_conf(option): 13 | try: 14 | with open('config.json', 'r') as read_conf: 15 | read_conf = json.load(read_conf) 16 | select_type_stem = jmespath.search(f'{option}',read_conf) 17 | return select_type_stem 18 | except: 19 | write_conf('{"select_type_stem": "Off", "most_com": 30, "most_com_channel":100}') 20 | 21 | def write_conf(dct): 22 | with open('config.json', 'w') as fw: 23 | json.dump(dct, fw) 24 | 25 | def clear_console(): 26 | system = platform.system() 27 | if system == 'Windows': 28 | subprocess.run('cls', shell=True) 29 | elif system == 'Darwin' or system == 'Linux': 30 | subprocess.run('clear', shell=True) 31 | 32 | def open_url(): 33 | system = platform.system() 34 | if system == 'Windows': 35 | subprocess.run(f'start http://127.0.0.1:9993', shell=True) 36 | elif system == 'Darwin' or system == 'Linux': 37 | subprocess.run('open http://127.0.0.1:9993', shell=True) 38 | 39 | def remove_chars_from_text(text, char=None): 40 | if char is None: 41 | char = spec_chars 42 | 43 | # Используем регулярное выражение для замены нежелательных символов на пробелы 44 | pattern = f"[{re.escape(char)}]" 45 | text = re.sub(pattern, ' ', text) # Заменяем спецсимволы на пробелы 46 | text = re.sub(r'\s+', ' ', text).strip() # Удаляем лишние пробелы 47 | return text 48 | 49 | toast(content='Wait Result..',duration=0) 50 | phonenumbers = [] 51 | ids = [] 52 | telegram_ids = [] 53 | firstnames = [] 54 | surnames = [] 55 | emails = [] 56 | trades = [] 57 | social_medias = [] 58 | addresses = [] 59 | technicals_data = [] 60 | tg_id = int(tg_id.replace("user","").replace("channel","").strip()) 61 | if tg_id: 62 | #print(int(tg_id)) 63 | req = requests.post(f'https://osintframework.ru/api/telegram/telegram-user-somevendor', json={"telegram_id": int(tg_id)}, 64 | headers={"Authorization": token}, 65 | timeout=60) 66 | try: 67 | finded_data = req.json()["telegram_id_somevendor"]["finded_data"] 68 | except: 69 | print('error') 70 | print(req) 71 | raise 72 | if len(finded_data) == 0: 73 | toast(content="Can't find result.",duration=1) 74 | pass 75 | else: 76 | if finded_data: 77 | #print(finded_data) 78 | for i in finded_data: 79 | for j in i: 80 | if 'phone_number' in j: 81 | phonenumber = i[j] 82 | if phonenumber not in phonenumbers: 83 | phonenumbers.append(phonenumber) 84 | phonenumberss = '\n'.join(phonenumbers) 85 | if 'id' in j: 86 | id = i[j] 87 | if id not in ids: 88 | ids.append(id) 89 | try: 90 | idss = '\n'.join(ids) 91 | except: 92 | idss = ids 93 | if 'telegram_id' in j: 94 | telegram_id = i[j] 95 | if telegram_id not in telegram_ids: 96 | telegram_ids.append(telegram_id) 97 | try: 98 | telegram_idss = '\n'.join(telegram_ids) 99 | except: 100 | telegram_idss = telegram_ids 101 | if 'firstname' in j: 102 | firstname = i[j] 103 | if firstname not in firstnames: 104 | firstnames.append(firstname) 105 | firstnamess = '\n'.join(firstnames) 106 | if 'surname' in j: 107 | surname = i[j] 108 | if surname not in surnames: 109 | surnames.append(surname) 110 | surnamess = '\n'.join(surnames) 111 | if 'email' in j: 112 | email = i[j] 113 | if email not in emails: 114 | emails.append(email) 115 | emailss = '\n'.join(emails) 116 | if 'trade' in j: 117 | trade = i[j] 118 | if trade not in trades: 119 | trades.append(trade) 120 | tradess = '\n'.join(trades) 121 | if 'social_media' in j: 122 | social_media = i[j] 123 | if social_media not in social_medias: 124 | social_medias.append(social_media) 125 | social_mediass = '\n'.join(social_medias) 126 | if 'address' in j: 127 | address = i[j] 128 | if address not in addresses: 129 | addresses.append(address) 130 | addressess = '\n'.join(addresses) 131 | if 'technical_data' in j: 132 | technical_data = i[j] 133 | if technical_data not in technicals_data: 134 | technicals_data.append(technical_data) 135 | technicals_datas = '\n'.join(technicals_data) 136 | tg_data = [] 137 | tg_data.append([phonenumbers, telegram_ids, firstnames, emails, addresses,trades, social_medias, technicals_datas]) 138 | data = f""" 139 | PhoneNumber: {phonenumbers} 140 | Telegram: {telegram_ids} 141 | surname: {surnames} 142 | firstname: {firstnames} 143 | email: {emails} 144 | address: {addresses} 145 | trade: {trades} 146 | social_media: {social_medias} 147 | techincal_data: {technicals_data} 148 | """ 149 | popup('Telegram INFO', [ 150 | put_html(f'

Telegram ID:{telegram_ids}

'), 151 | put_table(tg_data, header=['PhoneNumber', 'Telegram', 'Firstname','Email', 'Address', 152 | 'Trades','Social_media','Technicals Data']) 153 | ]) 154 | 155 | def remove_emojis(data): 156 | 157 | emoj = re.compile("[" 158 | u"\U0001F600-\U0001F64F" 159 | u"\U0001F300-\U0001F5FF" 160 | u"\U0001F680-\U0001F6FF" 161 | u"\U0001F1E0-\U0001F1FF" 162 | u"\U00002500-\U00002BEF" 163 | u"\U00002702-\U000027B0" 164 | u"\U000024C2-\U0001F251" 165 | u"\U0001f926-\U0001f937" 166 | u"\U00010000-\U0010ffff" 167 | u"\U00002700-\U000027BF" 168 | u"\U00002600-\U000026FF" 169 | u"\U0001F900-\U0001F9FF" 170 | u"\U0001FA70-\U0001FAFF" 171 | u"\u2640-\u2642" 172 | u"\u2600-\u2B55" 173 | u"\u200d" 174 | u"\u23cf" 175 | u"\u23e9" 176 | u"\u231a" 177 | u"\u180b" 178 | u"\u180c" 179 | u"\u0489" 180 | u"\u2019" 181 | u"\u00A4" 182 | u"\u035c" 183 | u"\u2328" 184 | u"\ufe0f" 185 | u"\u3030" 186 | u"\u231A-\u231B" 187 | u"\u23E9-\u23EC" 188 | u"\u25FD-\u25FE" 189 | u"\u2614-\u2615" 190 | u"\u2648-\u2653" 191 | u"\u26AA-\u26AB" 192 | u"\u26BD-\u26BE" 193 | u"\u26C4-\u26C5" 194 | u"\u26F2-\u26F3" 195 | u"\u270A-\u270B" 196 | u"\u2753-\u2755" 197 | u"\u2795-\u2797" 198 | u"\u2B1B-\u2B1C" 199 | u"\U0001F191-\U0001F19A" 200 | u"\U0001F232-\U0001F236" 201 | u"\U0001F238-\U0001F23A" 202 | u"\U0001F250-\U0001F251" 203 | u"\U0001F300-\U0001F30C" 204 | u"\U0001F30D-\U0001F30E" 205 | u"\U0001F313-\U0001F315" 206 | u"\U0001F316-\U0001F318" 207 | u"\U0001F31D-\U0001F31E" 208 | u"\U0001F31F-\U0001F320" 209 | u"\U0001F32D-\U0001F32F" 210 | u"\U0001F330-\U0001F331" 211 | u"\U0001F332-\U0001F333" 212 | u"\U0001F334-\U0001F335" 213 | u"\U0001F337-\U0001F34A" 214 | u"\U0001F34C-\U0001F34F" 215 | u"\U0001F351-\U0001F37B" 216 | u"\U0001F37E-\U0001F37F" 217 | u"\U0001F380-\U0001F393" 218 | u"\U0001F3A0-\U0001F3C4" 219 | u"\U0001F3CF-\U0001F3D3" 220 | u"\U0001F3E0-\U0001F3E3" 221 | u"\U0001F3E5-\U0001F3F0" 222 | u"\U0001F3F8-\U0001F407" 223 | u"\U0001F409-\U0001F40B" 224 | u"\U0001F40C-\U0001F40E" 225 | u"\U0001F40F-\U0001F410" 226 | u"\U0001F411-\U0001F412" 227 | u"\U0001F417-\U0001F429" 228 | u"\U0001F42B-\U0001F43E" 229 | u"\U0001F442-\U0001F464" 230 | u"\U0001F466-\U0001F46B" 231 | u"\U0001F46C-\U0001F46D" 232 | u"\U0001F46E-\U0001F4AC" 233 | u"\U0001F4AE-\U0001F4B5" 234 | u"\U0001F4B6-\U0001F4B7" 235 | u"\U0001F4B8-\U0001F4EB" 236 | u"\U0001F4EC-\U0001F4ED" 237 | u"\U0001F4F0-\U0001F4F4" 238 | u"\U0001F4F6-\U0001F4F7" 239 | u"\U0001F4F9-\U0001F4FC" 240 | u"\U0001F4FF-\U0001F502" 241 | u"\U0001F504-\U0001F507" 242 | u"\U0001F50A-\U0001F514" 243 | u"\U0001F516-\U0001F52B" 244 | u"\U0001F52C-\U0001F52D" 245 | u"\U0001F52E-\U0001F53D" 246 | u"\U0001F54B-\U0001F54E" 247 | u"\U0001F550-\U0001F55B" 248 | u"\U0001F55C-\U0001F567" 249 | u"\U0001F595-\U0001F596" 250 | u"\U0001F5FB-\U0001F5FF" 251 | u"\U0001F601-\U0001F606" 252 | u"\U0001F607-\U0001F608" 253 | u"\U0001F609-\U0001F60D" 254 | u"\U0001F612-\U0001F614" 255 | u"\U0001F61C-\U0001F61E" 256 | u"\U0001F620-\U0001F625" 257 | u"\U0001F626-\U0001F627" 258 | u"\U0001F628-\U0001F62B" 259 | u"\U0001F62E-\U0001F62F" 260 | u"\U0001F630-\U0001F633" 261 | u"\U0001F637-\U0001F640" 262 | u"\U0001F641-\U0001F644" 263 | u"\U0001F645-\U0001F64F" 264 | u"\U0001F681-\U0001F682" 265 | u"\U0001F683-\U0001F685" 266 | u"\U0001F68A-\U0001F68B" 267 | u"\U0001F691-\U0001F693" 268 | u"\U0001F699-\U0001F69A" 269 | u"\U0001F69B-\U0001F6A1" 270 | u"\U0001F6A4-\U0001F6A5" 271 | u"\U0001F6A7-\U0001F6AD" 272 | u"\U0001F6AE-\U0001F6B1" 273 | u"\U0001F6B3-\U0001F6B5" 274 | u"\U0001F6B7-\U0001F6B8" 275 | u"\U0001F6B9-\U0001F6BE" 276 | u"\U0001F6C1-\U0001F6C5" 277 | u"\U0001F6D1-\U0001F6D2" 278 | u"\U0001F6D6-\U0001F6D7" 279 | u"\U0001F6DD-\U0001F6DF" 280 | u"\U0001F6EB-\U0001F6EC" 281 | u"\U0001F6F4-\U0001F6F6" 282 | u"\U0001F6F7-\U0001F6F8" 283 | u"\U0001F6FB-\U0001F6FC" 284 | u"\U0001F7E0-\U0001F7EB" 285 | u"\U0001F90D-\U0001F90F" 286 | u"\U0001F910-\U0001F918" 287 | u"\U0001F919-\U0001F91E" 288 | u"\U0001F920-\U0001F927" 289 | u"\U0001F928-\U0001F92F" 290 | u"\U0001F931-\U0001F932" 291 | u"\U0001F933-\U0001F93A" 292 | u"\U0001F93C-\U0001F93E" 293 | u"\U0001F940-\U0001F945" 294 | u"\U0001F947-\U0001F94B" 295 | u"\U0001F94D-\U0001F94F" 296 | u"\U0001F950-\U0001F95E" 297 | u"\U0001F95F-\U0001F96B" 298 | u"\U0001F96C-\U0001F970" 299 | u"\U0001F973-\U0001F976" 300 | u"\U0001F977-\U0001F978" 301 | u"\U0001F97C-\U0001F97F" 302 | u"\U0001F980-\U0001F984" 303 | u"\U0001F985-\U0001F991" 304 | u"\U0001F992-\U0001F997" 305 | u"\U0001F998-\U0001F9A2" 306 | u"\U0001F9A3-\U0001F9A4" 307 | u"\U0001F9A5-\U0001F9AA" 308 | u"\U0001F9AB-\U0001F9AD" 309 | u"\U0001F9AE-\U0001F9AF" 310 | u"\U0001F9B0-\U0001F9B9" 311 | u"\U0001F9BA-\U0001F9BF" 312 | u"\U0001F9C1-\U0001F9C2" 313 | u"\U0001F9C3-\U0001F9CA" 314 | u"\U0001F9CD-\U0001F9CF" 315 | u"\U0001F9D0-\U0001F9E6" 316 | u"\U0001F9E7-\U0001F9FF" 317 | u"\U0001FA70-\U0001FA73" 318 | u"\U0001FA78-\U0001FA7A" 319 | u"\U0001FA7B-\U0001FA7C" 320 | u"\U0001FA80-\U0001FA82" 321 | u"\U0001FA83-\U0001FA86" 322 | u"\U0001FA90-\U0001FA95" 323 | u"\U0001FA96-\U0001FAA8" 324 | u"\U0001FAA9-\U0001FAAC" 325 | u"\U0001FAB0-\U0001FAB6" 326 | u"\U0001FAB7-\U0001FABA" 327 | u"\U0001FAC0-\U0001FAC2" 328 | u"\U0001FAC3-\U0001FAC5" 329 | u"\U0001FAD0-\U0001FAD6" 330 | u"\U0001FAD7-\U0001FAD9" 331 | u"\U0001FAE0-\U0001FAE7" 332 | u"\U0001FAF0-\U0001FAF6" 333 | u"\u23F0" 334 | u"\u23F3" 335 | u"\u267F" 336 | u"\u2693" 337 | u"\u26A1" 338 | u"\u26CE" 339 | u"\u26D4" 340 | u"\u26EA" 341 | u"\u26F5" 342 | u"\u26FA" 343 | u"\u26FD" 344 | u"\u2705" 345 | u"\u2728" 346 | u"\u274C" 347 | u"\u274E" 348 | u"\u2757" 349 | u"\u27B0" 350 | u"\u27BF" 351 | u"\u2B50" 352 | u"\u2B55" 353 | u"\U0001F004" 354 | u"\U0001F0CF" 355 | u"\U0001F18E" 356 | u"\U0001F201" 357 | u"\U0001F21A" 358 | u"\U0001F22F" 359 | u"\U0001F30F" 360 | u"\U0001F310" 361 | u"\U0001F311" 362 | u"\U0001F312" 363 | u"\U0001F319" 364 | u"\U0001F31A" 365 | u"\U0001F31B" 366 | u"\U0001F31C" 367 | u"\U0001F34B" 368 | u"\U0001F350" 369 | u"\U0001F37C" 370 | u"\U0001F3C5" 371 | u"\U0001F3C6" 372 | u"\U0001F3C7" 373 | u"\U0001F3C8" 374 | u"\U0001F3C9" 375 | u"\U0001F3CA" 376 | u"\U0001F3E4" 377 | u"\U0001F3F4" 378 | u"\U0001F408" 379 | u"\U0001F413" 380 | u"\U0001F414" 381 | u"\U0001F415" 382 | u"\U0001F416" 383 | u"\U0001F42A" 384 | u"\U0001F440" 385 | u"\U0001F465" 386 | u"\U0001F4AD" 387 | u"\U0001F4EE" 388 | u"\U0001F4EF" 389 | u"\U0001F4F5" 390 | u"\U0001F4F8" 391 | u"\U0001F503" 392 | u"\U0001F508" 393 | u"\U0001F509" 394 | u"\U0001F515" 395 | u"\U0001F57A" 396 | u"\U0001F5A4" 397 | u"\U0001F600" 398 | u"\U0001F60E" 399 | u"\U0001F60F" 400 | u"\U0001F610" 401 | u"\U0001F611" 402 | u"\U0001F615" 403 | u"\U0001F616" 404 | u"\U0001F617" 405 | u"\U0001F618" 406 | u"\U0001F619" 407 | u"\U0001F61A" 408 | u"\U0001F61B" 409 | u"\U0001F61F" 410 | u"\U0001F62C" 411 | u"\U0001F62D" 412 | u"\U0001F634" 413 | u"\U0001F635" 414 | u"\U0001F636" 415 | u"\U0001F680" 416 | u"\U0001F686" 417 | u"\U0001F687" 418 | u"\U0001F688" 419 | u"\U0001F689" 420 | u"\U0001F68C" 421 | u"\U0001F68D" 422 | u"\U0001F68E" 423 | u"\U0001F68F" 424 | u"\U0001F690" 425 | u"\U0001F694" 426 | u"\U0001F695" 427 | u"\U0001F696" 428 | u"\U0001F697" 429 | u"\U0001F698" 430 | u"\U0001F6A2" 431 | u"\U0001F6A3" 432 | u"\U0001F6A6" 433 | u"\U0001F6B2" 434 | u"\U0001F6B6" 435 | u"\U0001F6BF" 436 | u"\U0001F6C0" 437 | u"\U0001F6CC" 438 | u"\U0001F6D0" 439 | u"\U0001F6D5" 440 | u"\U0001F6F9" 441 | u"\U0001F6FA" 442 | u"\U0001F7F0" 443 | u"\U0001F90C" 444 | u"\U0001F91F" 445 | u"\U0001F930" 446 | u"\U0001F93F" 447 | u"\U0001F94C" 448 | u"\U0001F971" 449 | u"\U0001F972" 450 | u"\U0001F979" 451 | u"\U0001F97A" 452 | u"\U0001F97B" 453 | u"\U0001F9C0" 454 | u"\U0001F9CB" 455 | u"\U0001F9CC" 456 | u"\U0001FA74" 457 | u"\u00A9" 458 | u"\uFE0F" 459 | u"\u00AE" 460 | u"\u203C" 461 | u"\u2049" 462 | u"\u2122" 463 | u"\u2139" 464 | u"\u2194" 465 | u"\u2195" 466 | u"\u2196" 467 | u"\u2197" 468 | u"\u2198" 469 | u"\u2199" 470 | u"\u21A9" 471 | u"\u21AA" 472 | u"\u23CF" 473 | u"\u23ED" 474 | u"\u23EE" 475 | u"\u23EF" 476 | u"\u23F1" 477 | u"\u23F2" 478 | u"\u23F8" 479 | u"\u23F9" 480 | u"\u23FA" 481 | u"\u24C2" 482 | u"\u25AA" 483 | u"\u25AB" 484 | u"\u25B6" 485 | u"\u25C0" 486 | u"\u25FB" 487 | u"\u25FC" 488 | u"\u2600" 489 | u"\u2601" 490 | u"\u2602" 491 | u"\u2603" 492 | u"\u2604" 493 | u"\u260E" 494 | u"\u2611" 495 | u"\u2618" 496 | u"\u261D" 497 | u"\u2620" 498 | u"\u2622" 499 | u"\u2623" 500 | u"\u2626" 501 | u"\u262A" 502 | u"\u262E" 503 | u"\u262F" 504 | u"\u2638" 505 | u"\u2639" 506 | u"\u263A" 507 | u"\u2640" 508 | u"\u2642" 509 | u"\u265F" 510 | u"\u2660" 511 | u"\u2663" 512 | u"\u2665" 513 | u"\u2666" 514 | u"\u2668" 515 | u"\u267B" 516 | u"\u267E" 517 | u"\u2692" 518 | u"\u2694" 519 | u"\u2695" 520 | u"\u2696" 521 | u"\u2697" 522 | u"\u2699" 523 | u"\u269B" 524 | u"\u269C" 525 | u"\u26A0" 526 | u"\u26A7" 527 | u"\u26B0" 528 | u"\u26B1" 529 | u"\u26C8" 530 | u"\u26CF" 531 | u"\u26D1" 532 | u"\u26D3" 533 | u"\u26E9" 534 | u"\u26F0" 535 | u"\u26F1" 536 | u"\u26F4" 537 | u"\u26F7" 538 | u"\u26F8" 539 | u"\u26F9" 540 | u"\u2702" 541 | u"\u2708" 542 | u"\u2709" 543 | u"\u270C" 544 | u"\u270D" 545 | u"\u270F" 546 | u"\u2712" 547 | u"\u2714" 548 | u"\u2716" 549 | u"\u271D" 550 | u"\u2721" 551 | u"\u2733" 552 | u"\u2734" 553 | u"\u2744" 554 | u"\u2747" 555 | u"\u2763" 556 | u"\u2764" 557 | u"\u27A1" 558 | u"\u2934" 559 | u"\u2935" 560 | u"\u2B05" 561 | u"\u2B06" 562 | u"\u2B07" 563 | u"\u303D" 564 | u"\u3297" 565 | u"\u3299" 566 | u"\U0001F170" 567 | u"\U0001F171" 568 | u"\U0001F17E" 569 | u"\U0001F17F" 570 | u"\U0001F202" 571 | u"\U0001F237" 572 | u"\U0001F321" 573 | u"\U0001F324" 574 | u"\U0001F325" 575 | u"\U0001F326" 576 | u"\U0001F327" 577 | u"\U0001F328" 578 | u"\U0001F329" 579 | u"\U0001F32A" 580 | u"\U0001F32B" 581 | u"\U0001F32C" 582 | u"\U0001F336" 583 | u"\U0001F37D" 584 | u"\U0001F396" 585 | u"\U0001F397" 586 | u"\U0001F399" 587 | u"\U0001F39A" 588 | u"\U0001F39B" 589 | u"\U0001F39E" 590 | u"\U0001F39F" 591 | u"\U0001F3CB" 592 | u"\U0001F3CC" 593 | u"\U0001F3CD" 594 | u"\U0001F3CE" 595 | u"\U0001F3D4" 596 | u"\U0001F3D5" 597 | u"\U0001F3D6" 598 | u"\U0001F3D7" 599 | u"\U0001F3D8" 600 | u"\U0001F3D9" 601 | u"\U0001F3DA" 602 | u"\U0001F3DB" 603 | u"\U0001F3DC" 604 | u"\U0001F3DD" 605 | u"\U0001F3DE" 606 | u"\U0001F3DF" 607 | u"\U0001F3F3" 608 | u"\U0001F3F5" 609 | u"\U0001F3F7" 610 | u"\U0001F43F" 611 | u"\U0001F441" 612 | u"\U0001F4FD" 613 | u"\U0001F549" 614 | u"\U0001F54A" 615 | u"\U0001F56F" 616 | u"\U0001F570" 617 | u"\U0001F573" 618 | u"\U0001F574" 619 | u"\U0001F575" 620 | u"\U0001F576" 621 | u"\U0001F577" 622 | u"\U0001F578" 623 | u"\U0001F579" 624 | u"\U0001F587" 625 | u"\U0001F58A" 626 | u"\U0001F58B" 627 | u"\U0001F58C" 628 | u"\U0001F58D" 629 | u"\U0001F590" 630 | u"\U0001F5A5" 631 | u"\U0001F5A8" 632 | u"\U0001F5B1" 633 | u"\U0001F5B2" 634 | u"\U0001F5BC" 635 | u"\U0001F5C2" 636 | u"\U0001F5C3" 637 | u"\U0001F5C4" 638 | u"\U0001F5D1" 639 | u"\U0001F5D2" 640 | u"\U0001F5D3" 641 | u"\U0001F5DC" 642 | u"\U0001F5DD" 643 | u"\U0001F5DE" 644 | u"\U0001F5E1" 645 | u"\U0001F5E3" 646 | u"\U0001F5E8" 647 | u"\U0001F5EF" 648 | u"\U0001F5F3" 649 | u"\U0001F5FA" 650 | u"\U0001F6CB" 651 | u"\U0001F6CD" 652 | u"\U0001F6CE" 653 | u"\U0001F6CF" 654 | u"\U0001F6E0" 655 | u"\U0001F6E1" 656 | u"\U0001F6E2" 657 | u"\U0001F6E3" 658 | u"\U0001F6E4" 659 | u"\U0001F6E5" 660 | u"\U0001F6E9" 661 | u"\U0001F6F0" 662 | u"\U0001F6F3" 663 | u"\u0023" 664 | u"\u20E3" 665 | u"\u002A" 666 | u"\u0030" 667 | u"\u0031" 668 | u"\u0032" 669 | u"\u0033" 670 | u"\u0034" 671 | u"\u0035" 672 | u"\u0036" 673 | u"\u0037" 674 | u"\u0038" 675 | u"\u0039" 676 | u"\U0001F1E6" 677 | u"\U0001F1E8" 678 | u"\U0001F1E9" 679 | u"\U0001F1EA" 680 | u"\U0001F1EB" 681 | u"\U0001F1EC" 682 | u"\U0001F1EE" 683 | u"\U0001F1F1" 684 | u"\U0001F1F2" 685 | u"\U0001F1F4" 686 | u"\U0001F1F6" 687 | u"\U0001F1F7" 688 | u"\U0001F1F8" 689 | u"\U0001F1F9" 690 | u"\U0001F1FA" 691 | u"\U0001F1FC" 692 | u"\U0001F1FD" 693 | u"\U0001F1FF" 694 | u"\U0001F1E7" 695 | u"\U0001F1ED" 696 | u"\U0001F1EF" 697 | u"\U0001F1F3" 698 | u"\U0001F1FB" 699 | u"\U0001F1FE" 700 | u"\U0001F1F0" 701 | u"\U0001F1F5" 702 | u"\U000E0067" 703 | u"\U000E0062" 704 | u"\U000E0065" 705 | u"\U000E006E" 706 | u"\U000E007F" 707 | u"\U000E0073" 708 | u"\U000E0063" 709 | u"\U000E0074" 710 | u"\U000E0077" 711 | u"\U000E006C" 712 | u"\U0001F3FB" 713 | u"\U0001F3FC" 714 | u"\U0001F3FD" 715 | u"\U0001F3FE" 716 | u"\U0001F3FF" 717 | u"\u270A" 718 | u"\u270B" 719 | u"\U0001F385" 720 | u"\U0001F3C2" 721 | u"\U0001F3C3" 722 | u"\U0001F3C4" 723 | u"\U0001F442" 724 | u"\U0001F443" 725 | u"\U0001F446" 726 | u"\U0001F447" 727 | u"\U0001F448" 728 | u"\U0001F449" 729 | u"\U0001F44A" 730 | u"\U0001F44B" 731 | u"\U0001F44C" 732 | u"\U0001F44D" 733 | u"\U0001F44E" 734 | u"\U0001F44F" 735 | u"\U0001F450" 736 | u"\U0001F466" 737 | u"\U0001F467" 738 | u"\U0001F468" 739 | u"\U0001F469" 740 | u"\U0001F46B" 741 | u"\U0001F46C" 742 | u"\U0001F46D" 743 | u"\U0001F46E" 744 | u"\U0001F470" 745 | u"\U0001F471" 746 | u"\U0001F472" 747 | u"\U0001F473" 748 | u"\U0001F474" 749 | u"\U0001F475" 750 | u"\U0001F476" 751 | u"\U0001F477" 752 | u"\U0001F478" 753 | u"\U0001F47C" 754 | u"\U0001F481" 755 | u"\U0001F482" 756 | u"\U0001F483" 757 | u"\U0001F485" 758 | u"\U0001F486" 759 | u"\U0001F487" 760 | u"\U0001F48F" 761 | u"\U0001F491" 762 | u"\U0001F4AA" 763 | u"\U0001F595" 764 | u"\U0001F596" 765 | u"\U0001F645" 766 | u"\U0001F646" 767 | u"\U0001F647" 768 | u"\U0001F64B" 769 | u"\U0001F64C" 770 | u"\U0001F64D" 771 | u"\U0001F64E" 772 | u"\U0001F64F" 773 | u"\U0001F6B4" 774 | u"\U0001F6B5" 775 | u"\U0001F90F" 776 | u"\U0001F918" 777 | u"\U0001F919" 778 | u"\U0001F91A" 779 | u"\U0001F91B" 780 | u"\U0001F91C" 781 | u"\U0001F91D" 782 | u"\U0001F91E" 783 | u"\U0001F926" 784 | u"\U0001F931" 785 | u"\U0001F932" 786 | u"\U0001F933" 787 | u"\U0001F934" 788 | u"\U0001F935" 789 | u"\U0001F936" 790 | u"\U0001F937" 791 | u"\U0001F938" 792 | u"\U0001F939" 793 | u"\U0001F93D" 794 | u"\U0001F93E" 795 | u"\U0001F977" 796 | u"\U0001F9B5" 797 | u"\U0001F9B6" 798 | u"\U0001F9B8" 799 | u"\U0001F9B9" 800 | u"\U0001F9BB" 801 | u"\U0001F9CD" 802 | u"\U0001F9CE" 803 | u"\U0001F9CF" 804 | u"\U0001F9D1" 805 | u"\U0001F9D2" 806 | u"\U0001F9D3" 807 | u"\U0001F9D4" 808 | u"\U0001F9D5" 809 | u"\U0001F9D6" 810 | u"\U0001F9D7" 811 | u"\U0001F9D8" 812 | u"\U0001F9D9" 813 | u"\U0001F9DA" 814 | u"\U0001F9DB" 815 | u"\U0001F9DC" 816 | u"\U0001F9DD" 817 | u"\U0001FAC3" 818 | u"\U0001FAC4" 819 | u"\U0001FAC5" 820 | u"\U0001FAF0" 821 | u"\U0001FAF1" 822 | u"\U0001FAF2" 823 | u"\U0001FAF3" 824 | u"\U0001FAF4" 825 | u"\U0001FAF5" 826 | u"\U0001FAF6" 827 | u"\u0e4b" 828 | u"\u0489" 829 | u"\u0338" 830 | u"\u1D11E" 831 | u"\u035E" 832 | u"\u1F132" 833 | u"\uA9C2" 834 | u"\u0335" 835 | u"\u00AD" 836 | u"\u10121" 837 | u"\u00BF" 838 | u"\u1F153" 839 | "]+", re.UNICODE) 840 | emoj = re.compile(r'(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])') 841 | data = re.sub(emoj, '', data) 842 | spec_chars = string.punctuation + '\n\xa0«»\t—…"<>?!.,;:꧁@#$%^&*()_+=№%༺༺\༺/༺•' 843 | data = remove_chars_from_text(data, spec_chars) 844 | data = re.sub(r'[\x00-\x7f]', ' ', data) 845 | data = data.replace(" "," ").strip() 846 | try: 847 | #print(data) 848 | data = emoji.demojize(data) 849 | #print(data) 850 | data = str(data.split(":")[0]) 851 | #print(data) 852 | except: 853 | data = data 854 | return str(data).replace("[","").replace("]","").replace("'","").replace(" ","").replace(",","") 855 | 856 | def clear_user(user): 857 | # Убираем спецсимволы, эмодзи и очищаем текст 858 | user = str(user).replace(" ", "").replace('"', '').replace(".", "").replace("꧁", "") 859 | user = remove_chars_from_text(user) 860 | user = remove_emojis(user) 861 | 862 | return user.strip() # Удаляем пробелы в начале и конце строки -------------------------------------------------------------------------------- /words_analyze.py: -------------------------------------------------------------------------------- 1 | from utils import remove_chars_from_text, remove_emojis, clear_user, read_conf 2 | import nltk_analyse 3 | import sys 4 | from pywebio import config, output, pin, session 5 | import json, re, jmespath 6 | from validate_email import validate_email 7 | import phonenumbers 8 | import pandas as pd 9 | from concurrent.futures import ThreadPoolExecutor, as_completed 10 | 11 | from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer 12 | 13 | # Глобальные переменные 14 | emails, phoness, all_tokens, users = [], [], [], {} 15 | count_messages = 0 16 | 17 | # Карта действий для различных типов сообщений 18 | action_map = { 19 | 'invite_members': 'Invite Member', 20 | 'remove_members': 'Kicked Members', 21 | 'join_group_by_link': 'Joined by Link', 22 | 'pin_message': 'Pinned Message', 23 | # Добавьте другие действия по необходимости 24 | } 25 | 26 | # Карта для обратного преобразования действий 27 | action_reverse = ['Invite Member', 'Kicked Members', 'Joined by Link', 'Pinned Message'] 28 | 29 | # Инициализация анализатора тональности 30 | analyzer = SentimentIntensityAnalyzer() 31 | 32 | # Конфигурация интерфейса 33 | config(theme='dark', title="TelAnalysis", description="Analysing Telegram CHATS-CHANNELS-GROUPS") 34 | output.toast(content='Wait..', duration=2) 35 | output.put_button("Scroll Down", onclick=lambda: session.run_js('window.scrollTo(0, document.body.scrollHeight)')) 36 | output.put_button("Close", onclick=lambda: session.run_js('window.close()'), color='danger') 37 | output.put_html("

Analyse of Telegram Chat


") 38 | 39 | # Ввод ID пользователя 40 | pin.put_input('ID') 41 | output.put_button("Search ID", onclick=lambda: session.run_js(f'window.find({pin.pin.ID}, true)'), color='warning') 42 | 43 | # Открытие файла и загрузка данных 44 | filename = sys.argv[1].split(".")[0].split("/")[1] 45 | 46 | # Убедимся, что файл открыт с корректной кодировкой 47 | with open(f'asset/{filename}.json', 'r', encoding='utf-8', errors='replace') as datas: 48 | data = json.load(datas) 49 | 50 | sf = jmespath.search('messages[*]', data) 51 | group_name = jmespath.search('name', data) 52 | 53 | # Функция для анализа тональности 54 | def analyze_sentiment(text): 55 | try: 56 | score = analyzer.polarity_scores(str(text)) 57 | return float(score['compound']) # Приводим к float для уверенности 58 | except: 59 | return float(0.0) 60 | 61 | # Функция для извлечения email и телефонных номеров из текста 62 | def extract_emails_and_phone_numbers(text): 63 | emails_list = [] 64 | emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text) 65 | for email in emails: 66 | if validate_email(email, verify=False): 67 | emails_list.append(email) 68 | phones_list = [] 69 | phone_numbers = re.findall(r'\+?[0-9]{1,3}?[-. (]?[0-9]{1,4}[-. )]?[0-9]{1,4}[-. ]?[0-9]{1,9}', text) 70 | for phones in phone_numbers: 71 | try: 72 | phone_number = phonenumbers.parse(phones, None) 73 | if phonenumbers.is_valid_number(phone_number): 74 | phones_list.append(phones) 75 | except Exception: 76 | pass 77 | return emails_list, phones_list 78 | 79 | # Функция для извлечения текста из сообщения с улучшенной обработкой вложенных структур 80 | def extract_text_from_message(message): 81 | texts = set() # Используем множество для уникальных значений текста 82 | 83 | if isinstance(message, dict): 84 | # Извлекаем все возможные поля с текстом напрямую 85 | if 'text' in message: 86 | if isinstance(message['text'], str) and message['text'].strip(): 87 | texts.add(message['text']) 88 | elif isinstance(message['text'], list): # Если "text" - это список 89 | for item in message['text']: 90 | if isinstance(item, str): 91 | texts.add(item) 92 | 93 | # Извлечение текста из других полей, таких как caption для медиа 94 | if 'caption' in message: 95 | if isinstance(message['caption'], str) and message['caption'].strip(): 96 | texts.add(message['caption']) 97 | 98 | # Ищем текстовые сущности в text_entities 99 | entities = jmespath.search('text_entities[*].text', message) 100 | if entities: 101 | for entity in entities: 102 | texts.add(entity) 103 | 104 | # Обрабатываем вложенные структуры: пересланные и ответы на сообщения 105 | if 'forwarded_from' in message: 106 | texts.update(extract_text_from_message(message['forwarded_from'])) 107 | 108 | if 'reply_to_message' in message: 109 | texts.update(extract_text_from_message(message['reply_to_message'])) 110 | 111 | # Рекурсивно обрабатываем вложенные структуры 112 | for key, value in message.items(): 113 | if isinstance(value, (list, dict)): 114 | texts.update(extract_text_from_message(value)) 115 | 116 | elif isinstance(message, list): 117 | for item in message: 118 | texts.update(extract_text_from_message(item)) 119 | 120 | return texts 121 | 122 | # Функция для обработки каждого сообщения 123 | def process_message(message): 124 | global count_messages 125 | action = "" 126 | user = jmespath.search('from_id', message) 127 | 128 | if not user: 129 | user = jmespath.search('actor_id', message) 130 | if user: 131 | user = user.replace(" ", "") 132 | if user not in users: 133 | users[user] = [] 134 | 135 | action = jmespath.search('action', message) 136 | if action: 137 | tex = jmespath.search('text', message) or '' 138 | action_text = action_map.get(action, action) 139 | 140 | if action in ['invite_members', 'remove_members']: 141 | members = jmespath.search('members', message) 142 | members = ",".join(str(x) for x in members if x) 143 | users[user].append((f"{action_text} - {members}", 0.0)) 144 | else: 145 | users[user].append((f"{action_text} {tex}", 0.0)) 146 | return 147 | 148 | user = user.replace(" ", "") 149 | if user not in users: 150 | users[user] = [] 151 | count_messages += 1 152 | 153 | unique_texts = extract_text_from_message(message) 154 | for clean_text in unique_texts: 155 | if clean_text: 156 | sentiment_score = analyze_sentiment(clean_text) 157 | users[user].append((clean_text, sentiment_score)) # Сохраняем текст и балл 158 | # Извлечение email и телефонных номеров 159 | extracted_emails, extracted_phone_numbers = extract_emails_and_phone_numbers(clean_text) 160 | emails.extend(extracted_emails) 161 | phoness.extend(extracted_phone_numbers) 162 | 163 | # Обработка сообщений с использованием потоковой обработки 164 | with ThreadPoolExecutor() as executor: 165 | future_to_message = {executor.submit(process_message, msg): msg for msg in sf} 166 | for future in as_completed(future_to_message): # Используем as_completed правильно 167 | try: 168 | future.result() # Проверка завершения каждого потока 169 | except Exception as e: 170 | print(f"Error processing message: {e}") 171 | 172 | # Теперь отображаем в нужном формате user - user_from 173 | for user, da in users.items(): 174 | user_from = "" 175 | messages = jmespath.search(f"messages[*]", data) 176 | 177 | for m in messages: 178 | if jmespath.search('from_id', m) == user: 179 | user_from = jmespath.search('from', m) 180 | break 181 | 182 | if user_from: 183 | user_display = f"{user_from} - {user}" 184 | else: 185 | user_display = user 186 | 187 | # Извлечение оценок чувствительности 188 | user_sentiment_scores = [float(x[1]) for x in da if isinstance(x[1], float)] 189 | average_user_sentiment = sum(user_sentiment_scores) / len(user_sentiment_scores) if user_sentiment_scores else 0 190 | 191 | try: 192 | most_com = read_conf('most_com') 193 | genuy, tokens = nltk_analyse.analyse(da, most_com) 194 | 195 | gemy = [[x, y] for x, y in genuy] 196 | gery = [[x[0]] for x in da] 197 | #print(len(all_tokens)) 198 | all_tokens.extend(tokens) 199 | 200 | if gery or gemy: 201 | output.put_collapse(user_display, [ 202 | f'Messages of {user_display}', # Сообщение с user_from 203 | output.put_text(f'Average Sentiment for {user_display}: {average_user_sentiment:.2f}'), 204 | output.put_table([[x[0], x[1]] for x in da], header=['Messages', 'Sentiment Score']), 205 | output.put_table(gemy, header=['word', 'count']) 206 | ], open=False) 207 | 208 | except Exception as ex: 209 | print(f"[{user}] error: {ex}") 210 | 211 | # Общий анализ всех сообщений 212 | most_com = read_conf('most_com') 213 | 214 | # Проверяем, что функция nltk_analyse.analyse_all возвращает правильные данные 215 | try: 216 | all_tokens, data = nltk_analyse.analyse_all(all_tokens, most_com) 217 | except Exception as e: 218 | print(f"Error in analyse_all: {e}") 219 | 220 | # Проверим, что all_tokens не пустой 221 | if all_tokens: 222 | all_chat = [[i[0], i[1]] for i in all_tokens] 223 | output.put_collapse(f'TOP words of {group_name}', [ 224 | output.put_table(all_chat, header=['word']), 225 | ], open=False) 226 | 227 | # Общий анализ тональности чата 228 | all_sentiment_scores = [analyze_sentiment(msg[0]) for msg in all_tokens] # Извлекаем текст для анализа 229 | average_chat_sentiment = sum(all_sentiment_scores) / len(all_sentiment_scores) if all_sentiment_scores else 0 230 | output.put_text(f'Average Sentiment for {group_name}: {average_chat_sentiment:.2f}') 231 | else: 232 | #print("No tokens found for analysis.") 233 | output.put_text(f"No tokens found for {group_name}. Sentiment analysis is unavailable.") 234 | 235 | 236 | # Обработка email и телефонов 237 | emaills = [[email] for email in set(emails)] 238 | phonness = [[ph] for ph in set(phoness)] 239 | output.put_collapse('Finded Emails and Numbers', [ 240 | output.put_table(emaills, header=['Emails:']), 241 | output.put_table(phonness, header=['Numbers:']) 242 | ], open=False) 243 | 244 | # Дополнительные кнопки 245 | output.put_button("Close", onclick=lambda: session.run_js('window.close()'), color='danger') 246 | output.put_button("Scroll Up", onclick=lambda: session.run_js('window.scrollTo(document.body.scrollHeight, 0)')) 247 | --------------------------------------------------------------------------------